Re: holes in arrays

Felipe Oliveira Carvalho Thu, 08 Jun 2023 14:18:11 -0700

Hi Arkadiy,

Every array can potentially have nulls, meaning that the logical type of
the values of every array is areay<optional<T>>, but it’s common for
compute kernels to specialize their loops based on the presence or absence
of nulls in an array by calling Array::MayHaveLogicalNulls() before
starting the loop.

Note that the physical representation of an array is not a contiguous slice
of memory containing std::optional<T> values.

The first buffer of every array is a bitmap — an array of integers where
each bit indicates if that position, in the other array buffers, stores a
valid or NULL value.

So where there is a NULL, there isn’t really a hole, just a value that
shouldn’t be read because the validity bitmap tells you it’s invalid. These
values are usually zeroed, but that’s not guaranteed (they should be
treated as undefined memory).

When compute kernels know they aren’t going to output any NULLs in the
array, they don’t even allocate the validity buffer. That also communicates
the fact that every value is valid (aka non-NULL).

Does that answer your question?

—
Felipe

On Thu, 8 Jun 2023 at 17:43 Arkadiy Vertleyb (BLOOMBERG/ 120 PARK) <
[email protected]> wrote:

> Hi all.
> A question - Arrow API seems to assume arrays can have holes (nulls) - the
> values are std::optional.
> Is this something that is determined by the arrow user, when the array
> gets populated, or is this something that arrow can put there on its own,
> like for alignment, etc.?
> Should the reader always assume there could be holes or is it subject for
> agreement between writer and reader?
> Thanks.
>

Re: holes in arrays

Reply via email to