Hi Arkadiy, Every array can potentially have nulls, meaning that the logical type of the values of every array is areay<optional<T>>, but it’s common for compute kernels to specialize their loops based on the presence or absence of nulls in an array by calling Array::MayHaveLogicalNulls() before starting the loop.
Note that the physical representation of an array is not a contiguous slice of memory containing std::optional<T> values. The first buffer of every array is a bitmap — an array of integers where each bit indicates if that position, in the other array buffers, stores a valid or NULL value. So where there is a NULL, there isn’t really a hole, just a value that shouldn’t be read because the validity bitmap tells you it’s invalid. These values are usually zeroed, but that’s not guaranteed (they should be treated as undefined memory). When compute kernels know they aren’t going to output any NULLs in the array, they don’t even allocate the validity buffer. That also communicates the fact that every value is valid (aka non-NULL). Does that answer your question? — Felipe On Thu, 8 Jun 2023 at 17:43 Arkadiy Vertleyb (BLOOMBERG/ 120 PARK) < [email protected]> wrote: > Hi all. > A question - Arrow API seems to assume arrays can have holes (nulls) - the > values are std::optional. > Is this something that is determined by the arrow user, when the array > gets populated, or is this something that arrow can put there on its own, > like for alignment, etc.? > Should the reader always assume there could be holes or is it subject for > agreement between writer and reader? > Thanks. >
