> these do end up being loops at the lower levels

Even if you don't write explicit SIMD, (1) the compiler might
vectorize the loop for you, and (2) the superscalar nature of modern
CPUs means loops with less branches and memory indirections will run
faster.

> Now I just need to figure out the best way to do this over multiple columns 
> (row-wise).

You can usually turn loops that go row-by-row into loops that go
column-by-column by maintaining selection vectors or bitmaps that you
can use as masks to operations on the remaining columns.

On Thu, Feb 22, 2024 at 1:39 PM Blair Azzopardi <[email protected]> wrote:
>
> Thanks @Weston and @Felipe. This information has been very helpful and thank 
> you for the examples too. I completely agree with vectorizing computations; 
> although, ultimately, these do end up being loops at the lower levels (unless 
> there's some hardware support, eg SIMD/GPU etc).
>
> @Weston, I managed to iterate over my chunked array as you suggested (found 
> some useful examples under the test cases) i.e
>
>     std::vector<double> values;
>     for (auto elem : arrow::stl::Iterate<arrow::DoubleType>(*chunked_array)) {
>         if (elem.has_value()) {
>             values.push_back(*elem);
>         }
>     }
>
> @Felipe, I had to adjust your snippet somewhat to get it to work (perhaps the 
> API is in flux). Eventually I did something like this:
>
>     for (auto &chunk : chunked_array->chunks()) {
>         auto &data = chunk->data();
>         arrow::ArraySpan array_span(*data);
>         auto len = array_span.buffers[1].size / 
> static_cast<int64_t>(sizeof(double));
>         auto raw_values = array_span.GetSpan<double>(1, len);
>         // able to inspect (double)*(raw_values.data_ + N)
>     }
>
> Now I just need to figure out the best way to do this over multiple columns 
> (row-wise).
>
> Thanks again!
>
>
> On Tue, 20 Feb 2024 at 19:51, Felipe Oliveira Carvalho <[email protected]> 
> wrote:
>>
>> In a Vectorized querying system, scalars and conditionals should be
>> avoided at all costs. That's why it's called "vectorized" — it's about
>> the vectors and not the scalars.
>>
>> Arrow Arrays (AKA "vectors" in other systems) are the unit of data you
>> mainly deal with. Data abstraction (in the OOP sense) isn't possible
>> while also keeping performance — classes like Scalar and DoubleScalar
>> are not supposed to be instantiated for every scalar in an array when
>> you're looping. The disadvantage is that your loop now depends on the
>> type of the array you're dealing with (no data abstraction based on
>> virtual dispatching).
>>
>> > Also, is there an efficient way to loop through a slice perhaps by 
>> > incrementing a pointer?
>>
>> That's the right path. Given a ChunkedArray, this what you can do:
>>
>> auto &dt = chunked_array->type();
>> assert(dt->id() == Type::DOUBLE);
>> for (auto &chunk : chunked_array->chunks()) {
>>    // each chunk is an arrow::Array
>>    ArrayData &data = chunk->data();
>>    util::span<const double> raw_values = data.GetSpan<double>(1); // 1
>> is the data buffer
>>    // ^ all the scalars of the chunk ara tightly packed here
>>    // 64 bits for every double even if it's logically NULL
>> }
>>
>> If data.IsNull(i), the value of raw_values[i] is undefined, depending
>> on what you're doing with the raw_values, you don't have to care.
>> Compute functions commonly have two different loops: one that handles
>> nulls and a faster one (without checks in the loop body) that you can
>> use when data.GetNullCount()==0.
>>
>> Another trick is to compute on all the values and carry the same
>> validity-bitmap to the result. Possible when the operation is based on
>> each value independently of the others.
>>
>> Hope this helps. The ultra generic loop on all possible array types is
>> not possible without many allocations and branches per array element.
>>
>> --
>> Felipe
>>
>>
>>
>> On Mon, Feb 19, 2024 at 9:23 AM Weston Pace <[email protected]> wrote:
>> >
>> > There is no advantage to using a Datum here.  The Datum class is mainly 
>> > intended for representing something that might be a Scalar or might be an 
>> > Array.
>> >
>> > > Also, is there an efficient way to loop through a slice perhaps by 
>> > > incrementing a pointer?
>> >
>> > You will want to cast the Array and avoid Scalar instances entirely.  For 
>> > example, if you know there are no nulls in your data then you can use 
>> > methods like `DoubleArray::raw_values` which will give you a `double*`.  
>> > Since it is a chunked array you would also have to deal with indexing and 
>> > iterating the chunks.
>> >
>> > There are also some iterator utility classes like 
>> > `arrow::stl::ChunkedArrayIterator` which can be easier to use.
>> >
>> > On Mon, Feb 19, 2024 at 3:54 AM Blair Azzopardi <[email protected]> wrote:
>> >>
>> >> On 2nd thoughts, the 2nd method could also be done in a single line.
>> >>
>> >> auto low3 = 
>> >> arrow::Datum(st_s_low.ValueOrDie()).scalar_as<arrow::DoubleScalar>().value;
>> >>
>> >> That said, I'm still keen to hear if there's an advantage to using Datum 
>> >> or without; and on my 2nd question regarding efficiently looping through 
>> >> a slice's values.
>> >>
>> >> On Mon, 19 Feb 2024 at 09:24, Blair Azzopardi <[email protected]> wrote:
>> >>>
>> >>> Hi
>> >>>
>> >>> I'm trying to figure out the optimal way for extracting scalar values 
>> >>> from a table; I've found two ways, using a dynamic cast or using Datum 
>> >>> and cast. Is one better than the other? The advantage of the dynamic 
>> >>> cast, seems at least, to be a one liner.
>> >>>
>> >>> auto c_val1 = table.GetColumnByName("Val1");
>> >>> auto st_c_val1 = s_low->GetScalar(0);
>> >>> if (st_c_val1.ok()) {
>> >>>
>> >>>     // method 1 - via dyn cast
>> >>>     auto val1 = 
>> >>> std::dynamic_pointer_cast<arrow::DoubleScalar>(st_c_val1.ValueOrDie())->value;
>> >>>
>> >>>     // method 2 - via Datum & cast
>> >>>     arrow::Datum val(st_c_val1.ValueOrDie());
>> >>>     auto val1 = val.scalar_as<arrow::DoubleScalar>().value;
>> >>> }
>> >>>
>> >>> Also, is there an efficient way to loop through a slice perhaps by 
>> >>> incrementing a pointer? I know a chunked array might mean that the 
>> >>> underlying data isn't stored contiguously so perhaps this is tricky to 
>> >>> do. I imagine the compute functions might do this. Otherwise, it feels 
>> >>> each access to a value in memory requires calls to several functions 
>> >>> (GetScalar/ok/ValueOrDie etc).
>> >>>
>> >>> Thanks in advance
>> >>> Blair

Reply via email to