If you don't need the performance, you could stay in python (use to_pylist() for the array or as_py() for scalars).
If you do need the performance then you're probably better served getting the buffers and operating on them directly. Or, even better, making use of the compute kernels: arr = pa.array(['abc', 'ab', 'Xander', None], pa.string()) desired = pa.array(['Xander'], pa.string()) pc.any(pc.is_in(arr, value_set=desired)).as_py() # True On Wed, Apr 14, 2021 at 6:29 AM Xander Dunn <[email protected]> wrote: > This works for getting a c string out of the CScalar: > ``` > name_buffer = > (<CBaseBinaryScalar*>GetResultValue(names.get().\ > GetScalar(batch_row_index)).get()).value > name = <char *>name_buffer.get().data() > ``` > > > On Tue, Apr 13, 2021 at 10:43 PM, Xander Dunn <[email protected]> wrote: > >> Here is an example code snippet from a .pyx file that successfully >> iterates through a CRecordBatch and ensures that the timestamps are >> ascending: >> ``` >> while batch_row_index < batch.get().num_rows(): >> timestamp = >> GetResultValue(times.get().GetScalar(batch_row_index)) >> new_timestamp = <CTimestampScalar*>timestamp.get() >> current_timestamp = timestamps[name] >> if current_timestamp > new_timestamp.value: >> abort() >> batch_row_index += 1 >> ``` >> >> However, I'm having difficulty operating on the values in a column of >> string type. Unlike CTimestampScalar, there is no CStringScalar. Although >> there is a StringScalar type in C++, it isn't defined in the Cython >> interface. There is a `CStringType` and a `c_string` type. >> ``` >> while batch_row_index < batch.get().num_rows(): >> name = GetResultValue(names.get().GetScalar(batch_row_index)) >> name_string = <CStringType*>name.get() # This is wrong >> printf("%s\n", name_string) # This prints garbage >> if name_string == b"Xander": # Doesn't work >> print("found it") >> batch_row_index += 1 >> ``` >> How do I get the string value as a C type and compare it to other >> strings? >> >> Thanks, >> Xander >> > >
