Re: [C++][C Data][Table]ChunkedArray and Table

Weston Pace Wed, 08 Mar 2023 08:12:22 -0800

I agree with Aldrin.  `arrow_schema` has type ArrowSchema*. You are passing
in `&arrow_schema` which has type `ArrowSchema**` but you are casting it to
`duckdb_arrow_schema*` so it at least compiles.  You can also simplify
things by getting rid of some of this allocation.


```
    ArrowSchema arrow_schema;
    duckdb_query_arrow_schema(arrow_result, (duckdb_arrow_schema
*)&arrow_schema);
    auto schema = arrow::ImportSchema(&arrow_schema);
    auto output_stream = arrow::io::BufferOutputStream::Create();
    auto batch_writer = arrow::ipc::MakeStreamWriter(*output_stream,
*schema);
    while (true)
    {
      ArrowArray arrow_array;
      duckdb_query_arrow_array(arrow_result, (duckdb_arrow_array
*)&arrow_array);
      if (arrow_array.length > 0)
      {
        (*batch_writer)->WriteRecordBatch((const arrow::RecordBatch
&)**arrow::ImportRecordBatch(&arrow_array, *schema));
      }
      else
      {
        arrow_array.release(&arrow_array);
        break;
      }
    }
    auto buffer = (*output_stream)->Finish();
```


On Tue, Mar 7, 2023 at 11:26 AM Aldrin <[email protected]> wrote:

> I might be misreading, but do you need to get the address of your object
> pointers, `*arrow_schema` and `*arrow_array`? If they're already pointers,
> I'm not sure you need to get their address and cast that to a pointer.
> otherwise, I don't see anything else that stands out as problematic to me.
>
> You could try to check the Result objects so that you get better error
> information before the segfault occurs; as it stands now you're blindly
> dereferencing. The other thing that might be helpful is that I think you
> don't have to call `new` every time. In all of these instances you are
> allocating memory for the objects, but you could just create them directly
> and then pass references to them. For example:
>
>    ArrowSchema arrow_schema;
>    duckdb_query_arrow_schema(arrow_result, &arrow_schema);
>    ...
>
> Since ArrowSchema seems to be default constructible, I believe the
> declaration alone will do the default construction (`ArrowSchema
> arrow_schema`). But, my cpp knowledge comes and goes, so apologies if those
> suggestions aren't accurate.
>
> Aldrin Montana
> Computer Science PhD Student
> UC Santa Cruz
>
>
> On Sun, Mar 5, 2023 at 8:56 AM Kimmo Linna <[email protected]> wrote:
>
>> Hi,
>>
>> I noticed that my previous code worked only for one ArrowArray from
>> DuckDB. I change a code this but this is still working only for one
>> ArrowArray. The second “loop” causes a segmentation fault. Could someone
>> guide me a bit? What should I change to get this work?
>>
>>
>> ArrowSchema *arrow_schema = new ArrowSchema();
>> duckdb_query_arrow_schema(arrow_result, (duckdb_arrow_schema *)&
>> arrow_schema);
>> auto schema = arrow::ImportSchema(arrow_schema);
>> auto output_stream = arrow::io::BufferOutputStream::Create();
>> auto batch_writer = arrow::ipc::MakeStreamWriter(*output_stream, *schema
>> );
>> while (true) {
>> ArrowArray *arrow_array = new ArrowArray();
>> duckdb_query_arrow_array(arrow_result, (duckdb_arrow_array *)&arrow_array
>> );
>> if( arrow_array->length > 0) {
>> (*batch_writer)->WriteRecordBatch((const arrow::RecordBatch&) **arrow::
>> ImportRecordBatch(arrow_array,*schema));
>> delete arrow_array;
>> }else{
>> break;
>> }
>> }
>> auto buffer = (*output_stream)->Finish();
>> Best regards,
>>
>> Kimmo
>>
>> --
>> Kimmo Linna
>> Nihtisalontie 3 as 1
>> 02630  ESPOO
>> [email protected]
>> +358 40 590 1074
>>
>>
>>
>>

Re: [C++][C Data][Table]ChunkedArray and Table

Reply via email to