I have tested this a bit more. The problem seems to be the size of BufferOutputStream. Everything is fine as long as the data size is less that 32768 which equals initial capacity of buffer. (4096*8) I have tried to set a bigger value, but I’m unable to do it.
Best regards, Kimmo -- Kimmo Linna Nihtisalontie 3 as 1 02630 ESPOO [email protected] +358 40 590 1074 > On 9. Mar 2023, at 17.05, Weston Pace <[email protected]> wrote: > > > I think you're not releasing memory correctly; namely, you're missing calls > > to `release` > > If you call `arrow::ImportRecordBatch` it will take ownership of calling > release on the ArrowArray and so you should not call release yourself. There > is a variant of this method that receives ArrowSchema instead of > std::shared_ptr<Schema> and that variant will also take care of calling > release on the schema. So the only time that release() needs to be called is > in the else branch. > > The ArrowSchema is getting unconditionally passed to `arrow::ImportSchema` > which will take care of calling release(). So you should not call release on > the ArrowSchema. > > > I have to have it like this: > > Otherwise there will be a seg fault in c++ code. It’s maybe a requirement > > for DuckDB. > > From DuckDb's perspective it shouldn't matter (and it should be impossible > for DuckDb to know) how you are allocating the ArrowSchema / ArrowArray. So > this is very odd. > > Is this code something runnable that you can share? > Can you run the code in gdb and show the stack trace of the segmentation > fault? > > On Wed, Mar 8, 2023 at 10:57 PM Kimmo Linna <[email protected] > <mailto:[email protected]>> wrote: >> Hi, >> >> I have to have it like this: >> ArrowSchema *arrow_schema = new ArrowSchema; >> ArrowArray *arrow_array = new ArrowArray; >> >> Otherwise there will be a seg fault in c++ code. It’s maybe a requirement >> for DuckDB. >> >> Best regards, >> >> Kimmo >> -- >> Kimmo Linna >> Nihtisalontie 3 as 1 >> 02630 ESPOO >> [email protected] <mailto:[email protected]> >> +358 40 590 1074 >> >> >> >>> On 8. Mar 2023, at 18.10, Weston Pace <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> I agree with Aldrin. `arrow_schema` has type ArrowSchema*. You are passing >>> in `&arrow_schema` which has type `ArrowSchema**` but you are casting it to >>> `duckdb_arrow_schema*` so it at least compiles. You can also simplify >>> things by getting rid of some of this allocation. >>> >>> ``` >>> ArrowSchema arrow_schema; >>> duckdb_query_arrow_schema(arrow_result, (duckdb_arrow_schema >>> *)&arrow_schema); >>> auto schema = arrow::ImportSchema(&arrow_schema); >>> auto output_stream = arrow::io::BufferOutputStream::Create(); >>> auto batch_writer = arrow::ipc::MakeStreamWriter(*output_stream, >>> *schema); >>> while (true) >>> { >>> ArrowArray arrow_array; >>> duckdb_query_arrow_array(arrow_result, (duckdb_arrow_array >>> *)&arrow_array); >>> if (arrow_array.length > 0) >>> { >>> (*batch_writer)->WriteRecordBatch((const arrow::RecordBatch >>> &)**arrow::ImportRecordBatch(&arrow_array, *schema)); >>> } >>> else >>> { >>> arrow_array.release(&arrow_array); >>> break; >>> } >>> } >>> auto buffer = (*output_stream)->Finish(); >>> ``` >>> >>> >>> On Tue, Mar 7, 2023 at 11:26 AM Aldrin <[email protected] >>> <mailto:[email protected]>> wrote: >>>> I might be misreading, but do you need to get the address of your object >>>> pointers, `*arrow_schema` and `*arrow_array`? If they're already pointers, >>>> I'm not sure you need to get their address and cast that to a pointer. >>>> otherwise, I don't see anything else that stands out as problematic to me. >>>> >>>> You could try to check the Result objects so that you get better error >>>> information before the segfault occurs; as it stands now you're blindly >>>> dereferencing. The other thing that might be helpful is that I think you >>>> don't have to call `new` every time. In all of these instances you are >>>> allocating memory for the objects, but you could just create them directly >>>> and then pass references to them. For example: >>>> >>>> ArrowSchema arrow_schema; >>>> duckdb_query_arrow_schema(arrow_result, &arrow_schema); >>>> ... >>>> >>>> Since ArrowSchema seems to be default constructible, I believe the >>>> declaration alone will do the default construction (`ArrowSchema >>>> arrow_schema`). But, my cpp knowledge comes and goes, so apologies if >>>> those suggestions aren't accurate. >>>> >>>> Aldrin Montana >>>> Computer Science PhD Student >>>> UC Santa Cruz >>>> >>>> >>>> On Sun, Mar 5, 2023 at 8:56 AM Kimmo Linna <[email protected] >>>> <mailto:[email protected]>> wrote: >>>>> Hi, >>>>> >>>>> I noticed that my previous code worked only for one ArrowArray from >>>>> DuckDB. I change a code this but this is still working only for one >>>>> ArrowArray. The second “loop” causes a segmentation fault. Could someone >>>>> guide me a bit? What should I change to get this work? >>>>> >>>>> >>>>> ArrowSchema *arrow_schema = new ArrowSchema(); >>>>> duckdb_query_arrow_schema(arrow_result, (duckdb_arrow_schema >>>>> *)&arrow_schema); >>>>> auto schema = arrow::ImportSchema(arrow_schema); >>>>> auto output_stream = arrow::io::BufferOutputStream::Create(); >>>>> auto batch_writer = arrow::ipc::MakeStreamWriter(*output_stream, >>>>> *schema); >>>>> while (true) { >>>>> ArrowArray *arrow_array = new ArrowArray(); >>>>> duckdb_query_arrow_array(arrow_result, (duckdb_arrow_array >>>>> *)&arrow_array); >>>>> if( arrow_array->length > 0) { >>>>> (*batch_writer)->WriteRecordBatch((const arrow::RecordBatch&) >>>>> **arrow::ImportRecordBatch(arrow_array,*schema)); >>>>> delete arrow_array; >>>>> }else{ >>>>> break; >>>>> } >>>>> } >>>>> auto buffer = (*output_stream)->Finish(); >>>>> Best regards, >>>>> >>>>> Kimmo >>>>> >>>>> -- >>>>> Kimmo Linna >>>>> Nihtisalontie 3 as 1 >>>>> 02630 ESPOO >>>>> [email protected] <mailto:[email protected]> >>>>> +358 40 590 1074 >>>>> >>>>> >>>>> >>
