Re: [C++][C Data][JavaScript]RecordBatch buffer to JavaScript in Bun

Weston Pace Thu, 02 Mar 2023 12:34:57 -0800

> I think that the main thing is that is It possible use RecordBatchReader
in Javascript without IPC stream.

It is possible but it won't be easy.

Looking at your code I assume arrow_schema is an instance of ArrowSchema in
the C data interface and arrow_array is an instance of ArrowArray in the C
data interface. These are defined in the C data interface[1] which is meant
to be a stable C ABI.  Everything in these structures is either a number, a
pointer (pointers to data, pointers to functions, and pointers to structs),
or a null terminated string.  So these structures should be able to marshal
across bun:ffi.  You would want to turn the numbers into JS numbers, the
null terminated strings into JS strings, the function pointers into JS
functions[2], and the data pointers into ArrayBuffer[3].

Once you've done this you should be able to assemble these various pieces
into an ArrowJS Schema[4] or an ArrowJS RecordBatch[5].

At the end of the day I'd expect most of the work to be JS work and not
much C++ work.  However, it would require a pretty good familiarity with
ArrowJS to know the proper way to assemble these different pieces (e.g. the
ArrayBuffer data buffers would need to be wrapped into things like
Int8Array or Int32Array based on the type of the array).

[1]
https://arrow.apache.org/docs/format/CDataInterface.html#structure-definitions
[2] https://bun.sh/docs/api/ffi#function-pointers
[3]
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer
[4] https://github.com/apache/arrow/blob/main/js/src/schema.ts
[5] https://github.com/apache/arrow/blob/main/js/src/recordbatch.ts

On Thu, Mar 2, 2023 at 11:19 AM Kimmo Linna <[email protected]> wrote:

> Hi Weston,
>
> I’m willing but hardly capable. I’m just a copy cat with C/C++. I think
> that the main thing is that is It possible use RecordBatchReader in
> Javascript without IPC stream. I haven’t find a way but that doesn’t tell
> much. Bun is capable to read directly from buffer if the buffer is null
> terminated or the size is know. I tried to use TotalBufferSize with
> RecordBatch from ImportRecordBatch but It didn’t work.
>
> K.
> --
> Kimmo Linna
> Nihtisalontie 3 as 1
> 02630  ESPOO
> [email protected]
> +358 40 590 1074
>
>
>
> On 2. Mar 2023, at 21.05, Weston Pace <[email protected]> wrote:
>
> I believe you would need a Javascript version of the C data interface.
> This should be doable with bun:ffi but I'm not aware of anyone that has
> done this before.  I also wonder if there is a way to create a C data
> interface based on TypedArray that would be usable in both bun and node.
> I'm also not really up to speed on what arrow-js has in terms of
> capabilities so it is possible it exists and I just didn't know.  Is it
> something you are interested in contributing?
>
> On Wed, Mar 1, 2023 at 10:41 PM Kimmo Linna <[email protected]> wrote:
>
>> Hi,
>>
>> I will get ArrowSchema and ArrowArray directly from DuckDB. I want to
>> transfer the RecordBatch to Bun with bun::ffi. At the moment my procedure
>> is the following:
>> auto schema = arrow::ImportSchema(arrow_schema);
>> auto batch = arrow::ImportRecordBatch(arrow_array, *schema);
>> auto output_stream = arrow::io::BufferOutputStream::Create();
>> auto batch_writer = arrow::ipc::MakeStreamWriter(*output_stream, *schema
>> );
>> auto status = (*batch_writer)->WriteRecordBatch(**batch);
>> auto buffer = (*output_stream)->Finish();
>> (*out).address = (void *)(*buffer)->address();
>> (*out).size = (*buffer)->size();
>>
>> And then I will read the buffer in Bun with toArrayBuffer and
>> RecordBatchReader like this:
>> return RecordBatchReader.from(
>> toArrayBuffer(
>> dab.dab_ipc_address(ipc), 0, Number(dab.dab_ipc_size(ipc))
>> )).readAll()[0];
>>
>> I just wonder Is there a way to read RecordBatch directly from
>> RecordBatch which is done by ImportRecordBatch or can I do this without
>> OutputStream at all?
>>
>> Best regards,
>>
>> Kimmo
>>
>> --
>> Kimmo Linna
>> Nihtisalontie 3 as 1
>> 02630  ESPOO
>> [email protected]
>> +358 40 590 1074
>>
>>
>>
>>
>

Re: [C++][C Data][JavaScript]RecordBatch buffer to JavaScript in Bun

Reply via email to