Also considering derive crates for Arrow, but it seems to be very early days 
for it.
If I can go from Rust structures to Arrow through derive macros, that would be 
the least amount of work one has to do as a *user*.
Code for such derive macros is certainly a lot of work...
There's arrow2_convert, serde_arrow, and narrow. narrow seems to be more 
promising.

Although I conceptually like the example you've shown (python cffi + header 
file to generate schema, then running the C program), 
I wonder if I'm better off with python/rust (than C/C++), despite needing to 
type out the structures manually for python/rust.


On Wednesday, March 6th, 2024 at 19:07, Dewey Dunnington via user 
<[email protected]> wrote:

> Hi KB,
> 
> I imagine you will need a mix of generated and manually typed code to
> generate the ArrowSchema from the definition and recipe to build the
> ArrowArray from an instance, perhaps starting with well-tested
> manually typed code that you replace with generated code as patterns
> appear.
> 
> I think nanoarrow is appropriate for what you are trying to do...it
> provides a "straightforward" (in terms of packaging complexity) path
> to wrapping your generator functions in Rust and Python. We haven't
> done a great job of documenting how to do that with examples but feel
> free to ask here or open an issue in apache/arrow-nanoarrow asking for
> help until we do.
> 
> Cheers!
> 
> -dewey
> 
> On Tue, Mar 5, 2024 at 11:14 PM kekronbekron
> [email protected] wrote:
> 
> > Hi Dewey,
> > 
> > Thank you for taking the time.
> > My goal is to convert from a variety of big C data structures like this to 
> > equivalent Arrow spec/schema.
> > Then, I would like to store them (RecordBatches) to parquet or any other 
> > relevant type.
> > The CSV or JSON output from the example C program (smf84fmt) doesn't 
> > matter; just wanted to point to the sample data format as in the header 
> > file.
> > 
> > I had tried bindgen to create Rust definitions from the header files, but 
> > it gets complicated real fast... more than I can comprehend at least.
> > 
> > The types get crazier too, with singly linked lists (not there in the 
> > linked example, but in other types), etc.
> > 
> > Would really like to solve this in a systemtic way, without needing to hand 
> > code the Arrow schema...
> > Because the C header files are maintained (by a provider), it would work 
> > out best if it's possible to create a conversion script, and then use the 
> > Arrow schema in Python/Rust/etc.
> > 
> > -KB
> > 
> > On Wednesday, March 6th, 2024 at 07:59, Dewey Dunnington via user 
> > [email protected] wrote:
> > 
> > > Hi KB,
> > > 
> > > There might be some other approaches I'm not aware of; however, I had
> > > some fun with Python's cffi package to generate some (untested)
> > > nanoarrow code based on the struct definitions [1]. If all you need
> > > are the types in Python or some other higher-level language (e.g., to
> > > read one of the CSV or JSON files generated by the tool you linked),
> > > you could generate Python code instead.
> > > 
> > > I hope that's helpful!
> > > 
> > > -dewey
> > > 
> > > [1] https://gist.github.com/paleolimbot/e1667a57f837e4db7e973b9677e33ddb
> > > 
> > > On Sun, Mar 3, 2024 at 10:08 PM kekronbekron
> > > [email protected] wrote:
> > > 
> > > > Hello,
> > > > 
> > > > Say I have a whole bunch of fully typed (with unions and all) data 
> > > > structures like the one here - 
> > > > https://github.com/IBM/IBM-Z-zOS/blob/main/SMF-Tools/SMF84Formatter/smf84fmt.h.
> > > > Say I'm parsing bytes with such a header...is it possible to then use 
> > > > Arrow's C data interface (or maybe nanoarrow) to painlessly convert 
> > > > such a struct to Arrow type(s)?
> > > > 
> > > > - KB

Reply via email to