Re: Developing a "dataset" API / framework for Arrow C++ users

2019-02-27 Thread Wes McKinney
hi Ryan, On Wed, Feb 27, 2019 at 1:31 PM Ryan Blue wrote: > > Thanks for pointing out that document, Uwe. I really like the intent and it > would be really useful to have common components for large datasets. One of > the questions we are hitting with an Iceberg python implementation is the > file

Re: Developing a "dataset" API / framework for Arrow C++ users

2019-02-27 Thread Ryan Blue
Thanks for pointing out that document, Uwe. I really like the intent and it would be really useful to have common components for large datasets. One of the questions we are hitting with an Iceberg python implementation is the file system abstraction, so I think this is very relevant for all of us.

Re: Developing a "dataset" API / framework for Arrow C++ users

2019-02-25 Thread Wes McKinney
hi Joel and Uwe, yes, feedback from the Iceberg community would be useful about what kinds of APIs are required to be able to interact well with table formats like Iceberg. As Uwe says, the objective of the C++ code I am proposing to develop is to have appropriate C++ APIs for interacting with dif

Re: Developing a "dataset" API / framework for Arrow C++ users

2019-02-25 Thread Uwe L. Korn
Hello, this should definitely be shared with the Apache Iceberg community (cc'ed). The title of the document may be a bit confusing. What is proposed in there is actually constructing the building blocks in C++ that are required for supporting Python/C++/.. implementations for things like Icebe

Re: Developing a "dataset" API / framework for Arrow C++ users

2019-02-25 Thread Joel Pfaff
Hello, Thanks for the write-up. Have you considered sharing this document with the Apache Iceberg community? My feeling is that there are some shared goals here between the two projects. And while their implementation is in Java, their spec is language agnostic. Regards, Joel On Sun, Feb 24,

Developing a "dataset" API / framework for Arrow C++ users

2019-02-24 Thread Wes McKinney
hi folks, We've spent a good amount of energy up until now implementing interfaces for reading different kinds of file formats in C++, like Parquet, ORC, CSV, and JSON. There's some higher level layers missing, through, which are necessary if we want to make use of these file formats in the contex