ARROW-1644 and its children. On Tue, Aug 25, 2020 at 10:30 AM Anoop Johnson <[email protected]> wrote:
> Thanks Micah. Is there a Jira or pull request I could follow for the C++ > implementation for arbitrary nesting? How about maps? > > On Tue, Aug 25, 2020 at 9:10 AM Micah Kornfield <[email protected]> > wrote: > >> Also does the C++ Parquet to Arrow reader have any such limitations? >> >> >> The C++ implementation can currently either read nested structs or nested >> lists but not a combination of the two. It is actively being worked on to >> be able to handle arbitrary nesting. >> >> On Tue, Aug 25, 2020 at 1:15 AM Anoop Johnson <[email protected]> >> wrote: >> >>> If I read the Iceberg vectorized reader code right, it does not support >>> nested types (same limitation as Spark's built-in vectorized parquet >>> reader). Is that correct? Also does the C++ Parquet to Arrow reader have >>> any such limitations? >>> >>> On Wed, Aug 19, 2020 at 9:37 AM Jacques Nadeau <[email protected]> >>> wrote: >>> >>>> I believe there is code in the iceberg project to do this in pure Java >>>> [1]. Right now, there isn't a pure java implementation in the Arrow >>>> project. >>>> >>>> [1] >>>> https://github.com/apache/iceberg/tree/master/arrow/src/main/java/org/apache/iceberg/arrow/vectorized >>>> >>>> On Wed, Aug 19, 2020 at 5:18 AM Chris Nuernberger <[email protected]> >>>> wrote: >>>> >>>>> Also, javacpp has prepackaged C++ bindings to arrow for multiple OS's: >>>>> >>>>> http://bytedeco.org/javacpp-presets/arrow/apidocs/ >>>>> >>>>> We have had success with javacpp >>>>> <https://github.com/techascent/tech.opencv> in the past and it is >>>>> much better now that their preprocess is based on Clang. >>>>> >>>>> On Tue, Aug 18, 2020 at 4:16 PM Chris Nuernberger < >>>>> [email protected]> wrote: >>>>> >>>>>> Thanks, that is helpful. >>>>>> >>>>>> Chris >>>>>> >>>>>> On Tue, Aug 18, 2020 at 10:24 AM Micah Kornfield < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi Chris, >>>>>>> There is an open PR to support this through C++'s Dataset >>>>>>> functionality [1]. There was also a prior attempt that went stale and I >>>>>>> can't find at the moment. >>>>>>> >>>>>>> IIUC the main missing component at this point before the PR gets >>>>>>> merged is integration to honor "-XX:MaxDirectMemorySize" settings. >>>>>>> >>>>>>> -Micah >>>>>>> >>>>>>> [1] https://github.com/apache/arrow/pull/7030 >>>>>>> >>>>>>> >>>>>>> >>>>>>> [1] https://github.com/apache/arrow/pull/7030 >>>>>>> >>>>>>> On Tue, Aug 18, 2020 at 6:48 AM Chris Nuernberger < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hey, >>>>>>>> >>>>>>>> We were wondering what the best way to convert a parquet file to an >>>>>>>> arrow file would be via a java pathway. I notice that the c++ layer >>>>>>>> appears to have this conversion. >>>>>>>> >>>>>>>> The best hint I have see so far is this gist: >>>>>>>> >>>>>>>> https://gist.github.com/animeshtrivedi/76de64f9dab1453958e1d4f8eca1605f >>>>>>>> >>>>>>>> I also found this jni pathway for ORC files: >>>>>>>> https://github.com/apache/arrow/tree/master/cpp/src/jni >>>>>>>> >>>>>>>> Another thought I had was to use the JNA or JNR and bind to the C >>>>>>>> glib pathway. >>>>>>>> >>>>>>>> Thanks for any help, >>>>>>>> >>>>>>>> Chris >>>>>>>> >>>>>>>
