There is also the ORC orc-mapreduce library that provides a row by row wrapper layer. It reads each row into an OrcStruct, which is just an object with a list of fields. See the interface here - https://orc.apache.org/api/orc-mapreduce/org/apache/orc/mapred/OrcStruct.html .
Basically, just wrap the ORC core reader or writer in a mapred reader or writer. See https://orc.apache.org/api/orc-mapreduce/org/apache/orc/mapred/OrcMapredRecordReader.html .. Owen .. On Tue, Jul 13, 2021 at 1:38 PM Karthik Abram <kart...@eclecticlogic.com> wrote: > Also have a look at github.com/eclecticlogic/eclectic-orc > It allows creating orc files from hibernate and other annotated java > classes. > I haven't kept up with orc version updates but should be an easy bump up. > > On Jul 13, 2021, at 3:38 PM, Ian Kaplan <i...@bearcave.com> wrote: > > > > I worked on a Java project where I needed to convert a large number of > JSON files into ORC files. These ORC files would be used in an AWS > Athena-based data lake. I found that one of the challenges in completing > this project was understanding the Java ORC API. > > I had previously used PyORC (a Python library for writing ORC files) to > build a prototype of the application. A Python tuple for an ORC row can be > passed to the PyORC writer, which allows ORC files to be created without > knowledge of an underlying API, as is the case with Java. > > I developed the javaorc library to provide the type of functionality > that PyORC provides for Java developers. In the case of the javaorc > library, a List<Object> for each ORC row is passed to the ORC file writer. > > To support testing of the javaorc code, I also developed an ORC file > reader. > > The javaorc library and documentation can be found at > https://github.com/IanLKaplan/javaorc The library is published under the > Apache 2 license. > > I have also written a Medium article on the javaorc library: > > https://nderground-net.medium.com/javaorc-making-orc-files-simple-2e04c43bc978 > > I hope that this library will be useful to the Java/ORC community. > > Best, > > Ian Kaplan > www.topstonesoftware.com > >