There is also the ORC orc-mapreduce library that provides a row by row
wrapper layer. It reads each row into an OrcStruct, which is just an object
with a list of fields. See the interface here -
https://orc.apache.org/api/orc-mapreduce/org/apache/orc/mapred/OrcStruct.html
.

Basically, just wrap the ORC core reader or writer in a mapred reader or
writer.
See
https://orc.apache.org/api/orc-mapreduce/org/apache/orc/mapred/OrcMapredRecordReader.html

.. Owen

..

On Tue, Jul 13, 2021 at 1:38 PM Karthik Abram <kart...@eclecticlogic.com>
wrote:

> Also have a look at github.com/eclecticlogic/eclectic-orc
> It allows creating orc files from hibernate and other annotated java
> classes.
> I haven't kept up with orc version updates but should be an easy bump up.
>
> On Jul 13, 2021, at 3:38 PM, Ian Kaplan <i...@bearcave.com> wrote:
>
> 
>
>   I worked on a Java project where I needed to convert a large number of
> JSON files into ORC files.  These ORC files would be used in an AWS
> Athena-based data lake.  I found that one of the challenges in completing
> this project was understanding the Java ORC API.
>
>   I had previously used PyORC (a Python library for writing ORC files) to
> build a prototype of the application.  A Python tuple for an ORC row can be
> passed to the PyORC writer, which allows ORC files to be created without
> knowledge of an underlying API, as is the case with Java.
>
>   I developed the javaorc library to provide the type of functionality
> that PyORC provides for Java developers. In the case of the javaorc
> library, a List<Object> for each ORC row is passed to the ORC file writer.
>
>  To support testing of the javaorc code, I also developed an ORC file
> reader.
>
>   The javaorc library and documentation can be found at
> https://github.com/IanLKaplan/javaorc  The library is published under the
> Apache 2 license.
>
>   I have also written a Medium article on the javaorc library:
>
> https://nderground-net.medium.com/javaorc-making-orc-files-simple-2e04c43bc978
>
>    I hope that this library will be useful to the Java/ORC community.
>
>   Best,
>
>   Ian Kaplan
>   www.topstonesoftware.com
>
>

Reply via email to