I worked on a Java project where I needed to convert a large number of
JSON files into ORC files.  These ORC files would be used in an AWS
Athena-based data lake.  I found that one of the challenges in completing
this project was understanding the Java ORC API.

  I had previously used PyORC (a Python library for writing ORC files) to
build a prototype of the application.  A Python tuple for an ORC row can be
passed to the PyORC writer, which allows ORC files to be created without
knowledge of an underlying API, as is the case with Java.

  I developed the javaorc library to provide the type of functionality that
PyORC provides for Java developers. In the case of the javaorc library, a
List<Object> for each ORC row is passed to the ORC file writer.

 To support testing of the javaorc code, I also developed an ORC file
reader.

  The javaorc library and documentation can be found at
https://github.com/IanLKaplan/javaorc  The library is published under the
Apache 2 license.

  I have also written a Medium article on the javaorc library:
https://nderground-net.medium.com/javaorc-making-orc-files-simple-2e04c43bc978

   I hope that this library will be useful to the Java/ORC community.

  Best,

  Ian Kaplan
  www.topstonesoftware.com

Reply via email to