I worked on a Java project where I needed to convert a large number of JSON files into ORC files. These ORC files would be used in an AWS Athena-based data lake. I found that one of the challenges in completing this project was understanding the Java ORC API.
I had previously used PyORC (a Python library for writing ORC files) to build a prototype of the application. A Python tuple for an ORC row can be passed to the PyORC writer, which allows ORC files to be created without knowledge of an underlying API, as is the case with Java. I developed the javaorc library to provide the type of functionality that PyORC provides for Java developers. In the case of the javaorc library, a List<Object> for each ORC row is passed to the ORC file writer. To support testing of the javaorc code, I also developed an ORC file reader. The javaorc library and documentation can be found at https://github.com/IanLKaplan/javaorc The library is published under the Apache 2 license. I have also written a Medium article on the javaorc library: https://nderground-net.medium.com/javaorc-making-orc-files-simple-2e04c43bc978 I hope that this library will be useful to the Java/ORC community. Best, Ian Kaplan www.topstonesoftware.com