Hi, I’m trying to read S3 objects in ORC format and parse the content from Lambda through:
3ObjectInputStream s3ObjectInputStream = amazonS3.getObject(request).getObjectContent(); , where s3ObjectInputStream extends java.io.InputStream. I found that ORC format was designed for Hadoop ecosystem and even though Spark and Presto have support to read data in that format, there is no support to read those files outside of a distributed processing framework. I've checked org.apache.orc.impl.ReaderImpl.java and it is tied to “org.apache.hadoop.fs.Path”. Even if there were a S3-based path class extending org.apache.hadoop.fs.Path, it would require to instantiate org.apache.orc.impl.ReaderImpl and add Hadoop dependencies to Lambda's zip, which I'm not very inclined to. Is there any light library that would allow me to read either ORC files or java.io.InputStream in ORC format? Thanks. Regards. Juan Carlos Blanco Martínez
