Reading S3 objects in ORC format from Lambda

Juan Carlos Blanco Martínez Wed, 16 Jan 2019 03:47:14 -0800

Hi,
I’m trying to read S3 objects in ORC format and parse the content from
Lambda through:


3ObjectInputStream s3ObjectInputStream =
amazonS3.getObject(request).getObjectContent();

, where s3ObjectInputStream extends java.io.InputStream.

I found that ORC format was designed for Hadoop ecosystem and even though
Spark and Presto have support to read data in that format, there is no
support to read those files outside of a distributed processing framework.
I've checked org.apache.orc.impl.ReaderImpl.java and it is tied to
“org.apache.hadoop.fs.Path”.
Even if there were a S3-based path class extending
org.apache.hadoop.fs.Path, it would require to instantiate
org.apache.orc.impl.ReaderImpl and add Hadoop dependencies to Lambda's zip,
which I'm not very inclined to.
Is there any light library that would allow me to read either ORC files or
java.io.InputStream in ORC format?

Thanks.
Regards.

Juan Carlos Blanco Martínez

Reading S3 objects in ORC format from Lambda

Reply via email to