Hi,

Is there anyway to read ORC files from HDFS directly using Apache Beam?

I’m looking at loading up Kafka with data stored in ORC files backing Hive 
tables.

After doing some research it doesn’t look possible, but I thought I ask to make 
sure.

It may be possible to use jdbc or hcatalog to query the data out, but I’d 
rather scale out by pulling the data straight from the datanodes.

The runner I’m using is Spark 1.6.3 on the HDP 2.6.2 distro.




Reply via email to