Hi, Is there anyway to read ORC files from HDFS directly using Apache Beam?
I’m looking at loading up Kafka with data stored in ORC files backing Hive tables. After doing some research it doesn’t look possible, but I thought I ask to make sure. It may be possible to use jdbc or hcatalog to query the data out, but I’d rather scale out by pulling the data straight from the datanodes. The runner I’m using is Spark 1.6.3 on the HDP 2.6.2 distro.