Hello,

There is not support yet to read ORC files directly on Beam, You can
track the progress of this issue here.
https://issues.apache.org/jira/browse/BEAM-1861

You better use HCatalogIO than JdbcIO (the split should be better).




On Mon, Dec 18, 2017 at 4:17 AM, Allan Wilson <[email protected]> wrote:
> Hi,
>
> Is there anyway to read ORC files from HDFS directly using Apache Beam?
>
> I’m looking at loading up Kafka with data stored in ORC files backing Hive
> tables.
>
> After doing some research it doesn’t look possible, but I thought I ask to
> make sure.
>
> It may be possible to use jdbc or hcatalog to query the data out, but I’d
> rather scale out by pulling the data straight from the datanodes.
>
> The runner I’m using is Spark 1.6.3 on the HDP 2.6.2 distro.
>
>
>
>

Reply via email to