Had a feeling that would be the answer, but being new to Beam I wanted to make sure I wasn’t missing something. :)
Thanks Ismael On 12/18/17, 3:07 AM, "Ismaël Mejía" <ieme...@gmail.com> wrote: >Hello, > >There is not support yet to read ORC files directly on Beam, You can >track the progress of this issue here. >https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_BEAM-2D1861&d=DwIFaQ&c=gFTBenQ7Vj71sUi1A4CkFnmPzqwDo07QsHw-JRepxyw&r=ZpzaEtcaU94NK3jHb3YffLFtq_DRaHEGobEO2J_3zIw&m=M0Hv4VMVlhVQOflTfehE_mOiOJXTz5Y-Mc7Hk-ybtF8&s=BVnOfRDnazZ6nFSJN0tyuBb-qNOUTvab47qT5Nykuws&e= > > >You better use HCatalogIO than JdbcIO (the split should be better). > > > > >On Mon, Dec 18, 2017 at 4:17 AM, Allan Wilson <awils...@pandora.com> wrote: >> Hi, >> >> Is there anyway to read ORC files from HDFS directly using Apache Beam? >> >> I’m looking at loading up Kafka with data stored in ORC files backing Hive >> tables. >> >> After doing some research it doesn’t look possible, but I thought I ask to >> make sure. >> >> It may be possible to use jdbc or hcatalog to query the data out, but I’d >> rather scale out by pulling the data straight from the datanodes. >> >> The runner I’m using is Spark 1.6.3 on the HDP 2.6.2 distro. >> >> >> >>