Re: Reading from ORC Files in HDFS

Allan Wilson Tue, 19 Dec 2017 07:16:25 -0800

 Had a feeling that would be the answer, but being new to Beam I wanted to make 
sure I wasn’t missing something. :)



Thanks Ismael



On 12/18/17, 3:07 AM, "Ismaël Mejía" <[email protected]> wrote:

>Hello,
>
>There is not support yet to read ORC files directly on Beam, You can
>track the progress of this issue here.
>https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_BEAM-2D1861&d=DwIFaQ&c=gFTBenQ7Vj71sUi1A4CkFnmPzqwDo07QsHw-JRepxyw&r=ZpzaEtcaU94NK3jHb3YffLFtq_DRaHEGobEO2J_3zIw&m=M0Hv4VMVlhVQOflTfehE_mOiOJXTz5Y-Mc7Hk-ybtF8&s=BVnOfRDnazZ6nFSJN0tyuBb-qNOUTvab47qT5Nykuws&e=
> 
>
>You better use HCatalogIO than JdbcIO (the split should be better).
>
>
>
>
>On Mon, Dec 18, 2017 at 4:17 AM, Allan Wilson <[email protected]> wrote:
>> Hi,
>>
>> Is there anyway to read ORC files from HDFS directly using Apache Beam?
>>
>> I’m looking at loading up Kafka with data stored in ORC files backing Hive
>> tables.
>>
>> After doing some research it doesn’t look possible, but I thought I ask to
>> make sure.
>>
>> It may be possible to use jdbc or hcatalog to query the data out, but I’d
>> rather scale out by pulling the data straight from the datanodes.
>>
>> The runner I’m using is Spark 1.6.3 on the HDP 2.6.2 distro.
>>
>>
>>
>>

Re: Reading from ORC Files in HDFS

Reply via email to