Yes, we storing data in ORC files correctly, the problem appears when we 
reading it via jdbc. We generate ORC files through org.apache.orc library and 
load into hive via load data inpath command. But, then we read them, jdbc does 
that awful split

From: Owen O'Malley [mailto:owen.omal...@gmail.com]
Sent: Thursday, November 02, 2017 6:21 PM
To: user@hive.apache.org
Subject: Re: READING STRING, CONTAINS \R\N, FROM ORC FILES VIA JDBC DRIVER 
PRODUCES DIRTY DATA

ORC stores the data in UTF-8 with the length of the value stored explicitly. 
Therefore, it doesn't do any parsing of newlines.

You can see the contents of an ORC file by using:

% hive --orcfiledump -d <path_to_file>

from https://orc.apache.org/docs/hive-ddl.html . How did you load the data into 
Hive?

... Owen

On Thu, Nov 2, 2017 at 5:29 AM, Залеский Александр Андреевич 
<aazal...@mts.ru<mailto:aazal...@mts.ru>> wrote:
My problem is to read data with “newline” character from ORC via jdbc. Standard 
behavior for reading string – split row for every newline symbol, and that 
seems like a bug. Why I couldn’t store any symbols in my data? Why jdbc read 
them as control symbols? I have created issue to terradata 
(https://tays.teradata.com/home/?language=en_US&aidIncidentId=RECHDBRVV) and 
they give me advice to write own SerDe. Perhaps, that is not unique task, and 
you already wrote such SerDe, can I ask for it?

Reply via email to