...@gmail.com]
Sent: Thursday, November 02, 2017 6:21 PM
To: user@hive.apache.org
Subject: Re: READING STRING, CONTAINS \R\N, FROM ORC FILES VIA JDBC DRIVER
PRODUCES DIRTY DATA
ORC stores the data in UTF-8 with the length of the value stored explicitly.
Therefore, it doesn't do any parsing of newlines
> Why jdbc read them as control symbols?
Most likely this is already fixed by
https://issues.apache.org/jira/browse/HIVE-1608
That pretty much makes the default as
set hive.query.result.fileformat=SequenceFile;
Cheers,
Gopal
ORC stores the data in UTF-8 with the length of the value stored
explicitly. Therefore, it doesn't do any parsing of newlines.
You can see the contents of an ORC file by using:
% hive --orcfiledump -d
from https://orc.apache.org/docs/hive-ddl.html . How did you load the data
into Hive?
...
My problem is to read data with "newline" character from ORC via jdbc. Standard
behavior for reading string - split row for every newline symbol, and that
seems like a bug. Why I couldn't store any symbols in my data? Why jdbc read
them as control symbols? I have created issue to terradata