RE: READING STRING, CONTAINS \R\N, FROM ORC FILES VIA JDBC DRIVER PRODUCES DIRTY DATA

2017-11-03 Thread Залеский Александр Андреевич
...@gmail.com] Sent: Thursday, November 02, 2017 6:21 PM To: user@hive.apache.org Subject: Re: READING STRING, CONTAINS \R\N, FROM ORC FILES VIA JDBC DRIVER PRODUCES DIRTY DATA ORC stores the data in UTF-8 with the length of the value stored explicitly. Therefore, it doesn't do any parsing of newlines

Re: READING STRING, CONTAINS \R\N, FROM ORC FILES VIA JDBC DRIVER PRODUCES DIRTY DATA

2017-11-02 Thread Gopal Vijayaraghavan
> Why jdbc read them as control symbols? Most likely this is already fixed by https://issues.apache.org/jira/browse/HIVE-1608 That pretty much makes the default as set hive.query.result.fileformat=SequenceFile; Cheers, Gopal

Re: READING STRING, CONTAINS \R\N, FROM ORC FILES VIA JDBC DRIVER PRODUCES DIRTY DATA

2017-11-02 Thread Owen O'Malley
ORC stores the data in UTF-8 with the length of the value stored explicitly. Therefore, it doesn't do any parsing of newlines. You can see the contents of an ORC file by using: % hive --orcfiledump -d from https://orc.apache.org/docs/hive-ddl.html . How did you load the data into Hive? ...

READING STRING, CONTAINS \R\N, FROM ORC FILES VIA JDBC DRIVER PRODUCES DIRTY DATA

2017-11-02 Thread Залеский Александр Андреевич
My problem is to read data with "newline" character from ORC via jdbc. Standard behavior for reading string - split row for every newline symbol, and that seems like a bug. Why I couldn't store any symbols in my data? Why jdbc read them as control symbols? I have created issue to terradata