Re: READING STRING, CONTAINS \R\N, FROM ORC FILES VIA JDBC DRIVER PRODUCES DIRTY DATA
> Why jdbc read them as control symbols? Most likely this is already fixed by https://issues.apache.org/jira/browse/HIVE-1608 That pretty much makes the default as set hive.query.result.fileformat=SequenceFile; Cheers, Gopal
Re: READING STRING, CONTAINS \R\N, FROM ORC FILES VIA JDBC DRIVER PRODUCES DIRTY DATA
ORC stores the data in UTF-8 with the length of the value stored explicitly. Therefore, it doesn't do any parsing of newlines. You can see the contents of an ORC file by using: % hive --orcfiledump -d from https://orc.apache.org/docs/hive-ddl.html . How did you load the data into Hive? ... Owen On Thu, Nov 2, 2017 at 5:29 AM, Залеский Александр Андреевич < aazal...@mts.ru> wrote: > My problem is to read data with “newline” character from ORC via jdbc. > Standard behavior for reading string – split row for every newline symbol, > and that seems like a bug. Why I couldn’t store any symbols in my data? Why > jdbc read them as control symbols? I have created issue to terradata ( > https://tays.teradata.com/home/?language=en_US=RECHDBRVV) > and they give me advice to write own SerDe. Perhaps, that is not unique > task, and you already wrote such SerDe, can I ask for it? >
READING STRING, CONTAINS \R\N, FROM ORC FILES VIA JDBC DRIVER PRODUCES DIRTY DATA
My problem is to read data with "newline" character from ORC via jdbc. Standard behavior for reading string - split row for every newline symbol, and that seems like a bug. Why I couldn't store any symbols in my data? Why jdbc read them as control symbols? I have created issue to terradata (https://tays.teradata.com/home/?language=en_US=RECHDBRVV) and they give me advice to write own SerDe. Perhaps, that is not unique task, and you already wrote such SerDe, can I ask for it?
Hive LIMIT clause slows query
I'm using HDP 2.5.0 with 1.2.1 Hive. Performing some tests I noticed that my query works better if I don't use limit clause. My query is: insert into table *results_table *partition (task_id=xxx) select * from *data_table * where dt=20171102 and . limit 100 This query runs in about 30 seconds, but without limit clause I can get about 20 seconds. Query execution plan with limit <https://pastebin.com/Cmp2rPNr>, and without <https://pastebin.com/z1ps2EhG>. I can't remove limit clause because in some cases there's to much results and I don't whant to store them all in result table. Why limit affects performance so much? Intuitively, it seems that with limit clause it should work faster. What can I do to improve prefomance?
Retry the failed stage.
Hi, I am working with hive running on MapReduce. Our query creates multiple stages of MR jobs. if any of the stages fails due to an intermittent issue. we have to retry the full query. Is there any config in the hive so that only failed stage is retired before failing full query. Thanks. -piyush