[jira] [Commented] (HIVE-9201) Lazy functions do not handle newlines and carriage returns properly

2015-01-06 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266175#comment-14266175
 ] 

Yongzhi Chen commented on HIVE-9201:


Even we will support line terminator other than \n in the future, we have to 
handle the case when line terminator used in string value. Any suggestions or 
corrections for my current approach? Or any better ideas? Thanks

 Lazy functions do not handle newlines and carriage returns properly
 ---

 Key: HIVE-9201
 URL: https://issues.apache.org/jira/browse/HIVE-9201
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 0.13.1
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-9201.1.patch


 Hive returns wrong result when returning string has char \r or \n in it.  
 This happens when the query can trigger mapreduce jobs. 
 For example, for a table named strsim with only one row:
 As shown following, query 1 returns 1 row while query 2 returns 3 rows.
 Query 1:
 select abc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray;
 Query 2:
 select a\rb\nc, narray from strsim LATERAL VIEW explode(array(1)) C AS 
 narray;
 select abc, narray from strsim LATERAL VIEW e 
 xplode(array(1)) C AS narray;
 INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2014-12-23 15:00:08,958 Stage-1 map = 0%,  reduce = 0%
 INFO  : Ended Job = job_local1178499218_0015
 +--+-+--+
 1 row selected (1.283 seconds)
 | _c0  | narray  |
 +--+-+--+
 | abc  | 1   |
 +--+-+--+
 select a\rb\nc, narray from strsim LATERAL VI 
 EW explode(array(1)) C AS narray;
 INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2014-12-23 15:04:35,441 Stage-1 map = 0%,  reduce = 0%
 INFO  : Ended Job = job_local1816711099_0016
 +--+-+--+
 3 rows selected (1.135 seconds)
 | _c0  | narray  |
 +--+-+--+
 | a| NULL|
 | b| NULL|
 | c| 1   |
 +--+-+--+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9201) Lazy functions do not handle newlines and carriage returns properly

2015-01-05 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265035#comment-14265035
 ] 

Yongzhi Chen commented on HIVE-9201:


Just found out, in SerDeUtils, escapeString and lightEscapeString use the same 
way to escape \n and \r as my fix for the issue:

https://github.com/apache/hive/blob/trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java#L98

https://github.com/apache/hive/blob/trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java#L129



 Lazy functions do not handle newlines and carriage returns properly
 ---

 Key: HIVE-9201
 URL: https://issues.apache.org/jira/browse/HIVE-9201
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 0.13.1
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-9201.1.patch


 Hive returns wrong result when returning string has char \r or \n in it.  
 This happens when the query can trigger mapreduce jobs. 
 For example, for a table named strsim with only one row:
 As shown following, query 1 returns 1 row while query 2 returns 3 rows.
 Query 1:
 select abc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray;
 Query 2:
 select a\rb\nc, narray from strsim LATERAL VIEW explode(array(1)) C AS 
 narray;
 select abc, narray from strsim LATERAL VIEW e 
 xplode(array(1)) C AS narray;
 INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2014-12-23 15:00:08,958 Stage-1 map = 0%,  reduce = 0%
 INFO  : Ended Job = job_local1178499218_0015
 +--+-+--+
 1 row selected (1.283 seconds)
 | _c0  | narray  |
 +--+-+--+
 | abc  | 1   |
 +--+-+--+
 select a\rb\nc, narray from strsim LATERAL VI 
 EW explode(array(1)) C AS narray;
 INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2014-12-23 15:04:35,441 Stage-1 map = 0%,  reduce = 0%
 INFO  : Ended Job = job_local1816711099_0016
 +--+-+--+
 3 rows selected (1.135 seconds)
 | _c0  | narray  |
 +--+-+--+
 | a| NULL|
 | b| NULL|
 | c| 1   |
 +--+-+--+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9201) Lazy functions do not handle newlines and carriage returns properly

2015-01-05 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264965#comment-14264965
 ] 

Yongzhi Chen commented on HIVE-9201:


[~ashutoshgupt...@gmail.com],
Are you trying to say we start to Implement LINES TERMINATED BY for hive? It 
is treated as not fixable by 
https://issues.apache.org/jira/browse/HIVE-302
In current hive code, it seems we just error out the line terminator other than 
\n, and many places just assume the \n is the only line terminator. 
case HiveParser.TOK_TABLEROWFORMATLINES:
  String lineDelim = unescapeSQLString(rowChild.getChild(0).getText());
  tblDesc.getProperties().setProperty(serdeConstants.LINE_DELIM, 
lineDelim);
  if (!lineDelim.equals(\n)  !lineDelim.equals(10)) {
throw new SemanticException(generateErrorMessage(rowChild,
ErrorMsg.LINES_TERMINATED_BY_NON_NEWLINE.getMsg()));
  }
  break;
But with MAPREDUCE-2602  fixed, it is possible for hive to support changing the 
line terminator. Just wonder it may not be a easy change.

Thanks.

 Lazy functions do not handle newlines and carriage returns properly
 ---

 Key: HIVE-9201
 URL: https://issues.apache.org/jira/browse/HIVE-9201
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 0.13.1
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-9201.1.patch


 Hive returns wrong result when returning string has char \r or \n in it.  
 This happens when the query can trigger mapreduce jobs. 
 For example, for a table named strsim with only one row:
 As shown following, query 1 returns 1 row while query 2 returns 3 rows.
 Query 1:
 select abc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray;
 Query 2:
 select a\rb\nc, narray from strsim LATERAL VIEW explode(array(1)) C AS 
 narray;
 select abc, narray from strsim LATERAL VIEW e 
 xplode(array(1)) C AS narray;
 INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2014-12-23 15:00:08,958 Stage-1 map = 0%,  reduce = 0%
 INFO  : Ended Job = job_local1178499218_0015
 +--+-+--+
 1 row selected (1.283 seconds)
 | _c0  | narray  |
 +--+-+--+
 | abc  | 1   |
 +--+-+--+
 select a\rb\nc, narray from strsim LATERAL VI 
 EW explode(array(1)) C AS narray;
 INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2014-12-23 15:04:35,441 Stage-1 map = 0%,  reduce = 0%
 INFO  : Ended Job = job_local1816711099_0016
 +--+-+--+
 3 rows selected (1.135 seconds)
 | _c0  | narray  |
 +--+-+--+
 | a| NULL|
 | b| NULL|
 | c| 1   |
 +--+-+--+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9201) Lazy functions do not handle newlines and carriage returns properly

2015-01-02 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14263354#comment-14263354
 ] 

Ashutosh Chauhan commented on HIVE-9201:


Wondering if MAPREDUCE-2602 is useful here, which lets you set multi-byte 
record delimiter via {{textinputformat.record.delimiter}} config param.

 Lazy functions do not handle newlines and carriage returns properly
 ---

 Key: HIVE-9201
 URL: https://issues.apache.org/jira/browse/HIVE-9201
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 0.13.1
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-9201.1.patch


 Hive returns wrong result when returning string has char \r or \n in it.  
 This happens when the query can trigger mapreduce jobs. 
 For example, for a table named strsim with only one row:
 As shown following, query 1 returns 1 row while query 2 returns 3 rows.
 Query 1:
 select abc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray;
 Query 2:
 select a\rb\nc, narray from strsim LATERAL VIEW explode(array(1)) C AS 
 narray;
 select abc, narray from strsim LATERAL VIEW e 
 xplode(array(1)) C AS narray;
 INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2014-12-23 15:00:08,958 Stage-1 map = 0%,  reduce = 0%
 INFO  : Ended Job = job_local1178499218_0015
 +--+-+--+
 1 row selected (1.283 seconds)
 | _c0  | narray  |
 +--+-+--+
 | abc  | 1   |
 +--+-+--+
 select a\rb\nc, narray from strsim LATERAL VI 
 EW explode(array(1)) C AS narray;
 INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2014-12-23 15:04:35,441 Stage-1 map = 0%,  reduce = 0%
 INFO  : Ended Job = job_local1816711099_0016
 +--+-+--+
 3 rows selected (1.135 seconds)
 | _c0  | narray  |
 +--+-+--+
 | a| NULL|
 | b| NULL|
 | c| 1   |
 +--+-+--+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9201) Lazy functions do not handle newlines and carriage returns properly

2014-12-29 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260340#comment-14260340
 ] 

Yongzhi Chen commented on HIVE-9201:


The four failed tests are not related to the change:
The first 2 tests are old failures aged 3 or more.
 I ran tests
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_remote_script
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
on my own machine, both succeeded.

 Lazy functions do not handle newlines and carriage returns properly
 ---

 Key: HIVE-9201
 URL: https://issues.apache.org/jira/browse/HIVE-9201
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 0.13.1
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-9201.1.patch


 Hive returns wrong result when returning string has char \r or \n in it.  
 This happens when the query can trigger mapreduce jobs. 
 For example, for a table named strsim with only one row:
 As shown following, query 1 returns 1 row while query 2 returns 3 rows.
 Query 1:
 select abc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray;
 Query 2:
 select a\rb\nc, narray from strsim LATERAL VIEW explode(array(1)) C AS 
 narray;
 select abc, narray from strsim LATERAL VIEW e 
 xplode(array(1)) C AS narray;
 INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2014-12-23 15:00:08,958 Stage-1 map = 0%,  reduce = 0%
 INFO  : Ended Job = job_local1178499218_0015
 +--+-+--+
 1 row selected (1.283 seconds)
 | _c0  | narray  |
 +--+-+--+
 | abc  | 1   |
 +--+-+--+
 select a\rb\nc, narray from strsim LATERAL VI 
 EW explode(array(1)) C AS narray;
 INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2014-12-23 15:04:35,441 Stage-1 map = 0%,  reduce = 0%
 INFO  : Ended Job = job_local1816711099_0016
 +--+-+--+
 3 rows selected (1.135 seconds)
 | _c0  | narray  |
 +--+-+--+
 | a| NULL|
 | b| NULL|
 | c| 1   |
 +--+-+--+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9201) Lazy functions do not handle newlines and carriage returns properly

2014-12-24 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258112#comment-14258112
 ] 

Hive QA commented on HIVE-9201:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12688942/HIVE-9201.1.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 6723 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_query_result_fileformat
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_remote_script
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2185/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2185/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2185/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12688942 - PreCommit-HIVE-TRUNK-Build

 Lazy functions do not handle newlines and carriage returns properly
 ---

 Key: HIVE-9201
 URL: https://issues.apache.org/jira/browse/HIVE-9201
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 0.13.1
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-9201.1.patch


 Hive returns wrong result when returning string has char \r or \n in it.  
 This happens when the query can trigger mapreduce jobs. 
 For example, for a table named strsim with only one row:
 As shown following, query 1 returns 1 row while query 2 returns 3 rows.
 Query 1:
 select abc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray;
 Query 2:
 select a\rb\nc, narray from strsim LATERAL VIEW explode(array(1)) C AS 
 narray;
 select abc, narray from strsim LATERAL VIEW e 
 xplode(array(1)) C AS narray;
 INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2014-12-23 15:00:08,958 Stage-1 map = 0%,  reduce = 0%
 INFO  : Ended Job = job_local1178499218_0015
 +--+-+--+
 1 row selected (1.283 seconds)
 | _c0  | narray  |
 +--+-+--+
 | abc  | 1   |
 +--+-+--+
 select a\rb\nc, narray from strsim LATERAL VI 
 EW explode(array(1)) C AS narray;
 INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2014-12-23 15:04:35,441 Stage-1 map = 0%,  reduce = 0%
 INFO  : Ended Job = job_local1816711099_0016
 +--+-+--+
 3 rows selected (1.135 seconds)
 | _c0  | narray  |
 +--+-+--+
 | a| NULL|
 | b| NULL|
 | c| 1   |
 +--+-+--+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9201) Lazy functions do not handle newlines and carriage returns properly

2014-12-24 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258311#comment-14258311
 ] 

Brock Noland commented on HIVE-9201:


Hi,

Is there any downsides of this approach? Assume that users already have data 
that contains actual newlines, {{\\n}}, and {{\n}} will anyone be impacted 
negatively?

 Lazy functions do not handle newlines and carriage returns properly
 ---

 Key: HIVE-9201
 URL: https://issues.apache.org/jira/browse/HIVE-9201
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 0.13.1
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-9201.1.patch


 Hive returns wrong result when returning string has char \r or \n in it.  
 This happens when the query can trigger mapreduce jobs. 
 For example, for a table named strsim with only one row:
 As shown following, query 1 returns 1 row while query 2 returns 3 rows.
 Query 1:
 select abc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray;
 Query 2:
 select a\rb\nc, narray from strsim LATERAL VIEW explode(array(1)) C AS 
 narray;
 select abc, narray from strsim LATERAL VIEW e 
 xplode(array(1)) C AS narray;
 INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2014-12-23 15:00:08,958 Stage-1 map = 0%,  reduce = 0%
 INFO  : Ended Job = job_local1178499218_0015
 +--+-+--+
 1 row selected (1.283 seconds)
 | _c0  | narray  |
 +--+-+--+
 | abc  | 1   |
 +--+-+--+
 select a\rb\nc, narray from strsim LATERAL VI 
 EW explode(array(1)) C AS narray;
 INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2014-12-23 15:04:35,441 Stage-1 map = 0%,  reduce = 0%
 INFO  : Ended Job = job_local1816711099_0016
 +--+-+--+
 3 rows selected (1.135 seconds)
 | _c0  | narray  |
 +--+-+--+
 | a| NULL|
 | b| NULL|
 | c| 1   |
 +--+-+--+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9201) Lazy functions do not handle newlines and carriage returns properly

2014-12-24 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258326#comment-14258326
 ] 

Yongzhi Chen commented on HIVE-9201:


we will escape new line, changing it to '\' 'n' in stream. We also escape  \ in 
the customer's data. So 
if customer's data is newline, char \ char n, then in the stream it will be 
\n\\n, 
then later it will be converted back to newline char \ char n by the method 
copyAndEscapeStringDataToText

 Lazy functions do not handle newlines and carriage returns properly
 ---

 Key: HIVE-9201
 URL: https://issues.apache.org/jira/browse/HIVE-9201
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 0.13.1
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-9201.1.patch


 Hive returns wrong result when returning string has char \r or \n in it.  
 This happens when the query can trigger mapreduce jobs. 
 For example, for a table named strsim with only one row:
 As shown following, query 1 returns 1 row while query 2 returns 3 rows.
 Query 1:
 select abc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray;
 Query 2:
 select a\rb\nc, narray from strsim LATERAL VIEW explode(array(1)) C AS 
 narray;
 select abc, narray from strsim LATERAL VIEW e 
 xplode(array(1)) C AS narray;
 INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2014-12-23 15:00:08,958 Stage-1 map = 0%,  reduce = 0%
 INFO  : Ended Job = job_local1178499218_0015
 +--+-+--+
 1 row selected (1.283 seconds)
 | _c0  | narray  |
 +--+-+--+
 | abc  | 1   |
 +--+-+--+
 select a\rb\nc, narray from strsim LATERAL VI 
 EW explode(array(1)) C AS narray;
 INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2014-12-23 15:04:35,441 Stage-1 map = 0%,  reduce = 0%
 INFO  : Ended Job = job_local1816711099_0016
 +--+-+--+
 3 rows selected (1.135 seconds)
 | _c0  | narray  |
 +--+-+--+
 | a| NULL|
 | b| NULL|
 | c| 1   |
 +--+-+--+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9201) Lazy functions do not handle newlines and carriage returns properly

2014-12-24 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258376#comment-14258376
 ] 

Brock Noland commented on HIVE-9201:


Hmm I am a little nervous about this and I don't have too much experience with 
the text serialized formats. [~ashutoshc] any thoughts on this one?

 Lazy functions do not handle newlines and carriage returns properly
 ---

 Key: HIVE-9201
 URL: https://issues.apache.org/jira/browse/HIVE-9201
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 0.13.1
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-9201.1.patch


 Hive returns wrong result when returning string has char \r or \n in it.  
 This happens when the query can trigger mapreduce jobs. 
 For example, for a table named strsim with only one row:
 As shown following, query 1 returns 1 row while query 2 returns 3 rows.
 Query 1:
 select abc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray;
 Query 2:
 select a\rb\nc, narray from strsim LATERAL VIEW explode(array(1)) C AS 
 narray;
 select abc, narray from strsim LATERAL VIEW e 
 xplode(array(1)) C AS narray;
 INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2014-12-23 15:00:08,958 Stage-1 map = 0%,  reduce = 0%
 INFO  : Ended Job = job_local1178499218_0015
 +--+-+--+
 1 row selected (1.283 seconds)
 | _c0  | narray  |
 +--+-+--+
 | abc  | 1   |
 +--+-+--+
 select a\rb\nc, narray from strsim LATERAL VI 
 EW explode(array(1)) C AS narray;
 INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2014-12-23 15:04:35,441 Stage-1 map = 0%,  reduce = 0%
 INFO  : Ended Job = job_local1816711099_0016
 +--+-+--+
 3 rows selected (1.135 seconds)
 | _c0  | narray  |
 +--+-+--+
 | a| NULL|
 | b| NULL|
 | c| 1   |
 +--+-+--+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9201) Lazy functions do not handle newlines and carriage returns properly

2014-12-23 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14257577#comment-14257577
 ] 

Brock Noland commented on HIVE-9201:


https://github.com/apache/hive/blob/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L746

?

 Lazy functions do not handle newlines and carriage returns properly
 ---

 Key: HIVE-9201
 URL: https://issues.apache.org/jira/browse/HIVE-9201
 Project: Hive
  Issue Type: Bug
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen

 Hive returns wrong result when returning string has char \r or \n in it.  
 This happens when the query can trigger mapreduce jobs. 
 For example, for a table named strsim with only one row:
 As shown following, query 1 returns 1 row while query 2 returns 3 rows.
 Query 1:
 select abc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray;
 Query 2:
 select a\rb\nc, narray from strsim LATERAL VIEW explode(array(1)) C AS 
 narray;
 select abc, narray from strsim LATERAL VIEW e 
 xplode(array(1)) C AS narray;
 INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2014-12-23 15:00:08,958 Stage-1 map = 0%,  reduce = 0%
 INFO  : Ended Job = job_local1178499218_0015
 +--+-+--+
 1 row selected (1.283 seconds)
 | _c0  | narray  |
 +--+-+--+
 | abc  | 1   |
 +--+-+--+
 select a\rb\nc, narray from strsim LATERAL VI 
 EW explode(array(1)) C AS narray;
 INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2014-12-23 15:04:35,441 Stage-1 map = 0%,  reduce = 0%
 INFO  : Ended Job = job_local1816711099_0016
 +--+-+--+
 3 rows selected (1.135 seconds)
 | _c0  | narray  |
 +--+-+--+
 | a| NULL|
 | b| NULL|
 | c| 1   |
 +--+-+--+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9201) Lazy functions do not handle newlines and carriage returns properly

2014-12-23 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14257688#comment-14257688
 ] 

Yongzhi Chen commented on HIVE-9201:


Three rows are returned because hadoop method 
org.apache.hadoop.mapred.LineRecordReader.readDefaultLine  use \r and \n as line
terminator. So hive need to process the \r and \n chars before call the method.
 Map job uses LazyUtils.writeEscaped method to escape special chars (such as 
control characters). The method just blindly add escape chars before the chars 
needing escaped. There are two issues: first \r and \n not in the chars needed 
to be escaped. second, even they are added, they should be escaped differently: 
for just adding escape char (such as \ ) before them can not solve our problem, 
the char with value 13 and 10 still in the stream. So we should process the two 
chars differently. For example replace '\r' with two chars: escape char and 
char 'r' . These logic can be add in the LazyUtils.writeEscaped method. The 
processed stream can go through 
org.apache.hadoop.mapred.LineRecordReader.readDefaultLine method without logic 
error(such errors as one row becomes 3 rows). Then in LazyString.init method, 
when we remove the escape chars, we know convert '\' '\r' to char 13.
Attach the fix patch.


 Lazy functions do not handle newlines and carriage returns properly
 ---

 Key: HIVE-9201
 URL: https://issues.apache.org/jira/browse/HIVE-9201
 Project: Hive
  Issue Type: Bug
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen

 Hive returns wrong result when returning string has char \r or \n in it.  
 This happens when the query can trigger mapreduce jobs. 
 For example, for a table named strsim with only one row:
 As shown following, query 1 returns 1 row while query 2 returns 3 rows.
 Query 1:
 select abc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray;
 Query 2:
 select a\rb\nc, narray from strsim LATERAL VIEW explode(array(1)) C AS 
 narray;
 select abc, narray from strsim LATERAL VIEW e 
 xplode(array(1)) C AS narray;
 INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2014-12-23 15:00:08,958 Stage-1 map = 0%,  reduce = 0%
 INFO  : Ended Job = job_local1178499218_0015
 +--+-+--+
 1 row selected (1.283 seconds)
 | _c0  | narray  |
 +--+-+--+
 | abc  | 1   |
 +--+-+--+
 select a\rb\nc, narray from strsim LATERAL VI 
 EW explode(array(1)) C AS narray;
 INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
 INFO  : Job running in-process (local Hadoop)
 INFO  : 2014-12-23 15:04:35,441 Stage-1 map = 0%,  reduce = 0%
 INFO  : Ended Job = job_local1816711099_0016
 +--+-+--+
 3 rows selected (1.135 seconds)
 | _c0  | narray  |
 +--+-+--+
 | a| NULL|
 | b| NULL|
 | c| 1   |
 +--+-+--+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)