[jira] [Commented] (HIVE-9201) Lazy functions do not handle newlines and carriage returns properly
[ https://issues.apache.org/jira/browse/HIVE-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266175#comment-14266175 ] Yongzhi Chen commented on HIVE-9201: Even we will support line terminator other than \n in the future, we have to handle the case when line terminator used in string value. Any suggestions or corrections for my current approach? Or any better ideas? Thanks Lazy functions do not handle newlines and carriage returns properly --- Key: HIVE-9201 URL: https://issues.apache.org/jira/browse/HIVE-9201 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-9201.1.patch Hive returns wrong result when returning string has char \r or \n in it. This happens when the query can trigger mapreduce jobs. For example, for a table named strsim with only one row: As shown following, query 1 returns 1 row while query 2 returns 3 rows. Query 1: select abc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray; Query 2: select a\rb\nc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray; select abc, narray from strsim LATERAL VIEW e xplode(array(1)) C AS narray; INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : Job running in-process (local Hadoop) INFO : 2014-12-23 15:00:08,958 Stage-1 map = 0%, reduce = 0% INFO : Ended Job = job_local1178499218_0015 +--+-+--+ 1 row selected (1.283 seconds) | _c0 | narray | +--+-+--+ | abc | 1 | +--+-+--+ select a\rb\nc, narray from strsim LATERAL VI EW explode(array(1)) C AS narray; INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : Job running in-process (local Hadoop) INFO : 2014-12-23 15:04:35,441 Stage-1 map = 0%, reduce = 0% INFO : Ended Job = job_local1816711099_0016 +--+-+--+ 3 rows selected (1.135 seconds) | _c0 | narray | +--+-+--+ | a| NULL| | b| NULL| | c| 1 | +--+-+--+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9201) Lazy functions do not handle newlines and carriage returns properly
[ https://issues.apache.org/jira/browse/HIVE-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265035#comment-14265035 ] Yongzhi Chen commented on HIVE-9201: Just found out, in SerDeUtils, escapeString and lightEscapeString use the same way to escape \n and \r as my fix for the issue: https://github.com/apache/hive/blob/trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java#L98 https://github.com/apache/hive/blob/trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java#L129 Lazy functions do not handle newlines and carriage returns properly --- Key: HIVE-9201 URL: https://issues.apache.org/jira/browse/HIVE-9201 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-9201.1.patch Hive returns wrong result when returning string has char \r or \n in it. This happens when the query can trigger mapreduce jobs. For example, for a table named strsim with only one row: As shown following, query 1 returns 1 row while query 2 returns 3 rows. Query 1: select abc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray; Query 2: select a\rb\nc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray; select abc, narray from strsim LATERAL VIEW e xplode(array(1)) C AS narray; INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : Job running in-process (local Hadoop) INFO : 2014-12-23 15:00:08,958 Stage-1 map = 0%, reduce = 0% INFO : Ended Job = job_local1178499218_0015 +--+-+--+ 1 row selected (1.283 seconds) | _c0 | narray | +--+-+--+ | abc | 1 | +--+-+--+ select a\rb\nc, narray from strsim LATERAL VI EW explode(array(1)) C AS narray; INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : Job running in-process (local Hadoop) INFO : 2014-12-23 15:04:35,441 Stage-1 map = 0%, reduce = 0% INFO : Ended Job = job_local1816711099_0016 +--+-+--+ 3 rows selected (1.135 seconds) | _c0 | narray | +--+-+--+ | a| NULL| | b| NULL| | c| 1 | +--+-+--+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9201) Lazy functions do not handle newlines and carriage returns properly
[ https://issues.apache.org/jira/browse/HIVE-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264965#comment-14264965 ] Yongzhi Chen commented on HIVE-9201: [~ashutoshgupt...@gmail.com], Are you trying to say we start to Implement LINES TERMINATED BY for hive? It is treated as not fixable by https://issues.apache.org/jira/browse/HIVE-302 In current hive code, it seems we just error out the line terminator other than \n, and many places just assume the \n is the only line terminator. case HiveParser.TOK_TABLEROWFORMATLINES: String lineDelim = unescapeSQLString(rowChild.getChild(0).getText()); tblDesc.getProperties().setProperty(serdeConstants.LINE_DELIM, lineDelim); if (!lineDelim.equals(\n) !lineDelim.equals(10)) { throw new SemanticException(generateErrorMessage(rowChild, ErrorMsg.LINES_TERMINATED_BY_NON_NEWLINE.getMsg())); } break; But with MAPREDUCE-2602 fixed, it is possible for hive to support changing the line terminator. Just wonder it may not be a easy change. Thanks. Lazy functions do not handle newlines and carriage returns properly --- Key: HIVE-9201 URL: https://issues.apache.org/jira/browse/HIVE-9201 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-9201.1.patch Hive returns wrong result when returning string has char \r or \n in it. This happens when the query can trigger mapreduce jobs. For example, for a table named strsim with only one row: As shown following, query 1 returns 1 row while query 2 returns 3 rows. Query 1: select abc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray; Query 2: select a\rb\nc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray; select abc, narray from strsim LATERAL VIEW e xplode(array(1)) C AS narray; INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : Job running in-process (local Hadoop) INFO : 2014-12-23 15:00:08,958 Stage-1 map = 0%, reduce = 0% INFO : Ended Job = job_local1178499218_0015 +--+-+--+ 1 row selected (1.283 seconds) | _c0 | narray | +--+-+--+ | abc | 1 | +--+-+--+ select a\rb\nc, narray from strsim LATERAL VI EW explode(array(1)) C AS narray; INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : Job running in-process (local Hadoop) INFO : 2014-12-23 15:04:35,441 Stage-1 map = 0%, reduce = 0% INFO : Ended Job = job_local1816711099_0016 +--+-+--+ 3 rows selected (1.135 seconds) | _c0 | narray | +--+-+--+ | a| NULL| | b| NULL| | c| 1 | +--+-+--+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9201) Lazy functions do not handle newlines and carriage returns properly
[ https://issues.apache.org/jira/browse/HIVE-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14263354#comment-14263354 ] Ashutosh Chauhan commented on HIVE-9201: Wondering if MAPREDUCE-2602 is useful here, which lets you set multi-byte record delimiter via {{textinputformat.record.delimiter}} config param. Lazy functions do not handle newlines and carriage returns properly --- Key: HIVE-9201 URL: https://issues.apache.org/jira/browse/HIVE-9201 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-9201.1.patch Hive returns wrong result when returning string has char \r or \n in it. This happens when the query can trigger mapreduce jobs. For example, for a table named strsim with only one row: As shown following, query 1 returns 1 row while query 2 returns 3 rows. Query 1: select abc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray; Query 2: select a\rb\nc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray; select abc, narray from strsim LATERAL VIEW e xplode(array(1)) C AS narray; INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : Job running in-process (local Hadoop) INFO : 2014-12-23 15:00:08,958 Stage-1 map = 0%, reduce = 0% INFO : Ended Job = job_local1178499218_0015 +--+-+--+ 1 row selected (1.283 seconds) | _c0 | narray | +--+-+--+ | abc | 1 | +--+-+--+ select a\rb\nc, narray from strsim LATERAL VI EW explode(array(1)) C AS narray; INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : Job running in-process (local Hadoop) INFO : 2014-12-23 15:04:35,441 Stage-1 map = 0%, reduce = 0% INFO : Ended Job = job_local1816711099_0016 +--+-+--+ 3 rows selected (1.135 seconds) | _c0 | narray | +--+-+--+ | a| NULL| | b| NULL| | c| 1 | +--+-+--+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9201) Lazy functions do not handle newlines and carriage returns properly
[ https://issues.apache.org/jira/browse/HIVE-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260340#comment-14260340 ] Yongzhi Chen commented on HIVE-9201: The four failed tests are not related to the change: The first 2 tests are old failures aged 3 or more. I ran tests org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_remote_script org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority on my own machine, both succeeded. Lazy functions do not handle newlines and carriage returns properly --- Key: HIVE-9201 URL: https://issues.apache.org/jira/browse/HIVE-9201 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-9201.1.patch Hive returns wrong result when returning string has char \r or \n in it. This happens when the query can trigger mapreduce jobs. For example, for a table named strsim with only one row: As shown following, query 1 returns 1 row while query 2 returns 3 rows. Query 1: select abc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray; Query 2: select a\rb\nc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray; select abc, narray from strsim LATERAL VIEW e xplode(array(1)) C AS narray; INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : Job running in-process (local Hadoop) INFO : 2014-12-23 15:00:08,958 Stage-1 map = 0%, reduce = 0% INFO : Ended Job = job_local1178499218_0015 +--+-+--+ 1 row selected (1.283 seconds) | _c0 | narray | +--+-+--+ | abc | 1 | +--+-+--+ select a\rb\nc, narray from strsim LATERAL VI EW explode(array(1)) C AS narray; INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : Job running in-process (local Hadoop) INFO : 2014-12-23 15:04:35,441 Stage-1 map = 0%, reduce = 0% INFO : Ended Job = job_local1816711099_0016 +--+-+--+ 3 rows selected (1.135 seconds) | _c0 | narray | +--+-+--+ | a| NULL| | b| NULL| | c| 1 | +--+-+--+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9201) Lazy functions do not handle newlines and carriage returns properly
[ https://issues.apache.org/jira/browse/HIVE-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258112#comment-14258112 ] Hive QA commented on HIVE-9201: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12688942/HIVE-9201.1.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 6723 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_query_result_fileformat org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_remote_script org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2185/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2185/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2185/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12688942 - PreCommit-HIVE-TRUNK-Build Lazy functions do not handle newlines and carriage returns properly --- Key: HIVE-9201 URL: https://issues.apache.org/jira/browse/HIVE-9201 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-9201.1.patch Hive returns wrong result when returning string has char \r or \n in it. This happens when the query can trigger mapreduce jobs. For example, for a table named strsim with only one row: As shown following, query 1 returns 1 row while query 2 returns 3 rows. Query 1: select abc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray; Query 2: select a\rb\nc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray; select abc, narray from strsim LATERAL VIEW e xplode(array(1)) C AS narray; INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : Job running in-process (local Hadoop) INFO : 2014-12-23 15:00:08,958 Stage-1 map = 0%, reduce = 0% INFO : Ended Job = job_local1178499218_0015 +--+-+--+ 1 row selected (1.283 seconds) | _c0 | narray | +--+-+--+ | abc | 1 | +--+-+--+ select a\rb\nc, narray from strsim LATERAL VI EW explode(array(1)) C AS narray; INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : Job running in-process (local Hadoop) INFO : 2014-12-23 15:04:35,441 Stage-1 map = 0%, reduce = 0% INFO : Ended Job = job_local1816711099_0016 +--+-+--+ 3 rows selected (1.135 seconds) | _c0 | narray | +--+-+--+ | a| NULL| | b| NULL| | c| 1 | +--+-+--+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9201) Lazy functions do not handle newlines and carriage returns properly
[ https://issues.apache.org/jira/browse/HIVE-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258311#comment-14258311 ] Brock Noland commented on HIVE-9201: Hi, Is there any downsides of this approach? Assume that users already have data that contains actual newlines, {{\\n}}, and {{\n}} will anyone be impacted negatively? Lazy functions do not handle newlines and carriage returns properly --- Key: HIVE-9201 URL: https://issues.apache.org/jira/browse/HIVE-9201 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-9201.1.patch Hive returns wrong result when returning string has char \r or \n in it. This happens when the query can trigger mapreduce jobs. For example, for a table named strsim with only one row: As shown following, query 1 returns 1 row while query 2 returns 3 rows. Query 1: select abc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray; Query 2: select a\rb\nc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray; select abc, narray from strsim LATERAL VIEW e xplode(array(1)) C AS narray; INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : Job running in-process (local Hadoop) INFO : 2014-12-23 15:00:08,958 Stage-1 map = 0%, reduce = 0% INFO : Ended Job = job_local1178499218_0015 +--+-+--+ 1 row selected (1.283 seconds) | _c0 | narray | +--+-+--+ | abc | 1 | +--+-+--+ select a\rb\nc, narray from strsim LATERAL VI EW explode(array(1)) C AS narray; INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : Job running in-process (local Hadoop) INFO : 2014-12-23 15:04:35,441 Stage-1 map = 0%, reduce = 0% INFO : Ended Job = job_local1816711099_0016 +--+-+--+ 3 rows selected (1.135 seconds) | _c0 | narray | +--+-+--+ | a| NULL| | b| NULL| | c| 1 | +--+-+--+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9201) Lazy functions do not handle newlines and carriage returns properly
[ https://issues.apache.org/jira/browse/HIVE-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258326#comment-14258326 ] Yongzhi Chen commented on HIVE-9201: we will escape new line, changing it to '\' 'n' in stream. We also escape \ in the customer's data. So if customer's data is newline, char \ char n, then in the stream it will be \n\\n, then later it will be converted back to newline char \ char n by the method copyAndEscapeStringDataToText Lazy functions do not handle newlines and carriage returns properly --- Key: HIVE-9201 URL: https://issues.apache.org/jira/browse/HIVE-9201 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-9201.1.patch Hive returns wrong result when returning string has char \r or \n in it. This happens when the query can trigger mapreduce jobs. For example, for a table named strsim with only one row: As shown following, query 1 returns 1 row while query 2 returns 3 rows. Query 1: select abc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray; Query 2: select a\rb\nc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray; select abc, narray from strsim LATERAL VIEW e xplode(array(1)) C AS narray; INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : Job running in-process (local Hadoop) INFO : 2014-12-23 15:00:08,958 Stage-1 map = 0%, reduce = 0% INFO : Ended Job = job_local1178499218_0015 +--+-+--+ 1 row selected (1.283 seconds) | _c0 | narray | +--+-+--+ | abc | 1 | +--+-+--+ select a\rb\nc, narray from strsim LATERAL VI EW explode(array(1)) C AS narray; INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : Job running in-process (local Hadoop) INFO : 2014-12-23 15:04:35,441 Stage-1 map = 0%, reduce = 0% INFO : Ended Job = job_local1816711099_0016 +--+-+--+ 3 rows selected (1.135 seconds) | _c0 | narray | +--+-+--+ | a| NULL| | b| NULL| | c| 1 | +--+-+--+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9201) Lazy functions do not handle newlines and carriage returns properly
[ https://issues.apache.org/jira/browse/HIVE-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258376#comment-14258376 ] Brock Noland commented on HIVE-9201: Hmm I am a little nervous about this and I don't have too much experience with the text serialized formats. [~ashutoshc] any thoughts on this one? Lazy functions do not handle newlines and carriage returns properly --- Key: HIVE-9201 URL: https://issues.apache.org/jira/browse/HIVE-9201 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-9201.1.patch Hive returns wrong result when returning string has char \r or \n in it. This happens when the query can trigger mapreduce jobs. For example, for a table named strsim with only one row: As shown following, query 1 returns 1 row while query 2 returns 3 rows. Query 1: select abc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray; Query 2: select a\rb\nc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray; select abc, narray from strsim LATERAL VIEW e xplode(array(1)) C AS narray; INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : Job running in-process (local Hadoop) INFO : 2014-12-23 15:00:08,958 Stage-1 map = 0%, reduce = 0% INFO : Ended Job = job_local1178499218_0015 +--+-+--+ 1 row selected (1.283 seconds) | _c0 | narray | +--+-+--+ | abc | 1 | +--+-+--+ select a\rb\nc, narray from strsim LATERAL VI EW explode(array(1)) C AS narray; INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : Job running in-process (local Hadoop) INFO : 2014-12-23 15:04:35,441 Stage-1 map = 0%, reduce = 0% INFO : Ended Job = job_local1816711099_0016 +--+-+--+ 3 rows selected (1.135 seconds) | _c0 | narray | +--+-+--+ | a| NULL| | b| NULL| | c| 1 | +--+-+--+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9201) Lazy functions do not handle newlines and carriage returns properly
[ https://issues.apache.org/jira/browse/HIVE-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14257577#comment-14257577 ] Brock Noland commented on HIVE-9201: https://github.com/apache/hive/blob/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L746 ? Lazy functions do not handle newlines and carriage returns properly --- Key: HIVE-9201 URL: https://issues.apache.org/jira/browse/HIVE-9201 Project: Hive Issue Type: Bug Reporter: Yongzhi Chen Assignee: Yongzhi Chen Hive returns wrong result when returning string has char \r or \n in it. This happens when the query can trigger mapreduce jobs. For example, for a table named strsim with only one row: As shown following, query 1 returns 1 row while query 2 returns 3 rows. Query 1: select abc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray; Query 2: select a\rb\nc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray; select abc, narray from strsim LATERAL VIEW e xplode(array(1)) C AS narray; INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : Job running in-process (local Hadoop) INFO : 2014-12-23 15:00:08,958 Stage-1 map = 0%, reduce = 0% INFO : Ended Job = job_local1178499218_0015 +--+-+--+ 1 row selected (1.283 seconds) | _c0 | narray | +--+-+--+ | abc | 1 | +--+-+--+ select a\rb\nc, narray from strsim LATERAL VI EW explode(array(1)) C AS narray; INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : Job running in-process (local Hadoop) INFO : 2014-12-23 15:04:35,441 Stage-1 map = 0%, reduce = 0% INFO : Ended Job = job_local1816711099_0016 +--+-+--+ 3 rows selected (1.135 seconds) | _c0 | narray | +--+-+--+ | a| NULL| | b| NULL| | c| 1 | +--+-+--+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9201) Lazy functions do not handle newlines and carriage returns properly
[ https://issues.apache.org/jira/browse/HIVE-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14257688#comment-14257688 ] Yongzhi Chen commented on HIVE-9201: Three rows are returned because hadoop method org.apache.hadoop.mapred.LineRecordReader.readDefaultLine use \r and \n as line terminator. So hive need to process the \r and \n chars before call the method. Map job uses LazyUtils.writeEscaped method to escape special chars (such as control characters). The method just blindly add escape chars before the chars needing escaped. There are two issues: first \r and \n not in the chars needed to be escaped. second, even they are added, they should be escaped differently: for just adding escape char (such as \ ) before them can not solve our problem, the char with value 13 and 10 still in the stream. So we should process the two chars differently. For example replace '\r' with two chars: escape char and char 'r' . These logic can be add in the LazyUtils.writeEscaped method. The processed stream can go through org.apache.hadoop.mapred.LineRecordReader.readDefaultLine method without logic error(such errors as one row becomes 3 rows). Then in LazyString.init method, when we remove the escape chars, we know convert '\' '\r' to char 13. Attach the fix patch. Lazy functions do not handle newlines and carriage returns properly --- Key: HIVE-9201 URL: https://issues.apache.org/jira/browse/HIVE-9201 Project: Hive Issue Type: Bug Reporter: Yongzhi Chen Assignee: Yongzhi Chen Hive returns wrong result when returning string has char \r or \n in it. This happens when the query can trigger mapreduce jobs. For example, for a table named strsim with only one row: As shown following, query 1 returns 1 row while query 2 returns 3 rows. Query 1: select abc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray; Query 2: select a\rb\nc, narray from strsim LATERAL VIEW explode(array(1)) C AS narray; select abc, narray from strsim LATERAL VIEW e xplode(array(1)) C AS narray; INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : Job running in-process (local Hadoop) INFO : 2014-12-23 15:00:08,958 Stage-1 map = 0%, reduce = 0% INFO : Ended Job = job_local1178499218_0015 +--+-+--+ 1 row selected (1.283 seconds) | _c0 | narray | +--+-+--+ | abc | 1 | +--+-+--+ select a\rb\nc, narray from strsim LATERAL VI EW explode(array(1)) C AS narray; INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : Job running in-process (local Hadoop) INFO : 2014-12-23 15:04:35,441 Stage-1 map = 0%, reduce = 0% INFO : Ended Job = job_local1816711099_0016 +--+-+--+ 3 rows selected (1.135 seconds) | _c0 | narray | +--+-+--+ | a| NULL| | b| NULL| | c| 1 | +--+-+--+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)