[jira] [Updated] (HIVE-11094) Beeline redirecting all output to ErrorStream
[ https://issues.apache.org/jira/browse/HIVE-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11094: --- Assignee: (was: Jesus Camacho Rodriguez) Beeline redirecting all output to ErrorStream - Key: HIVE-11094 URL: https://issues.apache.org/jira/browse/HIVE-11094 Project: Hive Issue Type: Bug Components: CLI Reporter: Jesus Camacho Rodriguez Attachments: HIVE-11094.patch Beeline is sending all output to ErrorStream, instead of using OutputStream for info or debug information. The problem can be reproduced by running: {noformat} ./bin/beeline -u jdbc:hive2:// -e show databases exec.out {noformat} I will still print the output through the terminal. The reason seems to be that the normal output is also sent through the ErrorStream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11104) Select operator doesn't propagate constants appearing in expressions
[ https://issues.apache.org/jira/browse/HIVE-11104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602451#comment-14602451 ] Hive QA commented on HIVE-11104: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12741994/HIVE-11104.3.patch {color:green}SUCCESS:{color} +1 9024 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4387/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4387/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4387/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12741994 - PreCommit-HIVE-TRUNK-Build Select operator doesn't propagate constants appearing in expressions Key: HIVE-11104 URL: https://issues.apache.org/jira/browse/HIVE-11104 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.2.1 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-11104.2.patch, HIVE-11104.3.patch, HIVE-11104.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11094) Beeline redirecting all output to ErrorStream
[ https://issues.apache.org/jira/browse/HIVE-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602450#comment-14602450 ] Thejas M Nair commented on HIVE-11094: -- The 'correct' behavior (based on hive-cli as well as most other tools) is to send only output to stdout and all info/warning etc goes to stderr. The info messages are considered similar to log messages, just lower level than warn. Beeline redirecting all output to ErrorStream - Key: HIVE-11094 URL: https://issues.apache.org/jira/browse/HIVE-11094 Project: Hive Issue Type: Bug Components: CLI Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11094.patch Beeline is sending all output to ErrorStream, instead of using OutputStream for info or debug information. The problem can be reproduced by running: {noformat} ./bin/beeline -u jdbc:hive2:// -e show databases exec.out {noformat} I will still print the output through the terminal. The reason seems to be that the normal output is also sent through the ErrorStream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6791) Support variable substition for Beeline shell command
[ https://issues.apache.org/jira/browse/HIVE-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-6791: --- Attachment: HIVE-6791.4-beeline-cli.patch Update the patch addressing Xuefu's latest comments Support variable substition for Beeline shell command - Key: HIVE-6791 URL: https://issues.apache.org/jira/browse/HIVE-6791 Project: Hive Issue Type: New Feature Components: CLI, Clients Affects Versions: 0.14.0 Reporter: Xuefu Zhang Assignee: Ferdinand Xu Attachments: HIVE-6791-beeline-cli.2.patch, HIVE-6791-beeline-cli.3.patch, HIVE-6791-beeline-cli.patch, HIVE-6791.3-beeline-cli.patch, HIVE-6791.3-beeline-cli.patch, HIVE-6791.4-beeline-cli.patch A follow-up task from HIVE-6694. Similar to HIVE-6570. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11094) Beeline redirecting all output to ErrorStream
[ https://issues.apache.org/jira/browse/HIVE-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602454#comment-14602454 ] Jesus Camacho Rodriguez commented on HIVE-11094: Ok, thanks for the clarification, I was confused. I'll proceed and close the issue then. Thanks! Beeline redirecting all output to ErrorStream - Key: HIVE-11094 URL: https://issues.apache.org/jira/browse/HIVE-11094 Project: Hive Issue Type: Bug Components: CLI Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11094.patch Beeline is sending all output to ErrorStream, instead of using OutputStream for info or debug information. The problem can be reproduced by running: {noformat} ./bin/beeline -u jdbc:hive2:// -e show databases exec.out {noformat} I will still print the output through the terminal. The reason seems to be that the normal output is also sent through the ErrorStream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9970) Hive on spark
[ https://issues.apache.org/jira/browse/HIVE-9970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602467#comment-14602467 ] JoneZhang commented on HIVE-9970: - I have resolved the problem. first hive cli will load $HIVE_HONE\lib\*.jar accurately. then spark will load old version hive jar because in $SPARK_HOME\conf\spark-ent.sh export SPARK_CLASSPATH=$SPARK_HOME/lib/*:$HADOOP_HOME/share/hadoop/common/hadoop-lzo-0.4.20-SNAPSHOT.jar:$HIVE_HOME/lib/hive-contrib-0.12.0.jar:$HIVE_HOME/lib/hive-common-0.12.0.jar:$HIVE_HOME/bin/hive-cli -0.12.0.jar:$HIVE_HOME/lib/hive-serde-0.12.0.jar:$HIVE_HOME/lib/:$EXTRA_CLASSPATH however HiveConf.class in hive-common-0.12.0.jar does not contain SPARK_RPC_CLIENT_CONNECT_TIMEOUT. so, java.lang.NoSuchFieldError: SPARK_RPC_CLIENT_CONNECT_TIMEOUT has occur. Hive on spark - Key: HIVE-9970 URL: https://issues.apache.org/jira/browse/HIVE-9970 Project: Hive Issue Type: Bug Reporter: Amithsha Hi all, Recently i have configured Spark 1.2.0 and my environment is hadoop 2.6.0 hive 1.1.0 Here i have tried hive on Spark while executing insert into i am getting the following g error. Query ID = hadoop2_20150313162828_8764adad-a8e4-49da-9ef5-35e4ebd6bc63 Total jobs = 1 Launching Job 1 out of 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapreduce.job.reduces=number Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask Have added the spark-assembly jar in hive lib And also in hive console using the command add jar followed by the steps set spark.home=/opt/spark-1.2.1/; add jar /opt/spark-1.2.1/assembly/target/scala-2.10/spark-assembly-1.2.1-hadoop2.4.0.jar; set hive.execution.engine=spark; set spark.master=spark://xxx:7077; set spark.eventLog.enabled=true; set spark.executor.memory=512m; set spark.serializer=org.apache.spark.serializer.KryoSerializer; Can anyone suggest Thanks Regards Amithsha -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6791) Support variable substition for Beeline shell command
[ https://issues.apache.org/jira/browse/HIVE-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-6791: --- Attachment: HIVE-6791.5-beeline-cli.patch Rebase the patch Support variable substition for Beeline shell command - Key: HIVE-6791 URL: https://issues.apache.org/jira/browse/HIVE-6791 Project: Hive Issue Type: New Feature Components: CLI, Clients Affects Versions: 0.14.0 Reporter: Xuefu Zhang Assignee: Ferdinand Xu Attachments: HIVE-6791-beeline-cli.2.patch, HIVE-6791-beeline-cli.3.patch, HIVE-6791-beeline-cli.patch, HIVE-6791.3-beeline-cli.patch, HIVE-6791.3-beeline-cli.patch, HIVE-6791.4-beeline-cli.patch, HIVE-6791.5-beeline-cli.patch A follow-up task from HIVE-6694. Similar to HIVE-6570. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10983) SerDeUtils bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602551#comment-14602551 ] Chengxiang Li commented on HIVE-10983: -- Nice found, thanks for working on this issue, [~xiaowei]. For the patch, do you think we can just use {code:java} return new Text(new String(text.getBytes(), 0, text.getLength(), previousCharset)) {code} so that we do not need extra memory copy introduced in the patch. SerDeUtils bug ,when Text is reused - Key: HIVE-10983 URL: https://issues.apache.org/jira/browse/HIVE-10983 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0, 1.0.0, 1.2.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Labels: patch Fix For: 0.14.1, 1.2.0 Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt {noformat} The mothod transformTextToUTF8 have a error bug,It invoke a bad method of Text,getBytes()! The method getBytes of Text returns the raw bytes; however, only data up to Text.length is valid.A better way is use copyBytes() if you need the returned array to be precisely the length of the data. But the copyBytes is added behind hadoop1. {noformat} When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql , {code:sql} select * from web_searchhub where logdate=2015061003 {code} the result of sql see blow.Notice that ,the second row content contains the first row content. {noformat} INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 {noformat} The content of origin lzo file content see below ,just 2 rows. {noformat} INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 {noformat} I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : {code:sql} CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11112) ISO-8859-1 text output has fragments of previous longer rows appended
[ https://issues.apache.org/jira/browse/HIVE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602518#comment-14602518 ] Hive QA commented on HIVE-2: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12741917/HIVE-2.1.patch {color:green}SUCCESS:{color} +1 9026 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4388/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4388/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4388/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12741917 - PreCommit-HIVE-TRUNK-Build ISO-8859-1 text output has fragments of previous longer rows appended - Key: HIVE-2 URL: https://issues.apache.org/jira/browse/HIVE-2 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 1.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-2.1.patch If a LazySimpleSerDe table is created using ISO 8859-1 encoding, query results for a string column are incorrect for any row that was preceded by a row containing a longer string. Example steps to reproduce: 1. Create a table using ISO 8859-1 encoding: CREATE TABLE person_lat1 (name STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='ISO8859_1'); 2. Copy an ISO-8859-1 encoded text file into the appropriate warehouse folder in HDFS. I'll attach an example file containing the following text: Müller,Thomas Jørgensen,Jørgen Peña,Andrés Nåm,Fæk 3. Execute SELECT * FROM person_lat1 Result - The following output appears: +---+--+ | person_lat1.name | +---+--+ | Müller,Thomas | | Jørgensen,Jørgen | | Peña,Andrésørgen | | Nåm,Fækdrésørgen | +---+--+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6791) Support variable substition for Beeline shell command
[ https://issues.apache.org/jira/browse/HIVE-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602519#comment-14602519 ] Hive QA commented on HIVE-6791: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12742061/HIVE-6791.4-beeline-cli.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-BEELINE-Build/3/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-BEELINE-Build/3/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-BEELINE-Build-3/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-BEELINE-Build-3/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z beeline-cli ]] + [[ -d apache-git-beeline-source ]] + [[ ! -d apache-git-beeline-source/.git ]] + [[ ! -d apache-git-beeline-source ]] + cd apache-git-beeline-source + git fetch origin From https://github.com/apache/hive 2243de3..00e0d55 beeline-cli - origin/beeline-cli c5dc87a..cc4075b llap - origin/llap + git reset --hard HEAD HEAD is now at 2243de3 HIVE-10905 QuitExit fails ending with ';' [beeline-cli Branch](Chinna Rao Lalam, reviewed by Ferdinand Xu) + git clean -f -d Removing common/src/java/org/apache/hadoop/hive/conf/HiveVariableSource.java Removing common/src/java/org/apache/hadoop/hive/conf/VariableSubstitution.java Removing common/src/test/org/apache/hadoop/hive/conf/TestVariableSubstitution.java + git checkout beeline-cli Already on 'beeline-cli' Your branch is behind 'origin/beeline-cli' by 259 commits, and can be fast-forwarded. + git reset --hard origin/beeline-cli HEAD is now at 00e0d55 Merge branch 'master' into beeline-cli + git merge --ff-only origin/beeline-cli Already up-to-date. + git gc + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12742061 - PreCommit-HIVE-BEELINE-Build Support variable substition for Beeline shell command - Key: HIVE-6791 URL: https://issues.apache.org/jira/browse/HIVE-6791 Project: Hive Issue Type: New Feature Components: CLI, Clients Affects Versions: 0.14.0 Reporter: Xuefu Zhang Assignee: Ferdinand Xu Attachments: HIVE-6791-beeline-cli.2.patch, HIVE-6791-beeline-cli.3.patch, HIVE-6791-beeline-cli.patch, HIVE-6791.3-beeline-cli.patch, HIVE-6791.3-beeline-cli.patch, HIVE-6791.4-beeline-cli.patch A follow-up task from HIVE-6694. Similar to HIVE-6570. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6791) Support variable substition for Beeline shell command
[ https://issues.apache.org/jira/browse/HIVE-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-6791: --- Attachment: HIVE-6791.5-beeline-cli.patch Support variable substition for Beeline shell command - Key: HIVE-6791 URL: https://issues.apache.org/jira/browse/HIVE-6791 Project: Hive Issue Type: New Feature Components: CLI, Clients Affects Versions: 0.14.0 Reporter: Xuefu Zhang Assignee: Ferdinand Xu Attachments: HIVE-6791-beeline-cli.2.patch, HIVE-6791-beeline-cli.patch, HIVE-6791.3-beeline-cli.patch, HIVE-6791.3-beeline-cli.patch, HIVE-6791.4-beeline-cli.patch, HIVE-6791.5-beeline-cli.patch A follow-up task from HIVE-6694. Similar to HIVE-6570. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6791) Support variable substition for Beeline shell command
[ https://issues.apache.org/jira/browse/HIVE-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-6791: --- Attachment: (was: HIVE-6791-beeline-cli.3.patch) Support variable substition for Beeline shell command - Key: HIVE-6791 URL: https://issues.apache.org/jira/browse/HIVE-6791 Project: Hive Issue Type: New Feature Components: CLI, Clients Affects Versions: 0.14.0 Reporter: Xuefu Zhang Assignee: Ferdinand Xu Attachments: HIVE-6791-beeline-cli.2.patch, HIVE-6791-beeline-cli.patch, HIVE-6791.3-beeline-cli.patch, HIVE-6791.3-beeline-cli.patch, HIVE-6791.4-beeline-cli.patch, HIVE-6791.5-beeline-cli.patch A follow-up task from HIVE-6694. Similar to HIVE-6570. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6791) Support variable substition for Beeline shell command
[ https://issues.apache.org/jira/browse/HIVE-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-6791: --- Attachment: (was: HIVE-6791.5-beeline-cli.patch) Support variable substition for Beeline shell command - Key: HIVE-6791 URL: https://issues.apache.org/jira/browse/HIVE-6791 Project: Hive Issue Type: New Feature Components: CLI, Clients Affects Versions: 0.14.0 Reporter: Xuefu Zhang Assignee: Ferdinand Xu Attachments: HIVE-6791-beeline-cli.2.patch, HIVE-6791-beeline-cli.patch, HIVE-6791.3-beeline-cli.patch, HIVE-6791.3-beeline-cli.patch, HIVE-6791.4-beeline-cli.patch, HIVE-6791.5-beeline-cli.patch A follow-up task from HIVE-6694. Similar to HIVE-6570. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10983) SerDeUtils bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602583#comment-14602583 ] xiaowei wang commented on HIVE-10983: - Your method is better,more Concise . According to your suggestions,I will put up another a patch Thanks Very Much! SerDeUtils bug ,when Text is reused - Key: HIVE-10983 URL: https://issues.apache.org/jira/browse/HIVE-10983 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0, 1.0.0, 1.2.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Labels: patch Fix For: 0.14.1, 1.2.0 Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt {noformat} The mothod transformTextToUTF8 have a error bug,It invoke a bad method of Text,getBytes()! The method getBytes of Text returns the raw bytes; however, only data up to Text.length is valid.A better way is use copyBytes() if you need the returned array to be precisely the length of the data. But the copyBytes is added behind hadoop1. {noformat} When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql , {code:sql} select * from web_searchhub where logdate=2015061003 {code} the result of sql see blow.Notice that ,the second row content contains the first row content. {noformat} INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 {noformat} The content of origin lzo file content see below ,just 2 rows. {noformat} INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 {noformat} I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : {code:sql} CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10983) SerDeUtils bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603108#comment-14603108 ] Hive QA commented on HIVE-10983: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12742078/HIVE-10983.4.patch.txt {color:green}SUCCESS:{color} +1 9025 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4394/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4394/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4394/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12742078 - PreCommit-HIVE-TRUNK-Build SerDeUtils bug ,when Text is reused - Key: HIVE-10983 URL: https://issues.apache.org/jira/browse/HIVE-10983 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0, 1.0.0, 1.2.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Labels: patch Fix For: 0.14.1, 1.2.0 Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt, HIVE-10983.3.patch.txt, HIVE-10983.4.patch.txt, HIVE-10983.5.patch.txt {noformat} The mothod transformTextToUTF8 and transformTextFromUTF8 have a error bug,It invoke a bad method of Text,getBytes()! The method getBytes of Text returns the raw bytes; however, only data up to Text.length is valid.A better way is use copyBytes() if you need the returned array to be precisely the length of the data. But the copyBytes is added behind hadoop1. {noformat} When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql , {code:sql} select * from web_searchhub where logdate=2015061003 {code} the result of sql see blow.Notice that ,the second row content contains the first row content. {noformat} INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 {noformat} The content of origin lzo file content see below ,just 2 rows. {noformat} INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 {noformat} I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : {code:sql} CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11125) when i run a sql use hive on spark, it seem like the hive cli finished, but the application is always running
[ https://issues.apache.org/jira/browse/HIVE-11125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-11125: --- Labels: TODOC-SPARK (was: ) when i run a sql use hive on spark, it seem like the hive cli finished, but the application is always running - Key: HIVE-11125 URL: https://issues.apache.org/jira/browse/HIVE-11125 Project: Hive Issue Type: Bug Components: spark-branch Affects Versions: 1.2.0 Environment: Hive1.2.0 Spark1.3.1 Hadoop2.5.1 Reporter: JoneZhang Assignee: Xuefu Zhang Labels: TODOC-SPARK when i run a sql use hive on spark,. The hive cli has finished hive (default) select count(id) from t1 where id100; Query ID = mqq_20150626174732_9e18f0c9-7b56-46ab-bf90-3b66f1a51300 Total jobs = 1 Launching Job 1 out of 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapreduce.job.reduces=number Starting Spark Job = 7d34cb8c-eaad-4724-a99a-37e517db80d9 Query Hive on Spark job[0] stages: 0 1 Status: Running (Hive on Spark job[0]) Job Progress Format CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost] 2015-06-26 17:47:53,746 Stage-0_0: 0(+1)/5 Stage-1_0: 0/1 2015-06-26 17:47:56,771 Stage-0_0: 1(+0)/5 Stage-1_0: 0/1 2015-06-26 17:47:57,778 Stage-0_0: 4(+1)/5 Stage-1_0: 0/1 2015-06-26 17:47:59,791 Stage-0_0: 5/5 Finished Stage-1_0: 0(+1)/1 2015-06-26 17:48:00,797 Stage-0_0: 5/5 Finished Stage-1_0: 1/1 Finished Status: Finished successfully in 18.08 seconds OK 5 Time taken: 28.512 seconds, Fetched: 1 row(s) But the application is always running state on resourcemanager User: mqq Name: Hive on Spark Application Type: SPARK Application Tags: State:RUNNING FinalStatus: UNDEFINED Started: 2015-06-26 17:47:38 Elapsed: 24mins, 33sec Tracking URL: ApplicationMaster Diagnostics: the hive.log is 2015-06-26 18:12:26,878 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(569)) - 15/06/26 18:12:26 main INFO org.apache.spark.deploy.yarn.Client Application report for application_1433328839160_0071 (state: RUNNING) 2015-06-26 18:12:27,879 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(569)) - 15/06/26 18:12:27 main INFO org.apache.spark.deploy.yarn.Client Application report for application_1433328839160_0071 (state: RUNNING) 2015-06-26 18:12:28,880 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(569)) - 15/06/26 18:12:28 main INFO org.apache.spark.deploy.yarn.Client Application report for application_1433328839160_0071 (state: RUNNING) ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11100) Beeline should escape semi-colon in queries
[ https://issues.apache.org/jira/browse/HIVE-11100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-11100: --- Attachment: (was: HIVE-11100.patch) Beeline should escape semi-colon in queries --- Key: HIVE-11100 URL: https://issues.apache.org/jira/browse/HIVE-11100 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 1.2.0, 1.1.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Minor Beeline should escape the semicolon in queries. for example, the query like followings: CREATE TABLE beeline_tb (c1 int, c2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ';' LINES TERMINATED BY '\n'; or CREATE TABLE beeline_tb (c1 int, c2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\;' LINES TERMINATED BY '\n'; both failed. But the 2nd query with semicolon escaped with \ works in CLI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11112) ISO-8859-1 text output has fragments of previous longer rows appended
[ https://issues.apache.org/jira/browse/HIVE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602848#comment-14602848 ] Yongzhi Chen commented on HIVE-2: - [~ctang.ma] , [~xuefuz] could you review the change? Thanks ISO-8859-1 text output has fragments of previous longer rows appended - Key: HIVE-2 URL: https://issues.apache.org/jira/browse/HIVE-2 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 1.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-2.1.patch If a LazySimpleSerDe table is created using ISO 8859-1 encoding, query results for a string column are incorrect for any row that was preceded by a row containing a longer string. Example steps to reproduce: 1. Create a table using ISO 8859-1 encoding: CREATE TABLE person_lat1 (name STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='ISO8859_1'); 2. Copy an ISO-8859-1 encoded text file into the appropriate warehouse folder in HDFS. I'll attach an example file containing the following text: Müller,Thomas Jørgensen,Jørgen Peña,Andrés Nåm,Fæk 3. Execute SELECT * FROM person_lat1 Result - The following output appears: +---+--+ | person_lat1.name | +---+--+ | Müller,Thomas | | Jørgensen,Jørgen | | Peña,Andrésørgen | | Nåm,Fækdrésørgen | +---+--+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11125) when i run a sql use hive on spark, it seem like the hive cli finished, but the application is always running
[ https://issues.apache.org/jira/browse/HIVE-11125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603083#comment-14603083 ] JoneZhang commented on HIVE-11125: -- Thank you very much. This sitiation is different from hive on mapreduce. I suggest that we should add some description about this in wiki. when i run a sql use hive on spark, it seem like the hive cli finished, but the application is always running - Key: HIVE-11125 URL: https://issues.apache.org/jira/browse/HIVE-11125 Project: Hive Issue Type: Bug Components: spark-branch Affects Versions: 1.2.0 Environment: Hive1.2.0 Spark1.3.1 Hadoop2.5.1 Reporter: JoneZhang Assignee: Xuefu Zhang when i run a sql use hive on spark,. The hive cli has finished hive (default) select count(id) from t1 where id100; Query ID = mqq_20150626174732_9e18f0c9-7b56-46ab-bf90-3b66f1a51300 Total jobs = 1 Launching Job 1 out of 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapreduce.job.reduces=number Starting Spark Job = 7d34cb8c-eaad-4724-a99a-37e517db80d9 Query Hive on Spark job[0] stages: 0 1 Status: Running (Hive on Spark job[0]) Job Progress Format CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost] 2015-06-26 17:47:53,746 Stage-0_0: 0(+1)/5 Stage-1_0: 0/1 2015-06-26 17:47:56,771 Stage-0_0: 1(+0)/5 Stage-1_0: 0/1 2015-06-26 17:47:57,778 Stage-0_0: 4(+1)/5 Stage-1_0: 0/1 2015-06-26 17:47:59,791 Stage-0_0: 5/5 Finished Stage-1_0: 0(+1)/1 2015-06-26 17:48:00,797 Stage-0_0: 5/5 Finished Stage-1_0: 1/1 Finished Status: Finished successfully in 18.08 seconds OK 5 Time taken: 28.512 seconds, Fetched: 1 row(s) But the application is always running state on resourcemanager User: mqq Name: Hive on Spark Application Type: SPARK Application Tags: State:RUNNING FinalStatus: UNDEFINED Started: 2015-06-26 17:47:38 Elapsed: 24mins, 33sec Tracking URL: ApplicationMaster Diagnostics: the hive.log is 2015-06-26 18:12:26,878 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(569)) - 15/06/26 18:12:26 main INFO org.apache.spark.deploy.yarn.Client Application report for application_1433328839160_0071 (state: RUNNING) 2015-06-26 18:12:27,879 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(569)) - 15/06/26 18:12:27 main INFO org.apache.spark.deploy.yarn.Client Application report for application_1433328839160_0071 (state: RUNNING) 2015-06-26 18:12:28,880 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(569)) - 15/06/26 18:12:28 main INFO org.apache.spark.deploy.yarn.Client Application report for application_1433328839160_0071 (state: RUNNING) ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-11125) when i run a sql use hive on spark, it seem like the hive cli finished, but the application is always running
[ https://issues.apache.org/jira/browse/HIVE-11125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang resolved HIVE-11125. Resolution: Not A Problem when i run a sql use hive on spark, it seem like the hive cli finished, but the application is always running - Key: HIVE-11125 URL: https://issues.apache.org/jira/browse/HIVE-11125 Project: Hive Issue Type: Bug Components: spark-branch Affects Versions: 1.2.0 Environment: Hive1.2.0 Spark1.3.1 Hadoop2.5.1 Reporter: JoneZhang Assignee: Xuefu Zhang when i run a sql use hive on spark,. The hive cli has finished hive (default) select count(id) from t1 where id100; Query ID = mqq_20150626174732_9e18f0c9-7b56-46ab-bf90-3b66f1a51300 Total jobs = 1 Launching Job 1 out of 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapreduce.job.reduces=number Starting Spark Job = 7d34cb8c-eaad-4724-a99a-37e517db80d9 Query Hive on Spark job[0] stages: 0 1 Status: Running (Hive on Spark job[0]) Job Progress Format CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost] 2015-06-26 17:47:53,746 Stage-0_0: 0(+1)/5 Stage-1_0: 0/1 2015-06-26 17:47:56,771 Stage-0_0: 1(+0)/5 Stage-1_0: 0/1 2015-06-26 17:47:57,778 Stage-0_0: 4(+1)/5 Stage-1_0: 0/1 2015-06-26 17:47:59,791 Stage-0_0: 5/5 Finished Stage-1_0: 0(+1)/1 2015-06-26 17:48:00,797 Stage-0_0: 5/5 Finished Stage-1_0: 1/1 Finished Status: Finished successfully in 18.08 seconds OK 5 Time taken: 28.512 seconds, Fetched: 1 row(s) But the application is always running state on resourcemanager User: mqq Name: Hive on Spark Application Type: SPARK Application Tags: State:RUNNING FinalStatus: UNDEFINED Started: 2015-06-26 17:47:38 Elapsed: 24mins, 33sec Tracking URL: ApplicationMaster Diagnostics: the hive.log is 2015-06-26 18:12:26,878 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(569)) - 15/06/26 18:12:26 main INFO org.apache.spark.deploy.yarn.Client Application report for application_1433328839160_0071 (state: RUNNING) 2015-06-26 18:12:27,879 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(569)) - 15/06/26 18:12:27 main INFO org.apache.spark.deploy.yarn.Client Application report for application_1433328839160_0071 (state: RUNNING) 2015-06-26 18:12:28,880 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(569)) - 15/06/26 18:12:28 main INFO org.apache.spark.deploy.yarn.Client Application report for application_1433328839160_0071 (state: RUNNING) ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11100) Beeline should escape semi-colon in queries
[ https://issues.apache.org/jira/browse/HIVE-11100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603184#comment-14603184 ] Xuefu Zhang commented on HIVE-11100: Can we find out how CLI is escaping? I feel a little uncomfortable the way we are processing multiple command lines using split by ; and then manually fixing the escaping problem. Ideally, we should add a grammar that can handle this. Beeline should escape semi-colon in queries --- Key: HIVE-11100 URL: https://issues.apache.org/jira/browse/HIVE-11100 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 1.2.0, 1.1.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Minor Attachments: HIVE-11100.patch Beeline should escape the semicolon in queries. for example, the query like followings: CREATE TABLE beeline_tb (c1 int, c2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ';' LINES TERMINATED BY '\n'; or CREATE TABLE beeline_tb (c1 int, c2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\;' LINES TERMINATED BY '\n'; both failed. But the 2nd query with semicolon escaped with \ works in CLI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11112) ISO-8859-1 text output has fragments of previous longer rows appended
[ https://issues.apache.org/jira/browse/HIVE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603194#comment-14603194 ] Xuefu Zhang commented on HIVE-2: +1 ISO-8859-1 text output has fragments of previous longer rows appended - Key: HIVE-2 URL: https://issues.apache.org/jira/browse/HIVE-2 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 1.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-2.1.patch If a LazySimpleSerDe table is created using ISO 8859-1 encoding, query results for a string column are incorrect for any row that was preceded by a row containing a longer string. Example steps to reproduce: 1. Create a table using ISO 8859-1 encoding: CREATE TABLE person_lat1 (name STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='ISO8859_1'); 2. Copy an ISO-8859-1 encoded text file into the appropriate warehouse folder in HDFS. I'll attach an example file containing the following text: Müller,Thomas Jørgensen,Jørgen Peña,Andrés Nåm,Fæk 3. Execute SELECT * FROM person_lat1 Result - The following output appears: +---+--+ | person_lat1.name | +---+--+ | Müller,Thomas | | Jørgensen,Jørgen | | Peña,Andrésørgen | | Nåm,Fækdrésørgen | +---+--+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11100) Beeline should escape semi-colon in queries
[ https://issues.apache.org/jira/browse/HIVE-11100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-11100: --- Attachment: HIVE-11100.patch For unknown reason, the precommit test did not run. Reattach the patch to kick off the build. Beeline should escape semi-colon in queries --- Key: HIVE-11100 URL: https://issues.apache.org/jira/browse/HIVE-11100 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 1.2.0, 1.1.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Minor Attachments: HIVE-11100.patch Beeline should escape the semicolon in queries. for example, the query like followings: CREATE TABLE beeline_tb (c1 int, c2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ';' LINES TERMINATED BY '\n'; or CREATE TABLE beeline_tb (c1 int, c2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\;' LINES TERMINATED BY '\n'; both failed. But the 2nd query with semicolon escaped with \ works in CLI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11100) Beeline should escape semi-colon in queries
[ https://issues.apache.org/jira/browse/HIVE-11100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602982#comment-14602982 ] Chaoyu Tang commented on HIVE-11100: Patch was uploaded to https://reviews.apache.org/r/35907/ and requested for review. Thanks in advance. Beeline should escape semi-colon in queries --- Key: HIVE-11100 URL: https://issues.apache.org/jira/browse/HIVE-11100 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 1.2.0, 1.1.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Minor Attachments: HIVE-11100.patch Beeline should escape the semicolon in queries. for example, the query like followings: CREATE TABLE beeline_tb (c1 int, c2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ';' LINES TERMINATED BY '\n'; or CREATE TABLE beeline_tb (c1 int, c2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\;' LINES TERMINATED BY '\n'; both failed. But the 2nd query with semicolon escaped with \ works in CLI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10895) ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources
[ https://issues.apache.org/jira/browse/HIVE-10895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603259#comment-14603259 ] Aihua Xu commented on HIVE-10895: - [~thejas], [~xuefuz], [~ctang.ma], could you please review the code? ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources --- Key: HIVE-10895 URL: https://issues.apache.org/jira/browse/HIVE-10895 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13 Reporter: Takahiko Saito Assignee: Aihua Xu Attachments: HIVE-10895.1.patch, HIVE-10895.2.patch, HIVE-10895.3.patch During testing, we've noticed Oracle db running out of cursors. Might be related to this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7598) Potential null pointer dereference in MergeTask#closeJob()
[ https://issues.apache.org/jira/browse/HIVE-7598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HIVE-7598: - Description: Call to Utilities.mvFileToFinalPath() passes null as second last parameter, conf. null gets passed to createEmptyBuckets() which dereferences conf directly: {code} boolean isCompressed = conf.getCompressed(); TableDesc tableInfo = conf.getTableInfo(); {code} was: Call to Utilities.mvFileToFinalPath() passes null as second last parameter, conf. null gets passed to createEmptyBuckets() which dereferences conf directly: {code} boolean isCompressed = conf.getCompressed(); TableDesc tableInfo = conf.getTableInfo(); {code} Potential null pointer dereference in MergeTask#closeJob() -- Key: HIVE-7598 URL: https://issues.apache.org/jira/browse/HIVE-7598 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: SUYEON LEE Priority: Minor Attachments: HIVE-7598.patch Call to Utilities.mvFileToFinalPath() passes null as second last parameter, conf. null gets passed to createEmptyBuckets() which dereferences conf directly: {code} boolean isCompressed = conf.getCompressed(); TableDesc tableInfo = conf.getTableInfo(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7672) Potential resource leak in EximUtil#createExportDump()
[ https://issues.apache.org/jira/browse/HIVE-7672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HIVE-7672: - Description: Here is related code: {code} OutputStream out = fs.create(metadataPath); out.write(jsonContainer.toString().getBytes(UTF-8)); out.close(); {code} If out.write() throws exception, out would be left unclosed. out.close() should be enclosed in finally block. was: Here is related code: {code} OutputStream out = fs.create(metadataPath); out.write(jsonContainer.toString().getBytes(UTF-8)); out.close(); {code} If out.write() throws exception, out would be left unclosed. out.close() should be enclosed in finally block. Potential resource leak in EximUtil#createExportDump() -- Key: HIVE-7672 URL: https://issues.apache.org/jira/browse/HIVE-7672 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: SUYEON LEE Priority: Minor Attachments: HIVE-7672.patch Here is related code: {code} OutputStream out = fs.create(metadataPath); out.write(jsonContainer.toString().getBytes(UTF-8)); out.close(); {code} If out.write() throws exception, out would be left unclosed. out.close() should be enclosed in finally block. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
[ https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603287#comment-14603287 ] Hive QA commented on HIVE-11055: {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12742081/HIVE-11055.2.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4396/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4396/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4396/ Messages: {noformat} This message was trimmed, see log for full details Downloading: http://repo.maven.apache.org/maven2/org/abego/treelayout/org.abego.treelayout.core/1.0.1/org.abego.treelayout.core-1.0.1.pom 4/4 KB Downloaded: http://repo.maven.apache.org/maven2/org/abego/treelayout/org.abego.treelayout.core/1.0.1/org.abego.treelayout.core-1.0.1.pom (4 KB at 275.5 KB/sec) Downloading: http://www.datanucleus.org/downloads/maven2/org/antlr/antlr4-runtime/4.5/antlr4-runtime-4.5.jar Downloading: http://www.datanucleus.org/downloads/maven2/org/abego/treelayout/org.abego.treelayout.core/1.0.1/org.abego.treelayout.core-1.0.1.jar Downloading: http://repo.maven.apache.org/maven2/org/antlr/antlr4-runtime/4.5/antlr4-runtime-4.5.jar Downloading: http://repo.maven.apache.org/maven2/org/abego/treelayout/org.abego.treelayout.core/1.0.1/org.abego.treelayout.core-1.0.1.jar 4/366 KB 8/366 KB 12/366 KB 4/25 KB 8/366 KB 4/25 KB 12/366 KB 6/25 KB 16/366 KB 10/25 KB 16/366 KB 14/25 KB 16/366 KB 18/25 KB 16/366 KB 6/25 KB 20/366 KB 18/25 KB 24/366 KB 18/25 KB 28/366 KB 22/25 KB 32/366 KB 22/25 KB 24/366 KB 22/25 KB 32/366 KB 25/25 KB 36/366 KB 25/25 KB 40/366 KB 25/25 KB Downloaded: http://repo.maven.apache.org/maven2/org/abego/treelayout/org.abego.treelayout.core/1.0.1/org.abego.treelayout.core-1.0.1.jar (25 KB at 692.1 KB/sec) 44/366 KB 48/366 KB 52/366 KB 56/366 KB 60/366 KB 64/366 KB 68/366 KB 72/366 KB 76/366 KB 80/366 KB 84/366 KB 88/366 KB 92/366 KB 96/366 KB 100/366 KB 104/366 KB 108/366 KB 112/366 KB 116/366 KB 120/366 KB 124/366 KB 128/366 KB 132/366 KB 136/366 KB 140/366 KB 144/366 KB 148/366 KB 152/366 KB 156/366 KB 160/366 KB 164/366 KB 168/366 KB 172/366 KB 176/366 KB 180/366 KB 184/366 KB 188/366 KB 192/366 KB 196/366 KB 200/366 KB 204/366 KB 208/366 KB 212/366 KB 216/366 KB 220/366 KB 224/366 KB 228/366 KB 232/366 KB 236/366 KB 240/366 KB 244/366 KB 248/366 KB 252/366 KB 256/366 KB 260/366 KB 264/366 KB 268/366 KB 272/366 KB 276/366 KB 280/366 KB 284/366 KB 288/366 KB 292/366 KB 296/366 KB 300/366 KB 304/366 KB 308/366 KB 312/366 KB 316/366 KB 320/366 KB 324/366 KB 328/366 KB 332/366 KB 336/366 KB 340/366 KB 344/366 KB 348/366 KB 352/366 KB 356/366 KB 360/366 KB 364/366 KB 366/366 KB Downloaded: http://repo.maven.apache.org/maven2/org/antlr/antlr4-runtime/4.5/antlr4-runtime-4.5.jar (366 KB at 2536.6 KB/sec) [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-hplsql --- [INFO] Deleting /data/hive-ptest/working/apache-github-source-source/hplsql (includes = [datanucleus.log, derby.log], excludes = []) [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ hive-hplsql --- [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-hplsql --- [INFO] [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ hive-hplsql --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-github-source-source/hplsql/src/main/resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-hplsql --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-hplsql --- [INFO] Compiling 30 source files to /data/hive-ptest/working/apache-github-source-source/hplsql/target/classes [INFO] - [ERROR] COMPILATION ERROR : [INFO] - [ERROR] /data/hive-ptest/working/apache-github-source-source/hplsql/src/main/java/org/apache/hive/hplsql/Copy.java:[292,38] cannot find symbol symbol: method resolvePath(org.apache.hadoop.fs.Path) location: variable fs of type
[jira] [Updated] (HIVE-10233) Hive on tez: memory manager for grace hash join
[ https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-10233: -- Attachment: HIVE-10233.18.patch Hive on tez: memory manager for grace hash join --- Key: HIVE-10233 URL: https://issues.apache.org/jira/browse/HIVE-10233 Project: Hive Issue Type: Bug Components: Tez Affects Versions: llap, 2.0.0 Reporter: Vikram Dixit K Assignee: Gunther Hagleitner Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, HIVE-10233.09.patch, HIVE-10233.10.patch, HIVE-10233.11.patch, HIVE-10233.12.patch, HIVE-10233.13.patch, HIVE-10233.14.patch, HIVE-10233.15.patch, HIVE-10233.16.patch, HIVE-10233.17.patch, HIVE-10233.18.patch We need a memory manager in llap/tez to manage the usage of memory across threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10754) new Job() is deprecated. Replaced all with Job.getInstance() for Hcatalog
[ https://issues.apache.org/jira/browse/HIVE-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603366#comment-14603366 ] Aihua Xu commented on HIVE-10754: - [~mithun] and [~ctang.ma], can you guys take a look? Should be straightforward. The test failed is not related. new Job() is deprecated. Replaced all with Job.getInstance() for Hcatalog - Key: HIVE-10754 URL: https://issues.apache.org/jira/browse/HIVE-10754 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 1.2.0 Reporter: Aihua Xu Assignee: Aihua Xu Attachments: HIVE-10754.patch Replace all the deprecated new Job() with Job.getInstance() in HCatalog. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11128) Stats annotation should consider select star same as select without column list
[ https://issues.apache.org/jira/browse/HIVE-11128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603384#comment-14603384 ] Prasanth Jayachandran commented on HIVE-11128: -- I think this is not the actual issue. Select 1 from table is a valid query. Although not a select * query. It will contain empty column list. But output column expression map and output signature will have references to the constant which will be taken into account during data size estimation. Stats annotation should consider select star same as select without column list --- Key: HIVE-11128 URL: https://issues.apache.org/jira/browse/HIVE-11128 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 1.2.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-11128.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11119) Spark reduce vectorization doesnt account for scratch columns
[ https://issues.apache.org/jira/browse/HIVE-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603406#comment-14603406 ] Xuefu Zhang commented on HIVE-9: Patch looks good to me. Just left a minor question on RB. Thanks. +1 Spark reduce vectorization doesnt account for scratch columns - Key: HIVE-9 URL: https://issues.apache.org/jira/browse/HIVE-9 Project: Hive Issue Type: Bug Components: Spark, Vectorization Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-9.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11119) Spark reduce vectorization doesnt account for scratch columns
[ https://issues.apache.org/jira/browse/HIVE-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603417#comment-14603417 ] Ashutosh Chauhan commented on HIVE-9: - yeah.. will move to util class Spark reduce vectorization doesnt account for scratch columns - Key: HIVE-9 URL: https://issues.apache.org/jira/browse/HIVE-9 Project: Hive Issue Type: Bug Components: Spark, Vectorization Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-9.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10895) ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources
[ https://issues.apache.org/jira/browse/HIVE-10895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-10895: Attachment: (was: HIVE-10895.3.patch) ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources --- Key: HIVE-10895 URL: https://issues.apache.org/jira/browse/HIVE-10895 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13 Reporter: Takahiko Saito Assignee: Aihua Xu Attachments: HIVE-10895.1.patch, HIVE-10895.2.patch, HIVE-10895.3.patch During testing, we've noticed Oracle db running out of cursors. Might be related to this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11095) SerDeUtils another bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603272#comment-14603272 ] Hive QA commented on HIVE-11095: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12742079/HIVE-11095.2.patch.txt {color:green}SUCCESS:{color} +1 9025 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4395/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4395/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4395/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12742079 - PreCommit-HIVE-TRUNK-Build SerDeUtils another bug ,when Text is reused Key: HIVE-11095 URL: https://issues.apache.org/jira/browse/HIVE-11095 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0, 1.0.0, 1.2.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Fix For: 1.2.0 Attachments: HIVE-11095.1.patch.txt, HIVE-11095.2.patch.txt {noformat} The method transformTextFromUTF8 have a error bug, It invoke a bad method of Text,getBytes()! The method getBytes of Text returns the raw bytes; however, only data up to Text.length is valid.A better way is use copyBytes() if you need the returned array to be precisely the length of the data. But the copyBytes is added behind hadoop1. {noformat} How I found this bug? When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql , {code:sql} select * from web_searchhub where logdate=2015061003 {code} the result of sql see blow.Notice that ,the second row content contains the first row content. {noformat} INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 {noformat} The content of origin lzo file content see below ,just 2 rows. {noformat} INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 {noformat} I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : {code:sql} CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10233) Hive on tez: memory manager for grace hash join
[ https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-10233: -- Attachment: HIVE-10233.17.patch Hive on tez: memory manager for grace hash join --- Key: HIVE-10233 URL: https://issues.apache.org/jira/browse/HIVE-10233 Project: Hive Issue Type: Bug Components: Tez Affects Versions: llap, 2.0.0 Reporter: Vikram Dixit K Assignee: Gunther Hagleitner Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, HIVE-10233.09.patch, HIVE-10233.10.patch, HIVE-10233.11.patch, HIVE-10233.12.patch, HIVE-10233.13.patch, HIVE-10233.14.patch, HIVE-10233.15.patch, HIVE-10233.16.patch, HIVE-10233.17.patch We need a memory manager in llap/tez to manage the usage of memory across threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10950) Unit test against HBase Metastore
[ https://issues.apache.org/jira/browse/HIVE-10950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-10950: -- Attachment: HIVE-10950-2.patch The previous patch cannot recover from failures. So it can only run individual qtest, but not the whole TestCliDriver. Attach a new patch solves the problem. It brings back a clean snapshot of hbase metastore after every test. Unit test against HBase Metastore - Key: HIVE-10950 URL: https://issues.apache.org/jira/browse/HIVE-10950 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Daniel Dai Assignee: Daniel Dai Fix For: hbase-metastore-branch Attachments: HIVE-10950-1.patch, HIVE-10950-2.patch We need to run the entire Hive UT against HBase Metastore and make sure they pass. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10376) Move code to create jar for ivydownload.q to a separate id in maven ant-run-plugin in itests/pom.xml and remove sed dependency.
[ https://issues.apache.org/jira/browse/HIVE-10376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603296#comment-14603296 ] Hive QA commented on HIVE-10376: {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12726129/HIVE-10376.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4397/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4397/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4397/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-4397/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at 2a77e87 HIVE-11051: Hive 1.2.0 MapJoin w/Tez - LazyBinaryArray cannot be cast to [Ljava.lang.Object; (Matt McCline via Gopal V) + git clean -f -d Removing bin/ext/hplsql.sh Removing bin/hplsql Removing bin/hplsql.cmd Removing hplsql/ + git checkout master Already on 'master' + git reset --hard origin/master HEAD is now at 2a77e87 HIVE-11051: Hive 1.2.0 MapJoin w/Tez - LazyBinaryArray cannot be cast to [Ljava.lang.Object; (Matt McCline via Gopal V) + git merge --ff-only origin/master Already up-to-date. + git gc + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12726129 - PreCommit-HIVE-TRUNK-Build Move code to create jar for ivydownload.q to a separate id in maven ant-run-plugin in itests/pom.xml and remove sed dependency. --- Key: HIVE-10376 URL: https://issues.apache.org/jira/browse/HIVE-10376 Project: Hive Issue Type: Improvement Reporter: Anant Nag Assignee: Anant Nag Attachments: HIVE-10376.patch Currently the code to create an example jar for ivyDownload.q is piggybanked on the download-spark ant-run-plugin id. This code should be moved to a separate execution id called something like create-ivytest-jar or more generally itests-setup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11128) Stats annotation should consider select star same as select without column list
[ https://issues.apache.org/jira/browse/HIVE-11128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-11128: Attachment: HIVE-11128.patch Stats annotation should consider select star same as select without column list --- Key: HIVE-11128 URL: https://issues.apache.org/jira/browse/HIVE-11128 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 1.2.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-11128.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11104) Select operator doesn't propagate constants appearing in expressions
[ https://issues.apache.org/jira/browse/HIVE-11104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603307#comment-14603307 ] Ashutosh Chauhan commented on HIVE-11104: - [~prasanth_j] Its an unrelated issue which exists in StatsAnnotation rules. Opened HIVE-11128 for it. Select operator doesn't propagate constants appearing in expressions Key: HIVE-11104 URL: https://issues.apache.org/jira/browse/HIVE-11104 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.2.1 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-11104.2.patch, HIVE-11104.3.patch, HIVE-11104.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11104) Select operator doesn't propagate constants appearing in expressions
[ https://issues.apache.org/jira/browse/HIVE-11104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603205#comment-14603205 ] Ashutosh Chauhan commented on HIVE-11104: - [~prasanth_j] Can you please take a look? Select operator doesn't propagate constants appearing in expressions Key: HIVE-11104 URL: https://issues.apache.org/jira/browse/HIVE-11104 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.2.1 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-11104.2.patch, HIVE-11104.3.patch, HIVE-11104.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10895) ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources
[ https://issues.apache.org/jira/browse/HIVE-10895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-10895: Attachment: HIVE-10895.3.patch ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources --- Key: HIVE-10895 URL: https://issues.apache.org/jira/browse/HIVE-10895 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13 Reporter: Takahiko Saito Assignee: Aihua Xu Attachments: HIVE-10895.1.patch, HIVE-10895.2.patch, HIVE-10895.3.patch During testing, we've noticed Oracle db running out of cursors. Might be related to this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7305) Return value from in.read() is ignored in SerializationUtils#readLongLE()
[ https://issues.apache.org/jira/browse/HIVE-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HIVE-7305: - Description: {code} long readLongLE(InputStream in) throws IOException { in.read(readBuffer, 0, 8); return (((readBuffer[0] 0xff) 0) + ((readBuffer[1] 0xff) 8) {code} Return value from read() may indicate fewer than 8 bytes read. The return value should be checked. was: {code} long readLongLE(InputStream in) throws IOException { in.read(readBuffer, 0, 8); return (((readBuffer[0] 0xff) 0) + ((readBuffer[1] 0xff) 8) {code} Return value from read() may indicate fewer than 8 bytes read. The return value should be checked. Return value from in.read() is ignored in SerializationUtils#readLongLE() - Key: HIVE-7305 URL: https://issues.apache.org/jira/browse/HIVE-7305 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: skrho Priority: Minor Attachments: HIVE-7305_001.patch {code} long readLongLE(InputStream in) throws IOException { in.read(readBuffer, 0, 8); return (((readBuffer[0] 0xff) 0) + ((readBuffer[1] 0xff) 8) {code} Return value from read() may indicate fewer than 8 bytes read. The return value should be checked. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11104) Select operator doesn't propagate constants appearing in expressions
[ https://issues.apache.org/jira/browse/HIVE-11104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603284#comment-14603284 ] Prasanth Jayachandran commented on HIVE-11104: -- The stats diff does not look correct. All data sizes are now 0 which will break all join optimizations. Select operator doesn't propagate constants appearing in expressions Key: HIVE-11104 URL: https://issues.apache.org/jira/browse/HIVE-11104 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.2.1 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-11104.2.patch, HIVE-11104.3.patch, HIVE-11104.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11112) ISO-8859-1 text output has fragments of previous longer rows appended
[ https://issues.apache.org/jira/browse/HIVE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603301#comment-14603301 ] Yongzhi Chen commented on HIVE-2: - Thanks [~xuefuz] for reviewing it. ISO-8859-1 text output has fragments of previous longer rows appended - Key: HIVE-2 URL: https://issues.apache.org/jira/browse/HIVE-2 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 1.2.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-2.1.patch If a LazySimpleSerDe table is created using ISO 8859-1 encoding, query results for a string column are incorrect for any row that was preceded by a row containing a longer string. Example steps to reproduce: 1. Create a table using ISO 8859-1 encoding: CREATE TABLE person_lat1 (name STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='ISO8859_1'); 2. Copy an ISO-8859-1 encoded text file into the appropriate warehouse folder in HDFS. I'll attach an example file containing the following text: Müller,Thomas Jørgensen,Jørgen Peña,Andrés Nåm,Fæk 3. Execute SELECT * FROM person_lat1 Result - The following output appears: +---+--+ | person_lat1.name | +---+--+ | Müller,Thomas | | Jørgensen,Jørgen | | Peña,Andrésørgen | | Nåm,Fækdrésørgen | +---+--+ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11118) Load data query should valide file formats with destination tables
[ https://issues.apache.org/jira/browse/HIVE-8?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-8: -- Attachment: HIVE-8.4.patch Load data query should valide file formats with destination tables -- Key: HIVE-8 URL: https://issues.apache.org/jira/browse/HIVE-8 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-8.2.patch, HIVE-8.3.patch, HIVE-8.4.patch, HIVE-8.patch Load data local inpath queries does not do any validation wrt file format. If the destination table is ORC and if we try to load files that are not ORC, the load will succeed but querying such tables will result in runtime exceptions. We can do some simple sanity checks to prevent loading of files that does not match the destination table file format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11104) Select operator doesn't propagate constants appearing in expressions
[ https://issues.apache.org/jira/browse/HIVE-11104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603378#comment-14603378 ] Prasanth Jayachandran commented on HIVE-11104: -- I think the issues seems to be with column expression map not containing ExprNodeConstantDesc. Stats annotation is aware of constant projections if it is contained in colExprMap. That being said, I am fine with taking this in subsequent follow up jira. +1 Select operator doesn't propagate constants appearing in expressions Key: HIVE-11104 URL: https://issues.apache.org/jira/browse/HIVE-11104 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.2.1 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-11104.2.patch, HIVE-11104.3.patch, HIVE-11104.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11032) Enable more tests for grouping by skewed data [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603622#comment-14603622 ] Xuefu Zhang commented on HIVE-11032: [~mohitsabharwal], could you please create a JIRA tracking the missing feature of hive.explain.user and related tests? Thanks. Enable more tests for grouping by skewed data [Spark Branch] Key: HIVE-11032 URL: https://issues.apache.org/jira/browse/HIVE-11032 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Mohit Sabharwal Priority: Minor Attachments: HIVE-11032.1-spark.patch, HIVE-11032.2-spark.patch Not all of such tests are enabled, e.g. {{groupby1_map_skew.q}}. We can use this JIRA to track whether we need more of them. Basically, we need to look at all tests with {{set hive.groupby.skewindata=true;}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10233) Hive on tez: memory manager for grace hash join
[ https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-10233: -- Attachment: HIVE-10233.20.patch Hive on tez: memory manager for grace hash join --- Key: HIVE-10233 URL: https://issues.apache.org/jira/browse/HIVE-10233 Project: Hive Issue Type: Bug Components: Tez Affects Versions: llap, 2.0.0 Reporter: Vikram Dixit K Assignee: Gunther Hagleitner Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, HIVE-10233.09.patch, HIVE-10233.10.patch, HIVE-10233.11.patch, HIVE-10233.12.patch, HIVE-10233.13.patch, HIVE-10233.14.patch, HIVE-10233.15.patch, HIVE-10233.16.patch, HIVE-10233.17.patch, HIVE-10233.18.patch, HIVE-10233.19.patch, HIVE-10233.20.patch We need a memory manager in llap/tez to manage the usage of memory across threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11031) ORC concatenation of old files can fail while merging column statistics
[ https://issues.apache.org/jira/browse/HIVE-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603439#comment-14603439 ] Prasanth Jayachandran commented on HIVE-11031: -- It could be because of this or HIVE-10685. Can you try with branch-1.2 and see if it works for your query? Alternatively you can provide me a small repro. I can verify and confirm. ORC concatenation of old files can fail while merging column statistics --- Key: HIVE-11031 URL: https://issues.apache.org/jira/browse/HIVE-11031 Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0, 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Priority: Critical Fix For: 1.2.1 Attachments: HIVE-11031-branch-1.0.patch, HIVE-11031.2.patch, HIVE-11031.3.patch, HIVE-11031.4.patch, HIVE-11031.patch Column statistics in ORC are optional protobuf fields. Old ORC files might not have statistics for newly added types like decimal, date, timestamp etc. But column statistics merging assumes column statistics exists for these types and invokes merge. For example, merging of TimestampColumnStatistics directly casts the received ColumnStatistics object without doing instanceof check. If the ORC file contains time stamp column statistics then this will work else it will throw ClassCastException. Also, the file merge operator swallows the exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10983) SerDeUtils bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603458#comment-14603458 ] Hive QA commented on HIVE-10983: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12742100/HIVE-10983.5.patch.txt {color:green}SUCCESS:{color} +1 9025 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4398/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4398/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4398/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12742100 - PreCommit-HIVE-TRUNK-Build SerDeUtils bug ,when Text is reused - Key: HIVE-10983 URL: https://issues.apache.org/jira/browse/HIVE-10983 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0, 1.0.0, 1.2.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Labels: patch Fix For: 0.14.1, 1.2.0 Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt, HIVE-10983.3.patch.txt, HIVE-10983.4.patch.txt, HIVE-10983.5.patch.txt {noformat} The mothod transformTextToUTF8 and transformTextFromUTF8 have a error bug,It invoke a bad method of Text,getBytes()! The method getBytes of Text returns the raw bytes; however, only data up to Text.length is valid.A better way is use copyBytes() if you need the returned array to be precisely the length of the data. But the copyBytes is added behind hadoop1. {noformat} When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql , {code:sql} select * from web_searchhub where logdate=2015061003 {code} the result of sql see blow.Notice that ,the second row content contains the first row content. {noformat} INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 {noformat} The content of origin lzo file content see below ,just 2 rows. {noformat} INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 {noformat} I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : {code:sql} CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
[ https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603489#comment-14603489 ] Dmitry Tolpeko commented on HIVE-11055: --- I will need to compile for hadoop-1 and try to find replacement for method resolvePath(org.apache.hadoop.fs.Path) available in hadoop-2 only. Any hints how to deals with such cases? Thanks. HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution) --- Key: HIVE-11055 URL: https://issues.apache.org/jira/browse/HIVE-11055 Project: Hive Issue Type: Improvement Reporter: Dmitry Tolpeko Assignee: Dmitry Tolpeko Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive (actually any SQL-on-Hadoop implementation and any JDBC source). Alan Gates offered to contribute it to Hive under HPL/SQL name (org.apache.hive.hplsql package). This JIRA is to create a patch to contribute the PL/HQL code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11130) Refactoring the code so that HiveTxnManager interface will support lock/unlock table/database object
[ https://issues.apache.org/jira/browse/HIVE-11130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-11130: Attachment: HIVE-11130.patch Refactoring the code so that HiveTxnManager interface will support lock/unlock table/database object Key: HIVE-11130 URL: https://issues.apache.org/jira/browse/HIVE-11130 Project: Hive Issue Type: Sub-task Components: Locking Affects Versions: 2.0.0 Reporter: Aihua Xu Assignee: Aihua Xu Attachments: HIVE-11130.patch This is just a refactoring step which keeps the current logic, but it exposes the explicit lock/unlock table and database in HiveTxnManager which should be implemented differently by the subclasses ( currently it's not. e.g., for ZooKeeper implementation, we should lock table and database when we try to lock the table). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11104) Select operator doesn't propagate constants appearing in expressions
[ https://issues.apache.org/jira/browse/HIVE-11104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603549#comment-14603549 ] Ashutosh Chauhan commented on HIVE-11104: - Pushed to branch-1 as well. Select operator doesn't propagate constants appearing in expressions Key: HIVE-11104 URL: https://issues.apache.org/jira/browse/HIVE-11104 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.2.1 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 2.0.0 Attachments: HIVE-11104.2.patch, HIVE-11104.3.patch, HIVE-11104.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11100) Beeline should escape semi-colon in queries
[ https://issues.apache.org/jira/browse/HIVE-11100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603561#comment-14603561 ] Hive QA commented on HIVE-11100: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12742137/HIVE-11100.patch {color:green}SUCCESS:{color} +1 9032 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4399/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4399/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4399/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12742137 - PreCommit-HIVE-TRUNK-Build Beeline should escape semi-colon in queries --- Key: HIVE-11100 URL: https://issues.apache.org/jira/browse/HIVE-11100 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 1.2.0, 1.1.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Minor Attachments: HIVE-11100.patch Beeline should escape the semicolon in queries. for example, the query like followings: CREATE TABLE beeline_tb (c1 int, c2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ';' LINES TERMINATED BY '\n'; or CREATE TABLE beeline_tb (c1 int, c2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\;' LINES TERMINATED BY '\n'; both failed. But the 2nd query with semicolon escaped with \ works in CLI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11104) Select operator doesn't propagate constants appearing in expressions
[ https://issues.apache.org/jira/browse/HIVE-11104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603468#comment-14603468 ] Gunther Hagleitner commented on HIVE-11104: --- [~ashutoshc] branch-1? Select operator doesn't propagate constants appearing in expressions Key: HIVE-11104 URL: https://issues.apache.org/jira/browse/HIVE-11104 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.2.1 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 2.0.0 Attachments: HIVE-11104.2.patch, HIVE-11104.3.patch, HIVE-11104.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10233) Hive on tez: memory manager for grace hash join
[ https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-10233: -- Attachment: HIVE-10233.19.patch Hive on tez: memory manager for grace hash join --- Key: HIVE-10233 URL: https://issues.apache.org/jira/browse/HIVE-10233 Project: Hive Issue Type: Bug Components: Tez Affects Versions: llap, 2.0.0 Reporter: Vikram Dixit K Assignee: Gunther Hagleitner Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, HIVE-10233.09.patch, HIVE-10233.10.patch, HIVE-10233.11.patch, HIVE-10233.12.patch, HIVE-10233.13.patch, HIVE-10233.14.patch, HIVE-10233.15.patch, HIVE-10233.16.patch, HIVE-10233.17.patch, HIVE-10233.18.patch, HIVE-10233.19.patch We need a memory manager in llap/tez to manage the usage of memory across threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10233) Hive on tez: memory manager for grace hash join
[ https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603478#comment-14603478 ] Gunther Hagleitner commented on HIVE-10233: --- .19 has more reported fixes (fallback in case of all joins are small, actually making fallback work...) Hive on tez: memory manager for grace hash join --- Key: HIVE-10233 URL: https://issues.apache.org/jira/browse/HIVE-10233 Project: Hive Issue Type: Bug Components: Tez Affects Versions: llap, 2.0.0 Reporter: Vikram Dixit K Assignee: Gunther Hagleitner Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, HIVE-10233.09.patch, HIVE-10233.10.patch, HIVE-10233.11.patch, HIVE-10233.12.patch, HIVE-10233.13.patch, HIVE-10233.14.patch, HIVE-10233.15.patch, HIVE-10233.16.patch, HIVE-10233.17.patch, HIVE-10233.18.patch, HIVE-10233.19.patch We need a memory manager in llap/tez to manage the usage of memory across threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-11129) Issue a warning when copied from UTF-8 to ISO 8859-1
[ https://issues.apache.org/jira/browse/HIVE-11129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu reassigned HIVE-11129: --- Assignee: Aihua Xu Issue a warning when copied from UTF-8 to ISO 8859-1 Key: HIVE-11129 URL: https://issues.apache.org/jira/browse/HIVE-11129 Project: Hive Issue Type: Bug Components: File Formats Reporter: Aihua Xu Assignee: Aihua Xu Copying data from a table using UTF-8 encoding to one using ISO 8859-1 encoding causes data corruption without warning. {noformat} CREATE TABLE person_utf8 (name STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='UTF8'); {noformat} Put the following data in the table: Müller,Thomas Jørgensen,Jørgen Vega,Andrés 中村,浩人 אביה,נועם {noformat} CREATE TABLE person_2 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='ISO8859_1') AS select * from person_utf8; {noformat} expected to get mangled data but we should give a warning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11032) Enable more tests for grouping by skewed data [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603481#comment-14603481 ] Mohit Sabharwal commented on HIVE-11032: Thanks [~lirui], yes verified that query plan is in line with what we see in MR. When {{hive.groupby.skewindata=true}} is set, unless there is a distinct clause, the Reduce Output Operator partitions based on {{rand()}}. (The subsequent Reducer then does partial aggregation and the following reducer does final aggregation.) I also verified the behavior for other cases as well, for example when {{hive.map.aggr=true}} is set in addition to {{hive.groupby.skewindata=true}} as documented here: https://cwiki.apache.org/confluence/display/Hive/GroupByWithRollup The {{index_bitmap3}} test failure is unrelated to this patch. Enable more tests for grouping by skewed data [Spark Branch] Key: HIVE-11032 URL: https://issues.apache.org/jira/browse/HIVE-11032 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Mohit Sabharwal Priority: Minor Attachments: HIVE-11032.1-spark.patch, HIVE-11032.2-spark.patch Not all of such tests are enabled, e.g. {{groupby1_map_skew.q}}. We can use this JIRA to track whether we need more of them. Basically, we need to look at all tests with {{set hive.groupby.skewindata=true;}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11106) HiveServer2 JDBC (greater than v0.13.1) cannot connect to non-default database
[ https://issues.apache.org/jira/browse/HIVE-11106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Coleman updated HIVE-11106: --- Description: Using HiveServer 0.14.0 or greater, I cannot connect a non-default database. For example when connecting to HiveServer to via the following URLs, the session uses the 'default' database, instead of the intended database. jdbc://localhost:1/customDb This exact issue was fixed in 0.13.1 of HiveServer from https://issues.apache.org/jira/browse/HIVE-5904 but for some reason this fix was not ported to v0.14.0 or greater. From looking at the source, it looks as if this fix was overriden by another change to the HiveConnection class, was this intentional or a defect reintroduced from another defect fix? This means that we need to use 0.13.1 in order to connect to a non-default database via JDBC and we cannot upgrade Hive versions. We don't want placing a JDBC interceptor to inject use customDb each time a connection is borrowed from the pool on production code. One should be able to connect straight to the non-default database via the JDBC URL. Now it perhaps could be a simple oversight on my behalf in which the syntax to connect to a non-default database has changed from 0.14.0 onwards but I'd be grateful is this could be confirmed. was: Using HiveServer 0.14.0 or greater, I cannot connect a non-default database. For example when connecting to HiveServer to via the following URLs, the session uses the 'default' database, instead of the intended database. jdbc://localhost:1/customDb This exact issue was fixed in 0.13.1 of HiveServer from https://issues.apache.org/jira/browse/HIVE-5904 but for some reason this fix was not ported to v0.14.0 or greater. From looking at the source, it looks as if this fix was overriden by another change to the HiveConnection class, was this intentional or a defect reintroduced from another defect fix? This means that we need to use 0.13.1 in order to connect to a non-default database via JDBC and we cannot upgrade Hive versions. Now it perhaps could be a simple oversight on my behalf in which the syntax to connect to a non-default database has changed from 0.14.0 onwards but I'd be grateful is this could be confirmed. HiveServer2 JDBC (greater than v0.13.1) cannot connect to non-default database -- Key: HIVE-11106 URL: https://issues.apache.org/jira/browse/HIVE-11106 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.14.0 Reporter: Tom Coleman Using HiveServer 0.14.0 or greater, I cannot connect a non-default database. For example when connecting to HiveServer to via the following URLs, the session uses the 'default' database, instead of the intended database. jdbc://localhost:1/customDb This exact issue was fixed in 0.13.1 of HiveServer from https://issues.apache.org/jira/browse/HIVE-5904 but for some reason this fix was not ported to v0.14.0 or greater. From looking at the source, it looks as if this fix was overriden by another change to the HiveConnection class, was this intentional or a defect reintroduced from another defect fix? This means that we need to use 0.13.1 in order to connect to a non-default database via JDBC and we cannot upgrade Hive versions. We don't want placing a JDBC interceptor to inject use customDb each time a connection is borrowed from the pool on production code. One should be able to connect straight to the non-default database via the JDBC URL. Now it perhaps could be a simple oversight on my behalf in which the syntax to connect to a non-default database has changed from 0.14.0 onwards but I'd be grateful is this could be confirmed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10983) SerDeUtils bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602637#comment-14602637 ] Chengxiang Li commented on HIVE-10983: -- Great, [~xiaowei], let's wait for the unit test result. Besides, could you also test it with your own test case. SerDeUtils bug ,when Text is reused - Key: HIVE-10983 URL: https://issues.apache.org/jira/browse/HIVE-10983 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0, 1.0.0, 1.2.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Labels: patch Fix For: 0.14.1, 1.2.0 Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt, HIVE-10983.3.patch.txt, HIVE-10983.4.patch.txt {noformat} The mothod transformTextToUTF8 have a error bug,It invoke a bad method of Text,getBytes()! The method getBytes of Text returns the raw bytes; however, only data up to Text.length is valid.A better way is use copyBytes() if you need the returned array to be precisely the length of the data. But the copyBytes is added behind hadoop1. {noformat} When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql , {code:sql} select * from web_searchhub where logdate=2015061003 {code} the result of sql see blow.Notice that ,the second row content contains the first row content. {noformat} INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 {noformat} The content of origin lzo file content see below ,just 2 rows. {noformat} INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 {noformat} I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : {code:sql} CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
[ https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602645#comment-14602645 ] Dmitry Tolpeko commented on HIVE-11055: --- Correction: plhql-site.xml - hplsql-site.xml HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution) --- Key: HIVE-11055 URL: https://issues.apache.org/jira/browse/HIVE-11055 Project: Hive Issue Type: Improvement Reporter: Dmitry Tolpeko Assignee: Dmitry Tolpeko Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive (actually any SQL-on-Hadoop implementation and any JDBC source). Alan Gates offered to contribute it to Hive under HPL/SQL name (org.apache.hive.hplsql package). This JIRA is to create a patch to contribute the PL/HQL code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11118) Load data query should validate file formats with destination tables
[ https://issues.apache.org/jira/browse/HIVE-8?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-8: -- Summary: Load data query should validate file formats with destination tables (was: Load data query should valide file formats with destination tables) Load data query should validate file formats with destination tables Key: HIVE-8 URL: https://issues.apache.org/jira/browse/HIVE-8 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-8.2.patch, HIVE-8.3.patch, HIVE-8.4.patch, HIVE-8.patch Load data local inpath queries does not do any validation wrt file format. If the destination table is ORC and if we try to load files that are not ORC, the load will succeed but querying such tables will result in runtime exceptions. We can do some simple sanity checks to prevent loading of files that does not match the destination table file format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11128) Stats annotation should consider select star same as select without column list
[ https://issues.apache.org/jira/browse/HIVE-11128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-11128: Attachment: HIVE-11128.2.patch Stats annotation should consider select star same as select without column list --- Key: HIVE-11128 URL: https://issues.apache.org/jira/browse/HIVE-11128 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 1.2.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-11128.2.patch, HIVE-11128.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11118) Load data query should validate file formats with destination tables
[ https://issues.apache.org/jira/browse/HIVE-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603885#comment-14603885 ] Hive QA commented on HIVE-8: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12742179/HIVE-8.4.patch {color:green}SUCCESS:{color} +1 9030 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4402/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4402/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4402/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12742179 - PreCommit-HIVE-TRUNK-Build Load data query should validate file formats with destination tables Key: HIVE-8 URL: https://issues.apache.org/jira/browse/HIVE-8 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-8.2.patch, HIVE-8.3.patch, HIVE-8.4.patch, HIVE-8.patch Load data local inpath queries does not do any validation wrt file format. If the destination table is ORC and if we try to load files that are not ORC, the load will succeed but querying such tables will result in runtime exceptions. We can do some simple sanity checks to prevent loading of files that does not match the destination table file format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11133) Support hive.explain.user for Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-11133: --- Summary: Support hive.explain.user for Spark [Spark Branch] (was: Support hive.explain.user for Spark) Support hive.explain.user for Spark [Spark Branch] -- Key: HIVE-11133 URL: https://issues.apache.org/jira/browse/HIVE-11133 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Mohit Sabharwal User friendly explain output ({{set hive.explain.user=true}}) should support Spark as well. Once supported, we should also enable related q-tests like {{explainuser_1.q}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10233) Hive on tez: memory manager for grace hash join
[ https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603901#comment-14603901 ] Prasanth Jayachandran commented on HIVE-10233: -- Looks good to me too. Hive on tez: memory manager for grace hash join --- Key: HIVE-10233 URL: https://issues.apache.org/jira/browse/HIVE-10233 Project: Hive Issue Type: Bug Components: Tez Affects Versions: llap, 2.0.0 Reporter: Vikram Dixit K Assignee: Gunther Hagleitner Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, HIVE-10233.09.patch, HIVE-10233.10.patch, HIVE-10233.11.patch, HIVE-10233.12.patch, HIVE-10233.13.patch, HIVE-10233.14.patch, HIVE-10233.15.patch, HIVE-10233.16.patch, HIVE-10233.17.patch, HIVE-10233.18.patch, HIVE-10233.19.patch, HIVE-10233.20.patch, HIVE-10233.21.patch, HIVE-10233.22.patch, HIVE-10233.23.patch We need a memory manager in llap/tez to manage the usage of memory across threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10233) Hive on tez: memory manager for grace hash join
[ https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-10233: -- Attachment: HIVE-10233.22.patch Hive on tez: memory manager for grace hash join --- Key: HIVE-10233 URL: https://issues.apache.org/jira/browse/HIVE-10233 Project: Hive Issue Type: Bug Components: Tez Affects Versions: llap, 2.0.0 Reporter: Vikram Dixit K Assignee: Gunther Hagleitner Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, HIVE-10233.09.patch, HIVE-10233.10.patch, HIVE-10233.11.patch, HIVE-10233.12.patch, HIVE-10233.13.patch, HIVE-10233.14.patch, HIVE-10233.15.patch, HIVE-10233.16.patch, HIVE-10233.17.patch, HIVE-10233.18.patch, HIVE-10233.19.patch, HIVE-10233.20.patch, HIVE-10233.21.patch, HIVE-10233.22.patch We need a memory manager in llap/tez to manage the usage of memory across threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11032) Enable more tests for grouping by skewed data [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603814#comment-14603814 ] Mohit Sabharwal commented on HIVE-11032: Created HIVE-11133 to support {{hive.explain.user}} for Spark. Enable more tests for grouping by skewed data [Spark Branch] Key: HIVE-11032 URL: https://issues.apache.org/jira/browse/HIVE-11032 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Mohit Sabharwal Priority: Minor Attachments: HIVE-11032.1-spark.patch, HIVE-11032.2-spark.patch Not all of such tests are enabled, e.g. {{groupby1_map_skew.q}}. We can use this JIRA to track whether we need more of them. Basically, we need to look at all tests with {{set hive.groupby.skewindata=true;}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11028) Tez: table self join and join with another table fails with IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/HIVE-11028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-11028: -- Fix Version/s: 1.2.2 Tez: table self join and join with another table fails with IndexOutOfBoundsException - Key: HIVE-11028 URL: https://issues.apache.org/jira/browse/HIVE-11028 Project: Hive Issue Type: Bug Components: Query Planning Reporter: Jason Dere Assignee: Jason Dere Fix For: 2.0.0, 1.2.2 Attachments: HIVE-11028.1.patch, HIVE-11028.2.patch, HIVE-11028.3.patch {noformat} create table tez_self_join1(id1 int, id2 string, id3 string); insert into table tez_self_join1 values(1, 'aa','bb'), (2, 'ab','ab'), (3,'ba','ba'); create table tez_self_join2(id1 int); insert into table tez_self_join2 values(1),(2),(3); explain select s.id2, s.id3 from ( select self1.id1, self1.id2, self1.id3 from tez_self_join1 self1 join tez_self_join1 self2 on self1.id2=self2.id3 ) s join tez_self_join2 on s.id1=tez_self_join2.id1 where s.id2='ab'; {noformat} fails with error: {noformat} 2015-06-16 15:41:55,759 ERROR [main]: ql.Driver (SessionState.java:printError(979)) - FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 3, vertexId=vertex_1434494327112_0002_4_04, diagnostics=[Task failed, taskId=task_1434494327112_0002_4_04_00, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:118) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:109) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:290) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:275) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.getJoinOutputObjectInspector(CommonJoinOperator.java:175) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.initializeOp(CommonJoinOperator.java:313) at org.apache.hadoop.hive.ql.exec.AbstractMapJoinOperator.initializeOp(AbstractMapJoinOperator.java:71) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.initializeOp(CommonMergeJoinOperator.java:99) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:146) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147) ... 13 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10233) Hive on tez: memory manager for grace hash join
[ https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603817#comment-14603817 ] Hive QA commented on HIVE-10233: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12742236/HIVE-10233.21.patch {color:green}SUCCESS:{color} +1 9027 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4401/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4401/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4401/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12742236 - PreCommit-HIVE-TRUNK-Build Hive on tez: memory manager for grace hash join --- Key: HIVE-10233 URL: https://issues.apache.org/jira/browse/HIVE-10233 Project: Hive Issue Type: Bug Components: Tez Affects Versions: llap, 2.0.0 Reporter: Vikram Dixit K Assignee: Gunther Hagleitner Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, HIVE-10233.09.patch, HIVE-10233.10.patch, HIVE-10233.11.patch, HIVE-10233.12.patch, HIVE-10233.13.patch, HIVE-10233.14.patch, HIVE-10233.15.patch, HIVE-10233.16.patch, HIVE-10233.17.patch, HIVE-10233.18.patch, HIVE-10233.19.patch, HIVE-10233.20.patch, HIVE-10233.21.patch, HIVE-10233.22.patch We need a memory manager in llap/tez to manage the usage of memory across threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10233) Hive on tez: memory manager for grace hash join
[ https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-10233: -- Attachment: HIVE-10233.23.patch Hive on tez: memory manager for grace hash join --- Key: HIVE-10233 URL: https://issues.apache.org/jira/browse/HIVE-10233 Project: Hive Issue Type: Bug Components: Tez Affects Versions: llap, 2.0.0 Reporter: Vikram Dixit K Assignee: Gunther Hagleitner Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, HIVE-10233.09.patch, HIVE-10233.10.patch, HIVE-10233.11.patch, HIVE-10233.12.patch, HIVE-10233.13.patch, HIVE-10233.14.patch, HIVE-10233.15.patch, HIVE-10233.16.patch, HIVE-10233.17.patch, HIVE-10233.18.patch, HIVE-10233.19.patch, HIVE-10233.20.patch, HIVE-10233.21.patch, HIVE-10233.22.patch, HIVE-10233.23.patch We need a memory manager in llap/tez to manage the usage of memory across threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10233) Hive on tez: memory manager for grace hash join
[ https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-10233: -- Attachment: HIVE-10233.21.patch Hive on tez: memory manager for grace hash join --- Key: HIVE-10233 URL: https://issues.apache.org/jira/browse/HIVE-10233 Project: Hive Issue Type: Bug Components: Tez Affects Versions: llap, 2.0.0 Reporter: Vikram Dixit K Assignee: Gunther Hagleitner Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, HIVE-10233.09.patch, HIVE-10233.10.patch, HIVE-10233.11.patch, HIVE-10233.12.patch, HIVE-10233.13.patch, HIVE-10233.14.patch, HIVE-10233.15.patch, HIVE-10233.16.patch, HIVE-10233.17.patch, HIVE-10233.18.patch, HIVE-10233.19.patch, HIVE-10233.20.patch, HIVE-10233.21.patch We need a memory manager in llap/tez to manage the usage of memory across threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11131) Get row information on DataWritableWriter once for better writing performance
[ https://issues.apache.org/jira/browse/HIVE-11131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-11131: --- Attachment: HIVE-11131.1.patch Get row information on DataWritableWriter once for better writing performance - Key: HIVE-11131 URL: https://issues.apache.org/jira/browse/HIVE-11131 Project: Hive Issue Type: Sub-task Affects Versions: 1.2.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-11131.1.patch DataWritableWriter is a class used to write Hive records to Parquet files. This class is getting all the information about how to parse a record, such as schema and object inspector, every time a record is written (or write() is called). We can make this class perform better by initializing some writers per data type once, and saving all object inspectors on each writer. The class expects that the next records written will have the same object inspectors and schema, so there is no need to have conditions for that. When a new schema is written, DataWritableWriter is created again by Parquet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11131) Get row information on DataWritableWriter once for better writing performance
[ https://issues.apache.org/jira/browse/HIVE-11131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-11131: --- Attachment: (was: HIVE-11131.1.patch) Get row information on DataWritableWriter once for better writing performance - Key: HIVE-11131 URL: https://issues.apache.org/jira/browse/HIVE-11131 Project: Hive Issue Type: Sub-task Affects Versions: 1.2.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-11131.1.patch DataWritableWriter is a class used to write Hive records to Parquet files. This class is getting all the information about how to parse a record, such as schema and object inspector, every time a record is written (or write() is called). We can make this class perform better by initializing some writers per data type once, and saving all object inspectors on each writer. The class expects that the next records written will have the same object inspectors and schema, so there is no need to have conditions for that. When a new schema is written, DataWritableWriter is created again by Parquet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10233) Hive on tez: memory manager for grace hash join
[ https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603852#comment-14603852 ] Vikram Dixit K commented on HIVE-10233: --- +1 LGTM. Hive on tez: memory manager for grace hash join --- Key: HIVE-10233 URL: https://issues.apache.org/jira/browse/HIVE-10233 Project: Hive Issue Type: Bug Components: Tez Affects Versions: llap, 2.0.0 Reporter: Vikram Dixit K Assignee: Gunther Hagleitner Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, HIVE-10233.09.patch, HIVE-10233.10.patch, HIVE-10233.11.patch, HIVE-10233.12.patch, HIVE-10233.13.patch, HIVE-10233.14.patch, HIVE-10233.15.patch, HIVE-10233.16.patch, HIVE-10233.17.patch, HIVE-10233.18.patch, HIVE-10233.19.patch, HIVE-10233.20.patch, HIVE-10233.21.patch, HIVE-10233.22.patch, HIVE-10233.23.patch We need a memory manager in llap/tez to manage the usage of memory across threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8177) Wrong parameter order in ExplainTask#getJSONLogicalPlan()
[ https://issues.apache.org/jira/browse/HIVE-8177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HIVE-8177: - Description: {code} JSONObject jsonPlan = outputMap(work.getParseContext().getTopOps(), true, out, jsonOutput, work.getExtended(), 0); {code} The order of 4th and 5th parameters is reverted. was: {code} JSONObject jsonPlan = outputMap(work.getParseContext().getTopOps(), true, out, jsonOutput, work.getExtended(), 0); {code} The order of 4th and 5th parameters is reverted. Wrong parameter order in ExplainTask#getJSONLogicalPlan() - Key: HIVE-8177 URL: https://issues.apache.org/jira/browse/HIVE-8177 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: SUYEON LEE Priority: Minor Attachments: HIVE-8177.patch {code} JSONObject jsonPlan = outputMap(work.getParseContext().getTopOps(), true, out, jsonOutput, work.getExtended(), 0); {code} The order of 4th and 5th parameters is reverted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8342) Potential null dereference in ColumnTruncateMapper#jobClose()
[ https://issues.apache.org/jira/browse/HIVE-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HIVE-8342: - Description: {code} Utilities.mvFileToFinalPath(outputPath, job, success, LOG, dynPartCtx, null, reporter); {code} Utilities.mvFileToFinalPath() calls createEmptyBuckets() where conf is dereferenced: {code} boolean isCompressed = conf.getCompressed(); TableDesc tableInfo = conf.getTableInfo(); {code} was: {code} Utilities.mvFileToFinalPath(outputPath, job, success, LOG, dynPartCtx, null, reporter); {code} Utilities.mvFileToFinalPath() calls createEmptyBuckets() where conf is dereferenced: {code} boolean isCompressed = conf.getCompressed(); TableDesc tableInfo = conf.getTableInfo(); {code} Potential null dereference in ColumnTruncateMapper#jobClose() - Key: HIVE-8342 URL: https://issues.apache.org/jira/browse/HIVE-8342 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: skrho Priority: Minor Attachments: HIVE-8342_001.patch, HIVE-8342_002.patch {code} Utilities.mvFileToFinalPath(outputPath, job, success, LOG, dynPartCtx, null, reporter); {code} Utilities.mvFileToFinalPath() calls createEmptyBuckets() where conf is dereferenced: {code} boolean isCompressed = conf.getCompressed(); TableDesc tableInfo = conf.getTableInfo(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8343) Return value from BlockingQueue.offer() is not checked in DynamicPartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HIVE-8343: - Description: In addEvent() and processVertex(), there is call such as the following: {code} queue.offer(event); {code} The return value should be checked. If false is returned, event would not have been queued. Take a look at line 328 in: http://fuseyism.com/classpath/doc/java/util/concurrent/LinkedBlockingQueue-source.html was: In addEvent() and processVertex(), there is call such as the following: {code} queue.offer(event); {code} The return value should be checked. If false is returned, event would not have been queued. Take a look at line 328 in: http://fuseyism.com/classpath/doc/java/util/concurrent/LinkedBlockingQueue-source.html Return value from BlockingQueue.offer() is not checked in DynamicPartitionPruner Key: HIVE-8343 URL: https://issues.apache.org/jira/browse/HIVE-8343 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: JongWon Park Priority: Minor Attachments: HIVE-8343.patch In addEvent() and processVertex(), there is call such as the following: {code} queue.offer(event); {code} The return value should be checked. If false is returned, event would not have been queued. Take a look at line 328 in: http://fuseyism.com/classpath/doc/java/util/concurrent/LinkedBlockingQueue-source.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8285) Reference equality is used on boolean values in PartitionPruner#removeTruePredciates()
[ https://issues.apache.org/jira/browse/HIVE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HIVE-8285: - Description: {code} if (e.getTypeInfo() == TypeInfoFactory.booleanTypeInfo eC.getValue() == Boolean.TRUE) { {code} equals() should be used in the above comparison. was: {code} if (e.getTypeInfo() == TypeInfoFactory.booleanTypeInfo eC.getValue() == Boolean.TRUE) { {code} equals() should be used in the above comparison. Reference equality is used on boolean values in PartitionPruner#removeTruePredciates() -- Key: HIVE-8285 URL: https://issues.apache.org/jira/browse/HIVE-8285 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Ted Yu Priority: Minor Attachments: HIVE-8285.patch {code} if (e.getTypeInfo() == TypeInfoFactory.booleanTypeInfo eC.getValue() == Boolean.TRUE) { {code} equals() should be used in the above comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11131) Get row information on DataWritableWriter once for better writing performance
[ https://issues.apache.org/jira/browse/HIVE-11131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-11131: --- Attachment: (was: HIVE-11131.1.patch) Get row information on DataWritableWriter once for better writing performance - Key: HIVE-11131 URL: https://issues.apache.org/jira/browse/HIVE-11131 Project: Hive Issue Type: Sub-task Affects Versions: 1.2.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-11131.1.patch DataWritableWriter is a class used to write Hive records to Parquet files. This class is getting all the information about how to parse a record, such as schema and object inspector, every time a record is written (or write() is called). We can make this class perform better by initializing some writers per data type once, and saving all object inspectors on each writer. The class expects that the next records written will have the same object inspectors and schema, so there is no need to have conditions for that. When a new schema is written, DataWritableWriter is created again by Parquet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11131) Get row information on DataWritableWriter once for better writing performance
[ https://issues.apache.org/jira/browse/HIVE-11131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-11131: --- Attachment: HIVE-11131.1.patch Get row information on DataWritableWriter once for better writing performance - Key: HIVE-11131 URL: https://issues.apache.org/jira/browse/HIVE-11131 Project: Hive Issue Type: Sub-task Affects Versions: 1.2.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-11131.1.patch DataWritableWriter is a class used to write Hive records to Parquet files. This class is getting all the information about how to parse a record, such as schema and object inspector, every time a record is written (or write() is called). We can make this class perform better by initializing some writers per data type once, and saving all object inspectors on each writer. The class expects that the next records written will have the same object inspectors and schema, so there is no need to have conditions for that. When a new schema is written, DataWritableWriter is created again by Parquet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11132) Queries using join and group by produce incorrect output when hive.auto.convert.join=false and hive.optimize.reducededuplication=true
[ https://issues.apache.org/jira/browse/HIVE-11132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603705#comment-14603705 ] Rich Haase commented on HIVE-11132: --- Explain plan when hive.auto.convert.join=false and hive.optimize.reducededuplication=true: STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: mooo Statistics: Num rows: 1511511 Data size: 3402058087 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (((oppty_id is not null and oppty_line_id is not null) and (order_order_system 'sfdc_performance')) and (oppty_id = '006400CZbnWAAT')) (type: boolean) Statistics: Num rows: 188939 Data size: 425257542 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: '006400CZbnWAAT' (type: string), oppty_line_id (type: string) sort order: ++ Map-reduce partition columns: '006400CZbnWAAT' (type: string), oppty_line_id (type: string) Statistics: Num rows: 188939 Data size: 425257542 Basic stats: COMPLETE Column stats: NONE TableScan alias: mooo_s Statistics: Num rows: 1511511 Data size: 940228122 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: ((oppty_id is not null and oppty_line_id is not null) and (oppty_id = '006400CZbnWAAT')) (type: boolean) Statistics: Num rows: 188939 Data size: 117528593 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: '006400CZbnWAAT' (type: string), oppty_line_id (type: string) sort order: ++ Map-reduce partition columns: '006400CZbnWAAT' (type: string), oppty_line_id (type: string) Statistics: Num rows: 188939 Data size: 117528593 Basic stats: COMPLETE Column stats: NONE TableScan alias: forecast Statistics: Num rows: 29923099 Data size: 7723657280 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: ((oppty_id is not null and oppty_line_id is not null) and (oppty_id = '006400CZbnWAAT')) (type: boolean) Statistics: Num rows: 3740387 Data size: 965457063 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: '006400CZbnWAAT' (type: string), oppty_line_id (type: string) sort order: ++ Map-reduce partition columns: '006400CZbnWAAT' (type: string), oppty_line_id (type: string) Statistics: Num rows: 3740387 Data size: 965457063 Basic stats: COMPLETE Column stats: NONE TableScan alias: split Statistics: Num rows: 2072636 Data size: 524862652 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: ((oppty_id is not null and oppty_line_id is not null) and (oppty_id = '006400CZbnWAAT')) (type: boolean) Statistics: Num rows: 259079 Data size: 65607704 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: '006400CZbnWAAT' (type: string), oppty_line_id (type: string) sort order: ++ Map-reduce partition columns: '006400CZbnWAAT' (type: string), oppty_line_id (type: string) Statistics: Num rows: 259079 Data size: 65607704 Basic stats: COMPLETE Column stats: NONE Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 Inner Join 0 to 2 Inner Join 0 to 3 condition expressions: 0 1 2 3 Statistics: Num rows: 12343277 Data size: 3186008376 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: '006400CZbnWAAT' (type: string) outputColumnNames: _col0 Statistics: Num rows: 12343277 Data size: 3186008376 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: count() keys: _col0 (type: string) mode: complete outputColumnNames: _col0, _col1 Statistics: Num rows: 6171638 Data size: 1593004058 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: string), _col1 (type: bigint) outputColumnNames: _col0, _col1 Statistics: Num rows: 6171638 Data size: 1593004058 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed:
[jira] [Commented] (HIVE-10895) ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources
[ https://issues.apache.org/jira/browse/HIVE-10895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603712#comment-14603712 ] Hive QA commented on HIVE-10895: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12742160/HIVE-10895.3.patch {color:green}SUCCESS:{color} +1 9032 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4400/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4400/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4400/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12742160 - PreCommit-HIVE-TRUNK-Build ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources --- Key: HIVE-10895 URL: https://issues.apache.org/jira/browse/HIVE-10895 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13 Reporter: Takahiko Saito Assignee: Aihua Xu Attachments: HIVE-10895.1.patch, HIVE-10895.2.patch, HIVE-10895.3.patch During testing, we've noticed Oracle db running out of cursors. Might be related to this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10983) SerDeUtils bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiaowei wang updated HIVE-10983: Attachment: HIVE-10983.3.patch.txt SerDeUtils bug ,when Text is reused - Key: HIVE-10983 URL: https://issues.apache.org/jira/browse/HIVE-10983 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0, 1.0.0, 1.2.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Labels: patch Fix For: 0.14.1, 1.2.0 Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt, HIVE-10983.3.patch.txt {noformat} The mothod transformTextToUTF8 have a error bug,It invoke a bad method of Text,getBytes()! The method getBytes of Text returns the raw bytes; however, only data up to Text.length is valid.A better way is use copyBytes() if you need the returned array to be precisely the length of the data. But the copyBytes is added behind hadoop1. {noformat} When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql , {code:sql} select * from web_searchhub where logdate=2015061003 {code} the result of sql see blow.Notice that ,the second row content contains the first row content. {noformat} INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 {noformat} The content of origin lzo file content see below ,just 2 rows. {noformat} INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 {noformat} I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : {code:sql} CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-2998) Making Hive run on Windows Server and Windows Azure environment
[ https://issues.apache.org/jira/browse/HIVE-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602616#comment-14602616 ] nevi_me commented on HIVE-2998: --- Will this only be for Windows Server only? Making Hive run on Windows Server and Windows Azure environment --- Key: HIVE-2998 URL: https://issues.apache.org/jira/browse/HIVE-2998 Project: Hive Issue Type: Improvement Affects Versions: 0.7.1, 0.8.1 Environment: Windows Server 2008 R2 and Windows Azure Reporter: Lengning Liu This is the master JIRA for improvements to Hive that would enable it to run natively on Windows Server and Windows Azure environments. Microsoft has done the initial work here to have Hive (releases 0.7.1 and 0.8.1) running on Windows and would like to contribute this work back to the community. The end-to-end HiveQL query tests pass. We are currently working on investigating failed unit test cases. It is expected that we post the initial patches within a few weeks for review. Looking forward to the collaboration. Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11095) SerDeUtils another bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602617#comment-14602617 ] xiaowei wang commented on HIVE-11095: - According to the suggestion of Chengxiang Li ,I put up a new patch, HIVE-11095.2.patch.txt SerDeUtils another bug ,when Text is reused Key: HIVE-11095 URL: https://issues.apache.org/jira/browse/HIVE-11095 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0, 1.0.0, 1.2.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Fix For: 1.2.0 Attachments: HIVE-11095.1.patch.txt, HIVE-11095.2.patch.txt {noformat} The method transformTextFromUTF8 have a error bug, It invoke a bad method of Text,getBytes()! The method getBytes of Text returns the raw bytes; however, only data up to Text.length is valid.A better way is use copyBytes() if you need the returned array to be precisely the length of the data. But the copyBytes is added behind hadoop1. {noformat} How I found this bug? When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql , {code:sql} select * from web_searchhub where logdate=2015061003 {code} the result of sql see blow.Notice that ,the second row content contains the first row content. {noformat} INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 {noformat} The content of origin lzo file content see below ,just 2 rows. {noformat} INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 {noformat} I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : {code:sql} CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10983) SerDeUtils bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiaowei wang updated HIVE-10983: Attachment: HIVE-10983.4.patch.txt SerDeUtils bug ,when Text is reused - Key: HIVE-10983 URL: https://issues.apache.org/jira/browse/HIVE-10983 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0, 1.0.0, 1.2.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Labels: patch Fix For: 0.14.1, 1.2.0 Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt, HIVE-10983.3.patch.txt, HIVE-10983.4.patch.txt {noformat} The mothod transformTextToUTF8 have a error bug,It invoke a bad method of Text,getBytes()! The method getBytes of Text returns the raw bytes; however, only data up to Text.length is valid.A better way is use copyBytes() if you need the returned array to be precisely the length of the data. But the copyBytes is added behind hadoop1. {noformat} When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql , {code:sql} select * from web_searchhub where logdate=2015061003 {code} the result of sql see blow.Notice that ,the second row content contains the first row content. {noformat} INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 {noformat} The content of origin lzo file content see below ,just 2 rows. {noformat} INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 {noformat} I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : {code:sql} CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10983) SerDeUtils bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602596#comment-14602596 ] xiaowei wang commented on HIVE-10983: - According to the suggestion of Chengxiang Li ,I put up a new patch, HIVE-10983.4.patch.txt SerDeUtils bug ,when Text is reused - Key: HIVE-10983 URL: https://issues.apache.org/jira/browse/HIVE-10983 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0, 1.0.0, 1.2.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Labels: patch Fix For: 0.14.1, 1.2.0 Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt, HIVE-10983.3.patch.txt, HIVE-10983.4.patch.txt {noformat} The mothod transformTextToUTF8 have a error bug,It invoke a bad method of Text,getBytes()! The method getBytes of Text returns the raw bytes; however, only data up to Text.length is valid.A better way is use copyBytes() if you need the returned array to be precisely the length of the data. But the copyBytes is added behind hadoop1. {noformat} When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql , {code:sql} select * from web_searchhub where logdate=2015061003 {code} the result of sql see blow.Notice that ,the second row content contains the first row content. {noformat} INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 {noformat} The content of origin lzo file content see below ,just 2 rows. {noformat} INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 {noformat} I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : {code:sql} CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10895) ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources
[ https://issues.apache.org/jira/browse/HIVE-10895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602603#comment-14602603 ] Hive QA commented on HIVE-10895: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12741934/HIVE-10895.3.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9029 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.metastore.TestAdminUser.testCreateAdminNAddUser {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4389/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4389/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4389/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12741934 - PreCommit-HIVE-TRUNK-Build ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources --- Key: HIVE-10895 URL: https://issues.apache.org/jira/browse/HIVE-10895 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13 Reporter: Takahiko Saito Assignee: Aihua Xu Attachments: HIVE-10895.1.patch, HIVE-10895.2.patch, HIVE-10895.3.patch During testing, we've noticed Oracle db running out of cursors. Might be related to this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11031) ORC concatenation of old files can fail while merging column statistics
[ https://issues.apache.org/jira/browse/HIVE-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602613#comment-14602613 ] Demeter Sztanko commented on HIVE-11031: Hello [~prasanth_j], my MR jobs are getting this error when concatenating ORC files: {code} java.io.IOException: java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 0 at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:226) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:136) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:230) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:210) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 0 at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:105) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:224) ... 11 more Caused by: java.lang.IndexOutOfBoundsException: Index: 0 at java.util.Collections$EmptyList.get(Collections.java:3212) at org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeRecordReader.nextStripe(OrcFileStripeMergeRecordReader.java:82) at org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeRecordReader.next(OrcFileStripeMergeRecordReader.java:71) at org.apache.hadoop.hive.ql.io.orc.OrcFileStripeMergeRecordReader.next(OrcFileStripeMergeRecordReader.java:31) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) ... 15 more 2015-06-26 08:24:19,248 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task {code} Is this failure a result of the bug described in this ticket or that can be a different problem? ORC concatenation of old files can fail while merging column statistics --- Key: HIVE-11031 URL: https://issues.apache.org/jira/browse/HIVE-11031 Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0, 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Priority: Critical Fix For: 1.2.1 Attachments: HIVE-11031-branch-1.0.patch, HIVE-11031.2.patch, HIVE-11031.3.patch, HIVE-11031.4.patch, HIVE-11031.patch Column statistics in ORC are optional protobuf fields. Old ORC files might not have statistics for newly added types like decimal, date, timestamp etc. But column statistics merging assumes column statistics exists for these types and invokes merge. For example, merging of TimestampColumnStatistics directly casts the received ColumnStatistics object without doing instanceof check. If the ORC file contains time stamp column statistics then this will work else it will throw ClassCastException. Also, the file merge operator swallows the exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11095) SerDeUtils another bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiaowei wang updated HIVE-11095: Attachment: HIVE-11095.2.patch.txt SerDeUtils another bug ,when Text is reused Key: HIVE-11095 URL: https://issues.apache.org/jira/browse/HIVE-11095 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0, 1.0.0, 1.2.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Fix For: 1.2.0 Attachments: HIVE-11095.1.patch.txt, HIVE-11095.2.patch.txt {noformat} The method transformTextFromUTF8 have a error bug, It invoke a bad method of Text,getBytes()! The method getBytes of Text returns the raw bytes; however, only data up to Text.length is valid.A better way is use copyBytes() if you need the returned array to be precisely the length of the data. But the copyBytes is added behind hadoop1. {noformat} How I found this bug? When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql , {code:sql} select * from web_searchhub where logdate=2015061003 {code} the result of sql see blow.Notice that ,the second row content contains the first row content. {noformat} INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 {noformat} The content of origin lzo file content see below ,just 2 rows. {noformat} INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 {noformat} I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : {code:sql} CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
[ https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Tolpeko updated HIVE-11055: -- Attachment: HIVE-11055.2.patch HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution) --- Key: HIVE-11055 URL: https://issues.apache.org/jira/browse/HIVE-11055 Project: Hive Issue Type: Improvement Reporter: Dmitry Tolpeko Assignee: Dmitry Tolpeko Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive (actually any SQL-on-Hadoop implementation and any JDBC source). Alan Gates offered to contribute it to Hive under HPL/SQL name (org.apache.hive.hplsql package). This JIRA is to create a patch to contribute the PL/HQL code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11055) HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution)
[ https://issues.apache.org/jira/browse/HIVE-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602626#comment-14602626 ] Dmitry Tolpeko commented on HIVE-11055: --- HIVE-11055.2.patch created: Updates: 1) Modified Hive pom.xml, added modulehplsql/module to build HPL/SQL tool 2) Added hplsql/pom.xml 3) Added bin/hplsql and bin/ext/hplsql.sh to run the tool from shell 4) Added bin/hplsql.cmd for Windows. Open issues: 1) The tool depends on antlr-runtime-4.5.jar that needs to be put to $HIVE_LIB 2) The tool uses plhql-site.xml configuration file that needs to be distributed as well HPL/SQL - Implementing Procedural SQL in Hive (PL/HQL Contribution) --- Key: HIVE-11055 URL: https://issues.apache.org/jira/browse/HIVE-11055 Project: Hive Issue Type: Improvement Reporter: Dmitry Tolpeko Assignee: Dmitry Tolpeko Attachments: HIVE-11055.1.patch, HIVE-11055.2.patch There is PL/HQL tool (www.plhql.org) that implements procedural SQL for Hive (actually any SQL-on-Hadoop implementation and any JDBC source). Alan Gates offered to contribute it to Hive under HPL/SQL name (org.apache.hive.hplsql package). This JIRA is to create a patch to contribute the PL/HQL code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11131) Get row information on DataWritableWriter once for better writing performance
[ https://issues.apache.org/jira/browse/HIVE-11131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-11131: --- Attachment: HIVE-11131.2.patch Get row information on DataWritableWriter once for better writing performance - Key: HIVE-11131 URL: https://issues.apache.org/jira/browse/HIVE-11131 Project: Hive Issue Type: Sub-task Affects Versions: 1.2.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-11131.1.patch, HIVE-11131.2.patch DataWritableWriter is a class used to write Hive records to Parquet files. This class is getting all the information about how to parse a record, such as schema and object inspector, every time a record is written (or write() is called). We can make this class perform better by initializing some writers per data type once, and saving all object inspectors on each writer. The class expects that the next records written will have the same object inspectors and schema, so there is no need to have conditions for that. When a new schema is written, DataWritableWriter is created again by Parquet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11131) Get row information on DataWritableWriter once for better writing performance
[ https://issues.apache.org/jira/browse/HIVE-11131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-11131: --- Attachment: (was: HIVE-11131.1.patch) Get row information on DataWritableWriter once for better writing performance - Key: HIVE-11131 URL: https://issues.apache.org/jira/browse/HIVE-11131 Project: Hive Issue Type: Sub-task Affects Versions: 1.2.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-11131.2.patch DataWritableWriter is a class used to write Hive records to Parquet files. This class is getting all the information about how to parse a record, such as schema and object inspector, every time a record is written (or write() is called). We can make this class perform better by initializing some writers per data type once, and saving all object inspectors on each writer. The class expects that the next records written will have the same object inspectors and schema, so there is no need to have conditions for that. When a new schema is written, DataWritableWriter is created again by Parquet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11123) Fix how to confirm the RDBMS product name at Metastore.
[ https://issues.apache.org/jira/browse/HIVE-11123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shinichi Yamashita updated HIVE-11123: -- Attachment: HIVE-11123.1.patch I attach a patch file. Fix how to confirm the RDBMS product name at Metastore. --- Key: HIVE-11123 URL: https://issues.apache.org/jira/browse/HIVE-11123 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.2.0 Environment: PostgreSQL Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Attachments: HIVE-11123.1.patch I use PostgreSQL to Hive Metastore. And I saw the following message at PostgreSQL log. {code} 2015-06-26 10:58:15.488 JST ERROR: syntax error at or near @@ at character 5 2015-06-26 10:58:15.488 JST STATEMENT: SET @@session.sql_mode=ANSI_QUOTES 2015-06-26 10:58:15.489 JST ERROR: relation v$instance does not exist at character 21 2015-06-26 10:58:15.489 JST STATEMENT: SELECT version FROM v$instance 2015-06-26 10:58:15.490 JST ERROR: column version does not exist at character 10 2015-06-26 10:58:15.490 JST STATEMENT: SELECT @@version {code} When Hive CLI and Beeline embedded mode are carried out, this message is output to PostgreSQL log. These queries are called from MetaStoreDirectSql#determineDbType. And if we use MetaStoreDirectSql#getProductName, we need not to call these queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11130) Refactoring the code so that HiveTxnManager interface will support lock/unlock table/database object
[ https://issues.apache.org/jira/browse/HIVE-11130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603929#comment-14603929 ] Hive QA commented on HIVE-11130: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12742216/HIVE-11130.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9027 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchAbort org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4403/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4403/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4403/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12742216 - PreCommit-HIVE-TRUNK-Build Refactoring the code so that HiveTxnManager interface will support lock/unlock table/database object Key: HIVE-11130 URL: https://issues.apache.org/jira/browse/HIVE-11130 Project: Hive Issue Type: Sub-task Components: Locking Affects Versions: 2.0.0 Reporter: Aihua Xu Assignee: Aihua Xu Attachments: HIVE-11130.patch This is just a refactoring step which keeps the current logic, but it exposes the explicit lock/unlock table and database in HiveTxnManager which should be implemented differently by the subclasses ( currently it's not. e.g., for ZooKeeper implementation, we should lock table and database when we try to lock the table). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-7150) FileInputStream is not closed in HiveConnection#getHttpClient()
[ https://issues.apache.org/jira/browse/HIVE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov reassigned HIVE-7150: - Assignee: Alexander Pivovarov FileInputStream is not closed in HiveConnection#getHttpClient() --- Key: HIVE-7150 URL: https://issues.apache.org/jira/browse/HIVE-7150 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: Alexander Pivovarov Labels: jdbc Attachments: HIVE-7150.1.patch, HIVE-7150.2.patch Here is related code: {code} sslTrustStore.load(new FileInputStream(sslTrustStorePath), sslTrustStorePassword.toCharArray()); {code} The FileInputStream is not closed upon returning from the method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)