[jira] [Updated] (HIVE-8128) Improve Parquet Vectorization
[ https://issues.apache.org/jira/browse/HIVE-8128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Chen updated HIVE-8128: Attachment: HIVE-8128.1-parquet.patch Rebased to parquet branch based on HIVE-10975. Build pass locally. Improve Parquet Vectorization - Key: HIVE-8128 URL: https://issues.apache.org/jira/browse/HIVE-8128 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Dong Chen Fix For: parquet-branch Attachments: HIVE-8128-parquet.patch.POC, HIVE-8128.1-parquet.patch NO PRECOMMIT TESTS We'll want to do is finish the vectorization work (e.g. VectorizedOrcSerde, VectorizedOrcSerde) which was partially done in HIVE-5998. As discussed in PARQUET-131, we will work out Hive POC based on the new Parquet vectorized API, and then finish the implementation after finilized. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10820) Hive Server2 should register itself again when encounters some failures in HA mode
[ https://issues.apache.org/jira/browse/HIVE-10820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581715#comment-14581715 ] Nemon Lou commented on HIVE-10820: -- This has been fixed in 1.2.0 by HIVE-8890. Hive Server2 should register itself again when encounters some failures in HA mode -- Key: HIVE-10820 URL: https://issues.apache.org/jira/browse/HIVE-10820 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 0.14.0 Reporter: Wang Hao Hive Server2 should register itself again when encounters some failure in HA mode. For example,the network problem will cause session expire in ZK, the hive server2 ephemeral sequential node will be deleted with it. So , I think we can add some watch to handle it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10983) LazySimpleSerDe bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiaowei wang updated HIVE-10983: Summary: LazySimpleSerDe bug ,when Text is reused (was: LazySimpleSerDe bug when Text is reused ) LazySimpleSerDe bug ,when Text is reused -- Key: HIVE-10983 URL: https://issues.apache.org/jira/browse/HIVE-10983 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Priority: Critical Labels: patch Fix For: 1.2.0 Attachments: HIVE-10983.1.patch.txt When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql ,select * from web_searchhub where logdate=2015061003, the result of sql see blow.Notice that ,the second row content contains the first row content. INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 The content of origin lzo file content see below ,just 2 rows. INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10983) LazySimpleSerDe bug when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiaowei wang updated HIVE-10983: Summary: LazySimpleSerDe bug when Text is reused (was: Lazysimpleserde bug when Text is reused ) LazySimpleSerDe bug when Text is reused - Key: HIVE-10983 URL: https://issues.apache.org/jira/browse/HIVE-10983 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Priority: Critical Labels: patch Fix For: 1.2.0 Attachments: HIVE-10983.1.patch.txt When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql ,select * from web_searchhub where logdate=2015061003, the result of sql see blow.Notice that ,the second row content contains the first row content. INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 The content of origin lzo file content see below ,just 2 rows. INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10983) Lazysimpleserde bug when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiaowei wang updated HIVE-10983: Component/s: CLI Lazysimpleserde bug when Text is reused - Key: HIVE-10983 URL: https://issues.apache.org/jira/browse/HIVE-10983 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Priority: Critical When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql ,select * from web_searchhub where logdate=2015061003, the result of sql see blow.Notice that ,the second row content contains the first row content. INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 The content of origin lzo file content see below ,just 2 rows. INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10790) orc file sql excute fail
[ https://issues.apache.org/jira/browse/HIVE-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581723#comment-14581723 ] xiaowei wang commented on HIVE-10790: - OK,I have put up a patch orc file sql excute fail - Key: HIVE-10790 URL: https://issues.apache.org/jira/browse/HIVE-10790 Project: Hive Issue Type: Bug Components: API Affects Versions: 0.13.0, 0.14.0 Environment: Hadoop 2.5.0-cdh5.3.2 hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Attachments: HIVE-10790.0.patch.txt from a text table insert into a orc table,like as insert overwrite table custom.rank_less_orc_none partition(logdate='2015051500') select ur,rf,it,dt from custom.rank_text where logdate='2015051500'; will throws a error ,Error: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: getDefaultReplication on empty path is invalid at org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:593) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1750) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1767) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040) at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:105) at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:164) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:842) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227) ... 8 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10790) orc file sql excute fail
[ https://issues.apache.org/jira/browse/HIVE-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiaowei wang updated HIVE-10790: Attachment: HIVE-10790.0.patch.txt orc file sql excute fail - Key: HIVE-10790 URL: https://issues.apache.org/jira/browse/HIVE-10790 Project: Hive Issue Type: Bug Components: API Affects Versions: 0.13.0, 0.14.0 Environment: Hadoop 2.5.0-cdh5.3.2 hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Attachments: HIVE-10790.0.patch.txt from a text table insert into a orc table,like as insert overwrite table custom.rank_less_orc_none partition(logdate='2015051500') select ur,rf,it,dt from custom.rank_text where logdate='2015051500'; will throws a error ,Error: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: getDefaultReplication on empty path is invalid at org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:593) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1750) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1767) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040) at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:105) at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:164) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:842) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227) ... 8 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10790) orc file sql excute fail
[ https://issues.apache.org/jira/browse/HIVE-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581724#comment-14581724 ] xiaowei wang commented on HIVE-10790: - OK,I have put up a patch orc file sql excute fail - Key: HIVE-10790 URL: https://issues.apache.org/jira/browse/HIVE-10790 Project: Hive Issue Type: Bug Components: API Affects Versions: 0.13.0, 0.14.0 Environment: Hadoop 2.5.0-cdh5.3.2 hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Attachments: HIVE-10790.0.patch.txt from a text table insert into a orc table,like as insert overwrite table custom.rank_less_orc_none partition(logdate='2015051500') select ur,rf,it,dt from custom.rank_text where logdate='2015051500'; will throws a error ,Error: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: getDefaultReplication on empty path is invalid at org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:593) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1750) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1767) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040) at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:105) at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:164) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:842) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227) ... 8 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10983) LazySimpleSerDe bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581728#comment-14581728 ] Hive QA commented on HIVE-10983: {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12739021/HIVE-10983.1.patch.txt Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4247/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4247/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4247/ Messages: {noformat} This message was trimmed, see log for full details [INFO] [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-shims --- [INFO] Executing tasks main: [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/shims/aggregator/target/tmp [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/shims/aggregator/target/warehouse [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/shims/aggregator/target/tmp/conf [copy] Copying 11 files to /data/hive-ptest/working/apache-github-source-source/shims/aggregator/target/tmp/conf [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hive-shims --- [INFO] No sources to compile [INFO] [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-shims --- [INFO] Tests are skipped. [INFO] [INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-shims --- [INFO] Building jar: /data/hive-ptest/working/apache-github-source-source/shims/aggregator/target/hive-shims-2.0.0-SNAPSHOT.jar [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hive-shims --- [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-shims --- [INFO] Installing /data/hive-ptest/working/apache-github-source-source/shims/aggregator/target/hive-shims-2.0.0-SNAPSHOT.jar to /home/hiveptest/.m2/repository/org/apache/hive/hive-shims/2.0.0-SNAPSHOT/hive-shims-2.0.0-SNAPSHOT.jar [INFO] Installing /data/hive-ptest/working/apache-github-source-source/shims/aggregator/pom.xml to /home/hiveptest/.m2/repository/org/apache/hive/hive-shims/2.0.0-SNAPSHOT/hive-shims-2.0.0-SNAPSHOT.pom [INFO] [INFO] [INFO] Building Hive Common 2.0.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-common --- [INFO] Deleting /data/hive-ptest/working/apache-github-source-source/common/target [INFO] Deleting /data/hive-ptest/working/apache-github-source-source/common (includes = [datanucleus.log, derby.log], excludes = []) [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ hive-common --- [INFO] [INFO] --- maven-antrun-plugin:1.7:run (generate-version-annotation) @ hive-common --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- build-helper-maven-plugin:1.8:add-source (add-source) @ hive-common --- [INFO] Source directory: /data/hive-ptest/working/apache-github-source-source/common/src/gen added. [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-common --- [INFO] [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ hive-common --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] Copying 1 resource [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-common --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-common --- [INFO] Compiling 74 source files to /data/hive-ptest/working/apache-github-source-source/common/target/classes [WARNING] /data/hive-ptest/working/apache-github-source-source/common/src/java/org/apache/hadoop/hive/common/JvmPauseMonitor.java: /data/hive-ptest/working/apache-github-source-source/common/src/java/org/apache/hadoop/hive/common/JvmPauseMonitor.java uses or overrides a deprecated API. [WARNING] /data/hive-ptest/working/apache-github-source-source/common/src/java/org/apache/hadoop/hive/common/JvmPauseMonitor.java: Recompile with -Xlint:deprecation for details. [WARNING] /data/hive-ptest/working/apache-github-source-source/common/src/java/org/apache/hadoop/hive/common/ObjectPair.java: Some input files use unchecked or unsafe operations. [WARNING] /data/hive-ptest/working/apache-github-source-source/common/src/java/org/apache/hadoop/hive/common/ObjectPair.java: Recompile with -Xlint:unchecked for details. [INFO] [INFO] ---
[jira] [Commented] (HIVE-10866) Give a warning when client try to insert into bucketed table
[ https://issues.apache.org/jira/browse/HIVE-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581827#comment-14581827 ] Yongzhi Chen commented on HIVE-10866: - The 3 failures are not related to the patch. Their age is 3 or more. Give a warning when client try to insert into bucketed table Key: HIVE-10866 URL: https://issues.apache.org/jira/browse/HIVE-10866 Project: Hive Issue Type: Improvement Affects Versions: 1.2.0, 1.3.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-10866.1.patch, HIVE-10866.2.patch, HIVE-10866.3.patch, HIVE-10866.4.patch Currently, hive does not support appends(insert into) bucketed table, see open jira HIVE-3608. When insert into such table, the data will be corrupted and not fit for sort merge bucket mapjoin. We need find a way to prevent client from inserting into such table. Or at least give a warning. Reproduce: {noformat} CREATE TABLE IF NOT EXISTS buckettestoutput1( data string )CLUSTERED BY(data) INTO 2 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; CREATE TABLE IF NOT EXISTS buckettestoutput2( data string )CLUSTERED BY(data) INTO 2 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; set hive.enforce.bucketing = true; set hive.enforce.sorting=true; insert into table buckettestoutput1 select code from sample_07 where total_emp 134354250 limit 10; After this first insert, I did: set hive.auto.convert.sortmerge.join=true; set hive.optimize.bucketmapjoin = true; set hive.optimize.bucketmapjoin.sortedmerge = true; set hive.auto.convert.sortmerge.join.noconditionaltask=true; 0: jdbc:hive2://localhost:1 select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data); +---+---+ | data | data | +---+---+ +---+---+ So select works fine. Second insert: 0: jdbc:hive2://localhost:1 insert into table buckettestoutput1 select code from sample_07 where total_emp = 134354250 limit 10; No rows affected (61.235 seconds) Then select: 0: jdbc:hive2://localhost:1 select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data); Error: Error while compiling statement: FAILED: SemanticException [Error 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number of buckets for table buckettestoutput1 is 2, whereas the number of files is 4 (state=42000,code=10141) 0: jdbc:hive2://localhost:1 {noformat} Insert into empty table or partition will be fine, but insert into the non-empty one (after second insert in the reproduce), the bucketmapjoin will throw an error. We should not let second insert succeed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10975) Parquet: Bump the parquet version up to 1.8.0rc2-SNAPSHOT
[ https://issues.apache.org/jira/browse/HIVE-10975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582026#comment-14582026 ] Sergio Peña commented on HIVE-10975: I don't know when 1.8.0 will be released yet, but I think 1.7.0 has the new import changes you want. If you need the new org.apache.parquet imports, then you can try with bumping the version to 1.7.0. Parquet: Bump the parquet version up to 1.8.0rc2-SNAPSHOT - Key: HIVE-10975 URL: https://issues.apache.org/jira/browse/HIVE-10975 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu Priority: Minor Attachments: HIVE-10975-parquet.patch, HIVE-10975.1-parquet.patch There are lots of changes since parquet's graduation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582001#comment-14582001 ] Laljo John Pullokkaran commented on HIVE-10841: --- RB Link posted. [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0 Reporter: Alexander Pivovarov Assignee: Laljo John Pullokkaran Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, HIVE-10841.2.patch, HIVE-10841.patch The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) pi TableScan alias: pi Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats:
[jira] [Commented] (HIVE-6867) Bucketized Table feature fails in some cases
[ https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582031#comment-14582031 ] Yongzhi Chen commented on HIVE-6867: [~pxiong], could you explain how to do step (2), will you or [~hsubramaniyan] fix HIVE-3608: Support appends (INSERT INTO) for bucketed tables? Thanks Bucketized Table feature fails in some cases Key: HIVE-6867 URL: https://issues.apache.org/jira/browse/HIVE-6867 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.12.0 Reporter: Laljo John Pullokkaran Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-6867.01.patch, HIVE-6867.02.patch, HIVE-6867.03.patch, HIVE-6867.04.patch, HIVE-6867.05.patch Bucketized Table feature fails in some cases. if src destination is bucketed on same key, and if actual data in the src is not bucketed (because data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be bucketed while writing to destination. Example -- CREATE TABLE P1(key STRING, val STRING) CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE P1; – perform an insert to make sure there are 2 files INSERT OVERWRITE TABLE P1 select key, val from P1; -- This is not a regression. This has never worked. This got only discovered due to Hadoop2 changes. In Hadoop1, in local mode, number of reducers will always be 1, regardless of what is requested by app. Hadoop2 now honors the number of reducer setting in local mode (by spawning threads). Long term solution seems to be to prevent load data for bucketed table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10972) DummyTxnManager always locks the current database in shared mode, which is incorrect.
[ https://issues.apache.org/jira/browse/HIVE-10972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582041#comment-14582041 ] Aihua Xu commented on HIVE-10972: - I think we have another issue in ZooKeeperHiveLockManager.java, in which when locking exclusively on an object we should also check if the children are locked. The test passed before, is because we always locked the current database before. If we do {{use default; lock table lockneg2.tstsrcpart shared; lock database lockneg2 exclusive;}}, it will allow to do so which is not correct. HIVE-10984 has been filed to get it fixed. I will leave the test failure as it is. DummyTxnManager always locks the current database in shared mode, which is incorrect. - Key: HIVE-10972 URL: https://issues.apache.org/jira/browse/HIVE-10972 Project: Hive Issue Type: Bug Components: Locking Affects Versions: 2.0.0 Reporter: Aihua Xu Assignee: Aihua Xu Attachments: HIVE-10972.patch In DummyTxnManager [line 163 | http://grepcode.com/file/repo1.maven.org/maven2/co.cask.cdap/hive-exec/0.13.0/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java#163], it always locks the current database. That is not correct since the current database can be db1, and the query can be select * from db2.tb1, which will lock db1 unnecessarily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10975) Parquet: Bump the parquet version up to 1.8.0rc2-SNAPSHOT
[ https://issues.apache.org/jira/browse/HIVE-10975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582045#comment-14582045 ] Ferdinand Xu commented on HIVE-10975: - For my part, it does not exist a strong requirement since the parquet bloom filter is under development. I am not sure about the case for vectorization. Parquet: Bump the parquet version up to 1.8.0rc2-SNAPSHOT - Key: HIVE-10975 URL: https://issues.apache.org/jira/browse/HIVE-10975 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu Priority: Minor Attachments: HIVE-10975-parquet.patch, HIVE-10975.1-parquet.patch There are lots of changes since parquet's graduation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10976) Redundant HiveMetaStore connect check in HS2 CLIService start
[ https://issues.apache.org/jira/browse/HIVE-10976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582135#comment-14582135 ] Jimmy Xiang commented on HIVE-10976: [~ctang.ma], is it possible to remove the whole method from CLIService? Redundant HiveMetaStore connect check in HS2 CLIService start - Key: HIVE-10976 URL: https://issues.apache.org/jira/browse/HIVE-10976 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Trivial Attachments: HIVE-10976.patch During HS2 startup, CLIService start() does a HMS connection test to HMS. It is redundant, since in its init stage, CLIService calls applyAuthorizationConfigPolicy where it starts a sessionState and establishes a connection to HMS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10959) webhcat launcher job should reconnect to the running child job on task retry
[ https://issues.apache.org/jira/browse/HIVE-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582156#comment-14582156 ] Ivan Mitic commented on HIVE-10959: --- Thanks [~thejas] for the review and commit! webhcat launcher job should reconnect to the running child job on task retry Key: HIVE-10959 URL: https://issues.apache.org/jira/browse/HIVE-10959 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.15.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Fix For: 1.2.1 Attachments: HIVE-10959.2.patch, HIVE-10959.3.patch, HIVE-10959.4.patch, HIVE-10959.patch Currently, Templeton launcher kills all child jobs (jobs tagged with the parent job's id) upon task retry. Upon templeton launcher task retry, templeton should reconnect to the running job and continue tracking its progress that way. This logic cannot be used for all job kinds (e.g. for jobs that are driven by the client side like regular hive). However, for MapReduceV2, and possibly Tez and HiveOnTez, this should be the default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10976) Redundant HiveMetaStore connect check in HS2 CLIService start
[ https://issues.apache.org/jira/browse/HIVE-10976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582182#comment-14582182 ] Chaoyu Tang commented on HIVE-10976: [~jxiang] It is possible since it only calls the upperclass start(), but I thought that but still left it there because 1) to be consistent with the stop method; 2) in case in future CLIService needs add something especially in start, whose method already exists. Redundant HiveMetaStore connect check in HS2 CLIService start - Key: HIVE-10976 URL: https://issues.apache.org/jira/browse/HIVE-10976 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Trivial Attachments: HIVE-10976.patch During HS2 startup, CLIService start() does a HMS connection test to HMS. It is redundant, since in its init stage, CLIService calls applyAuthorizationConfigPolicy where it starts a sessionState and establishes a connection to HMS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7018) Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others
[ https://issues.apache.org/jira/browse/HIVE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582188#comment-14582188 ] Chaoyu Tang commented on HIVE-7018: --- [~ychena] have you verified that it works with SchemaTool with [~hsubramaniyan] HIVE-10659 fix? Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others - Key: HIVE-7018 URL: https://issues.apache.org/jira/browse/HIVE-7018 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Yongzhi Chen Attachments: HIVE-7018.1.patch, HIVE-7018.2.patch, HIVE-7018.3.patch It appears that at least postgres and oracle do not have the LINK_TARGET_ID column while mysql does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10976) Redundant HiveMetaStore connect check in HS2 CLIService start
[ https://issues.apache.org/jira/browse/HIVE-10976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582190#comment-14582190 ] Jimmy Xiang commented on HIVE-10976: I see. The other choice is to remove the stop() as well :). Either way is fine with me. Thanks. Redundant HiveMetaStore connect check in HS2 CLIService start - Key: HIVE-10976 URL: https://issues.apache.org/jira/browse/HIVE-10976 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Trivial Attachments: HIVE-10976.patch During HS2 startup, CLIService start() does a HMS connection test to HMS. It is redundant, since in its init stage, CLIService calls applyAuthorizationConfigPolicy where it starts a sessionState and establishes a connection to HMS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10976) Redundant HiveMetaStore connect check in HS2 CLIService start
[ https://issues.apache.org/jira/browse/HIVE-10976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582207#comment-14582207 ] Chaoyu Tang commented on HIVE-10976: Thanks, [~jxiang]. Could you help to commit it? Redundant HiveMetaStore connect check in HS2 CLIService start - Key: HIVE-10976 URL: https://issues.apache.org/jira/browse/HIVE-10976 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Trivial Attachments: HIVE-10976.patch During HS2 startup, CLIService start() does a HMS connection test to HMS. It is redundant, since in its init stage, CLIService calls applyAuthorizationConfigPolicy where it starts a sessionState and establishes a connection to HMS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10983) LazySimpleSerDe bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582261#comment-14582261 ] Hive QA commented on HIVE-10983: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12739041/HIVE-10983.2.patch.txt {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9007 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28 org.apache.hive.beeline.TestSchemaTool.testSchemaInit org.apache.hive.beeline.TestSchemaTool.testSchemaUpgrade {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4250/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4250/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4250/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12739041 - PreCommit-HIVE-TRUNK-Build LazySimpleSerDe bug ,when Text is reused -- Key: HIVE-10983 URL: https://issues.apache.org/jira/browse/HIVE-10983 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Priority: Critical Labels: patch Fix For: 0.14.1 Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql ,select * from web_searchhub where logdate=2015061003, the result of sql see blow.Notice that ,the second row content contains the first row content. INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 The content of origin lzo file content see below ,just 2 rows. INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6867) Bucketized Table feature fails in some cases
[ https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582330#comment-14582330 ] Pengcheng Xiong commented on HIVE-6867: --- [~ychena], yes, it would be the best if insert into is also supported. That depends on how far [~hsubramaniyan] would like to go. Thanks. Bucketized Table feature fails in some cases Key: HIVE-6867 URL: https://issues.apache.org/jira/browse/HIVE-6867 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.12.0 Reporter: Laljo John Pullokkaran Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-6867.01.patch, HIVE-6867.02.patch, HIVE-6867.03.patch, HIVE-6867.04.patch, HIVE-6867.05.patch Bucketized Table feature fails in some cases. if src destination is bucketed on same key, and if actual data in the src is not bucketed (because data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be bucketed while writing to destination. Example -- CREATE TABLE P1(key STRING, val STRING) CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE P1; – perform an insert to make sure there are 2 files INSERT OVERWRITE TABLE P1 select key, val from P1; -- This is not a regression. This has never worked. This got only discovered due to Hadoop2 changes. In Hadoop1, in local mode, number of reducers will always be 1, regardless of what is requested by app. Hadoop2 now honors the number of reducer setting in local mode (by spawning threads). Long term solution seems to be to prevent load data for bucketed table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10977) No need to instantiate MetaStoreDirectSql when HMS DirectSql is disabled
[ https://issues.apache.org/jira/browse/HIVE-10977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582375#comment-14582375 ] Sergey Shelukhin commented on HIVE-10977: - +1 No need to instantiate MetaStoreDirectSql when HMS DirectSql is disabled Key: HIVE-10977 URL: https://issues.apache.org/jira/browse/HIVE-10977 Project: Hive Issue Type: Bug Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Minor Attachments: HIVE-10977.patch When hive.metastore.try.direct.sql is set to false, HMS will use JDO to retrieve data, therefor it is not necessary to instantiate an expensive MetaStoreDirectSql during ObjectStore initialization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10979) Fix failed tests in TestSchemaTool after the version number change in HIVE-10921
[ https://issues.apache.org/jira/browse/HIVE-10979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582447#comment-14582447 ] Sergey Shelukhin commented on HIVE-10979: - +1, sorry missed those Fix failed tests in TestSchemaTool after the version number change in HIVE-10921 Key: HIVE-10979 URL: https://issues.apache.org/jira/browse/HIVE-10979 Project: Hive Issue Type: Bug Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-10979.patch Some version variables in sql are not updated in HIVE-10921 which caused unit test failed. See http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4241/testReport/org.apache.hive.beeline/TestSchemaTool/testSchemaUpgrade/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10983) LazySimpleSerDe bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiaowei wang updated HIVE-10983: Fix Version/s: (was: 1.2.0) 0.14.1 LazySimpleSerDe bug ,when Text is reused -- Key: HIVE-10983 URL: https://issues.apache.org/jira/browse/HIVE-10983 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Priority: Critical Labels: patch Fix For: 0.14.1 Attachments: HIVE-10983.1.patch.txt When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql ,select * from web_searchhub where logdate=2015061003, the result of sql see blow.Notice that ,the second row content contains the first row content. INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 The content of origin lzo file content see below ,just 2 rows. INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10983) LazySimpleSerDe bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiaowei wang updated HIVE-10983: Attachment: HIVE-10983.2.patch.txt LazySimpleSerDe bug ,when Text is reused -- Key: HIVE-10983 URL: https://issues.apache.org/jira/browse/HIVE-10983 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Priority: Critical Labels: patch Fix For: 0.14.1 Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql ,select * from web_searchhub where logdate=2015061003, the result of sql see blow.Notice that ,the second row content contains the first row content. INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 The content of origin lzo file content see below ,just 2 rows. INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10790) orc file sql excute fail
[ https://issues.apache.org/jira/browse/HIVE-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiaowei wang updated HIVE-10790: Fix Version/s: 0.14.1 orc file sql excute fail - Key: HIVE-10790 URL: https://issues.apache.org/jira/browse/HIVE-10790 Project: Hive Issue Type: Bug Components: API Affects Versions: 0.13.0, 0.14.0 Environment: Hadoop 2.5.0-cdh5.3.2 hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Fix For: 0.14.1 Attachments: HIVE-10790.0.patch.txt from a text table insert into a orc table,like as insert overwrite table custom.rank_less_orc_none partition(logdate='2015051500') select ur,rf,it,dt from custom.rank_text where logdate='2015051500'; will throws a error ,Error: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: getDefaultReplication on empty path is invalid at org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:593) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1750) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1767) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040) at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:105) at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:164) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:842) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227) ... 8 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581895#comment-14581895 ] Hive QA commented on HIVE-10841: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12739033/HIVE-10841.03.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9008 tests executed *Failed tests:* {noformat} org.apache.hive.beeline.TestSchemaTool.testSchemaInit org.apache.hive.beeline.TestSchemaTool.testSchemaUpgrade {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4248/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4248/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4248/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12739033 - PreCommit-HIVE-TRUNK-Build [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0 Reporter: Alexander Pivovarov Assignee: Laljo John Pullokkaran Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, HIVE-10841.2.patch, HIVE-10841.patch The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean)
[jira] [Commented] (HIVE-10855) Make HIVE-10568 work with Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581906#comment-14581906 ] Xuefu Zhang commented on HIVE-10855: +1 Make HIVE-10568 work with Spark [Spark Branch] -- Key: HIVE-10855 URL: https://issues.apache.org/jira/browse/HIVE-10855 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Rui Li Attachments: HIVE-10855.1-spark.patch, HIVE-10855.2-spark.patch, HIVE-10855.3-spark.patch HIVE-10568 only works with Tez. It's good to make it also work for Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10983) LazySimpleSerDe bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581882#comment-14581882 ] xiaowei wang commented on HIVE-10983: - in the first patch,i invoke a method of Text ,copyBytes().This method is added behind Hadoop1.0 ,so the compile failed. then i put up the second patch . LazySimpleSerDe bug ,when Text is reused -- Key: HIVE-10983 URL: https://issues.apache.org/jira/browse/HIVE-10983 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Priority: Critical Labels: patch Fix For: 0.14.1 Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql ,select * from web_searchhub where logdate=2015061003, the result of sql see blow.Notice that ,the second row content contains the first row content. INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 The content of origin lzo file content see below ,just 2 rows. INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10971) count(*) with count(distinct) gives wrong results when hive.groupby.skewindata=true
[ https://issues.apache.org/jira/browse/HIVE-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581911#comment-14581911 ] Xuefu Zhang commented on HIVE-10971: Similar to TestCliDriver, TestSparkCliDriver is generated as part of .q test. count(*) with count(distinct) gives wrong results when hive.groupby.skewindata=true --- Key: HIVE-10971 URL: https://issues.apache.org/jira/browse/HIVE-10971 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 1.2.0 Reporter: WangMeng Assignee: WangMeng Fix For: 1.2.1 Attachments: HIVE-10971.01.patch, HIVE-10971.1.patch When hive.groupby.skewindata=true, the following query based on TPC-H gives wrong results: {code} set hive.groupby.skewindata=true; select l_returnflag, count(*), count(distinct l_linestatus) from lineitem group by l_returnflag limit 10; {code} The query plan shows that it generates only one MapReduce job instead of two theoretically, which is dictated by hive.groupby.skewindata=true. The problem arises only when {noformat}count(*){noformat} and {noformat}count(distinct){noformat} exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10977) No need to instantiate MetaStoreDirectSql when HMS DirectSql is disabled
[ https://issues.apache.org/jira/browse/HIVE-10977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581887#comment-14581887 ] Chaoyu Tang commented on HIVE-10977: The two failed tests seem not relevant to this patch. No need to instantiate MetaStoreDirectSql when HMS DirectSql is disabled Key: HIVE-10977 URL: https://issues.apache.org/jira/browse/HIVE-10977 Project: Hive Issue Type: Bug Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Minor Attachments: HIVE-10977.patch When hive.metastore.try.direct.sql is set to false, HMS will use JDO to retrieve data, therefor it is not necessary to instantiate an expensive MetaStoreDirectSql during ObjectStore initialization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10977) No need to instantiate MetaStoreDirectSql when HMS DirectSql is disabled
[ https://issues.apache.org/jira/browse/HIVE-10977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581904#comment-14581904 ] Xuefu Zhang commented on HIVE-10977: +1. Patch makes sense. No need to instantiate MetaStoreDirectSql when HMS DirectSql is disabled Key: HIVE-10977 URL: https://issues.apache.org/jira/browse/HIVE-10977 Project: Hive Issue Type: Bug Reporter: Chaoyu Tang Assignee: Chaoyu Tang Priority: Minor Attachments: HIVE-10977.patch When hive.metastore.try.direct.sql is set to false, HMS will use JDO to retrieve data, therefor it is not necessary to instantiate an expensive MetaStoreDirectSql during ObjectStore initialization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10142) Calculating formula based on difference between each row's value and current row's in Windowing function
[ https://issues.apache.org/jira/browse/HIVE-10142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581936#comment-14581936 ] Aihua Xu commented on HIVE-10142: - {{x preceding and y preceding}} is supported now. I'm investigating what could be still missing. [~zhangyii] If you have sth in mind, could you please post the queries here, I can give it a try to see if it's already supported or missing? Calculating formula based on difference between each row's value and current row's in Windowing function Key: HIVE-10142 URL: https://issues.apache.org/jira/browse/HIVE-10142 Project: Hive Issue Type: New Feature Components: PTF-Windowing Affects Versions: 1.0.0 Reporter: Yi Zhang Assignee: Aihua Xu For analytics with windowing function, the calculation formula sometimes needs to perform over each row's value against current tow's value. The decay value is a good example, such as sums of value with a decay function based on difference of timestamp between each row and current row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10972) DummyTxnManager always locks the current database in shared mode, which is incorrect.
[ https://issues.apache.org/jira/browse/HIVE-10972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581930#comment-14581930 ] Aihua Xu commented on HIVE-10972: - I'm investigating lockneg_try_lock_db_in_use test case failure. Seems related. DummyTxnManager always locks the current database in shared mode, which is incorrect. - Key: HIVE-10972 URL: https://issues.apache.org/jira/browse/HIVE-10972 Project: Hive Issue Type: Bug Components: Locking Affects Versions: 2.0.0 Reporter: Aihua Xu Assignee: Aihua Xu Attachments: HIVE-10972.patch In DummyTxnManager [line 163 | http://grepcode.com/file/repo1.maven.org/maven2/co.cask.cdap/hive-exec/0.13.0/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java#163], it always locks the current database. That is not correct since the current database can be db1, and the query can be select * from db2.tb1, which will lock db1 unnecessarily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10972) DummyTxnManager always locks the current database in shared mode, which is incorrect.
[ https://issues.apache.org/jira/browse/HIVE-10972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582615#comment-14582615 ] Chaoyu Tang commented on HIVE-10972: [~aihuaxu] I left some comments and questions on the RB. DummyTxnManager always locks the current database in shared mode, which is incorrect. - Key: HIVE-10972 URL: https://issues.apache.org/jira/browse/HIVE-10972 Project: Hive Issue Type: Bug Components: Locking Affects Versions: 2.0.0 Reporter: Aihua Xu Assignee: Aihua Xu Attachments: HIVE-10972.patch In DummyTxnManager [line 163 | http://grepcode.com/file/repo1.maven.org/maven2/co.cask.cdap/hive-exec/0.13.0/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java#163], it always locks the current database. That is not correct since the current database can be db1, and the query can be select * from db2.tb1, which will lock db1 unnecessarily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10704) Errors in Tez HashTableLoader when estimated table size is 0
[ https://issues.apache.org/jira/browse/HIVE-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582700#comment-14582700 ] Mostafa Mokhtar commented on HIVE-10704: Replied to your comments. Errors in Tez HashTableLoader when estimated table size is 0 Key: HIVE-10704 URL: https://issues.apache.org/jira/browse/HIVE-10704 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Jason Dere Assignee: Mostafa Mokhtar Fix For: 1.2.1 Attachments: HIVE-10704.1.patch, HIVE-10704.2.patch, HIVE-10704.3.patch Couple of issues: - If the table sizes in MapJoinOperator.getParentDataSizes() are 0 for all tables, the largest small table selection is wrong and could select the large table (which results in NPE) - The memory estimates can either divide-by-zero, or allocate 0 memory if the table size is 0. Try to come up with a sensible default for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582596#comment-14582596 ] Laljo John Pullokkaran commented on HIVE-10841: --- Committed to 1.2.1 [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0 Reporter: Alexander Pivovarov Assignee: Laljo John Pullokkaran Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, HIVE-10841.2.patch, HIVE-10841.patch The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) pi TableScan alias: pi Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats:
[jira] [Updated] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-10841: -- Fix Version/s: 1.2.1 [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0 Reporter: Alexander Pivovarov Assignee: Laljo John Pullokkaran Fix For: 1.2.1 Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, HIVE-10841.2.patch, HIVE-10841.patch The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) pi TableScan alias: pi Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats:
[jira] [Commented] (HIVE-10855) Make HIVE-10568 work with Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581581#comment-14581581 ] Hive QA commented on HIVE-10855: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12738993/HIVE-10855.3-spark.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 7943 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.initializationError org.apache.hive.jdbc.TestSSL.testSSLConnectionWithProperty {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/876/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/876/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-876/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12738993 - PreCommit-HIVE-SPARK-Build Make HIVE-10568 work with Spark [Spark Branch] -- Key: HIVE-10855 URL: https://issues.apache.org/jira/browse/HIVE-10855 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Rui Li Attachments: HIVE-10855.1-spark.patch, HIVE-10855.2-spark.patch, HIVE-10855.3-spark.patch HIVE-10568 only works with Tez. It's good to make it also work for Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10855) Make HIVE-10568 work with Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581586#comment-14581586 ] Rui Li commented on HIVE-10855: --- Latest failures are not related. Make HIVE-10568 work with Spark [Spark Branch] -- Key: HIVE-10855 URL: https://issues.apache.org/jira/browse/HIVE-10855 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Rui Li Attachments: HIVE-10855.1-spark.patch, HIVE-10855.2-spark.patch, HIVE-10855.3-spark.patch HIVE-10568 only works with Tez. It's good to make it also work for Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10866) Give a warning when client try to insert into bucketed table
[ https://issues.apache.org/jira/browse/HIVE-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581519#comment-14581519 ] Hive QA commented on HIVE-10866: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12738983/HIVE-10866.4.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9008 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28 org.apache.hive.beeline.TestSchemaTool.testSchemaInit org.apache.hive.beeline.TestSchemaTool.testSchemaUpgrade {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4245/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4245/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4245/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12738983 - PreCommit-HIVE-TRUNK-Build Give a warning when client try to insert into bucketed table Key: HIVE-10866 URL: https://issues.apache.org/jira/browse/HIVE-10866 Project: Hive Issue Type: Improvement Affects Versions: 1.2.0, 1.3.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-10866.1.patch, HIVE-10866.2.patch, HIVE-10866.3.patch, HIVE-10866.4.patch Currently, hive does not support appends(insert into) bucketed table, see open jira HIVE-3608. When insert into such table, the data will be corrupted and not fit for sort merge bucket mapjoin. We need find a way to prevent client from inserting into such table. Or at least give a warning. Reproduce: {noformat} CREATE TABLE IF NOT EXISTS buckettestoutput1( data string )CLUSTERED BY(data) INTO 2 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; CREATE TABLE IF NOT EXISTS buckettestoutput2( data string )CLUSTERED BY(data) INTO 2 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; set hive.enforce.bucketing = true; set hive.enforce.sorting=true; insert into table buckettestoutput1 select code from sample_07 where total_emp 134354250 limit 10; After this first insert, I did: set hive.auto.convert.sortmerge.join=true; set hive.optimize.bucketmapjoin = true; set hive.optimize.bucketmapjoin.sortedmerge = true; set hive.auto.convert.sortmerge.join.noconditionaltask=true; 0: jdbc:hive2://localhost:1 select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data); +---+---+ | data | data | +---+---+ +---+---+ So select works fine. Second insert: 0: jdbc:hive2://localhost:1 insert into table buckettestoutput1 select code from sample_07 where total_emp = 134354250 limit 10; No rows affected (61.235 seconds) Then select: 0: jdbc:hive2://localhost:1 select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data); Error: Error while compiling statement: FAILED: SemanticException [Error 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number of buckets for table buckettestoutput1 is 2, whereas the number of files is 4 (state=42000,code=10141) 0: jdbc:hive2://localhost:1 {noformat} Insert into empty table or partition will be fine, but insert into the non-empty one (after second insert in the reproduce), the bucketmapjoin will throw an error. We should not let second insert succeed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10980) Merge of dynamic partitions loads all data to default partition
[ https://issues.apache.org/jira/browse/HIVE-10980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581540#comment-14581540 ] Gopal V commented on HIVE-10980: Are you using MapReduce or Tez? Merge of dynamic partitions loads all data to default partition --- Key: HIVE-10980 URL: https://issues.apache.org/jira/browse/HIVE-10980 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 0.14.0 Environment: HDP 2.2.4 (also reproduced on apache hive built from trunk) Reporter: Illya Yalovyy Conditions that lead to the issue: 1. Partition columns have different types 2. Both static and dynamic partitions are used in the query 3. Dynamically generated partitions require merge Result: Final data is loaded to __HIVE_DEFAULT_PARTITION__. Steps to reproduce: set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=strict; set hive.optimize.sort.dynamic.partition=false; set hive.merge.mapfiles=true; set hive.merge.mapredfiles=true; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; create external table sdp ( dataint bigint, hour int, req string, cid string, caid string ) row format delimited fields terminated by ','; load data local inpath '../../data/files/dynpartdata1.txt' into table sdp; load data local inpath '../../data/files/dynpartdata2.txt' into table sdp; ... load data local inpath '../../data/files/dynpartdataN.txt' into table sdp; create table tdp (cid string, caid string) partitioned by (dataint bigint, hour int, req string); insert overwrite table tdp partition (dataint=20150316, hour=16, req) select cid, caid, req from sdp where dataint=20150316 and hour=16; select * from tdp order by caid; show partitions tdp; Example of the input file: 20150316,16,reqA,clusterIdA,cacheId1 20150316,16,reqB,clusterIdB,cacheId2 20150316,16,reqA,clusterIdC,cacheId3 20150316,16,reqD,clusterIdD,cacheId4 20150316,16,reqA,clusterIdA,cacheId5 Actual result: clusterIdA cacheId12015031616 __HIVE_DEFAULT_PARTITION__ clusterIdA cacheId12015031616 __HIVE_DEFAULT_PARTITION__ clusterIdB cacheId22015031616 __HIVE_DEFAULT_PARTITION__ clusterIdC cacheId32015031616 __HIVE_DEFAULT_PARTITION__ clusterIdD cacheId42015031616 __HIVE_DEFAULT_PARTITION__ clusterIdA cacheId52015031616 __HIVE_DEFAULT_PARTITION__ clusterIdD cacheId82015031616 __HIVE_DEFAULT_PARTITION__ clusterIdB cacheId92015031616 __HIVE_DEFAULT_PARTITION__ dataint=20150316/hour=16/req=__HIVE_DEFAULT_PARTITION__ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10959) Templeton launcher job should reconnect to the running child job on task retry
[ https://issues.apache.org/jira/browse/HIVE-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581550#comment-14581550 ] Thejas M Nair commented on HIVE-10959: -- The test failures are unrelated. Templeton launcher job should reconnect to the running child job on task retry -- Key: HIVE-10959 URL: https://issues.apache.org/jira/browse/HIVE-10959 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.15.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: HIVE-10959.2.patch, HIVE-10959.3.patch, HIVE-10959.4.patch, HIVE-10959.patch Currently, Templeton launcher kills all child jobs (jobs tagged with the parent job's id) upon task retry. Upon templeton launcher task retry, templeton should reconnect to the running job and continue tracking its progress that way. This logic cannot be used for all job kinds (e.g. for jobs that are driven by the client side like regular hive). However, for MapReduceV2, and possibly Tez and HiveOnTez, this should be the default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10959) webhcat launcher job should reconnect to the running child job on task retry
[ https://issues.apache.org/jira/browse/HIVE-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-10959: - Summary: webhcat launcher job should reconnect to the running child job on task retry (was: Templeton launcher job should reconnect to the running child job on task retry) webhcat launcher job should reconnect to the running child job on task retry Key: HIVE-10959 URL: https://issues.apache.org/jira/browse/HIVE-10959 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.15.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: HIVE-10959.2.patch, HIVE-10959.3.patch, HIVE-10959.4.patch, HIVE-10959.patch Currently, Templeton launcher kills all child jobs (jobs tagged with the parent job's id) upon task retry. Upon templeton launcher task retry, templeton should reconnect to the running job and continue tracking its progress that way. This logic cannot be used for all job kinds (e.g. for jobs that are driven by the client side like regular hive). However, for MapReduceV2, and possibly Tez and HiveOnTez, this should be the default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10790) orc file sql excute fail
[ https://issues.apache.org/jira/browse/HIVE-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiaowei wang updated HIVE-10790: Flags: Patch,Important Labels: patch (was: ) orc file sql excute fail - Key: HIVE-10790 URL: https://issues.apache.org/jira/browse/HIVE-10790 Project: Hive Issue Type: Bug Components: API Affects Versions: 0.13.0, 0.14.0 Environment: Hadoop 2.5.0-cdh5.3.2 hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Labels: patch Fix For: 0.14.1 Attachments: HIVE-10790.0.patch.txt from a text table insert into a orc table,like as insert overwrite table custom.rank_less_orc_none partition(logdate='2015051500') select ur,rf,it,dt from custom.rank_text where logdate='2015051500'; will throws a error ,Error: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: getDefaultReplication on empty path is invalid at org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:593) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1750) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1767) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040) at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:105) at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:164) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:842) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227) ... 8 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10704) Errors in Tez HashTableLoader when estimated table size is 0
[ https://issues.apache.org/jira/browse/HIVE-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582813#comment-14582813 ] Alexander Pivovarov commented on HIVE-10704: +1 Errors in Tez HashTableLoader when estimated table size is 0 Key: HIVE-10704 URL: https://issues.apache.org/jira/browse/HIVE-10704 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Jason Dere Assignee: Mostafa Mokhtar Fix For: 1.2.1 Attachments: HIVE-10704.1.patch, HIVE-10704.2.patch, HIVE-10704.3.patch Couple of issues: - If the table sizes in MapJoinOperator.getParentDataSizes() are 0 for all tables, the largest small table selection is wrong and could select the large table (which results in NPE) - The memory estimates can either divide-by-zero, or allocate 0 memory if the table size is 0. Try to come up with a sensible default for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10980) Merge of dynamic partitions loads all data to default partition
[ https://issues.apache.org/jira/browse/HIVE-10980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582092#comment-14582092 ] Illya Yalovyy commented on HIVE-10980: -- Good point. I observed this behavior on MapReduce. I'll update the ticket. Merge of dynamic partitions loads all data to default partition --- Key: HIVE-10980 URL: https://issues.apache.org/jira/browse/HIVE-10980 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 0.14.0 Environment: HDP 2.2.4 (also reproduced on apache hive built from trunk) Reporter: Illya Yalovyy Conditions that lead to the issue: 1. Partition columns have different types 2. Both static and dynamic partitions are used in the query 3. Dynamically generated partitions require merge Result: Final data is loaded to __HIVE_DEFAULT_PARTITION__. Steps to reproduce: set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=strict; set hive.optimize.sort.dynamic.partition=false; set hive.merge.mapfiles=true; set hive.merge.mapredfiles=true; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; create external table sdp ( dataint bigint, hour int, req string, cid string, caid string ) row format delimited fields terminated by ','; load data local inpath '../../data/files/dynpartdata1.txt' into table sdp; load data local inpath '../../data/files/dynpartdata2.txt' into table sdp; ... load data local inpath '../../data/files/dynpartdataN.txt' into table sdp; create table tdp (cid string, caid string) partitioned by (dataint bigint, hour int, req string); insert overwrite table tdp partition (dataint=20150316, hour=16, req) select cid, caid, req from sdp where dataint=20150316 and hour=16; select * from tdp order by caid; show partitions tdp; Example of the input file: 20150316,16,reqA,clusterIdA,cacheId1 20150316,16,reqB,clusterIdB,cacheId2 20150316,16,reqA,clusterIdC,cacheId3 20150316,16,reqD,clusterIdD,cacheId4 20150316,16,reqA,clusterIdA,cacheId5 Actual result: clusterIdA cacheId12015031616 __HIVE_DEFAULT_PARTITION__ clusterIdA cacheId12015031616 __HIVE_DEFAULT_PARTITION__ clusterIdB cacheId22015031616 __HIVE_DEFAULT_PARTITION__ clusterIdC cacheId32015031616 __HIVE_DEFAULT_PARTITION__ clusterIdD cacheId42015031616 __HIVE_DEFAULT_PARTITION__ clusterIdA cacheId52015031616 __HIVE_DEFAULT_PARTITION__ clusterIdD cacheId82015031616 __HIVE_DEFAULT_PARTITION__ clusterIdB cacheId92015031616 __HIVE_DEFAULT_PARTITION__ dataint=20150316/hour=16/req=__HIVE_DEFAULT_PARTITION__ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements
[ https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582055#comment-14582055 ] Ashutosh Chauhan commented on HIVE-10841: - +1 LGTM [WHERE col is not null] does not work sometimes for queries with many JOIN statements - Key: HIVE-10841 URL: https://issues.apache.org/jira/browse/HIVE-10841 Project: Hive Issue Type: Bug Components: Query Planning, Query Processor Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0 Reporter: Alexander Pivovarov Assignee: Laljo John Pullokkaran Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, HIVE-10841.2.patch, HIVE-10841.patch The result from the following SELECT query is 3 rows but it should be 1 row. I checked it in MySQL - it returned 1 row. To reproduce the issue in Hive 1. prepare tables {code} drop table if exists L; drop table if exists LA; drop table if exists FR; drop table if exists A; drop table if exists PI; drop table if exists acct; create table L as select 4436 id; create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id; create table FR as select 4436 loan_id; create table A as select 4748 id; create table PI as select 4415 id; create table acct as select 4748 aid, 10 acc_n, 122 brn; insert into table acct values(4748, null, null); insert into table acct values(4748, null, null); {code} 2. run SELECT query {code} select acct.ACC_N, acct.brn FROM L JOIN LA ON L.id = LA.loan_id JOIN FR ON L.id = FR.loan_id JOIN A ON LA.aid = A.id JOIN PI ON PI.id = LA.pi_id JOIN acct ON A.id = acct.aid WHERE L.id = 4436 and acct.brn is not null; {code} the result is 3 rows {code} 10122 NULL NULL NULL NULL {code} but it should be 1 row {code} 10122 {code} 2.1 explain select ... output for hive-1.3.0 MR {code} STAGE DEPENDENCIES: Stage-12 is a root stage Stage-9 depends on stages: Stage-12 Stage-0 depends on stages: Stage-9 STAGE PLANS: Stage: Stage-12 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 acct Fetch Operator limit: -1 fr Fetch Operator limit: -1 l Fetch Operator limit: -1 pi Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) acct TableScan alias: acct Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: aid is not null (type: boolean) Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 _col5 (type: int) 1 id (type: int) 2 aid (type: int) fr TableScan alias: fr Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (loan_id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) l TableScan alias: l Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (id = 4436) (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator keys: 0 4436 (type: int) 1 4436 (type: int) 2 4436 (type: int) pi TableScan alias: pi Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats:
[jira] [Updated] (HIVE-10980) Merge of dynamic partitions loads all data to default partition
[ https://issues.apache.org/jira/browse/HIVE-10980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Illya Yalovyy updated HIVE-10980: - Description: Conditions that lead to the issue: 1. Execution engine set to MapReduce 2. Partition columns have different types 3. Both static and dynamic partitions are used in the query 4. Dynamically generated partitions require merge Result: Final data is loaded to __HIVE_DEFAULT_PARTITION__. Steps to reproduce: set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=strict; set hive.optimize.sort.dynamic.partition=false; set hive.merge.mapfiles=true; set hive.merge.mapredfiles=true; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; set hive.execution.engine=mr; create external table sdp ( dataint bigint, hour int, req string, cid string, caid string ) row format delimited fields terminated by ','; load data local inpath '../../data/files/dynpartdata1.txt' into table sdp; load data local inpath '../../data/files/dynpartdata2.txt' into table sdp; ... load data local inpath '../../data/files/dynpartdataN.txt' into table sdp; create table tdp (cid string, caid string) partitioned by (dataint bigint, hour int, req string); insert overwrite table tdp partition (dataint=20150316, hour=16, req) select cid, caid, req from sdp where dataint=20150316 and hour=16; select * from tdp order by caid; show partitions tdp; Example of the input file: 20150316,16,reqA,clusterIdA,cacheId1 20150316,16,reqB,clusterIdB,cacheId2 20150316,16,reqA,clusterIdC,cacheId3 20150316,16,reqD,clusterIdD,cacheId4 20150316,16,reqA,clusterIdA,cacheId5 Actual result: clusterIdA cacheId12015031616 __HIVE_DEFAULT_PARTITION__ clusterIdA cacheId12015031616 __HIVE_DEFAULT_PARTITION__ clusterIdB cacheId22015031616 __HIVE_DEFAULT_PARTITION__ clusterIdC cacheId32015031616 __HIVE_DEFAULT_PARTITION__ clusterIdD cacheId42015031616 __HIVE_DEFAULT_PARTITION__ clusterIdA cacheId52015031616 __HIVE_DEFAULT_PARTITION__ clusterIdD cacheId82015031616 __HIVE_DEFAULT_PARTITION__ clusterIdB cacheId92015031616 __HIVE_DEFAULT_PARTITION__ dataint=20150316/hour=16/req=__HIVE_DEFAULT_PARTITION__ was: Conditions that lead to the issue: 1. Partition columns have different types 2. Both static and dynamic partitions are used in the query 3. Dynamically generated partitions require merge Result: Final data is loaded to __HIVE_DEFAULT_PARTITION__. Steps to reproduce: set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=strict; set hive.optimize.sort.dynamic.partition=false; set hive.merge.mapfiles=true; set hive.merge.mapredfiles=true; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; create external table sdp ( dataint bigint, hour int, req string, cid string, caid string ) row format delimited fields terminated by ','; load data local inpath '../../data/files/dynpartdata1.txt' into table sdp; load data local inpath '../../data/files/dynpartdata2.txt' into table sdp; ... load data local inpath '../../data/files/dynpartdataN.txt' into table sdp; create table tdp (cid string, caid string) partitioned by (dataint bigint, hour int, req string); insert overwrite table tdp partition (dataint=20150316, hour=16, req) select cid, caid, req from sdp where dataint=20150316 and hour=16; select * from tdp order by caid; show partitions tdp; Example of the input file: 20150316,16,reqA,clusterIdA,cacheId1 20150316,16,reqB,clusterIdB,cacheId2 20150316,16,reqA,clusterIdC,cacheId3 20150316,16,reqD,clusterIdD,cacheId4 20150316,16,reqA,clusterIdA,cacheId5 Actual result: clusterIdA cacheId12015031616 __HIVE_DEFAULT_PARTITION__ clusterIdA cacheId12015031616 __HIVE_DEFAULT_PARTITION__ clusterIdB cacheId22015031616 __HIVE_DEFAULT_PARTITION__ clusterIdC cacheId32015031616 __HIVE_DEFAULT_PARTITION__ clusterIdD cacheId42015031616 __HIVE_DEFAULT_PARTITION__ clusterIdA cacheId52015031616 __HIVE_DEFAULT_PARTITION__ clusterIdD cacheId82015031616 __HIVE_DEFAULT_PARTITION__ clusterIdB cacheId92015031616 __HIVE_DEFAULT_PARTITION__ dataint=20150316/hour=16/req=__HIVE_DEFAULT_PARTITION__ Merge of dynamic partitions loads all data to default partition
[jira] [Commented] (HIVE-10972) DummyTxnManager always locks the current database in shared mode, which is incorrect.
[ https://issues.apache.org/jira/browse/HIVE-10972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582053#comment-14582053 ] Aihua Xu commented on HIVE-10972: - [~damien.carol] Can you take a look again and see my following comments? DummyTxnManager always locks the current database in shared mode, which is incorrect. - Key: HIVE-10972 URL: https://issues.apache.org/jira/browse/HIVE-10972 Project: Hive Issue Type: Bug Components: Locking Affects Versions: 2.0.0 Reporter: Aihua Xu Assignee: Aihua Xu Attachments: HIVE-10972.patch In DummyTxnManager [line 163 | http://grepcode.com/file/repo1.maven.org/maven2/co.cask.cdap/hive-exec/0.13.0/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java#163], it always locks the current database. That is not correct since the current database can be db1, and the query can be select * from db2.tb1, which will lock db1 unnecessarily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)