[jira] [Commented] (HIVE-8300) Missing guava lib causes IllegalStateException when deserializing a task [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152843#comment-14152843 ] Hive QA commented on HIVE-8300: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12671962/HIVE-8300.1-spark.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6508 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/181/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/181/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-181/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12671962 Missing guava lib causes IllegalStateException when deserializing a task [Spark Branch] --- Key: HIVE-8300 URL: https://issues.apache.org/jira/browse/HIVE-8300 Project: Hive Issue Type: Bug Components: Spark Environment: Spark-1.2.0-SNAPSHOT Reporter: Rui Li Attachments: HIVE-8300.1-spark.patch In spark-1.2, we have guava shaded in spark-assembly. And we only ship hive-exec to spark cluster. So spark executor won't have (original) guava in its class path. This can cause some problem when TaskRunner deserializes a task, and throws something like this: {code} org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, node13-1): java.lang.IllegalStateException: unread block data java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:164) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) {code} We may have to verify this issue and ship guava to spark cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4224) Upgrade to Thrift 1.0 when available
[ https://issues.apache.org/jira/browse/HIVE-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152849#comment-14152849 ] Nemon Lou commented on HIVE-4224: - THRIFT-1869 has been fixed in Thrift 0.9.1,which is released on 21/Aug/13. Any plan to upgrade thrift to 0.9.1 ? Upgrade to Thrift 1.0 when available Key: HIVE-4224 URL: https://issues.apache.org/jira/browse/HIVE-4224 Project: Hive Issue Type: Sub-task Components: HiveServer2, Metastore, Server Infrastructure Affects Versions: 0.11.0 Reporter: Brock Noland Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8151) Dynamic partition sort optimization inserts record wrongly to partition when used with GroupBy
[ https://issues.apache.org/jira/browse/HIVE-8151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8151: - Attachment: HIVE-8151.7.patch Fixes yet another failure. Dynamic partition sort optimization inserts record wrongly to partition when used with GroupBy -- Key: HIVE-8151 URL: https://issues.apache.org/jira/browse/HIVE-8151 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Reporter: Prasanth J Assignee: Prasanth J Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8151.1.patch, HIVE-8151.2.patch, HIVE-8151.3.patch, HIVE-8151.4.patch, HIVE-8151.5.patch, HIVE-8151.6.patch, HIVE-8151.7.patch HIVE-6455 added dynamic partition sort optimization. It added startGroup() method to FileSink operator to look for changes in reduce key for creating partition directories. This method however is not reliable as the key called with startGroup() is different from the key called with processOp(). startGroup() is called with newly changed key whereas processOp() is called with previously aggregated key. This will result in processOp() writing the last row of previous group as the first row of next group. This happens only when used with group by operator. The fix is to not rely on startGroup() and do the partition directory creation in processOp() itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8287) Metadata action errors don't have information about cause
[ https://issues.apache.org/jira/browse/HIVE-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152893#comment-14152893 ] Hive QA commented on HIVE-8287: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12671866/HIVE-8287.3.patch {color:green}SUCCESS:{color} +1 6372 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1047/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1047/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1047/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12671866 Metadata action errors don't have information about cause - Key: HIVE-8287 URL: https://issues.apache.org/jira/browse/HIVE-8287 Project: Hive Issue Type: Bug Components: Authorization, Logging Reporter: Thejas M Nair Assignee: Thejas M Nair Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8287.1.patch, HIVE-8287.2.patch, HIVE-8287.3.patch Example of error message that doesn't given enough useful information - {noformat} 0: jdbc:hive2://localhost:1 alter table parttab1 drop partition (p1='def'); Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unknown error. Please check logs. (state=08S01,code=1) {noformat} Some calls to get database and get table from metastore also treat all exceptions as an 'object does not exist' exception, and end up ignoring the errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8231) Error when insert into empty table with ACID
[ https://issues.apache.org/jira/browse/HIVE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152951#comment-14152951 ] Damien Carol commented on HIVE-8231: Restarting HDFS AND Yarn AND remote Metastore AND Hiveserver2 didn't helped. I think the bug comes because there are no base dir. When I'm doing major compaction. The base is created and I can see the new rows. Error when insert into empty table with ACID Key: HIVE-8231 URL: https://issues.apache.org/jira/browse/HIVE-8231 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Critical Fix For: 0.14.0 Steps to show the bug : 1. create table {code} create table encaissement_1b_64m like encaissement_1b; {code} 2. check table {code} desc encaissement_1b_64m; dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m; {code} everything is ok: {noformat} 0: jdbc:hive2://nc-h04:1/casino desc encaissement_1b_64m; +++--+--+ | col_name | data_type | comment | +++--+--+ | id | int| | | idmagasin | int| | | zibzin | string | | | cheque | int| | | montant| double | | | date | timestamp | | | col_6 | string | | | col_7 | string | | | col_8 | string | | +++--+--+ 9 rows selected (0.158 seconds) 0: jdbc:hive2://nc-h04:1/casino dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/; +-+--+ | DFS Output | +-+--+ +-+--+ No rows selected (0.01 seconds) {noformat} 3. Insert values into the new table {noformat} insert into table encaissement_1b_64m VALUES (1, 1, '8909', 1, 12.5, '12/05/2014', '','',''); {noformat} 4. Check {noformat} 0: jdbc:hive2://nc-h04:1/casino select id from encaissement_1b_64m; +-+--+ | id | +-+--+ +-+--+ No rows selected (0.091 seconds) {noformat} There are already a pb. I don't see the inserted row. 5. When I'm checking HDFS directory, I see {{delta_421_421}} folder {noformat} 0: jdbc:hive2://nc-h04:1/casino dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/; +-+--+ | DFS Output | +-+--+ | Found 1 items | | drwxr-xr-x - hduser supergroup 0 2014-09-23 12:17 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/delta_421_421 | +-+--+ 2 rows selected (0.014 seconds) {noformat} 6. Doing a major compaction solves the bug {noformat} 0: jdbc:hive2://nc-h04:1/casino alter table encaissement_1b_64m compact 'major'; No rows affected (0.046 seconds) 0: jdbc:hive2://nc-h04:1/casino dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/; ++--+ | DFS Output | ++--+ | Found 1 items | | drwxr-xr-x - hduser supergroup 0 2014-09-23 12:21 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/base_421 | ++--+ 2 rows selected (0.02 seconds) {noformat} -- This message was sent by Atlassian JIRA
[jira] [Updated] (HIVE-8285) Reference equality is used on boolean values in PartitionPruner#removeTruePredciates()
[ https://issues.apache.org/jira/browse/HIVE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanghyun Yun updated HIVE-8285: --- Attachment: HIVE-8285.patch I changed to equals function. Please review, [~tedyu] :) Reference equality is used on boolean values in PartitionPruner#removeTruePredciates() -- Key: HIVE-8285 URL: https://issues.apache.org/jira/browse/HIVE-8285 Project: Hive Issue Type: Bug Reporter: Ted Yu Priority: Minor Attachments: HIVE-8285.patch {code} if (e.getTypeInfo() == TypeInfoFactory.booleanTypeInfo eC.getValue() == Boolean.TRUE) { {code} equals() should be used in the above comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8285) Reference equality is used on boolean values in PartitionPruner#removeTruePredciates()
[ https://issues.apache.org/jira/browse/HIVE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanghyun Yun updated HIVE-8285: --- Affects Version/s: 0.14.0 Status: Patch Available (was: Open) Reference equality is used on boolean values in PartitionPruner#removeTruePredciates() -- Key: HIVE-8285 URL: https://issues.apache.org/jira/browse/HIVE-8285 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Ted Yu Priority: Minor Attachments: HIVE-8285.patch {code} if (e.getTypeInfo() == TypeInfoFactory.booleanTypeInfo eC.getValue() == Boolean.TRUE) { {code} equals() should be used in the above comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8295) Add batch retrieve partition objects for metastore direct sql
[ https://issues.apache.org/jira/browse/HIVE-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152972#comment-14152972 ] Hive QA commented on HIVE-8295: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12671876/HIVE-8295.1.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6370 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1048/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1048/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1048/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12671876 Add batch retrieve partition objects for metastore direct sql -- Key: HIVE-8295 URL: https://issues.apache.org/jira/browse/HIVE-8295 Project: Hive Issue Type: Bug Reporter: Selina Zhang Assignee: Selina Zhang Attachments: HIVE-8295.1.patch Currently in MetastoreDirectSql partition objects are constructed in a way that fetching partition ids first. However, if the partition ids that match the filter is larger than 1000, direct sql will fail with the following stack trace: {code} 2014-09-29 19:30:02,942 DEBUG [pool-1-thread-1] metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:timingTrace(604)) - Direct SQL query in 122.085893ms + 13.048901ms, the query is [select PARTITIONS.PART_ID from PARTITIONS inner join TBLS on PARTITIONS.TBL_ID = TBLS.TBL_ID and TBLS.TBL_NAME = ? inner join DBS on TBLS.DB_ID = DBS.DB_ID and DBS.NAME = ? inner join PARTITION_KEY_VALS FILTER2 on FILTER2.PART_ID = PARTITIONS.PART_ID and FILTER2.INTEGER_IDX = 2 where ((FILTER2.PART_KEY_VAL = ?))] 2014-09-29 19:30:02,949 ERROR [pool-1-thread-1] metastore.ObjectStore (ObjectStore.java:handleDirectSqlError(2248)) - Direct SQL failed, falling back to ORM javax.jdo.JDODataStoreException: Error executing SQL query select PARTITIONS.PART_ID, SDS.SD_ID, SDS.CD_ID, SERDES.SERDE_ID, PARTITIONS.CREATE_TIME, PARTITIONS.LAST_ACCESS_TIME, SDS.INPUT_FORMAT, SDS.IS_COMPRESSED, SDS.IS_STOREDASSUBDIRECTORIES, SDS.LOCATION, SDS.NUM_BUCKETS, SDS.OUTPUT_FORMAT, SERDES.NAME, SERDES.SLIB from PARTITIONS left outer join SDS on PARTITIONS.SD_ID = SDS.SD_ID left outer join SERDES on SDS.SERDE_ID = SERDES.SERDE_ID where PART_ID in (136,140,143,147,152,156,160,163,167,171,174,180,185,191,196,198,203,208,212,217... ) order by PART_NAME asc. at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:422) at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:331) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:211) at org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1920) at org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1914) at org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2213) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:1914) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExpr(ObjectStore.java:1887) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98) at com.sun.proxy.$Proxy8.getPartitionsByExpr(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_expr(HiveMetaStore.java:3800) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9366) at
[jira] [Commented] (HIVE-8231) Error when insert into empty table with ACID
[ https://issues.apache.org/jira/browse/HIVE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152978#comment-14152978 ] Damien Carol commented on HIVE-8231: I don't think it's a cache issue. Doing stop/run of ALL daemons of the cluster change nothing. Error when insert into empty table with ACID Key: HIVE-8231 URL: https://issues.apache.org/jira/browse/HIVE-8231 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Critical Fix For: 0.14.0 Steps to show the bug : 1. create table {code} create table encaissement_1b_64m like encaissement_1b; {code} 2. check table {code} desc encaissement_1b_64m; dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m; {code} everything is ok: {noformat} 0: jdbc:hive2://nc-h04:1/casino desc encaissement_1b_64m; +++--+--+ | col_name | data_type | comment | +++--+--+ | id | int| | | idmagasin | int| | | zibzin | string | | | cheque | int| | | montant| double | | | date | timestamp | | | col_6 | string | | | col_7 | string | | | col_8 | string | | +++--+--+ 9 rows selected (0.158 seconds) 0: jdbc:hive2://nc-h04:1/casino dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/; +-+--+ | DFS Output | +-+--+ +-+--+ No rows selected (0.01 seconds) {noformat} 3. Insert values into the new table {noformat} insert into table encaissement_1b_64m VALUES (1, 1, '8909', 1, 12.5, '12/05/2014', '','',''); {noformat} 4. Check {noformat} 0: jdbc:hive2://nc-h04:1/casino select id from encaissement_1b_64m; +-+--+ | id | +-+--+ +-+--+ No rows selected (0.091 seconds) {noformat} There are already a pb. I don't see the inserted row. 5. When I'm checking HDFS directory, I see {{delta_421_421}} folder {noformat} 0: jdbc:hive2://nc-h04:1/casino dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/; +-+--+ | DFS Output | +-+--+ | Found 1 items | | drwxr-xr-x - hduser supergroup 0 2014-09-23 12:17 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/delta_421_421 | +-+--+ 2 rows selected (0.014 seconds) {noformat} 6. Doing a major compaction solves the bug {noformat} 0: jdbc:hive2://nc-h04:1/casino alter table encaissement_1b_64m compact 'major'; No rows affected (0.046 seconds) 0: jdbc:hive2://nc-h04:1/casino dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/; ++--+ | DFS Output | ++--+ | Found 1 items | | drwxr-xr-x - hduser supergroup 0 2014-09-23 12:21 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/base_421 | ++--+ 2 rows selected (0.02 seconds) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8282) Potential null deference in ConvertJoinMapJoin#convertJoinBucketMapJoin()
[ https://issues.apache.org/jira/browse/HIVE-8282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanghyun Yun updated HIVE-8282: --- Attachment: HIVE-8282.patch I added null check and logging. Please review, [~yuzhih...@gmail.com]. :) Potential null deference in ConvertJoinMapJoin#convertJoinBucketMapJoin() - Key: HIVE-8282 URL: https://issues.apache.org/jira/browse/HIVE-8282 Project: Hive Issue Type: Bug Reporter: Ted Yu Priority: Minor Attachments: HIVE-8282.patch In convertJoinMapJoin(): {code} for (Operator? extends OperatorDesc parentOp : joinOp.getParentOperators()) { if (parentOp instanceof MuxOperator) { return null; } } {code} NPE would result if convertJoinMapJoin() returns null: {code} MapJoinOperator mapJoinOp = convertJoinMapJoin(joinOp, context, bigTablePosition); MapJoinDesc joinDesc = mapJoinOp.getConf(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8282) Potential null deference in ConvertJoinMapJoin#convertJoinBucketMapJoin()
[ https://issues.apache.org/jira/browse/HIVE-8282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanghyun Yun updated HIVE-8282: --- Affects Version/s: 0.14.0 Status: Patch Available (was: Open) Potential null deference in ConvertJoinMapJoin#convertJoinBucketMapJoin() - Key: HIVE-8282 URL: https://issues.apache.org/jira/browse/HIVE-8282 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Ted Yu Priority: Minor Attachments: HIVE-8282.patch In convertJoinMapJoin(): {code} for (Operator? extends OperatorDesc parentOp : joinOp.getParentOperators()) { if (parentOp instanceof MuxOperator) { return null; } } {code} NPE would result if convertJoinMapJoin() returns null: {code} MapJoinOperator mapJoinOp = convertJoinMapJoin(joinOp, context, bigTablePosition); MapJoinDesc joinDesc = mapJoinOp.getConf(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8231) Error when insert into empty table with ACID
[ https://issues.apache.org/jira/browse/HIVE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153012#comment-14153012 ] Damien Carol commented on HIVE-8231: Another strange result, when I'm doing this query: {code} select ROW__ID, INPUT__FILE__NAME, * from foo7; {code} I'm having this result : {noformat} +---+-+--+--+ |row__id| input__file__name | foo7.id | +---+-+--+--+ | {transactionid:536,bucketid:0,rowid:0} | hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_542/bucket_0 | 1 | | {transactionid:537,bucketid:0,rowid:0} | hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_542/bucket_0 | 1 | | {transactionid:538,bucketid:0,rowid:0} | hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_542/bucket_0 | 1 | | {transactionid:539,bucketid:0,rowid:0} | hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_542/bucket_0 | 1 | | {transactionid:540,bucketid:0,rowid:0} | hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_542/bucket_0 | 1 | | {transactionid:541,bucketid:0,rowid:0} | hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_542/bucket_0 | 1 | | {transactionid:542,bucketid:0,rowid:0} | hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_542/bucket_0 | 1 | | {transactionid:544,bucketid:0,rowid:0} | hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_542/bucket_0 | 1 | | {transactionid:545,bucketid:0,rowid:0} | hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_542/bucket_0 | 1 | | {transactionid:546,bucketid:0,rowid:0} | hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_542/bucket_0 | 1 | +---+-+--+--+ 10 rows selected (0.168 seconds) {noformat} Which is wrong. See here : {noformat} 0: jdbc:hive2://nc-h04:1/casino dfs -ls /user/hive/warehouse/casino.db/foo7; +-+--+ | DFS Output | +-+--+ | Found 4 items | | drwxr-xr-x - hduser supergroup 0 2014-09-30 11:29 /user/hive/warehouse/casino.db/foo7/base_542 | | drwxr-xr-x - hduser supergroup 0 2014-09-30 11:30 /user/hive/warehouse/casino.db/foo7/delta_544_544 | | drwxr-xr-x - hduser supergroup 0 2014-09-30 11:30 /user/hive/warehouse/casino.db/foo7/delta_545_545 | | drwxr-xr-x - hduser supergroup 0 2014-09-30 11:30 /user/hive/warehouse/casino.db/foo7/delta_546_546 | +-+--+ 5 rows selected (0.025 seconds) {noformat} Error when insert into empty table with ACID Key: HIVE-8231 URL: https://issues.apache.org/jira/browse/HIVE-8231 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Critical Fix For: 0.14.0 Steps to show the bug : 1. create table {code} create table encaissement_1b_64m like encaissement_1b; {code} 2. check table {code} desc encaissement_1b_64m; dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m; {code} everything is ok: {noformat} 0: jdbc:hive2://nc-h04:1/casino desc encaissement_1b_64m; +++--+--+ | col_name | data_type | comment | +++--+--+ | id | int| | | idmagasin | int| | | zibzin | string | | | cheque | int| | | montant| double | | | date | timestamp | | | col_6 | string | | | col_7 | string | | | col_8 | string |
[jira] [Commented] (HIVE-8231) Error when insert into empty table with ACID
[ https://issues.apache.org/jira/browse/HIVE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153022#comment-14153022 ] Damien Carol commented on HIVE-8231: With block offset VC : {noformat} +---+-+--+--+--+ |row__id| input__file__name | block__offset__inside__file | foo7.id | +---+-+--+--+--+ | {transactionid:536,bucketid:0,rowid:0} | hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_546/bucket_0 | 61 | 1| | {transactionid:537,bucketid:0,rowid:0} | hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_546/bucket_0 | 122 | 1| | {transactionid:538,bucketid:0,rowid:0} | hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_546/bucket_0 | 183 | 1| | {transactionid:539,bucketid:0,rowid:0} | hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_546/bucket_0 | 244 | 1| | {transactionid:540,bucketid:0,rowid:0} | hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_546/bucket_0 | 306 | 1| | {transactionid:541,bucketid:0,rowid:0} | hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_546/bucket_0 | 367 | 1| | {transactionid:542,bucketid:0,rowid:0} | hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_546/bucket_0 | 428 | 1| | {transactionid:544,bucketid:0,rowid:0} | hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_546/bucket_0 | 489 | 1| | {transactionid:545,bucketid:0,rowid:0} | hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_546/bucket_0 | 550 | 1| | {transactionid:546,bucketid:0,rowid:0} | hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_546/bucket_0 | 612 | 1| | {transactionid:547,bucketid:0,rowid:0} | hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_546/bucket_0 | 612 | 1| | {transactionid:548,bucketid:0,rowid:0} | hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_546/bucket_0 | 612 | 1| | {transactionid:549,bucketid:0,rowid:0} | hdfs://nc-h04/user/hive/warehouse/casino.db/foo7/base_546/bucket_0 | 612 | 1| +---+-+--+--+--+ 13 rows selected (0.162 seconds) {noformat} Column {{input\_\_file\_\_name}} and {{block\_\_offset\_\_inside\_\_file}} have wrong values for the last 3 rows. These rows are in delta directories. Error when insert into empty table with ACID Key: HIVE-8231 URL: https://issues.apache.org/jira/browse/HIVE-8231 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Critical Fix For: 0.14.0 Steps to show the bug : 1. create table {code} create table encaissement_1b_64m like encaissement_1b; {code} 2. check table {code} desc encaissement_1b_64m; dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m; {code} everything is ok: {noformat} 0: jdbc:hive2://nc-h04:1/casino desc encaissement_1b_64m; +++--+--+ | col_name | data_type | comment | +++--+--+ | id | int| | | idmagasin | int| | | zibzin | string | | | cheque | int| | | montant| double | | | date | timestamp | | | col_6 | string | | | col_7 | string | | | col_8 | string | | +++--+--+ 9 rows selected (0.158 seconds) 0: jdbc:hive2://nc-h04:1/casino dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/; +-+--+ | DFS Output | +-+--+ +-+--+ No rows selected (0.01 seconds) {noformat} 3.
[jira] [Updated] (HIVE-7689) Enable Postgres as METASTORE back-end
[ https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-7689: --- Attachment: HIVE-7689.9.patch Rebased on last trunk. Enable Postgres as METASTORE back-end - Key: HIVE-7689 URL: https://issues.apache.org/jira/browse/HIVE-7689 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Blocker Labels: metastore, postgres Fix For: 0.14.0 Attachments: HIVE-7689.5.patch, HIVE-7689.6.patch, HIVE-7689.7.patch, HIVE-7689.8.patch, HIVE-7689.9.patch, HIVE-7889.1.patch, HIVE-7889.2.patch, HIVE-7889.3.patch, HIVE-7889.4.patch I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This patch enable LOCKS, COMPACTION and fix error in STATS on postgres metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7689) Enable Postgres as METASTORE back-end
[ https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-7689: --- Description: This patch fix wrong lower case tables names in Postgres Metastore back end. I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This patch enable LOCKS, COMPACTION and STATS on postgres metastore. was: I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This patch enable LOCKS, COMPACTION and fix error in STATS on postgres metastore. Enable Postgres as METASTORE back-end - Key: HIVE-7689 URL: https://issues.apache.org/jira/browse/HIVE-7689 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Blocker Labels: metastore, postgres Fix For: 0.14.0 Attachments: HIVE-7689.5.patch, HIVE-7689.6.patch, HIVE-7689.7.patch, HIVE-7689.8.patch, HIVE-7689.9.patch, HIVE-7889.1.patch, HIVE-7889.2.patch, HIVE-7889.3.patch, HIVE-7889.4.patch This patch fix wrong lower case tables names in Postgres Metastore back end. I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This patch enable LOCKS, COMPACTION and STATS on postgres metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7689) Fix wrong lower case table names in Postgres Metastore back end
[ https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-7689: --- Summary: Fix wrong lower case table names in Postgres Metastore back end (was: Enable Postgres as METASTORE back-end) Fix wrong lower case table names in Postgres Metastore back end --- Key: HIVE-7689 URL: https://issues.apache.org/jira/browse/HIVE-7689 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Blocker Labels: metastore, postgres Fix For: 0.14.0 Attachments: HIVE-7689.5.patch, HIVE-7689.6.patch, HIVE-7689.7.patch, HIVE-7689.8.patch, HIVE-7689.9.patch, HIVE-7889.1.patch, HIVE-7889.2.patch, HIVE-7889.3.patch, HIVE-7889.4.patch This patch fix wrong lower case tables names in Postgres Metastore back end. I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This patch enable LOCKS, COMPACTION and STATS on postgres metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7689) Fix wrong lower case table names in Postgres Metastore back end
[ https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-7689: --- Description: Current 0.14 patch create table with lower case nmae. This patch fix wrong lower case tables names in Postgres Metastore back end. was: This patch fix wrong lower case tables names in Postgres Metastore back end. I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This patch enable LOCKS, COMPACTION and STATS on postgres metastore. Fix wrong lower case table names in Postgres Metastore back end --- Key: HIVE-7689 URL: https://issues.apache.org/jira/browse/HIVE-7689 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Blocker Labels: metastore, postgres Fix For: 0.14.0 Attachments: HIVE-7689.5.patch, HIVE-7689.6.patch, HIVE-7689.7.patch, HIVE-7689.8.patch, HIVE-7689.9.patch, HIVE-7889.1.patch, HIVE-7889.2.patch, HIVE-7889.3.patch, HIVE-7889.4.patch Current 0.14 patch create table with lower case nmae. This patch fix wrong lower case tables names in Postgres Metastore back end. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8296) Tez ReduceShuffle Vectorization needs 2 data buffers (key and value) for adding rows
[ https://issues.apache.org/jira/browse/HIVE-8296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153040#comment-14153040 ] Hive QA commented on HIVE-8296: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12671904/HIVE-8296.02.patch {color:green}SUCCESS:{color} +1 6371 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1049/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1049/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1049/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12671904 Tez ReduceShuffle Vectorization needs 2 data buffers (key and value) for adding rows Key: HIVE-8296 URL: https://issues.apache.org/jira/browse/HIVE-8296 Project: Hive Issue Type: Bug Components: Tez, Vectorization Affects Versions: 0.14.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8296.01.patch, HIVE-8296.02.patch We reuse the keys for the vectorized row batch and need to use a separate buffer (for strings) for reuse the batch for new values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7689) Fix wrong lower case table names in Postgres Metastore back end
[ https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-7689: --- Description: Current 0.14 patch create table with lower case names. This patch fix wrong lower case tables names in Postgres Metastore back end. was: Current 0.14 patch create table with lower case nmae. This patch fix wrong lower case tables names in Postgres Metastore back end. Fix wrong lower case table names in Postgres Metastore back end --- Key: HIVE-7689 URL: https://issues.apache.org/jira/browse/HIVE-7689 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Blocker Labels: metastore, postgres Fix For: 0.14.0 Attachments: HIVE-7689.5.patch, HIVE-7689.6.patch, HIVE-7689.7.patch, HIVE-7689.8.patch, HIVE-7689.9.patch, HIVE-7889.1.patch, HIVE-7889.2.patch, HIVE-7889.3.patch, HIVE-7889.4.patch Current 0.14 patch create table with lower case names. This patch fix wrong lower case tables names in Postgres Metastore back end. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 26172: HiveServer2 in http-kerberos doAs=true is failing with org.apache.hadoop.security.AccessControlException
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26172/ --- Review request for hive and Thejas Nair. Bugs: HIVE-8299 https://issues.apache.org/jira/browse/HIVE-8299 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-8299 Diffs - service/src/java/org/apache/hive/service/auth/HiveAuthFactory.java a0f7667 service/src/java/org/apache/hive/service/auth/HttpAuthUtils.java 07e8c9a service/src/java/org/apache/hive/service/auth/HttpCLIServiceUGIProcessor.java 245d793 service/src/java/org/apache/hive/service/auth/TSetIpAddressProcessor.java 0149dcf service/src/java/org/apache/hive/service/cli/session/SessionManager.java 4654acc service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java c4b273c service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpCLIService.java 795115e Diff: https://reviews.apache.org/r/26172/diff/ Testing --- Manually on a secure cluster. Thanks, Vaibhav Gumashta
[jira] [Updated] (HIVE-8299) HiveServer2 in http-kerberos doAs=true is failing with org.apache.hadoop.security.AccessControlException
[ https://issues.apache.org/jira/browse/HIVE-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-8299: --- Attachment: HIVE-8299.1.patch HiveServer2 in http-kerberos doAs=true is failing with org.apache.hadoop.security.AccessControlException -- Key: HIVE-8299 URL: https://issues.apache.org/jira/browse/HIVE-8299 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8299.1.patch The issue is that it does a doAs at processor level and fails at scratch dir creation before the session is opened. Since we are now using a proxy class to implement doAs at HiveSession level, we should get rid of HttpCLIServiceUGIProcessor and related classes that were used before. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8299) HiveServer2 in http-kerberos doAs=true is failing with org.apache.hadoop.security.AccessControlException
[ https://issues.apache.org/jira/browse/HIVE-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-8299: --- Status: Patch Available (was: Open) HiveServer2 in http-kerberos doAs=true is failing with org.apache.hadoop.security.AccessControlException -- Key: HIVE-8299 URL: https://issues.apache.org/jira/browse/HIVE-8299 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8299.1.patch The issue is that it does a doAs at processor level and fails at scratch dir creation before the session is opened. Since we are now using a proxy class to implement doAs at HiveSession level, we should get rid of HttpCLIServiceUGIProcessor and related classes that were used before. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8298) Incorrect results for n-way join when join expressions are not in same order across joins
[ https://issues.apache.org/jira/browse/HIVE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153099#comment-14153099 ] Hive QA commented on HIVE-8298: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12671935/HIVE-8298.patch {color:green}SUCCESS:{color} +1 6371 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1051/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1051/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1051/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12671935 Incorrect results for n-way join when join expressions are not in same order across joins - Key: HIVE-8298 URL: https://issues.apache.org/jira/browse/HIVE-8298 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.13.0, 0.13.1 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Priority: Blocker Attachments: HIVE-8298.patch select * from srcpart a join srcpart b on a.key = b.key and a.hr = b.hr join srcpart c on a.hr = c.hr and a.key = c.key; is minimal query which reproduces it -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8151) Dynamic partition sort optimization inserts record wrongly to partition when used with GroupBy
[ https://issues.apache.org/jira/browse/HIVE-8151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153118#comment-14153118 ] Zhichun Wu commented on HIVE-8151: -- @ [~prasanth_j], after applying HIVE-8151.7.patch , the bug still exists, here is the testcase: {code} use test; set hive.exec.dynamic.partition.mode=nonstrict; set hive.exec.dynamic.partition=true; set hive.optimize.sort.dynamic.partition=true; drop table if exists src1; create table src1 ( key int, val string ); load data local inpath '../hive/examples/files/kv1.txt' overwrite into table src1; drop table if exists hive13_dp1; create table if not exists hive13_dp1 ( k1 int, k2 int ) PARTITIONED BY(`day` string COMMENT 'days') STORED AS ORC; insert overwrite table `hive13_dp1` partition(`day`) select key k1, count(val) k2, day `day` from src1 group by day, key; select * from hive13_dp1 limit 5; {code} Dynamic partition sort optimization inserts record wrongly to partition when used with GroupBy -- Key: HIVE-8151 URL: https://issues.apache.org/jira/browse/HIVE-8151 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Reporter: Prasanth J Assignee: Prasanth J Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8151.1.patch, HIVE-8151.2.patch, HIVE-8151.3.patch, HIVE-8151.4.patch, HIVE-8151.5.patch, HIVE-8151.6.patch, HIVE-8151.7.patch HIVE-6455 added dynamic partition sort optimization. It added startGroup() method to FileSink operator to look for changes in reduce key for creating partition directories. This method however is not reliable as the key called with startGroup() is different from the key called with processOp(). startGroup() is called with newly changed key whereas processOp() is called with previously aggregated key. This will result in processOp() writing the last row of previous group as the first row of next group. This happens only when used with group by operator. The fix is to not rely on startGroup() and do the partition directory creation in processOp() itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8196) Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance
[ https://issues.apache.org/jira/browse/HIVE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153149#comment-14153149 ] Hive QA commented on HIVE-8196: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12671936/HIVE-8196.6.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6371 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1052/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1052/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1052/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12671936 Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance - Key: HIVE-8196 URL: https://issues.apache.org/jira/browse/HIVE-8196 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Prasanth J Priority: Blocker Labels: performance Fix For: 0.14.0 Attachments: HIVE-8196.1.patch, HIVE-8196.2.patch, HIVE-8196.3.patch, HIVE-8196.4.patch, HIVE-8196.5.patch, HIVE-8196.6.patch To make the best out of dynamic partition pruning joins should be on the partitioning columns which results in dynamically pruning the partitions from the fact table based on the qualifying column keys from the dimension table, this type of joins negatively effects on cardinality estimates with fetch column stats enabled. Currently we don't have statistics for partition columns and as a result NDV is set to row count, doing that negatively affects the estimated join selectivity from the join. Workaround is to capture statistics for partition columns or use number of partitions incase dynamic partitioning is used. In StatsUtils.getColStatisticsFromExpression is where count distincts gets set to row count {code} if (encd.getIsPartitionColOrVirtualCol()) { // vitual columns colType = encd.getTypeInfo().getTypeName(); countDistincts = numRows; oi = encd.getWritableObjectInspector(); {code} Query used to repro the issue : {code} set hive.stats.fetch.column.stats=true; set hive.tez.dynamic.partition.pruning=true; explain select d_date from store_sales, date_dim where store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 1998; {code} Plan {code} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 1 - Map 2 (BROADCAST_EDGE) DagName: mmokhtar_20140919180404_945d29f5-d041-4420-9666-1c5d64fa6540:8 Vertices: Map 1 Map Operator Tree: TableScan alias: store_sales filterExpr: ss_sold_date_sk is not null (type: boolean) Statistics: Num rows: 550076554 Data size: 47370018816 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {ss_sold_date_sk} 1 {d_date_sk} {d_date} keys: 0 ss_sold_date_sk (type: int) 1 d_date_sk (type: int) outputColumnNames: _col22, _col26, _col28 input vertices: 1 Map 2 Statistics: Num rows: 652 Data size: 66504 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (_col22 = _col26) (type: boolean) Statistics: Num rows: 326 Data size: 33252 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col28 (type: string) outputColumnNames: _col0
[jira] [Updated] (HIVE-8300) Missing guava lib causes IllegalStateException when deserializing a task [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8300: -- Resolution: Fixed Fix Version/s: spark-branch Status: Resolved (was: Patch Available) Patch committed to Spark branch. Missing guava lib causes IllegalStateException when deserializing a task [Spark Branch] --- Key: HIVE-8300 URL: https://issues.apache.org/jira/browse/HIVE-8300 Project: Hive Issue Type: Bug Components: Spark Environment: Spark-1.2.0-SNAPSHOT Reporter: Rui Li Fix For: spark-branch Attachments: HIVE-8300.1-spark.patch In spark-1.2, we have guava shaded in spark-assembly. And we only ship hive-exec to spark cluster. So spark executor won't have (original) guava in its class path. This can cause some problem when TaskRunner deserializes a task, and throws something like this: {code} org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, node13-1): java.lang.IllegalStateException: unread block data java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:164) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) {code} We may have to verify this issue and ship guava to spark cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8285) Reference equality is used on boolean values in PartitionPruner#removeTruePredciates()
[ https://issues.apache.org/jira/browse/HIVE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153220#comment-14153220 ] Ted Yu commented on HIVE-8285: -- +1 Reference equality is used on boolean values in PartitionPruner#removeTruePredciates() -- Key: HIVE-8285 URL: https://issues.apache.org/jira/browse/HIVE-8285 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Ted Yu Priority: Minor Attachments: HIVE-8285.patch {code} if (e.getTypeInfo() == TypeInfoFactory.booleanTypeInfo eC.getValue() == Boolean.TRUE) { {code} equals() should be used in the above comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8282) Potential null deference in ConvertJoinMapJoin#convertJoinBucketMapJoin()
[ https://issues.apache.org/jira/browse/HIVE-8282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153232#comment-14153232 ] Ted Yu commented on HIVE-8282: -- lgtm nit: 'bucket map join' was mentioned in the log message @ line 321. It should appear in the new message as well. Potential null deference in ConvertJoinMapJoin#convertJoinBucketMapJoin() - Key: HIVE-8282 URL: https://issues.apache.org/jira/browse/HIVE-8282 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Ted Yu Priority: Minor Attachments: HIVE-8282.patch In convertJoinMapJoin(): {code} for (Operator? extends OperatorDesc parentOp : joinOp.getParentOperators()) { if (parentOp instanceof MuxOperator) { return null; } } {code} NPE would result if convertJoinMapJoin() returns null: {code} MapJoinOperator mapJoinOp = convertJoinMapJoin(joinOp, context, bigTablePosition); MapJoinDesc joinDesc = mapJoinOp.getConf(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8278) Restoring a graph representation of SparkPlan [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8278: -- Resolution: Fixed Fix Version/s: spark-branch Status: Resolved (was: Patch Available) Patch committed to Spark branch. Thanks to Chao for the nice contribuiton. Restoring a graph representation of SparkPlan [Spark Branch] Key: HIVE-8278 URL: https://issues.apache.org/jira/browse/HIVE-8278 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chao Assignee: Chao Fix For: spark-branch Attachments: HIVE-8278.1-spark.patch, HIVE-8278.2-spark.patch, HIVE-8278.3-spark.patch HIVE-8249 greatly simply file the SparkPlan model and the SparkPlanGenerator logic. As a side effect, however, a visual representation of SparkPlan got lost. Such representation is helpful for debugging and performance profiling. In addition, it would be also good to separate plan generation and plan execution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8302) GroupByShuffler.java missing apache license header [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8302: -- Resolution: Fixed Fix Version/s: spark-branch Status: Resolved (was: Patch Available) Patch committed to Spark branch. Thanks to Chao for the contribution. GroupByShuffler.java missing apache license header [Spark Branch] - Key: HIVE-8302 URL: https://issues.apache.org/jira/browse/HIVE-8302 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chao Fix For: spark-branch Attachments: HIVE-8302.1-spark.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8263) CBO : TPC-DS Q64 is item is joined last with store_sales while it should be first as it is the most selective
[ https://issues.apache.org/jira/browse/HIVE-8263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153237#comment-14153237 ] Hive QA commented on HIVE-8263: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12671940/HIVE-8263.1.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6371 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_bigdata {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1053/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1053/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1053/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12671940 CBO : TPC-DS Q64 is item is joined last with store_sales while it should be first as it is the most selective - Key: HIVE-8263 URL: https://issues.apache.org/jira/browse/HIVE-8263 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Harish Butani Fix For: 0.14.0 Attachments: HIVE-8263.1.patch, Q64_cbo_on_explain_log.txt.zip Plan for TPC-DS Q64 shows that item is joined last with store_sales while store_sales x item is the most selective join in the plan. Interestingly predicate push down is applied on item but item comes so late in the join which most likely means that calculation of the join selectivity gave too high of a number of it was never considered. This is a subset of the logical plan showing that item was joined very last {code} HiveProjectRel(_o__col0=[$0], _o__col1=[$2], _o__col2=[$3], _o__col3=[$4], _o__col4=[$5], _o__col5=[$6], _o__col6=[$7], _o__col7=[$8], _o__col8=[$9], _o__col9=[$10], _o__col10=[$11], _o__col11=[$12], _o__col12=[$13], _o__col13=[$14], _o__col14=[$15], _o__col15=[$16], _o__col16=[$22], _o__col17=[$23], _o__col18=[$24], _o__col19=[$20], _o__col20=[$21]): rowcount = 1.0, cumulative cost = {1.1593403796322412E9 rows, 0.0 cpu, 0.0 io}, id = 990 HiveFilterRel(condition=[=($21, $13)]): rowcount = 1.0, cumulative cost = {1.1593403796322412E9 rows, 0.0 cpu, 0.0 io}, id = 988 HiveProjectRel(_o__col0=[$0], _o__col1=[$1], _o__col2=[$2], _o__col3=[$3], _o__col4=[$4], _o__col5=[$5], _o__col6=[$6], _o__col7=[$7], _o__col8=[$8], _o__col9=[$9], _o__col10=[$10], _o__col11=[$11], _o__col12=[$12], _o__col15=[$13], _o__col16=[$14], _o__col17=[$15], _o__col18=[$16], _o__col13=[$17], _o__col20=[$18], _o__col30=[$19], _o__col120=[$20], _o__col150=[$21], _o__col160=[$22], _o__col170=[$23], _o__col180=[$24]): rowcount = 1.0, cumulative cost = {1.1593403796322412E9 rows, 0.0 cpu, 0.0 io}, id = 3571 HiveJoinRel(condition=[AND(AND(=($1, $17), =($2, $18)), =($3, $19))], joinType=[inner]): rowcount = 1.0, cumulative cost = {1.1593403796322412E9 rows, 0.0 cpu, 0.0 io}, id = 3566 HiveProjectRel(_o__col0=[$0], _o__col1=[$1], _o__col2=[$2], _o__col3=[$3], _o__col4=[$4], _o__col5=[$5], _o__col6=[$6], _o__col7=[$7], _o__col8=[$8], _o__col9=[$9], _o__col10=[$10], _o__col11=[$11], _o__col12=[$12], _o__col15=[$15], _o__col16=[$16], _o__col17=[$17], _o__col18=[$18]): rowcount = 1.0, cumulative cost = {1.1593403776322412E9 rows, 0.0 cpu, 0.0 io}, id = 890 HiveFilterRel(condition=[=($12, 2000)]): rowcount = 1.0, cumulative cost = {1.1593403776322412E9 rows, 0.0 cpu, 0.0 io}, id = 888 HiveAggregateRel(group=[{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14}], agg#0=[count()], agg#1=[sum($15)], agg#2=[sum($16)], agg#3=[sum($17)]): rowcount = 1.0, cumulative cost = {1.1593403776322412E9 rows, 0.0 cpu, 0.0 io}, id = 886 HiveProjectRel($f0=[$53], $f1=[$50], $f2=[$27], $f3=[$28], $f4=[$39], $f5=[$40], $f6=[$41], $f7=[$42], $f8=[$44], $f9=[$45], $f10=[$46], $f11=[$47], $f12=[$21], $f13=[$23], $f14=[$25], $f15=[$9], $f16=[$10], $f17=[$11]): rowcount = 1.0, cumulative cost = {1.1593403776322412E9 rows, 0.0 cpu, 0.0 io}, id = 884 HiveProjectRel(ss_sold_date_sk=[$17], ss_item_sk=[$18], ss_customer_sk=[$19], ss_cdemo_sk=[$20], ss_hdemo_sk=[$21], ss_addr_sk=[$22], ss_store_sk=[$23], ss_promo_sk=[$24],
[jira] [Created] (HIVE-8307) null character in columns.comments schema property breaks jobconf.xml
Carl Laird created HIVE-8307: Summary: null character in columns.comments schema property breaks jobconf.xml Key: HIVE-8307 URL: https://issues.apache.org/jira/browse/HIVE-8307 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.13.1, 0.13.0 Reporter: Carl Laird It would appear that the fix for https://issues.apache.org/jira/browse/HIVE-6681 is causing the null character to show up in job config xml files: I get the following when trying to insert into an elasticsearch backed table: [Fatal Error] :336:51: Character reference # 14/06/17 14:40:11 FATAL conf.Configuration: error parsing conf file: org.xml.sax.SAXParseException; lineNumber: 336; columnNumber: 51; Character reference # Exception in thread main java.lang.RuntimeException: org.xml.sax.SAXParseException; lineNumber: 336; columnNumber: 51; Character reference # at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1263) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1129) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1063) at org.apache.hadoop.conf.Configuration.get(Configuration.java:416) at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:604) at org.apache.hadoop.hive.conf.HiveConf.getBoolVar(HiveConf.java:1273) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:667) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: org.xml.sax.SAXParseException; lineNumber: 336; columnNumber: 51; Character reference # at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:251) at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:300) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1181) ... 11 more Execution failed with exit status: 1 Line 336 of jobconf.xml: propertynamecolumns.comments/namevalue#0;#0;#0;#0;#0;#0;#0;#0;#0;#0;#0;#0;/value/property See https://groups.google.com/forum/#!msg/mongodb-user/lKbha0SzMP8/jvE8ZrJom4AJ for more discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6681) Describe table sometimes shows from deserializer for column comments
[ https://issues.apache.org/jira/browse/HIVE-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153278#comment-14153278 ] Carl Laird commented on HIVE-6681: -- I believe this fix has caused another issue: https://issues.apache.org/jira/browse/HIVE-8307 Describe table sometimes shows from deserializer for column comments -- Key: HIVE-6681 URL: https://issues.apache.org/jira/browse/HIVE-6681 Project: Hive Issue Type: Bug Components: Metastore, Serializers/Deserializers Affects Versions: 0.11.0, 0.12.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.13.0 Attachments: HIVE-6681.2.patch, HIVE-6681.3.patch, HIVE-6681.4.patch, HIVE-6681.5.patch, HIVE-6681.6.patch, HIVE-6681.7.patch, HIVE-6681.8.patch, HIVE-6681.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8231) Error when insert into empty table with ACID
[ https://issues.apache.org/jira/browse/HIVE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153279#comment-14153279 ] Alan Gates commented on HIVE-8231: -- Ok, I'm not sure if we're chasing the same bug or not. But I'll keep chasing the one I see and if we get lucky it will turn out to have the same root cause. Could you turn on debug level logging on your hive client and HiveServer2 instance, then do the insert and select that reproduces the error and attach both logs. That would help me have an idea where things may be going wrong. Error when insert into empty table with ACID Key: HIVE-8231 URL: https://issues.apache.org/jira/browse/HIVE-8231 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Critical Fix For: 0.14.0 Steps to show the bug : 1. create table {code} create table encaissement_1b_64m like encaissement_1b; {code} 2. check table {code} desc encaissement_1b_64m; dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m; {code} everything is ok: {noformat} 0: jdbc:hive2://nc-h04:1/casino desc encaissement_1b_64m; +++--+--+ | col_name | data_type | comment | +++--+--+ | id | int| | | idmagasin | int| | | zibzin | string | | | cheque | int| | | montant| double | | | date | timestamp | | | col_6 | string | | | col_7 | string | | | col_8 | string | | +++--+--+ 9 rows selected (0.158 seconds) 0: jdbc:hive2://nc-h04:1/casino dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/; +-+--+ | DFS Output | +-+--+ +-+--+ No rows selected (0.01 seconds) {noformat} 3. Insert values into the new table {noformat} insert into table encaissement_1b_64m VALUES (1, 1, '8909', 1, 12.5, '12/05/2014', '','',''); {noformat} 4. Check {noformat} 0: jdbc:hive2://nc-h04:1/casino select id from encaissement_1b_64m; +-+--+ | id | +-+--+ +-+--+ No rows selected (0.091 seconds) {noformat} There are already a pb. I don't see the inserted row. 5. When I'm checking HDFS directory, I see {{delta_421_421}} folder {noformat} 0: jdbc:hive2://nc-h04:1/casino dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/; +-+--+ | DFS Output | +-+--+ | Found 1 items | | drwxr-xr-x - hduser supergroup 0 2014-09-23 12:17 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/delta_421_421 | +-+--+ 2 rows selected (0.014 seconds) {noformat} 6. Doing a major compaction solves the bug {noformat} 0: jdbc:hive2://nc-h04:1/casino alter table encaissement_1b_64m compact 'major'; No rows affected (0.046 seconds) 0: jdbc:hive2://nc-h04:1/casino dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/; ++--+ | DFS Output | ++--+ | Found 1 items | | drwxr-xr-x - hduser supergroup 0 2014-09-23 12:21 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/base_421 |
[jira] [Commented] (HIVE-8231) Error when insert into empty table with ACID
[ https://issues.apache.org/jira/browse/HIVE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153281#comment-14153281 ] Alan Gates commented on HIVE-8231: -- I mean JDBC client, not hive client. Error when insert into empty table with ACID Key: HIVE-8231 URL: https://issues.apache.org/jira/browse/HIVE-8231 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Critical Fix For: 0.14.0 Steps to show the bug : 1. create table {code} create table encaissement_1b_64m like encaissement_1b; {code} 2. check table {code} desc encaissement_1b_64m; dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m; {code} everything is ok: {noformat} 0: jdbc:hive2://nc-h04:1/casino desc encaissement_1b_64m; +++--+--+ | col_name | data_type | comment | +++--+--+ | id | int| | | idmagasin | int| | | zibzin | string | | | cheque | int| | | montant| double | | | date | timestamp | | | col_6 | string | | | col_7 | string | | | col_8 | string | | +++--+--+ 9 rows selected (0.158 seconds) 0: jdbc:hive2://nc-h04:1/casino dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/; +-+--+ | DFS Output | +-+--+ +-+--+ No rows selected (0.01 seconds) {noformat} 3. Insert values into the new table {noformat} insert into table encaissement_1b_64m VALUES (1, 1, '8909', 1, 12.5, '12/05/2014', '','',''); {noformat} 4. Check {noformat} 0: jdbc:hive2://nc-h04:1/casino select id from encaissement_1b_64m; +-+--+ | id | +-+--+ +-+--+ No rows selected (0.091 seconds) {noformat} There are already a pb. I don't see the inserted row. 5. When I'm checking HDFS directory, I see {{delta_421_421}} folder {noformat} 0: jdbc:hive2://nc-h04:1/casino dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/; +-+--+ | DFS Output | +-+--+ | Found 1 items | | drwxr-xr-x - hduser supergroup 0 2014-09-23 12:17 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/delta_421_421 | +-+--+ 2 rows selected (0.014 seconds) {noformat} 6. Doing a major compaction solves the bug {noformat} 0: jdbc:hive2://nc-h04:1/casino alter table encaissement_1b_64m compact 'major'; No rows affected (0.046 seconds) 0: jdbc:hive2://nc-h04:1/casino dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/; ++--+ | DFS Output | ++--+ | Found 1 items | | drwxr-xr-x - hduser supergroup 0 2014-09-23 12:21 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/base_421 | ++--+ 2 rows selected (0.02 seconds) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6905) Implement Auto increment, primary-foreign Key, not null constraints and default value in Hive Table columns
[ https://issues.apache.org/jira/browse/HIVE-6905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153297#comment-14153297 ] Greg W commented on HIVE-6905: -- Now that HIVE-5317 is resolved, is it still conceivable this feature (particularly the auto-increment component) will be available in Hive 0.14? Implement Auto increment, primary-foreign Key, not null constraints and default value in Hive Table columns Key: HIVE-6905 URL: https://issues.apache.org/jira/browse/HIVE-6905 Project: Hive Issue Type: New Feature Components: Database/Schema Affects Versions: 0.14.0 Reporter: Pardeep Kumar For Hive to replace a modern datawarehouse based on RDBMS, it must have support for keys, constraints, auto-increment values, surrogate keys and not null features etc. Many customers do not move their EDW to Hive due to these reasons as these have been challenging to maintain in Hive. This must be implemented once https://issues.apache.org/jira/browse/HIVE-5317 for Updates, Deletes and Inserts are done in Hive. This should be next stop for Hive enhancement to take it closer to a very wide mainstream adoption.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8298) Incorrect results for n-way join when join expressions are not in same order across joins
[ https://issues.apache.org/jira/browse/HIVE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8298: --- Resolution: Fixed Fix Version/s: 0.15.0 Status: Resolved (was: Patch Available) Committed to trunk. [~vikram.dixit] It will be good to have this in 0.14 as well. Incorrect results for n-way join when join expressions are not in same order across joins - Key: HIVE-8298 URL: https://issues.apache.org/jira/browse/HIVE-8298 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.13.0, 0.13.1 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Priority: Blocker Fix For: 0.15.0 Attachments: HIVE-8298.patch select * from srcpart a join srcpart b on a.key = b.key and a.hr = b.hr join srcpart c on a.hr = c.hr and a.key = c.key; is minimal query which reproduces it -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6905) Implement Auto increment, primary-foreign Key, not null constraints and default value in Hive Table columns
[ https://issues.apache.org/jira/browse/HIVE-6905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153321#comment-14153321 ] Damien Carol commented on HIVE-6905: [~grw] Wait few days, I will create a new JIRA with Sequence Generator after 0.14 release. My implementation is less intrusive than my previous comment. Anyway, this feature must wait 0.14 release because few bugs in ACID code still there. Implement Auto increment, primary-foreign Key, not null constraints and default value in Hive Table columns Key: HIVE-6905 URL: https://issues.apache.org/jira/browse/HIVE-6905 Project: Hive Issue Type: New Feature Components: Database/Schema Affects Versions: 0.14.0 Reporter: Pardeep Kumar For Hive to replace a modern datawarehouse based on RDBMS, it must have support for keys, constraints, auto-increment values, surrogate keys and not null features etc. Many customers do not move their EDW to Hive due to these reasons as these have been challenging to maintain in Hive. This must be implemented once https://issues.apache.org/jira/browse/HIVE-5317 for Updates, Deletes and Inserts are done in Hive. This should be next stop for Hive enhancement to take it closer to a very wide mainstream adoption.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7689) Fix wrong lower case table names in Postgres Metastore back end
[ https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153326#comment-14153326 ] Brock Noland commented on HIVE-7689: [~damien.carol] do you mean without the double quotes, the tables are created as lowercase and thus do not work? Fix wrong lower case table names in Postgres Metastore back end --- Key: HIVE-7689 URL: https://issues.apache.org/jira/browse/HIVE-7689 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Blocker Labels: metastore, postgres Fix For: 0.14.0 Attachments: HIVE-7689.5.patch, HIVE-7689.6.patch, HIVE-7689.7.patch, HIVE-7689.8.patch, HIVE-7689.9.patch, HIVE-7889.1.patch, HIVE-7889.2.patch, HIVE-7889.3.patch, HIVE-7889.4.patch Current 0.14 patch create table with lower case names. This patch fix wrong lower case tables names in Postgres Metastore back end. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4224) Upgrade to Thrift 1.0 when available
[ https://issues.apache.org/jira/browse/HIVE-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153329#comment-14153329 ] Brock Noland commented on HIVE-4224: [~nemon] that'd be great. Can you create a separate JIRA to do that upgrade? Upgrade to Thrift 1.0 when available Key: HIVE-4224 URL: https://issues.apache.org/jira/browse/HIVE-4224 Project: Hive Issue Type: Sub-task Components: HiveServer2, Metastore, Server Infrastructure Affects Versions: 0.11.0 Reporter: Brock Noland Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8270) JDBC uber jar is missing some classes required in secure setup.
[ https://issues.apache.org/jira/browse/HIVE-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153344#comment-14153344 ] Vikram Dixit K commented on HIVE-8270: -- +1 for 0.14. JDBC uber jar is missing some classes required in secure setup. --- Key: HIVE-8270 URL: https://issues.apache.org/jira/browse/HIVE-8270 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Labels: TODOC14 Fix For: 0.15.0 Attachments: HIVE-8270.1.patch JDBC uber jar is missing some required classes for a secure setup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8298) Incorrect results for n-way join when join expressions are not in same order across joins
[ https://issues.apache.org/jira/browse/HIVE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153345#comment-14153345 ] Vikram Dixit K commented on HIVE-8298: -- +1 for 0.14. Incorrect results for n-way join when join expressions are not in same order across joins - Key: HIVE-8298 URL: https://issues.apache.org/jira/browse/HIVE-8298 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.13.0, 0.13.1 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Priority: Blocker Fix For: 0.15.0 Attachments: HIVE-8298.patch select * from srcpart a join srcpart b on a.key = b.key and a.hr = b.hr join srcpart c on a.hr = c.hr and a.key = c.key; is minimal query which reproduces it -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8290) With DbTxnManager configured, all ORC tables forced to be transactional
[ https://issues.apache.org/jira/browse/HIVE-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153346#comment-14153346 ] Hive QA commented on HIVE-8290: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12671955/HIVE-8290.2.patch {color:green}SUCCESS:{color} +1 6380 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1054/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1054/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1054/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12671955 With DbTxnManager configured, all ORC tables forced to be transactional --- Key: HIVE-8290 URL: https://issues.apache.org/jira/browse/HIVE-8290 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8290.2.patch, HIVE-8290.patch Currently, once a user configures DbTxnManager to the be transaction manager, all tables that use ORC are expected to be transactional. This means they all have to have buckets. This most likely won't be what users want. We need to add a specific mark to a table so that users can indicate it should be treated in a transactional way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8270) JDBC uber jar is missing some classes required in secure setup.
[ https://issues.apache.org/jira/browse/HIVE-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8270: --- Fix Version/s: (was: 0.15.0) 0.14.0 JDBC uber jar is missing some classes required in secure setup. --- Key: HIVE-8270 URL: https://issues.apache.org/jira/browse/HIVE-8270 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-8270.1.patch JDBC uber jar is missing some required classes for a secure setup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8298) Incorrect results for n-way join when join expressions are not in same order across joins
[ https://issues.apache.org/jira/browse/HIVE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8298: --- Fix Version/s: (was: 0.15.0) 0.14.0 Incorrect results for n-way join when join expressions are not in same order across joins - Key: HIVE-8298 URL: https://issues.apache.org/jira/browse/HIVE-8298 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.13.0, 0.13.1 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8298.patch select * from srcpart a join srcpart b on a.key = b.key and a.hr = b.hr join srcpart c on a.hr = c.hr and a.key = c.key; is minimal query which reproduces it -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8270) JDBC uber jar is missing some classes required in secure setup.
[ https://issues.apache.org/jira/browse/HIVE-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153358#comment-14153358 ] Ashutosh Chauhan commented on HIVE-8270: Committed to 0.14 JDBC uber jar is missing some classes required in secure setup. --- Key: HIVE-8270 URL: https://issues.apache.org/jira/browse/HIVE-8270 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-8270.1.patch JDBC uber jar is missing some required classes for a secure setup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8298) Incorrect results for n-way join when join expressions are not in same order across joins
[ https://issues.apache.org/jira/browse/HIVE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153354#comment-14153354 ] Ashutosh Chauhan commented on HIVE-8298: Committed to 0.14 Incorrect results for n-way join when join expressions are not in same order across joins - Key: HIVE-8298 URL: https://issues.apache.org/jira/browse/HIVE-8298 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.13.0, 0.13.1 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8298.patch select * from srcpart a join srcpart b on a.key = b.key and a.hr = b.hr join srcpart c on a.hr = c.hr and a.key = c.key; is minimal query which reproduces it -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8180) Update SparkReduceRecordHandler for processing the vectors [spark branch]
[ https://issues.apache.org/jira/browse/HIVE-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam updated HIVE-8180: --- Attachment: HIVE-8180.3-spark.patch Removed trailing spaces. Update SparkReduceRecordHandler for processing the vectors [spark branch] - Key: HIVE-8180 URL: https://issues.apache.org/jira/browse/HIVE-8180 Project: Hive Issue Type: Bug Components: Spark Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Labels: Spark-M1 Attachments: HIVE-8180-spark.patch, HIVE-8180.1-spark.patch, HIVE-8180.2-spark.patch, HIVE-8180.3-spark.patch Update SparkReduceRecordHandler for processing the vectors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8180) Update SparkReduceRecordHandler for processing the vectors [spark branch]
[ https://issues.apache.org/jira/browse/HIVE-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153437#comment-14153437 ] Xuefu Zhang commented on HIVE-8180: --- +1 Update SparkReduceRecordHandler for processing the vectors [spark branch] - Key: HIVE-8180 URL: https://issues.apache.org/jira/browse/HIVE-8180 Project: Hive Issue Type: Bug Components: Spark Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Labels: Spark-M1 Attachments: HIVE-8180-spark.patch, HIVE-8180.1-spark.patch, HIVE-8180.2-spark.patch, HIVE-8180.3-spark.patch Update SparkReduceRecordHandler for processing the vectors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8263) CBO : TPC-DS Q64 is item is joined last with store_sales while it should be first as it is the most selective
[ https://issues.apache.org/jira/browse/HIVE-8263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153446#comment-14153446 ] Harish Butani commented on HIVE-8263: - Failure in 'groupby_bigdata' is not related to this patch. CBO : TPC-DS Q64 is item is joined last with store_sales while it should be first as it is the most selective - Key: HIVE-8263 URL: https://issues.apache.org/jira/browse/HIVE-8263 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Harish Butani Fix For: 0.14.0 Attachments: HIVE-8263.1.patch, Q64_cbo_on_explain_log.txt.zip Plan for TPC-DS Q64 shows that item is joined last with store_sales while store_sales x item is the most selective join in the plan. Interestingly predicate push down is applied on item but item comes so late in the join which most likely means that calculation of the join selectivity gave too high of a number of it was never considered. This is a subset of the logical plan showing that item was joined very last {code} HiveProjectRel(_o__col0=[$0], _o__col1=[$2], _o__col2=[$3], _o__col3=[$4], _o__col4=[$5], _o__col5=[$6], _o__col6=[$7], _o__col7=[$8], _o__col8=[$9], _o__col9=[$10], _o__col10=[$11], _o__col11=[$12], _o__col12=[$13], _o__col13=[$14], _o__col14=[$15], _o__col15=[$16], _o__col16=[$22], _o__col17=[$23], _o__col18=[$24], _o__col19=[$20], _o__col20=[$21]): rowcount = 1.0, cumulative cost = {1.1593403796322412E9 rows, 0.0 cpu, 0.0 io}, id = 990 HiveFilterRel(condition=[=($21, $13)]): rowcount = 1.0, cumulative cost = {1.1593403796322412E9 rows, 0.0 cpu, 0.0 io}, id = 988 HiveProjectRel(_o__col0=[$0], _o__col1=[$1], _o__col2=[$2], _o__col3=[$3], _o__col4=[$4], _o__col5=[$5], _o__col6=[$6], _o__col7=[$7], _o__col8=[$8], _o__col9=[$9], _o__col10=[$10], _o__col11=[$11], _o__col12=[$12], _o__col15=[$13], _o__col16=[$14], _o__col17=[$15], _o__col18=[$16], _o__col13=[$17], _o__col20=[$18], _o__col30=[$19], _o__col120=[$20], _o__col150=[$21], _o__col160=[$22], _o__col170=[$23], _o__col180=[$24]): rowcount = 1.0, cumulative cost = {1.1593403796322412E9 rows, 0.0 cpu, 0.0 io}, id = 3571 HiveJoinRel(condition=[AND(AND(=($1, $17), =($2, $18)), =($3, $19))], joinType=[inner]): rowcount = 1.0, cumulative cost = {1.1593403796322412E9 rows, 0.0 cpu, 0.0 io}, id = 3566 HiveProjectRel(_o__col0=[$0], _o__col1=[$1], _o__col2=[$2], _o__col3=[$3], _o__col4=[$4], _o__col5=[$5], _o__col6=[$6], _o__col7=[$7], _o__col8=[$8], _o__col9=[$9], _o__col10=[$10], _o__col11=[$11], _o__col12=[$12], _o__col15=[$15], _o__col16=[$16], _o__col17=[$17], _o__col18=[$18]): rowcount = 1.0, cumulative cost = {1.1593403776322412E9 rows, 0.0 cpu, 0.0 io}, id = 890 HiveFilterRel(condition=[=($12, 2000)]): rowcount = 1.0, cumulative cost = {1.1593403776322412E9 rows, 0.0 cpu, 0.0 io}, id = 888 HiveAggregateRel(group=[{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14}], agg#0=[count()], agg#1=[sum($15)], agg#2=[sum($16)], agg#3=[sum($17)]): rowcount = 1.0, cumulative cost = {1.1593403776322412E9 rows, 0.0 cpu, 0.0 io}, id = 886 HiveProjectRel($f0=[$53], $f1=[$50], $f2=[$27], $f3=[$28], $f4=[$39], $f5=[$40], $f6=[$41], $f7=[$42], $f8=[$44], $f9=[$45], $f10=[$46], $f11=[$47], $f12=[$21], $f13=[$23], $f14=[$25], $f15=[$9], $f16=[$10], $f17=[$11]): rowcount = 1.0, cumulative cost = {1.1593403776322412E9 rows, 0.0 cpu, 0.0 io}, id = 884 HiveProjectRel(ss_sold_date_sk=[$17], ss_item_sk=[$18], ss_customer_sk=[$19], ss_cdemo_sk=[$20], ss_hdemo_sk=[$21], ss_addr_sk=[$22], ss_store_sk=[$23], ss_promo_sk=[$24], ss_ticket_number=[$25], ss_wholesale_cost=[$26], ss_list_price=[$27], ss_coupon_amt=[$28], sr_item_sk=[$29], sr_ticket_number=[$30], c_customer_sk=[$31], c_current_cdemo_sk=[$32], c_current_hdemo_sk=[$33], c_current_addr_sk=[$34], c_first_shipto_date_sk=[$35], c_first_sales_date_sk=[$36], d_date_sk=[$37], d_year=[$38], d_date_sk0=[$39], d_year0=[$40], d_date_sk1=[$41], d_year1=[$42], s_store_sk=[$43], s_store_name=[$44], s_zip=[$45], cd_demo_sk=[$46], cd_marital_status=[$47], cd_demo_sk0=[$48], cd_marital_status0=[$49], p_promo_sk=[$0], hd_demo_sk=[$15], hd_income_band_sk=[$16], hd_demo_sk0=[$13], hd_income_band_sk0=[$14], ca_address_sk=[$6], ca_street_number=[$7], ca_street_name=[$8], ca_city=[$9], ca_zip=[$10], ca_address_sk0=[$1], ca_street_number0=[$2], ca_street_name0=[$3], ca_city0=[$4], ca_zip0=[$5], ib_income_band_sk=[$12], ib_income_band_sk0=[$11], i_item_sk=[$51], i_current_price=[$52], i_color=[$53], i_product_name=[$54], _o__col0=[$50]):
[jira] [Updated] (HIVE-8250) Truncating table doesnt invalidate stats
[ https://issues.apache.org/jira/browse/HIVE-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8250: --- Status: Open (was: Patch Available) Truncating table doesnt invalidate stats Key: HIVE-8250 URL: https://issues.apache.org/jira/browse/HIVE-8250 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.13.1, 0.13.0 Reporter: Jagruti Varia Assignee: Ashutosh Chauhan Attachments: HIVE-8250.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8250) Truncating table doesnt invalidate stats
[ https://issues.apache.org/jira/browse/HIVE-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8250: --- Attachment: HIVE-8250.1.patch Updated .q.out for failed test. Truncating table doesnt invalidate stats Key: HIVE-8250 URL: https://issues.apache.org/jira/browse/HIVE-8250 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.13.0, 0.13.1 Reporter: Jagruti Varia Assignee: Ashutosh Chauhan Attachments: HIVE-8250.1.patch, HIVE-8250.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8250) Truncating table doesnt invalidate stats
[ https://issues.apache.org/jira/browse/HIVE-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8250: --- Status: Patch Available (was: Open) Truncating table doesnt invalidate stats Key: HIVE-8250 URL: https://issues.apache.org/jira/browse/HIVE-8250 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.13.1, 0.13.0 Reporter: Jagruti Varia Assignee: Ashutosh Chauhan Attachments: HIVE-8250.1.patch, HIVE-8250.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 26178: Truncating table doesnt invalidate stats
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26178/ --- Review request for hive and Prasanth_J. Bugs: HIVE-8250 https://issues.apache.org/jira/browse/HIVE-8250 Repository: hive-git Description --- Truncating table doesnt invalidate stats Diffs - metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java c95473c ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java dc00d66 ql/src/test/results/clientpositive/alter_numbuckets_partitioned_table_h23.q.out 5047b23 Diff: https://reviews.apache.org/r/26178/diff/ Testing --- Thanks, Ashutosh Chauhan
[jira] [Commented] (HIVE-8290) With DbTxnManager configured, all ORC tables forced to be transactional
[ https://issues.apache.org/jira/browse/HIVE-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153450#comment-14153450 ] Eugene Koifman commented on HIVE-8290: -- There is unused import hive_metastoreConstants. Also, could you add a comment on ACID_TABLE_PROPERTY, basically the equivalent of the the Description of this Jira ticket? This is minor, but would it make sense to move the constant to AcidInputFormat or some other more directly ACID related class? Otherwise, LGTM +1. With DbTxnManager configured, all ORC tables forced to be transactional --- Key: HIVE-8290 URL: https://issues.apache.org/jira/browse/HIVE-8290 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8290.2.patch, HIVE-8290.patch Currently, once a user configures DbTxnManager to the be transaction manager, all tables that use ORC are expected to be transactional. This means they all have to have buckets. This most likely won't be what users want. We need to add a specific mark to a table so that users can indicate it should be treated in a transactional way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8261) CBO : Predicate pushdown is removed by Optiq
[ https://issues.apache.org/jira/browse/HIVE-8261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Butani updated HIVE-8261: Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk. CBO : Predicate pushdown is removed by Optiq - Key: HIVE-8261 URL: https://issues.apache.org/jira/browse/HIVE-8261 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.14.0, 0.13.1 Reporter: Mostafa Mokhtar Assignee: Harish Butani Fix For: 0.14.0 Attachments: HIVE-8261.1.patch Plan for TPC-DS Q64 wasn't optimal upon looking at the logical plan I realized that predicate pushdown is not applied on date_dim d1. Interestingly before optiq we have the predicate pushed : {code} HiveFilterRel(condition=[=($5, $1)]) HiveJoinRel(condition=[=($3, $6)], joinType=[inner]) HiveProjectRel(_o__col0=[$0], _o__col1=[$2], _o__col2=[$3], _o__col3=[$1]) HiveFilterRel(condition=[=($0, 2000)]) HiveAggregateRel(group=[{0, 1}], agg#0=[count()], agg#1=[sum($2)]) HiveProjectRel($f0=[$4], $f1=[$5], $f2=[$2]) HiveJoinRel(condition=[=($1, $8)], joinType=[inner]) HiveJoinRel(condition=[=($1, $5)], joinType=[inner]) HiveJoinRel(condition=[=($0, $3)], joinType=[inner]) HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$2], ss_wholesale_cost=[$11]) HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.store_sales]]) HiveProjectRel(d_date_sk=[$0], d_year=[$6]) HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.date_dim]]) HiveFilterRel(condition=[AND(in($2, 'maroon', 'burnished', 'dim', 'steel', 'navajo', 'chocolate'), between(false, $1, 35, +(35, 10)), between(false, $1, +(35, 1), +(35, 15)))]) HiveProjectRel(i_item_sk=[$0], i_current_price=[$5], i_color=[$17]) HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.item]]) HiveProjectRel(_o__col0=[$0]) HiveAggregateRel(group=[{0}]) HiveProjectRel($f0=[$0]) HiveJoinRel(condition=[AND(=($0, $2), =($1, $3))], joinType=[inner]) HiveProjectRel(cs_item_sk=[$15], cs_order_number=[$17]) HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.catalog_sales]]) HiveProjectRel(cr_item_sk=[$2], cr_order_number=[$16]) HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.catalog_returns]]) HiveProjectRel(_o__col0=[$0], _o__col1=[$2], _o__col3=[$1]) HiveFilterRel(condition=[=($0, +(2000, 1))]) HiveAggregateRel(group=[{0, 1}], agg#0=[count()]) HiveProjectRel($f0=[$4], $f1=[$5], $f2=[$2]) HiveJoinRel(condition=[=($1, $8)], joinType=[inner]) HiveJoinRel(condition=[=($1, $5)], joinType=[inner]) HiveJoinRel(condition=[=($0, $3)], joinType=[inner]) HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$2], ss_wholesale_cost=[$11]) HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.store_sales]]) HiveProjectRel(d_date_sk=[$0], d_year=[$6]) HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.date_dim]]) HiveFilterRel(condition=[AND(in($2, 'maroon', 'burnished', 'dim', 'steel', 'navajo', 'chocolate'), between(false, $1, 35, +(35, 10)), between(false, $1, +(35, 1), +(35, 15)))]) HiveProjectRel(i_item_sk=[$0], i_current_price=[$5], i_color=[$17]) HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.item]]) HiveProjectRel(_o__col0=[$0]) HiveAggregateRel(group=[{0}]) HiveProjectRel($f0=[$0]) HiveJoinRel(condition=[AND(=($0, $2), =($1, $3))], joinType=[inner]) HiveProjectRel(cs_item_sk=[$15], cs_order_number=[$17]) HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.catalog_sales]]) HiveProjectRel(cr_item_sk=[$2], cr_order_number=[$16]) HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.catalog_returns]]) {code} While after Optiq the filter on date_dim gets pulled up the plan {code} HiveFilterRel(condition=[=($5, $1)]): rowcount = 1.0, cumulative cost = {5.50188454E8 rows, 0.0 cpu, 0.0 io}, id = 6895 HiveProjectRel(_o__col0=[$0], _o__col1=[$1], _o__col2=[$2], _o__col3=[$3], _o__col00=[$4], _o__col10=[$5], _o__col30=[$6]): rowcount =
[jira] [Commented] (HIVE-8151) Dynamic partition sort optimization inserts record wrongly to partition when used with GroupBy
[ https://issues.apache.org/jira/browse/HIVE-8151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153456#comment-14153456 ] Hive QA commented on HIVE-8151: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12671975/HIVE-8151.7.patch {color:green}SUCCESS:{color} +1 6374 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1055/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1055/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1055/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12671975 Dynamic partition sort optimization inserts record wrongly to partition when used with GroupBy -- Key: HIVE-8151 URL: https://issues.apache.org/jira/browse/HIVE-8151 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Reporter: Prasanth J Assignee: Prasanth J Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8151.1.patch, HIVE-8151.2.patch, HIVE-8151.3.patch, HIVE-8151.4.patch, HIVE-8151.5.patch, HIVE-8151.6.patch, HIVE-8151.7.patch HIVE-6455 added dynamic partition sort optimization. It added startGroup() method to FileSink operator to look for changes in reduce key for creating partition directories. This method however is not reliable as the key called with startGroup() is different from the key called with processOp(). startGroup() is called with newly changed key whereas processOp() is called with previously aggregated key. This will result in processOp() writing the last row of previous group as the first row of next group. This happens only when used with group by operator. The fix is to not rely on startGroup() and do the partition directory creation in processOp() itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8182) beeline fails when executing multiple-line queries with trailing spaces
[ https://issues.apache.org/jira/browse/HIVE-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-8182: --- Resolution: Fixed Status: Resolved (was: Patch Available) Thank you for the contribution Sergo! I have committed this to trunk! beeline fails when executing multiple-line queries with trailing spaces --- Key: HIVE-8182 URL: https://issues.apache.org/jira/browse/HIVE-8182 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 0.13.1 Reporter: Yongzhi Chen Assignee: Sergio Peña Fix For: 0.14.0 Attachments: HIVE-8181.1.patch, HIVE-8182.1.patch, HIVE-8182.2.patch As title indicates, when executing a multi-line query with trailing spaces, beeline reports syntax error: Error: Error while compiling statement: FAILED: ParseException line 1:76 extraneous input ';' expecting EOF near 'EOF' (state=42000,code=4) If put this query in one single line, beeline succeeds to execute it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6148) Support arbitrary structs stored in HBase
[ https://issues.apache.org/jira/browse/HIVE-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-6148: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Thank you very much Swarnim! I have committed this to trunk! Support arbitrary structs stored in HBase - Key: HIVE-6148 URL: https://issues.apache.org/jira/browse/HIVE-6148 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.12.0 Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Fix For: 0.14.0 Attachments: HIVE-6148.1.patch.txt, HIVE-6148.2.patch.txt, HIVE-6148.3.patch.txt We should add support to be able to query arbitrary structs stored in HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8182) beeline fails when executing multiple-line queries with trailing spaces
[ https://issues.apache.org/jira/browse/HIVE-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-8182: --- Fix Version/s: (was: 0.14.0) 0.15.0 beeline fails when executing multiple-line queries with trailing spaces --- Key: HIVE-8182 URL: https://issues.apache.org/jira/browse/HIVE-8182 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 0.13.1 Reporter: Yongzhi Chen Assignee: Sergio Peña Fix For: 0.15.0 Attachments: HIVE-8181.1.patch, HIVE-8182.1.patch, HIVE-8182.2.patch As title indicates, when executing a multi-line query with trailing spaces, beeline reports syntax error: Error: Error while compiling statement: FAILED: ParseException line 1:76 extraneous input ';' expecting EOF near 'EOF' (state=42000,code=4) If put this query in one single line, beeline succeeds to execute it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6148) Support arbitrary structs stored in HBase
[ https://issues.apache.org/jira/browse/HIVE-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-6148: --- Fix Version/s: (was: 0.14.0) 0.15.0 Support arbitrary structs stored in HBase - Key: HIVE-6148 URL: https://issues.apache.org/jira/browse/HIVE-6148 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.12.0 Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Fix For: 0.15.0 Attachments: HIVE-6148.1.patch.txt, HIVE-6148.2.patch.txt, HIVE-6148.3.patch.txt We should add support to be able to query arbitrary structs stored in HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8262) Create CacheTran that transforms the input RDD by caching it [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao updated HIVE-8262: --- Attachment: HIVE-8262.1-spark.patch Create CacheTran that transforms the input RDD by caching it [Spark Branch] --- Key: HIVE-8262 URL: https://issues.apache.org/jira/browse/HIVE-8262 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chao Attachments: HIVE-8262.1-spark.patch In a few cases we need to cache a RDD to avoid recompute it for better performance. However, caching a map input RDD is different from caching a regular RDD due to SPARK-3693. The way to cache a Hadoop RDD, which is the input to MapWork, is to cache, the result RDD that is transformed from the original Hadoop RDD by applying a map function, in which key, value pairs are copied. To cache intermediate RDDs, such as that from a shuffle, is just calling .cache(). This task is to create a CacheTran to capture this, which can be used to plug in Spark Plan when caching is desirable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8262) Create CacheTran that transforms the input RDD by caching it [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao updated HIVE-8262: --- Status: Patch Available (was: Open) Create CacheTran that transforms the input RDD by caching it [Spark Branch] --- Key: HIVE-8262 URL: https://issues.apache.org/jira/browse/HIVE-8262 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chao Attachments: HIVE-8262.1-spark.patch In a few cases we need to cache a RDD to avoid recompute it for better performance. However, caching a map input RDD is different from caching a regular RDD due to SPARK-3693. The way to cache a Hadoop RDD, which is the input to MapWork, is to cache, the result RDD that is transformed from the original Hadoop RDD by applying a map function, in which key, value pairs are copied. To cache intermediate RDDs, such as that from a shuffle, is just calling .cache(). This task is to create a CacheTran to capture this, which can be used to plug in Spark Plan when caching is desirable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8290) With DbTxnManager configured, all ORC tables forced to be transactional
[ https://issues.apache.org/jira/browse/HIVE-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153473#comment-14153473 ] Alan Gates commented on HIVE-8290: -- bq. This is minor, but would it make sense to move the constant to AcidInputFormat or some other more directly ACID related class? I didn't see a general place to put table parameter keys. According to the Hive jedi master (Ashutosh), there is no central place for them. I agree it makes sense to collect ACID related ones into one place. In addition to ACID_TABLE_PROPERTY there's NO_AUTO_COMPACT in Initiator. I'll file a separate ticket to collect those together, and then the patch to do that will be trivial. With DbTxnManager configured, all ORC tables forced to be transactional --- Key: HIVE-8290 URL: https://issues.apache.org/jira/browse/HIVE-8290 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8290.2.patch, HIVE-8290.patch Currently, once a user configures DbTxnManager to the be transaction manager, all tables that use ORC are expected to be transactional. This means they all have to have buckets. This most likely won't be what users want. We need to add a specific mark to a table so that users can indicate it should be treated in a transactional way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8308) Acid related table properties should be defined in one place
Alan Gates created HIVE-8308: Summary: Acid related table properties should be defined in one place Key: HIVE-8308 URL: https://issues.apache.org/jira/browse/HIVE-8308 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Minor Currently SemanticAnalyzer.ACID_TABLE_PROPERTY and Initiator.NO_AUTO_COMPACT are defined in the classes that use them. Since these are both potential table properties and they both are ACID related it makes sense to collect them together. There's no central place for Table properties at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8196) Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance
[ https://issues.apache.org/jira/browse/HIVE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153477#comment-14153477 ] Prasanth J commented on HIVE-8196: -- The last test failures are unrelated. Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance - Key: HIVE-8196 URL: https://issues.apache.org/jira/browse/HIVE-8196 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Prasanth J Priority: Blocker Labels: performance Fix For: 0.14.0 Attachments: HIVE-8196.1.patch, HIVE-8196.2.patch, HIVE-8196.3.patch, HIVE-8196.4.patch, HIVE-8196.5.patch, HIVE-8196.6.patch To make the best out of dynamic partition pruning joins should be on the partitioning columns which results in dynamically pruning the partitions from the fact table based on the qualifying column keys from the dimension table, this type of joins negatively effects on cardinality estimates with fetch column stats enabled. Currently we don't have statistics for partition columns and as a result NDV is set to row count, doing that negatively affects the estimated join selectivity from the join. Workaround is to capture statistics for partition columns or use number of partitions incase dynamic partitioning is used. In StatsUtils.getColStatisticsFromExpression is where count distincts gets set to row count {code} if (encd.getIsPartitionColOrVirtualCol()) { // vitual columns colType = encd.getTypeInfo().getTypeName(); countDistincts = numRows; oi = encd.getWritableObjectInspector(); {code} Query used to repro the issue : {code} set hive.stats.fetch.column.stats=true; set hive.tez.dynamic.partition.pruning=true; explain select d_date from store_sales, date_dim where store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 1998; {code} Plan {code} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 1 - Map 2 (BROADCAST_EDGE) DagName: mmokhtar_20140919180404_945d29f5-d041-4420-9666-1c5d64fa6540:8 Vertices: Map 1 Map Operator Tree: TableScan alias: store_sales filterExpr: ss_sold_date_sk is not null (type: boolean) Statistics: Num rows: 550076554 Data size: 47370018816 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {ss_sold_date_sk} 1 {d_date_sk} {d_date} keys: 0 ss_sold_date_sk (type: int) 1 d_date_sk (type: int) outputColumnNames: _col22, _col26, _col28 input vertices: 1 Map 2 Statistics: Num rows: 652 Data size: 66504 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (_col22 = _col26) (type: boolean) Statistics: Num rows: 326 Data size: 33252 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col28 (type: string) outputColumnNames: _col0 Statistics: Num rows: 326 Data size: 30644 Basic stats: COMPLETE Column stats: COMPLETE File Output Operator compressed: false Statistics: Num rows: 326 Data size: 30644 Basic stats: COMPLETE Column stats: COMPLETE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Execution mode: vectorized Map 2 Map Operator Tree: TableScan alias: date_dim filterExpr: (d_date_sk is not null and (d_year = 1998)) (type: boolean) Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (d_date_sk is not null and
Review Request 26181: HIVE-8262 - Create CacheTran that transforms the input RDD by caching it [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26181/ --- Review request for hive and Xuefu Zhang. Bugs: HIVE-8262 https://issues.apache.org/jira/browse/HIVE-8262 Repository: hive-git Description --- In a few cases we need to cache a RDD to avoid recompute it for better performance. However, caching a map input RDD is different from caching a regular RDD due to SPARK-3693. The way to cache a Hadoop RDD, which is the input to MapWork, is to cache, the result RDD that is transformed from the original Hadoop RDD by applying a map function, in which key, value pairs are copied. To cache intermediate RDDs, such as that from a shuffle, is just calling .cache(). This task is to create a CacheTran to capture this, which can be used to plug in Spark Plan when caching is desirable. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/CachedTran.java PRE-CREATION Diff: https://reviews.apache.org/r/26181/diff/ Testing --- Thanks, Chao Sun
[jira] [Updated] (HIVE-8196) Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance
[ https://issues.apache.org/jira/browse/HIVE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8196: - Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk and branch 0.14. Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance - Key: HIVE-8196 URL: https://issues.apache.org/jira/browse/HIVE-8196 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Prasanth J Priority: Blocker Labels: performance Fix For: 0.14.0 Attachments: HIVE-8196.1.patch, HIVE-8196.2.patch, HIVE-8196.3.patch, HIVE-8196.4.patch, HIVE-8196.5.patch, HIVE-8196.6.patch To make the best out of dynamic partition pruning joins should be on the partitioning columns which results in dynamically pruning the partitions from the fact table based on the qualifying column keys from the dimension table, this type of joins negatively effects on cardinality estimates with fetch column stats enabled. Currently we don't have statistics for partition columns and as a result NDV is set to row count, doing that negatively affects the estimated join selectivity from the join. Workaround is to capture statistics for partition columns or use number of partitions incase dynamic partitioning is used. In StatsUtils.getColStatisticsFromExpression is where count distincts gets set to row count {code} if (encd.getIsPartitionColOrVirtualCol()) { // vitual columns colType = encd.getTypeInfo().getTypeName(); countDistincts = numRows; oi = encd.getWritableObjectInspector(); {code} Query used to repro the issue : {code} set hive.stats.fetch.column.stats=true; set hive.tez.dynamic.partition.pruning=true; explain select d_date from store_sales, date_dim where store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 1998; {code} Plan {code} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 1 - Map 2 (BROADCAST_EDGE) DagName: mmokhtar_20140919180404_945d29f5-d041-4420-9666-1c5d64fa6540:8 Vertices: Map 1 Map Operator Tree: TableScan alias: store_sales filterExpr: ss_sold_date_sk is not null (type: boolean) Statistics: Num rows: 550076554 Data size: 47370018816 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {ss_sold_date_sk} 1 {d_date_sk} {d_date} keys: 0 ss_sold_date_sk (type: int) 1 d_date_sk (type: int) outputColumnNames: _col22, _col26, _col28 input vertices: 1 Map 2 Statistics: Num rows: 652 Data size: 66504 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (_col22 = _col26) (type: boolean) Statistics: Num rows: 326 Data size: 33252 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col28 (type: string) outputColumnNames: _col0 Statistics: Num rows: 326 Data size: 30644 Basic stats: COMPLETE Column stats: COMPLETE File Output Operator compressed: false Statistics: Num rows: 326 Data size: 30644 Basic stats: COMPLETE Column stats: COMPLETE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Execution mode: vectorized Map 2 Map Operator Tree: TableScan alias: date_dim filterExpr: (d_date_sk is not null and (d_year = 1998)) (type: boolean) Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (d_date_sk is not
[jira] [Updated] (HIVE-7939) Refactoring GraphTran to make it conform to SparkTran interface. [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao updated HIVE-7939: --- Resolution: Won't Fix Status: Resolved (was: Patch Available) No longer needed since {{GraphTran}} is removed. Refactoring GraphTran to make it conform to SparkTran interface. [Spark Branch] --- Key: HIVE-7939 URL: https://issues.apache.org/jira/browse/HIVE-7939 Project: Hive Issue Type: Task Components: Spark Reporter: Chao Assignee: Chao Attachments: HIVE-7939.1-spark.patch Currently, {{GraphTran}} uses its own {{execute}} method, which executes the operator plan in a DFS fashion, and does something special for union. The goal for this JIRA is to do some refactoring and make it conform to the {{SparkTran}} interface. The initial idea is to use varargs for {{SparkTran::transform}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-7525) Research to find out if it's possible to submit Spark jobs concurrently using shared SparkContext [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao resolved HIVE-7525. Resolution: Fixed Research to find out if it's possible to submit Spark jobs concurrently using shared SparkContext [Spark Branch] Key: HIVE-7525 URL: https://issues.apache.org/jira/browse/HIVE-7525 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chao Refer to HIVE-7503 and SPARK-2688. Find out if it's possible to submit multiple spark jobs concurrently using a shared SparkContext. SparkClient's code can be manipulated for this test. Here is the process: 1. Transform rdd1 to rdd2 using some transformation. 2. call rdd2.cache() to persist it in memory. 3. in two threads, calling accordingly: Thread a. rdd2 - rdd3; rdd3.foreach() Thread b. rdd2 - rdd4; rdd4.foreach() It would be nice to find out monitoring and error reporting aspects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-8276) Separate shuffle from ReduceTran and so create ShuffleTran [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao reassigned HIVE-8276: -- Assignee: Chao Separate shuffle from ReduceTran and so create ShuffleTran [Spark Branch] - Key: HIVE-8276 URL: https://issues.apache.org/jira/browse/HIVE-8276 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chao Currently ShuffleTran captures both shuffle and reduce side processing. Per HIVE-8118, sometimes the output RDD from shuffle needs to be cached for better performance. Thus, it makes sense to separate shuffle from Reduce and create ShuffleTran class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7857) Hive query fails after Tez session times out
[ https://issues.apache.org/jira/browse/HIVE-7857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153513#comment-14153513 ] Gunther Hagleitner commented on HIVE-7857: -- +1. [~vikram.dixit] hive-14? Hive query fails after Tez session times out Key: HIVE-7857 URL: https://issues.apache.org/jira/browse/HIVE-7857 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 0.14.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Priority: Critical Fix For: 0.14.0 Attachments: HIVE-7857.1.patch, HIVE-7857.2.patch, HIVE-7857.3.patch Originally reported by [~deepesh] Steps to reproduce: Open the Hive CLI, ensure that HIVE_AUX_JARS_PATH has hcatalog-core.jar in the path. Keep it idle for more than 5 minutes (this is the default tez session timeout). Essentially Tez session should time out. Run a Hive on Tez query, the query fails. Here is a sample CLI session: {noformat} hive select from_unixtime(unix_timestamp(), dd-MMM-) from vectortab10korc limit 1; Query ID = hrt_qa_20140626002525_6e964079-4031-406b-85ed-cda9c65dca22 Total jobs = 1 Launching Job 1 out of 1 Tez session was closed. Reopening... Session re-established. Status: Running (application id: application_1403688364015_1930) Map 1: -/- Map 1: 0/1 Map 1: 0/1 Map 1: 0/1 Map 1: 0/1 Map 1: 0/1 Status: Failed Vertex failed, vertexName=Map 1, vertexId=vertex_1403688364015_1930_1_00, diagnostics=[Task failed, taskId=task_1403688364015_1930_1_00_00, diagnostics=[AttemptID:attempt_1403688364015_1930_1_00_00_0 Info:Container container_1403688364015_1930_01_02 COMPLETED with diagnostics set to [Resource hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar changed on src filesystem (expected 1403741969169, was 1403742347351 ], AttemptID:attempt_1403688364015_1930_1_00_00_1 Info:Container container_1403688364015_1930_01_03 COMPLETED with diagnostics set to [Resource hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar changed on src filesystem (expected 1403741969169, was 1403742347351 ], AttemptID:attempt_1403688364015_1930_1_00_00_2 Info:Container container_1403688364015_1930_01_04 COMPLETED with diagnostics set to [Resource hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar changed on src filesystem (expected 1403741969169, was 1403742347351 ], AttemptID:attempt_1403688364015_1930_1_00_00_3 Info:Container container_1403688364015_1930_01_05 COMPLETED with diagnostics set to [Resource hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar changed on src filesystem (expected 1403741969169, was 1403742347351 ]], Vertex failed as one or more tasks failed. failedTasks:1] DAG failed due to vertex failure. failedVertices:1 killedVertices:0 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7857) Hive query fails after Tez session times out
[ https://issues.apache.org/jira/browse/HIVE-7857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153518#comment-14153518 ] Vikram Dixit K commented on HIVE-7857: -- Yes. Will be required in 0.14 as well. Hive query fails after Tez session times out Key: HIVE-7857 URL: https://issues.apache.org/jira/browse/HIVE-7857 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 0.14.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Priority: Critical Fix For: 0.14.0 Attachments: HIVE-7857.1.patch, HIVE-7857.2.patch, HIVE-7857.3.patch Originally reported by [~deepesh] Steps to reproduce: Open the Hive CLI, ensure that HIVE_AUX_JARS_PATH has hcatalog-core.jar in the path. Keep it idle for more than 5 minutes (this is the default tez session timeout). Essentially Tez session should time out. Run a Hive on Tez query, the query fails. Here is a sample CLI session: {noformat} hive select from_unixtime(unix_timestamp(), dd-MMM-) from vectortab10korc limit 1; Query ID = hrt_qa_20140626002525_6e964079-4031-406b-85ed-cda9c65dca22 Total jobs = 1 Launching Job 1 out of 1 Tez session was closed. Reopening... Session re-established. Status: Running (application id: application_1403688364015_1930) Map 1: -/- Map 1: 0/1 Map 1: 0/1 Map 1: 0/1 Map 1: 0/1 Map 1: 0/1 Status: Failed Vertex failed, vertexName=Map 1, vertexId=vertex_1403688364015_1930_1_00, diagnostics=[Task failed, taskId=task_1403688364015_1930_1_00_00, diagnostics=[AttemptID:attempt_1403688364015_1930_1_00_00_0 Info:Container container_1403688364015_1930_01_02 COMPLETED with diagnostics set to [Resource hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar changed on src filesystem (expected 1403741969169, was 1403742347351 ], AttemptID:attempt_1403688364015_1930_1_00_00_1 Info:Container container_1403688364015_1930_01_03 COMPLETED with diagnostics set to [Resource hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar changed on src filesystem (expected 1403741969169, was 1403742347351 ], AttemptID:attempt_1403688364015_1930_1_00_00_2 Info:Container container_1403688364015_1930_01_04 COMPLETED with diagnostics set to [Resource hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar changed on src filesystem (expected 1403741969169, was 1403742347351 ], AttemptID:attempt_1403688364015_1930_1_00_00_3 Info:Container container_1403688364015_1930_01_05 COMPLETED with diagnostics set to [Resource hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar changed on src filesystem (expected 1403741969169, was 1403742347351 ]], Vertex failed as one or more tasks failed. failedTasks:1] DAG failed due to vertex failure. failedVertices:1 killedVertices:0 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8151) Dynamic partition sort optimization inserts record wrongly to partition when used with GroupBy
[ https://issues.apache.org/jira/browse/HIVE-8151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8151: - Attachment: HIVE-8151.8.patch Rebase patch after HIVE-8196 commit. Dynamic partition sort optimization inserts record wrongly to partition when used with GroupBy -- Key: HIVE-8151 URL: https://issues.apache.org/jira/browse/HIVE-8151 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Reporter: Prasanth J Assignee: Prasanth J Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8151.1.patch, HIVE-8151.2.patch, HIVE-8151.3.patch, HIVE-8151.4.patch, HIVE-8151.5.patch, HIVE-8151.6.patch, HIVE-8151.7.patch, HIVE-8151.8.patch HIVE-6455 added dynamic partition sort optimization. It added startGroup() method to FileSink operator to look for changes in reduce key for creating partition directories. This method however is not reliable as the key called with startGroup() is different from the key called with processOp(). startGroup() is called with newly changed key whereas processOp() is called with previously aggregated key. This will result in processOp() writing the last row of previous group as the first row of next group. This happens only when used with group by operator. The fix is to not rely on startGroup() and do the partition directory creation in processOp() itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7857) Hive query fails after Tez session times out
[ https://issues.apache.org/jira/browse/HIVE-7857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-7857: - Resolution: Fixed Status: Resolved (was: Patch Available) Committed to both 0.14 and trunk. Hive query fails after Tez session times out Key: HIVE-7857 URL: https://issues.apache.org/jira/browse/HIVE-7857 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 0.14.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Priority: Critical Fix For: 0.14.0 Attachments: HIVE-7857.1.patch, HIVE-7857.2.patch, HIVE-7857.3.patch Originally reported by [~deepesh] Steps to reproduce: Open the Hive CLI, ensure that HIVE_AUX_JARS_PATH has hcatalog-core.jar in the path. Keep it idle for more than 5 minutes (this is the default tez session timeout). Essentially Tez session should time out. Run a Hive on Tez query, the query fails. Here is a sample CLI session: {noformat} hive select from_unixtime(unix_timestamp(), dd-MMM-) from vectortab10korc limit 1; Query ID = hrt_qa_20140626002525_6e964079-4031-406b-85ed-cda9c65dca22 Total jobs = 1 Launching Job 1 out of 1 Tez session was closed. Reopening... Session re-established. Status: Running (application id: application_1403688364015_1930) Map 1: -/- Map 1: 0/1 Map 1: 0/1 Map 1: 0/1 Map 1: 0/1 Map 1: 0/1 Status: Failed Vertex failed, vertexName=Map 1, vertexId=vertex_1403688364015_1930_1_00, diagnostics=[Task failed, taskId=task_1403688364015_1930_1_00_00, diagnostics=[AttemptID:attempt_1403688364015_1930_1_00_00_0 Info:Container container_1403688364015_1930_01_02 COMPLETED with diagnostics set to [Resource hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar changed on src filesystem (expected 1403741969169, was 1403742347351 ], AttemptID:attempt_1403688364015_1930_1_00_00_1 Info:Container container_1403688364015_1930_01_03 COMPLETED with diagnostics set to [Resource hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar changed on src filesystem (expected 1403741969169, was 1403742347351 ], AttemptID:attempt_1403688364015_1930_1_00_00_2 Info:Container container_1403688364015_1930_01_04 COMPLETED with diagnostics set to [Resource hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar changed on src filesystem (expected 1403741969169, was 1403742347351 ], AttemptID:attempt_1403688364015_1930_1_00_00_3 Info:Container container_1403688364015_1930_01_05 COMPLETED with diagnostics set to [Resource hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar changed on src filesystem (expected 1403741969169, was 1403742347351 ]], Vertex failed as one or more tasks failed. failedTasks:1] DAG failed due to vertex failure. failedVertices:1 killedVertices:0 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8180) Update SparkReduceRecordHandler for processing the vectors [spark branch]
[ https://issues.apache.org/jira/browse/HIVE-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153534#comment-14153534 ] Hive QA commented on HIVE-8180: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12672070/HIVE-8180.3-spark.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6511 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/182/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/182/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-182/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12672070 Update SparkReduceRecordHandler for processing the vectors [spark branch] - Key: HIVE-8180 URL: https://issues.apache.org/jira/browse/HIVE-8180 Project: Hive Issue Type: Bug Components: Spark Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Labels: Spark-M1 Attachments: HIVE-8180-spark.patch, HIVE-8180.1-spark.patch, HIVE-8180.2-spark.patch, HIVE-8180.3-spark.patch Update SparkReduceRecordHandler for processing the vectors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8309) CBO: Fix OB by removing constraining DT, Use external names for col Aliases, Remove unnecessary Selects, Make DT Name counter query specific
Laljo John Pullokkaran created HIVE-8309: Summary: CBO: Fix OB by removing constraining DT, Use external names for col Aliases, Remove unnecessary Selects, Make DT Name counter query specific Key: HIVE-8309 URL: https://issues.apache.org/jira/browse/HIVE-8309 Project: Hive Issue Type: Sub-task Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8262) Create CacheTran that transforms the input RDD by caching it [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153537#comment-14153537 ] Xuefu Zhang commented on HIVE-8262: --- Let's put this one on hold until we find out if it's simpler just to put a caching flag in other SparkTran subclasses. Create CacheTran that transforms the input RDD by caching it [Spark Branch] --- Key: HIVE-8262 URL: https://issues.apache.org/jira/browse/HIVE-8262 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chao Attachments: HIVE-8262.1-spark.patch In a few cases we need to cache a RDD to avoid recompute it for better performance. However, caching a map input RDD is different from caching a regular RDD due to SPARK-3693. The way to cache a Hadoop RDD, which is the input to MapWork, is to cache, the result RDD that is transformed from the original Hadoop RDD by applying a map function, in which key, value pairs are copied. To cache intermediate RDDs, such as that from a shuffle, is just calling .cache(). This task is to create a CacheTran to capture this, which can be used to plug in Spark Plan when caching is desirable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8310) RetryingHMSHandler is not used when kerberos auth enabled
Thejas M Nair created HIVE-8310: --- Summary: RetryingHMSHandler is not used when kerberos auth enabled Key: HIVE-8310 URL: https://issues.apache.org/jira/browse/HIVE-8310 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Priority: Blocker Fix For: 0.14.0 RetryingHMSHandler is not being used when kerberos auth enabled, after changes in HIVE-3255 . The changes in HIVE-4996 also removed the lower level retrying layer - RetryingRawStore. This means that in kerberos mode, retries are not done for database query failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8310) RetryingHMSHandler is not used when kerberos auth enabled
[ https://issues.apache.org/jira/browse/HIVE-8310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153546#comment-14153546 ] Thejas M Nair commented on HIVE-8310: - [~vikram.dixit] This will be very useful fix for hive 0.14, it will make metastore more resilient to database failures. It is a regression. RetryingHMSHandler is not used when kerberos auth enabled - Key: HIVE-8310 URL: https://issues.apache.org/jira/browse/HIVE-8310 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Priority: Blocker Fix For: 0.14.0 RetryingHMSHandler is not being used when kerberos auth enabled, after changes in HIVE-3255 . The changes in HIVE-4996 also removed the lower level retrying layer - RetryingRawStore. This means that in kerberos mode, retries are not done for database query failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8263) CBO : TPC-DS Q64 is item is joined last with store_sales while it should be first as it is the most selective
[ https://issues.apache.org/jira/browse/HIVE-8263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153548#comment-14153548 ] Ashutosh Chauhan commented on HIVE-8263: +1 [~vikram.dixit] It will be good to have this in 0.14 as well. CBO : TPC-DS Q64 is item is joined last with store_sales while it should be first as it is the most selective - Key: HIVE-8263 URL: https://issues.apache.org/jira/browse/HIVE-8263 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Harish Butani Fix For: 0.14.0 Attachments: HIVE-8263.1.patch, Q64_cbo_on_explain_log.txt.zip Plan for TPC-DS Q64 shows that item is joined last with store_sales while store_sales x item is the most selective join in the plan. Interestingly predicate push down is applied on item but item comes so late in the join which most likely means that calculation of the join selectivity gave too high of a number of it was never considered. This is a subset of the logical plan showing that item was joined very last {code} HiveProjectRel(_o__col0=[$0], _o__col1=[$2], _o__col2=[$3], _o__col3=[$4], _o__col4=[$5], _o__col5=[$6], _o__col6=[$7], _o__col7=[$8], _o__col8=[$9], _o__col9=[$10], _o__col10=[$11], _o__col11=[$12], _o__col12=[$13], _o__col13=[$14], _o__col14=[$15], _o__col15=[$16], _o__col16=[$22], _o__col17=[$23], _o__col18=[$24], _o__col19=[$20], _o__col20=[$21]): rowcount = 1.0, cumulative cost = {1.1593403796322412E9 rows, 0.0 cpu, 0.0 io}, id = 990 HiveFilterRel(condition=[=($21, $13)]): rowcount = 1.0, cumulative cost = {1.1593403796322412E9 rows, 0.0 cpu, 0.0 io}, id = 988 HiveProjectRel(_o__col0=[$0], _o__col1=[$1], _o__col2=[$2], _o__col3=[$3], _o__col4=[$4], _o__col5=[$5], _o__col6=[$6], _o__col7=[$7], _o__col8=[$8], _o__col9=[$9], _o__col10=[$10], _o__col11=[$11], _o__col12=[$12], _o__col15=[$13], _o__col16=[$14], _o__col17=[$15], _o__col18=[$16], _o__col13=[$17], _o__col20=[$18], _o__col30=[$19], _o__col120=[$20], _o__col150=[$21], _o__col160=[$22], _o__col170=[$23], _o__col180=[$24]): rowcount = 1.0, cumulative cost = {1.1593403796322412E9 rows, 0.0 cpu, 0.0 io}, id = 3571 HiveJoinRel(condition=[AND(AND(=($1, $17), =($2, $18)), =($3, $19))], joinType=[inner]): rowcount = 1.0, cumulative cost = {1.1593403796322412E9 rows, 0.0 cpu, 0.0 io}, id = 3566 HiveProjectRel(_o__col0=[$0], _o__col1=[$1], _o__col2=[$2], _o__col3=[$3], _o__col4=[$4], _o__col5=[$5], _o__col6=[$6], _o__col7=[$7], _o__col8=[$8], _o__col9=[$9], _o__col10=[$10], _o__col11=[$11], _o__col12=[$12], _o__col15=[$15], _o__col16=[$16], _o__col17=[$17], _o__col18=[$18]): rowcount = 1.0, cumulative cost = {1.1593403776322412E9 rows, 0.0 cpu, 0.0 io}, id = 890 HiveFilterRel(condition=[=($12, 2000)]): rowcount = 1.0, cumulative cost = {1.1593403776322412E9 rows, 0.0 cpu, 0.0 io}, id = 888 HiveAggregateRel(group=[{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14}], agg#0=[count()], agg#1=[sum($15)], agg#2=[sum($16)], agg#3=[sum($17)]): rowcount = 1.0, cumulative cost = {1.1593403776322412E9 rows, 0.0 cpu, 0.0 io}, id = 886 HiveProjectRel($f0=[$53], $f1=[$50], $f2=[$27], $f3=[$28], $f4=[$39], $f5=[$40], $f6=[$41], $f7=[$42], $f8=[$44], $f9=[$45], $f10=[$46], $f11=[$47], $f12=[$21], $f13=[$23], $f14=[$25], $f15=[$9], $f16=[$10], $f17=[$11]): rowcount = 1.0, cumulative cost = {1.1593403776322412E9 rows, 0.0 cpu, 0.0 io}, id = 884 HiveProjectRel(ss_sold_date_sk=[$17], ss_item_sk=[$18], ss_customer_sk=[$19], ss_cdemo_sk=[$20], ss_hdemo_sk=[$21], ss_addr_sk=[$22], ss_store_sk=[$23], ss_promo_sk=[$24], ss_ticket_number=[$25], ss_wholesale_cost=[$26], ss_list_price=[$27], ss_coupon_amt=[$28], sr_item_sk=[$29], sr_ticket_number=[$30], c_customer_sk=[$31], c_current_cdemo_sk=[$32], c_current_hdemo_sk=[$33], c_current_addr_sk=[$34], c_first_shipto_date_sk=[$35], c_first_sales_date_sk=[$36], d_date_sk=[$37], d_year=[$38], d_date_sk0=[$39], d_year0=[$40], d_date_sk1=[$41], d_year1=[$42], s_store_sk=[$43], s_store_name=[$44], s_zip=[$45], cd_demo_sk=[$46], cd_marital_status=[$47], cd_demo_sk0=[$48], cd_marital_status0=[$49], p_promo_sk=[$0], hd_demo_sk=[$15], hd_income_band_sk=[$16], hd_demo_sk0=[$13], hd_income_band_sk0=[$14], ca_address_sk=[$6], ca_street_number=[$7], ca_street_name=[$8], ca_city=[$9], ca_zip=[$10], ca_address_sk0=[$1], ca_street_number0=[$2], ca_street_name0=[$3], ca_city0=[$4], ca_zip0=[$5], ib_income_band_sk=[$12], ib_income_band_sk0=[$11], i_item_sk=[$51], i_current_price=[$52], i_color=[$53], i_product_name=[$54],
[jira] [Created] (HIVE-8311) Driver is encoding transaction information too late
Alan Gates created HIVE-8311: Summary: Driver is encoding transaction information too late Key: HIVE-8311 URL: https://issues.apache.org/jira/browse/HIVE-8311 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 Currently Driver is obtaining the transaction information and encoding it in the conf in runInternal. But this is too late, as the query has already been planned. Either we need to change the plan when this info is obtained or we need to obtain it at compile time. This bug was introduced by HIVE-8203. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8311) Driver is encoding transaction information too late
[ https://issues.apache.org/jira/browse/HIVE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153560#comment-14153560 ] Alan Gates commented on HIVE-8311: -- [~vikram.dixit] I'd like to get this into 0.14, as it produces wrong results. I should have a patch in a few hours. Driver is encoding transaction information too late --- Key: HIVE-8311 URL: https://issues.apache.org/jira/browse/HIVE-8311 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 Currently Driver is obtaining the transaction information and encoding it in the conf in runInternal. But this is too late, as the query has already been planned. Either we need to change the plan when this info is obtained or we need to obtain it at compile time. This bug was introduced by HIVE-8203. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8265) Build failure on hadoop-1
[ https://issues.apache.org/jira/browse/HIVE-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153565#comment-14153565 ] Szehon Ho commented on HIVE-8265: - [~vikram.dixit] I would like to get this fixed for 0.14. Can you help take a look at this patch, if you have the cycle? Thanks. Build failure on hadoop-1 -- Key: HIVE-8265 URL: https://issues.apache.org/jira/browse/HIVE-8265 Project: Hive Issue Type: Task Components: Tests Affects Versions: 0.14.0 Reporter: Navis Assignee: Navis Priority: Blocker Attachments: HIVE-8265.1.patch.txt, HIVE-8265.2.patch no pre-commit-tests Fails from CustomPartitionVertex and TestHive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 26178: Truncating table doesnt invalidate stats
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26178/#review54998 --- Ship it! Ship It! - Prasanth_J On Sept. 30, 2014, 5:47 p.m., Ashutosh Chauhan wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26178/ --- (Updated Sept. 30, 2014, 5:47 p.m.) Review request for hive and Prasanth_J. Bugs: HIVE-8250 https://issues.apache.org/jira/browse/HIVE-8250 Repository: hive-git Description --- Truncating table doesnt invalidate stats Diffs - metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java c95473c ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java dc00d66 ql/src/test/results/clientpositive/alter_numbuckets_partitioned_table_h23.q.out 5047b23 Diff: https://reviews.apache.org/r/26178/diff/ Testing --- Thanks, Ashutosh Chauhan
[jira] [Resolved] (HIVE-7293) Hive-trunk does not build against JDK8 with generic class checks
[ https://issues.apache.org/jira/browse/HIVE-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V resolved HIVE-7293. --- Resolution: Not a Problem Builds are succeeding on JDK8. Hive-trunk does not build against JDK8 with generic class checks Key: HIVE-7293 URL: https://issues.apache.org/jira/browse/HIVE-7293 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Environment: java version 1.8.0 Java(TM) SE Runtime Environment (build 1.8.0-b132) Java HotSpot(TM) 64-Bit Server VM (build 25.0-b70, mixed mode) Reporter: Gopal V Assignee: Gopal V Priority: Minor Labels: Vectorization The current build and tests on my laptop are failing due to generic argument mismatch errors. {code} hive-trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPGreaterThan.java:[46,82] incompatible types found : java.lang.Classorg.apache.hadoop.hive.ql.exec.vector.expressions.gen.FilterDoubleScalarGreaterDoubleColumn required: java.lang.Class? extends org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8296) Tez ReduceShuffle Vectorization needs 2 data buffers (key and value) for adding rows
[ https://issues.apache.org/jira/browse/HIVE-8296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153580#comment-14153580 ] Gopal V commented on HIVE-8296: --- LGTM - +1. [~vikram.dixit]: this is necessary for 0.14 over the HIVE-8156 fix. Tez ReduceShuffle Vectorization needs 2 data buffers (key and value) for adding rows Key: HIVE-8296 URL: https://issues.apache.org/jira/browse/HIVE-8296 Project: Hive Issue Type: Bug Components: Tez, Vectorization Affects Versions: 0.14.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8296.01.patch, HIVE-8296.02.patch We reuse the keys for the vectorized row batch and need to use a separate buffer (for strings) for reuse the batch for new values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8285) Reference equality is used on boolean values in PartitionPruner#removeTruePredciates()
[ https://issues.apache.org/jira/browse/HIVE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153616#comment-14153616 ] Hive QA commented on HIVE-8285: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12671999/HIVE-8285.patch {color:green}SUCCESS:{color} +1 6373 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1056/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1056/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1056/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12671999 Reference equality is used on boolean values in PartitionPruner#removeTruePredciates() -- Key: HIVE-8285 URL: https://issues.apache.org/jira/browse/HIVE-8285 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Ted Yu Priority: Minor Attachments: HIVE-8285.patch {code} if (e.getTypeInfo() == TypeInfoFactory.booleanTypeInfo eC.getValue() == Boolean.TRUE) { {code} equals() should be used in the above comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8262) Create CacheTran that transforms the input RDD by caching it [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153622#comment-14153622 ] Hive QA commented on HIVE-8262: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12672078/HIVE-8262.1-spark.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6509 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/183/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/183/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-183/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12672078 Create CacheTran that transforms the input RDD by caching it [Spark Branch] --- Key: HIVE-8262 URL: https://issues.apache.org/jira/browse/HIVE-8262 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chao Attachments: HIVE-8262.1-spark.patch In a few cases we need to cache a RDD to avoid recompute it for better performance. However, caching a map input RDD is different from caching a regular RDD due to SPARK-3693. The way to cache a Hadoop RDD, which is the input to MapWork, is to cache, the result RDD that is transformed from the original Hadoop RDD by applying a map function, in which key, value pairs are copied. To cache intermediate RDDs, such as that from a shuffle, is just calling .cache(). This task is to create a CacheTran to capture this, which can be used to plug in Spark Plan when caching is desirable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8231) Error when insert into empty table with ACID
[ https://issues.apache.org/jira/browse/HIVE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153627#comment-14153627 ] Alan Gates commented on HIVE-8231: -- I definitely think we are seeing separate issues. I have a filed a new issue HIVE-8311 for what I am seeing. Error when insert into empty table with ACID Key: HIVE-8231 URL: https://issues.apache.org/jira/browse/HIVE-8231 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Critical Fix For: 0.14.0 Steps to show the bug : 1. create table {code} create table encaissement_1b_64m like encaissement_1b; {code} 2. check table {code} desc encaissement_1b_64m; dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m; {code} everything is ok: {noformat} 0: jdbc:hive2://nc-h04:1/casino desc encaissement_1b_64m; +++--+--+ | col_name | data_type | comment | +++--+--+ | id | int| | | idmagasin | int| | | zibzin | string | | | cheque | int| | | montant| double | | | date | timestamp | | | col_6 | string | | | col_7 | string | | | col_8 | string | | +++--+--+ 9 rows selected (0.158 seconds) 0: jdbc:hive2://nc-h04:1/casino dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/; +-+--+ | DFS Output | +-+--+ +-+--+ No rows selected (0.01 seconds) {noformat} 3. Insert values into the new table {noformat} insert into table encaissement_1b_64m VALUES (1, 1, '8909', 1, 12.5, '12/05/2014', '','',''); {noformat} 4. Check {noformat} 0: jdbc:hive2://nc-h04:1/casino select id from encaissement_1b_64m; +-+--+ | id | +-+--+ +-+--+ No rows selected (0.091 seconds) {noformat} There are already a pb. I don't see the inserted row. 5. When I'm checking HDFS directory, I see {{delta_421_421}} folder {noformat} 0: jdbc:hive2://nc-h04:1/casino dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/; +-+--+ | DFS Output | +-+--+ | Found 1 items | | drwxr-xr-x - hduser supergroup 0 2014-09-23 12:17 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/delta_421_421 | +-+--+ 2 rows selected (0.014 seconds) {noformat} 6. Doing a major compaction solves the bug {noformat} 0: jdbc:hive2://nc-h04:1/casino alter table encaissement_1b_64m compact 'major'; No rows affected (0.046 seconds) 0: jdbc:hive2://nc-h04:1/casino dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/; ++--+ | DFS Output | ++--+ | Found 1 items | | drwxr-xr-x - hduser supergroup 0 2014-09-23 12:21 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/base_421 | ++--+ 2 rows selected (0.02 seconds) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8309) CBO: Fix OB by removing constraining DT, Use external names for col Aliases, Remove unnecessary Selects, Make DT Name counter query specific
[ https://issues.apache.org/jira/browse/HIVE-8309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-8309: - Attachment: HIVE-8309.patch CBO: Fix OB by removing constraining DT, Use external names for col Aliases, Remove unnecessary Selects, Make DT Name counter query specific Key: HIVE-8309 URL: https://issues.apache.org/jira/browse/HIVE-8309 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-8309.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8306) Map join sizing done by auto.convert.join.noconditionaltask.size doesn't take into account Hash table overhead and results in OOM
[ https://issues.apache.org/jira/browse/HIVE-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-8306: -- Priority: Minor (was: Critical) Map join sizing done by auto.convert.join.noconditionaltask.size doesn't take into account Hash table overhead and results in OOM - Key: HIVE-8306 URL: https://issues.apache.org/jira/browse/HIVE-8306 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Prasanth J Priority: Minor Fix For: 0.14.0 Attachments: query64_oom_trim.txt When hive.auto.convert.join.noconditionaltask = true we check noconditionaltask.size and if the sum of tables sizes in the map join is less than noconditionaltask.size the plan would generate a Map join, the issue with this is that the calculation doesn't take into account the overhead introduced by different HashTable implementation as results if the sum of input sizes is smaller than the noconditionaltask size by a small margin queries will hit OOM. TPC-DS query 64 is a good example for this issue as one as non conditional task size is set to 1,280,000,000 while sum of input is 1,012,379,321 which is 20% smaller than the expected size. Vertex {code} Map 28 - Map 11 (BROADCAST_EDGE), Map 12 (BROADCAST_EDGE), Map 14 (BROADCAST_EDGE), Map 15 (BROADCAST_EDGE), Map 16 (BROADCAST_EDGE), Map 24 (BROADCAST_EDGE), Map 26 (BROADCAST_EDGE), Map 30 (BROADCAST_EDGE), Map 31 (BROADCAST_EDGE), Map 32 (BROADCAST_EDGE), Map 39 (BROADCAST_EDGE), Map 40 (BROADCAST_EDGE), Map 43 (BROADCAST_EDGE), Map 45 (BROADCAST_EDGE), Map 5 (BROADCAST_EDGE) {code} Exception {code} , TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:169) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:172) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:167) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.hive.serde2.WriteBuffers.nextBufferToWrite(WriteBuffers.java:206) at org.apache.hadoop.hive.serde2.WriteBuffers.write(WriteBuffers.java:182) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$LazyBinaryKvWriter.writeKey(MapJoinBytesTableContainer.java:189) at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.put(BytesBytesMultiHashMap.java:200) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.putRow(MapJoinBytesTableContainer.java:267) at org.apache.hadoop.hive.ql.exec.tez.HashTableLoader.load(HashTableLoader.java:114) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:184) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:210) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1036) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1040) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1040) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1040) at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:37) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:186) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:164) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:160)
[jira] [Updated] (HIVE-8151) Dynamic partition sort optimization inserts record wrongly to partition when used with GroupBy
[ https://issues.apache.org/jira/browse/HIVE-8151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8151: - Status: Open (was: Patch Available) Dynamic partition sort optimization inserts record wrongly to partition when used with GroupBy -- Key: HIVE-8151 URL: https://issues.apache.org/jira/browse/HIVE-8151 Project: Hive Issue Type: Bug Affects Versions: 0.13.1, 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8151.1.patch, HIVE-8151.2.patch, HIVE-8151.3.patch, HIVE-8151.4.patch, HIVE-8151.5.patch, HIVE-8151.6.patch, HIVE-8151.7.patch, HIVE-8151.8.patch HIVE-6455 added dynamic partition sort optimization. It added startGroup() method to FileSink operator to look for changes in reduce key for creating partition directories. This method however is not reliable as the key called with startGroup() is different from the key called with processOp(). startGroup() is called with newly changed key whereas processOp() is called with previously aggregated key. This will result in processOp() writing the last row of previous group as the first row of next group. This happens only when used with group by operator. The fix is to not rely on startGroup() and do the partition directory creation in processOp() itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8151) Dynamic partition sort optimization inserts record wrongly to partition when used with GroupBy
[ https://issues.apache.org/jira/browse/HIVE-8151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153665#comment-14153665 ] Prasanth J commented on HIVE-8151: -- [~wzc1989] Thanks for providing test case. Looks like the there is some issue with casting before writing the file. I will put up a fix for it soon in the next version of this patch. Dynamic partition sort optimization inserts record wrongly to partition when used with GroupBy -- Key: HIVE-8151 URL: https://issues.apache.org/jira/browse/HIVE-8151 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Reporter: Prasanth J Assignee: Prasanth J Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8151.1.patch, HIVE-8151.2.patch, HIVE-8151.3.patch, HIVE-8151.4.patch, HIVE-8151.5.patch, HIVE-8151.6.patch, HIVE-8151.7.patch, HIVE-8151.8.patch HIVE-6455 added dynamic partition sort optimization. It added startGroup() method to FileSink operator to look for changes in reduce key for creating partition directories. This method however is not reliable as the key called with startGroup() is different from the key called with processOp(). startGroup() is called with newly changed key whereas processOp() is called with previously aggregated key. This will result in processOp() writing the last row of previous group as the first row of next group. This happens only when used with group by operator. The fix is to not rely on startGroup() and do the partition directory creation in processOp() itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8231) Error when insert into empty table with ACID
[ https://issues.apache.org/jira/browse/HIVE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-8231: --- Attachment: a_beeline_insert.txt a_hiveserver2_insert.txt b_beeline_insert.txt b_hiveserver2_insert.txt Attached few files. Use case : 1. drop table if exists foo7 (no log) 2. create table foo7 (id int) STORED AS ORC (no log) 3. insert into table foo7 VALUES(1) (log a_hiveserver2 and a_beeline) 4. select * from foo7 (log b_hiveserver2 and b_beeline) Error when insert into empty table with ACID Key: HIVE-8231 URL: https://issues.apache.org/jira/browse/HIVE-8231 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Critical Fix For: 0.14.0 Attachments: a_beeline_insert.txt, a_hiveserver2_insert.txt, b_beeline_insert.txt, b_hiveserver2_insert.txt Steps to show the bug : 1. create table {code} create table encaissement_1b_64m like encaissement_1b; {code} 2. check table {code} desc encaissement_1b_64m; dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m; {code} everything is ok: {noformat} 0: jdbc:hive2://nc-h04:1/casino desc encaissement_1b_64m; +++--+--+ | col_name | data_type | comment | +++--+--+ | id | int| | | idmagasin | int| | | zibzin | string | | | cheque | int| | | montant| double | | | date | timestamp | | | col_6 | string | | | col_7 | string | | | col_8 | string | | +++--+--+ 9 rows selected (0.158 seconds) 0: jdbc:hive2://nc-h04:1/casino dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/; +-+--+ | DFS Output | +-+--+ +-+--+ No rows selected (0.01 seconds) {noformat} 3. Insert values into the new table {noformat} insert into table encaissement_1b_64m VALUES (1, 1, '8909', 1, 12.5, '12/05/2014', '','',''); {noformat} 4. Check {noformat} 0: jdbc:hive2://nc-h04:1/casino select id from encaissement_1b_64m; +-+--+ | id | +-+--+ +-+--+ No rows selected (0.091 seconds) {noformat} There are already a pb. I don't see the inserted row. 5. When I'm checking HDFS directory, I see {{delta_421_421}} folder {noformat} 0: jdbc:hive2://nc-h04:1/casino dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/; +-+--+ | DFS Output | +-+--+ | Found 1 items | | drwxr-xr-x - hduser supergroup 0 2014-09-23 12:17 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/delta_421_421 | +-+--+ 2 rows selected (0.014 seconds) {noformat} 6. Doing a major compaction solves the bug {noformat} 0: jdbc:hive2://nc-h04:1/casino alter table encaissement_1b_64m compact 'major'; No rows affected (0.046 seconds) 0: jdbc:hive2://nc-h04:1/casino dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/; ++--+ | DFS Output | ++--+ | Found 1 items | | drwxr-xr-x - hduser supergroup 0 2014-09-23 12:21
[jira] [Updated] (HIVE-8309) CBO: Fix OB by removing constraining DT, Use external names for col Aliases, Remove unnecessary Selects, Make DT Name counter query specific
[ https://issues.apache.org/jira/browse/HIVE-8309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8309: --- Status: Patch Available (was: Open) CBO: Fix OB by removing constraining DT, Use external names for col Aliases, Remove unnecessary Selects, Make DT Name counter query specific Key: HIVE-8309 URL: https://issues.apache.org/jira/browse/HIVE-8309 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-8309.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8312) Implicit type conversion on Join keys
Lin Liu created HIVE-8312: - Summary: Implicit type conversion on Join keys Key: HIVE-8312 URL: https://issues.apache.org/jira/browse/HIVE-8312 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Lin Liu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8311) Driver is encoding transaction information too late
[ https://issues.apache.org/jira/browse/HIVE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8311: - Status: Patch Available (was: Open) Driver is encoding transaction information too late --- Key: HIVE-8311 URL: https://issues.apache.org/jira/browse/HIVE-8311 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8311.patch Currently Driver is obtaining the transaction information and encoding it in the conf in runInternal. But this is too late, as the query has already been planned. Either we need to change the plan when this info is obtained or we need to obtain it at compile time. This bug was introduced by HIVE-8203. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8311) Driver is encoding transaction information too late
[ https://issues.apache.org/jira/browse/HIVE-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8311: - Attachment: HIVE-8311.patch This patch moves the encoding of the transaction information from runInternal to compile. Driver is encoding transaction information too late --- Key: HIVE-8311 URL: https://issues.apache.org/jira/browse/HIVE-8311 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8311.patch Currently Driver is obtaining the transaction information and encoding it in the conf in runInternal. But this is too late, as the query has already been planned. Either we need to change the plan when this info is obtained or we need to obtain it at compile time. This bug was introduced by HIVE-8203. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8312) Implicit type conversion on Join keys
[ https://issues.apache.org/jira/browse/HIVE-8312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Liu updated HIVE-8312: -- Description: Suppose we have a query as follows. SELECT FROM A LEFT SEMI JOIN B ON (A.col1 = B.col2) WHERE ... If A.col1 is of STRING type, but B.col2 is of BIGINT, or DOUBLE, Hive finds the common compatible type (here is DOUBLE) for both cols and do implicit type conversion. However, this implicit conversion from STRING to DOUBLE could produce NULL values, which could further generate unexpected results, like skew. I just wonder: Is this case by design? If so, what is the logic? If not, how can we solve it? Implicit type conversion on Join keys - Key: HIVE-8312 URL: https://issues.apache.org/jira/browse/HIVE-8312 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Lin Liu Suppose we have a query as follows. SELECT FROM A LEFT SEMI JOIN B ON (A.col1 = B.col2) WHERE ... If A.col1 is of STRING type, but B.col2 is of BIGINT, or DOUBLE, Hive finds the common compatible type (here is DOUBLE) for both cols and do implicit type conversion. However, this implicit conversion from STRING to DOUBLE could produce NULL values, which could further generate unexpected results, like skew. I just wonder: Is this case by design? If so, what is the logic? If not, how can we solve it? -- This message was sent by Atlassian JIRA (v6.3.4#6332)