[jira] [Work logged] (HIVE-22782) Consolidate metastore call to fetch constraints
[ https://issues.apache.org/jira/browse/HIVE-22782?focusedWorklogId=482445&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-482445 ] ASF GitHub Bot logged work on HIVE-22782: - Author: ASF GitHub Bot Created on: 12/Sep/20 07:54 Start Date: 12/Sep/20 07:54 Worklog Time Spent: 10m Work Description: ashish-kumar-sharma commented on a change in pull request #1419: URL: https://github.com/apache/hive/pull/1419#discussion_r487382568 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java ## @@ -1176,145 +1166,101 @@ public void setStatsStateLikeNewTable() { * Note that set apis are used by DESCRIBE only, although get apis return RELY or ENABLE * constraints DESCRIBE could set all type of constraints * */ - - /* This only return PK which are created with RELY */ - public PrimaryKeyInfo getPrimaryKeyInfo() { -if(!this.isPKFetched) { + public TableConstraintsInfo getTableConstraintsInfo() { +if (!this.isTableConstraintsFetched) { try { -pki = Hive.get().getReliablePrimaryKeys(this.getDbName(), this.getTableName()); -this.isPKFetched = true; +tableConstraintsInfo = Hive.get().getReliableAndEnableTableConstraints(this.getDbName(), this.getTableName()); +this.isTableConstraintsFetched = true; } catch (HiveException e) { -LOG.warn("Cannot retrieve PK info for table : " + this.getTableName() -+ " ignoring exception: " + e); +LOG.warn( +"Cannot retrieve table constraints info for table : " + this.getTableName() + " ignoring exception: " + e); Review comment: replaced with complete name This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 482445) Time Spent: 4h 20m (was: 4h 10m) > Consolidate metastore call to fetch constraints > --- > > Key: HIVE-22782 > URL: https://issues.apache.org/jira/browse/HIVE-22782 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 4h 20m > Remaining Estimate: 0h > > Currently separate calls are made to metastore to fetch constraints like Pk, > fk, not null etc. Since planner always retrieve these constraints we should > retrieve all of them in one call. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22782) Consolidate metastore call to fetch constraints
[ https://issues.apache.org/jira/browse/HIVE-22782?focusedWorklogId=482443&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-482443 ] ASF GitHub Bot logged work on HIVE-22782: - Author: ASF GitHub Bot Created on: 12/Sep/20 07:54 Start Date: 12/Sep/20 07:54 Worklog Time Spent: 10m Work Description: ashish-kumar-sharma commented on a change in pull request #1419: URL: https://github.com/apache/hive/pull/1419#discussion_r487382545 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java ## @@ -116,22 +116,12 @@ private transient Boolean outdatedForRewritingMaterializedView; /** Constraint related objects */ - private transient PrimaryKeyInfo pki; - private transient ForeignKeyInfo fki; - private transient UniqueConstraint uki; - private transient NotNullConstraint nnc; - private transient DefaultConstraint dc; - private transient CheckConstraint cc; + private transient TableConstraintsInfo tableConstraintsInfo; /** Constraint related flags * This is to track if constraints are retrieved from metastore or not */ - private transient boolean isPKFetched=false; - private transient boolean isFKFetched=false; - private transient boolean isUniqueFetched=false; - private transient boolean isNotNullFetched=false; - private transient boolean isDefaultFetched=false; - private transient boolean isCheckFetched=false; + private transient boolean isTableConstraintsFetched=false; Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 482443) Time Spent: 4h 10m (was: 4h) > Consolidate metastore call to fetch constraints > --- > > Key: HIVE-22782 > URL: https://issues.apache.org/jira/browse/HIVE-22782 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 4h 10m > Remaining Estimate: 0h > > Currently separate calls are made to metastore to fetch constraints like Pk, > fk, not null etc. Since planner always retrieve these constraints we should > retrieve all of them in one call. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22782) Consolidate metastore call to fetch constraints
[ https://issues.apache.org/jira/browse/HIVE-22782?focusedWorklogId=482446&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-482446 ] ASF GitHub Bot logged work on HIVE-22782: - Author: ASF GitHub Bot Created on: 12/Sep/20 07:56 Start Date: 12/Sep/20 07:56 Worklog Time Spent: 10m Work Description: ashish-kumar-sharma commented on a change in pull request #1419: URL: https://github.com/apache/hive/pull/1419#discussion_r487382720 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java ## @@ -116,22 +116,12 @@ private transient Boolean outdatedForRewritingMaterializedView; /** Constraint related objects */ - private transient PrimaryKeyInfo pki; - private transient ForeignKeyInfo fki; - private transient UniqueConstraint uki; - private transient NotNullConstraint nnc; - private transient DefaultConstraint dc; - private transient CheckConstraint cc; + private transient TableConstraintsInfo tableConstraintsInfo; /** Constraint related flags * This is to track if constraints are retrieved from metastore or not */ - private transient boolean isPKFetched=false; - private transient boolean isFKFetched=false; - private transient boolean isUniqueFetched=false; - private transient boolean isNotNullFetched=false; - private transient boolean isDefaultFetched=false; - private transient boolean isCheckFetched=false; + private transient boolean isTableConstraintsFetched=false; Review comment: Since we have wrapper in place. Now we don't require extra flag to track the object is fetched or not. We can have if condition to check wrapper is null or not This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 482446) Time Spent: 4.5h (was: 4h 20m) > Consolidate metastore call to fetch constraints > --- > > Key: HIVE-22782 > URL: https://issues.apache.org/jira/browse/HIVE-22782 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 4.5h > Remaining Estimate: 0h > > Currently separate calls are made to metastore to fetch constraints like Pk, > fk, not null etc. Since planner always retrieve these constraints we should > retrieve all of them in one call. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22782) Consolidate metastore call to fetch constraints
[ https://issues.apache.org/jira/browse/HIVE-22782?focusedWorklogId=482447&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-482447 ] ASF GitHub Bot logged work on HIVE-22782: - Author: ASF GitHub Bot Created on: 12/Sep/20 07:57 Start Date: 12/Sep/20 07:57 Worklog Time Spent: 10m Work Description: ashish-kumar-sharma commented on a change in pull request #1419: URL: https://github.com/apache/hive/pull/1419#discussion_r487375471 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java ## @@ -1176,145 +1166,101 @@ public void setStatsStateLikeNewTable() { * Note that set apis are used by DESCRIBE only, although get apis return RELY or ENABLE * constraints DESCRIBE could set all type of constraints * */ - - /* This only return PK which are created with RELY */ - public PrimaryKeyInfo getPrimaryKeyInfo() { -if(!this.isPKFetched) { + public TableConstraintsInfo getTableConstraintsInfo() { +if (!this.isTableConstraintsFetched) { try { -pki = Hive.get().getReliablePrimaryKeys(this.getDbName(), this.getTableName()); -this.isPKFetched = true; +tableConstraintsInfo = Hive.get().getReliableAndEnableTableConstraints(this.getDbName(), this.getTableName()); +this.isTableConstraintsFetched = true; } catch (HiveException e) { -LOG.warn("Cannot retrieve PK info for table : " + this.getTableName() -+ " ignoring exception: " + e); +LOG.warn( +"Cannot retrieve table constraints info for table : " + this.getTableName() + " ignoring exception: " + e); } } -return pki; +return tableConstraintsInfo; } - public void setPrimaryKeyInfo(PrimaryKeyInfo pki) { -this.pki = pki; -this.isPKFetched = true; + /** + * TableConstraintsInfo setter + * @param tableConstraintsInfo + */ + public void setTableConstraintsInfo(TableConstraintsInfo tableConstraintsInfo) { +this.tableConstraintsInfo = tableConstraintsInfo; +this.isTableConstraintsFetched = true; } - /* This only return FK constraints which are created with RELY */ - public ForeignKeyInfo getForeignKeyInfo() { -if(!isFKFetched) { - try { -fki = Hive.get().getReliableForeignKeys(this.getDbName(), this.getTableName()); -this.isFKFetched = true; - } catch (HiveException e) { -LOG.warn( -"Cannot retrieve FK info for table : " + this.getTableName() -+ " ignoring exception: " + e); - } + /** + * This only return PK which are created with RELY + * @return primary key constraint list + */ + public PrimaryKeyInfo getPrimaryKeyInfo() { +if (!this.isTableConstraintsFetched) { + getTableConstraintsInfo(); } -return fki; +return tableConstraintsInfo.getPrimaryKeyInfo(); } - public void setForeignKeyInfo(ForeignKeyInfo fki) { -this.fki = fki; -this.isFKFetched = true; + /** + * This only return FK constraints which are created with RELY + * @return foreign key constraint list + */ + public ForeignKeyInfo getForeignKeyInfo() { +if (!isTableConstraintsFetched) { + getTableConstraintsInfo(); +} +return tableConstraintsInfo.getForeignKeyInfo(); } - /* This only return UNIQUE constraint defined with RELY */ + /** + * This only return UNIQUE constraint defined with RELY + * @return unique constraint list + */ public UniqueConstraint getUniqueKeyInfo() { -if(!isUniqueFetched) { - try { -uki = Hive.get().getReliableUniqueConstraints(this.getDbName(), this.getTableName()); -this.isUniqueFetched = true; - } catch (HiveException e) { -LOG.warn( -"Cannot retrieve Unique Key info for table : " + this.getTableName() -+ " ignoring exception: " + e); - } +if (!isTableConstraintsFetched) { + getTableConstraintsInfo(); } -return uki; - } - - public void setUniqueKeyInfo(UniqueConstraint uki) { -this.uki = uki; -this.isUniqueFetched = true; +return tableConstraintsInfo.getUniqueConstraint(); } - /* This only return NOT NULL constraint defined with RELY */ + /** + * This only return NOT NULL constraint defined with RELY + * @return not null constraint list + */ public NotNullConstraint getNotNullConstraint() { -if(!isNotNullFetched) { - try { -nnc = Hive.get().getReliableNotNullConstraints(this.getDbName(), this.getTableName()); -this.isNotNullFetched = true; - } catch (HiveException e) { -LOG.warn("Cannot retrieve Not Null constraint info for table : " -+ this.getTableName() + " ignoring exception: " + e); - } +if (!isTableConstraintsFetched) { + getTableConstraintsInfo(); } -return nnc; - } - - public void setNotNullConstraint(NotNullCon
[jira] [Updated] (HIVE-24138) Llap external client flow is broken due to netty shading
[ https://issues.apache.org/jira/browse/HIVE-24138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24138: -- Labels: pull-request-available (was: ) > Llap external client flow is broken due to netty shading > > > Key: HIVE-24138 > URL: https://issues.apache.org/jira/browse/HIVE-24138 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Shubham Chaurasia >Assignee: Ayush Saxena >Priority: Critical > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > We shaded netty in hive-exec in - > https://issues.apache.org/jira/browse/HIVE-23073 > This breaks LLAP external client flow on LLAP daemon side - > LLAP daemon stacktrace - > {code} > 2020-09-09T18:22:13,413 INFO [TezTR-222977_4_0_0_0_0 > (497418324441977_0004_0_00_00_0)] llap.LlapOutputFormat: Returning > writer for: attempt_497418324441977_0004_0_00_00_0 > 2020-09-09T18:22:13,419 ERROR [TezTR-222977_4_0_0_0_0 > (497418324441977_0004_0_00_00_0)] tez.MapRecordSource: > java.lang.NoSuchMethodError: > org.apache.arrow.memory.BufferAllocator.buffer(I)Lorg/apache/hive/io/netty/buffer/ArrowBuf; > at > org.apache.hadoop.hive.llap.WritableByteChannelAdapter.write(WritableByteChannelAdapter.java:96) > at org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:74) > at org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:57) > at > org.apache.arrow.vector.ipc.WriteChannel.writeIntLittleEndian(WriteChannel.java:89) > at > org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:88) > at > org.apache.arrow.vector.ipc.ArrowWriter.ensureStarted(ArrowWriter.java:130) > at > org.apache.arrow.vector.ipc.ArrowWriter.writeBatch(ArrowWriter.java:102) > at > org.apache.hadoop.hive.llap.LlapArrowRecordWriter.write(LlapArrowRecordWriter.java:85) > at > org.apache.hadoop.hive.llap.LlapArrowRecordWriter.write(LlapArrowRecordWriter.java:46) > at > org.apache.hadoop.hive.ql.exec.vector.filesink.VectorFileSinkArrowOperator.process(VectorFileSinkArrowOperator.java:137) > at > org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) > at > org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158) > at > org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:809) > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:842) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:426) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > Arrow method signature mismatch mainly happens due to the fact that arrow > contains some classes which are packaged under {{io.netty.buffer.*}} - > {code} > io.netty.buffer.ArrowBuf > io.netty.buffer.ExpandableByteBuf >
[jira] [Work logged] (HIVE-24138) Llap external client flow is broken due to netty shading
[ https://issues.apache.org/jira/browse/HIVE-24138?focusedWorklogId=482488&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-482488 ] ASF GitHub Bot logged work on HIVE-24138: - Author: ASF GitHub Bot Created on: 12/Sep/20 15:00 Start Date: 12/Sep/20 15:00 Worklog Time Spent: 10m Work Description: ayushtkn opened a new pull request #1491: URL: https://github.com/apache/hive/pull/1491 https://issues.apache.org/jira/browse/HIVE-24138 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 482488) Remaining Estimate: 0h Time Spent: 10m > Llap external client flow is broken due to netty shading > > > Key: HIVE-24138 > URL: https://issues.apache.org/jira/browse/HIVE-24138 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Shubham Chaurasia >Assignee: Ayush Saxena >Priority: Critical > Time Spent: 10m > Remaining Estimate: 0h > > We shaded netty in hive-exec in - > https://issues.apache.org/jira/browse/HIVE-23073 > This breaks LLAP external client flow on LLAP daemon side - > LLAP daemon stacktrace - > {code} > 2020-09-09T18:22:13,413 INFO [TezTR-222977_4_0_0_0_0 > (497418324441977_0004_0_00_00_0)] llap.LlapOutputFormat: Returning > writer for: attempt_497418324441977_0004_0_00_00_0 > 2020-09-09T18:22:13,419 ERROR [TezTR-222977_4_0_0_0_0 > (497418324441977_0004_0_00_00_0)] tez.MapRecordSource: > java.lang.NoSuchMethodError: > org.apache.arrow.memory.BufferAllocator.buffer(I)Lorg/apache/hive/io/netty/buffer/ArrowBuf; > at > org.apache.hadoop.hive.llap.WritableByteChannelAdapter.write(WritableByteChannelAdapter.java:96) > at org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:74) > at org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:57) > at > org.apache.arrow.vector.ipc.WriteChannel.writeIntLittleEndian(WriteChannel.java:89) > at > org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:88) > at > org.apache.arrow.vector.ipc.ArrowWriter.ensureStarted(ArrowWriter.java:130) > at > org.apache.arrow.vector.ipc.ArrowWriter.writeBatch(ArrowWriter.java:102) > at > org.apache.hadoop.hive.llap.LlapArrowRecordWriter.write(LlapArrowRecordWriter.java:85) > at > org.apache.hadoop.hive.llap.LlapArrowRecordWriter.write(LlapArrowRecordWriter.java:46) > at > org.apache.hadoop.hive.ql.exec.vector.filesink.VectorFileSinkArrowOperator.process(VectorFileSinkArrowOperator.java:137) > at > org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) > at > org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158) > at > org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:809) > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:842) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:426) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38) > at o
[jira] [Work logged] (HIVE-24151) MultiDelimitSerDe shifts data if strings contain non-ASCII characters
[ https://issues.apache.org/jira/browse/HIVE-24151?focusedWorklogId=482552&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-482552 ] ASF GitHub Bot logged work on HIVE-24151: - Author: ASF GitHub Bot Created on: 12/Sep/20 20:01 Start Date: 12/Sep/20 20:01 Worklog Time Spent: 10m Work Description: szlta commented on pull request #1490: URL: https://github.com/apache/hive/pull/1490#issuecomment-691122063 Since this is a partial revert I'm placing the diff of LazyStruct.java for: before HIVE-22360 vs my current commit: szita@szita-MBP16:~/shadow/CDH/hive$ git diff 463dae9ee8f694002af492e7d05924423aeaed09:serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyStruct.java 5de36f990d89fcd5c3d7d2344a28e16e4c1f8c24:serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyStruct.java diff --git a/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyStruct.java b/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyStruct.java index f066aaa3bf5..66b15374dda 100644 --- a/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyStruct.java +++ b/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyStruct.java @@ -22,6 +22,8 @@ import java.util.List; import com.google.common.primitives.Bytes; + +import org.apache.hadoop.hive.serde2.MultiDelimitSerDe; import org.apache.hadoop.hive.serde2.SerDeException; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -294,10 +296,10 @@ public void parseMultiDelimit(byte[] rawRow, byte[] fieldDelimit) { } // the indexes of the delimiters int[] delimitIndexes = findIndexes(rawRow, fieldDelimit); -int diff = fieldDelimit.length - 1; +int diff = fieldDelimit.length - MultiDelimitSerDe.REPLACEMENT_DELIM_LENGTH; // first field always starts from 0, even when missing startPosition[0] = 0; -for (int i = 1; i < fields.length; i++) { +for (int i = 1; i <= fields.length; i++) { if (delimitIndexes[i - 1] != -1) { int start = delimitIndexes[i - 1] + fieldDelimit.length; startPosition[i] = start - i * diff; @@ -305,7 +307,6 @@ public void parseMultiDelimit(byte[] rawRow, byte[] fieldDelimit) { startPosition[i] = length + 1; } } -startPosition[fields.length] = length + 1; Arrays.fill(fieldInited, false); parsed = true; } @@ -315,7 +316,7 @@ public void parseMultiDelimit(byte[] rawRow, byte[] fieldDelimit) { if (fields.length <= 1) { return new int[0]; } -int[] indexes = new int[fields.length - 1]; +int[] indexes = new int[fields.length]; Arrays.fill(indexes, -1); indexes[0] = Bytes.indexOf(array, target); if (indexes[0] == -1) { This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 482552) Time Spent: 40m (was: 0.5h) > MultiDelimitSerDe shifts data if strings contain non-ASCII characters > - > > Key: HIVE-24151 > URL: https://issues.apache.org/jira/browse/HIVE-24151 > Project: Hive > Issue Type: Bug >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > HIVE-22360 intended to fix another MultiDelimitSerde problem (with NULL last > columns) but introduced a regression: the approach of the fix is pretty much > all wrong, as the existing logic that operated on bytes got replaced by regex > matcher logic which deals in character positions, rather than byte positions. > As some non ASCII characters consist of more than 1 byte, the whole record > may get shifted due to this. > With this ticket I'm going to restore the old logic, and apply the proper fix > on that, but keeping (and extending) the test cases added with HIVE-22360 so > that we have a solution for both issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23841) Field writers is an HashSet, i.e., not thread-safe. Field writers is typically protected by synchronization on lock, but not in 1 location.
[ https://issues.apache.org/jira/browse/HIVE-23841?focusedWorklogId=482603&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-482603 ] ASF GitHub Bot logged work on HIVE-23841: - Author: ASF GitHub Bot Created on: 12/Sep/20 20:05 Start Date: 12/Sep/20 20:05 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #1248: URL: https://github.com/apache/hive/pull/1248#issuecomment-691367907 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 482603) Time Spent: 40m (was: 0.5h) > Field writers is an HashSet, i.e., not thread-safe. Field writers is > typically protected by synchronization on lock, but not in 1 location. > > > Key: HIVE-23841 > URL: https://issues.apache.org/jira/browse/HIVE-23841 > Project: Hive > Issue Type: Bug > Environment: Any environment >Reporter: Adrian Nistor >Priority: Major > Labels: patch-available, pull-request-available > Attachments: HIVE-23841.patch > > Time Spent: 40m > Remaining Estimate: 0h > > I also submitted a pull request on github at: > > [https://github.com/apache/hive/pull/1248] > > (same patch) > h1. Description > > Field {{writers}} is a {{HashSet}} ([line > 70|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L70]), > i.e., not thread-safe. > Accesses to field {{writers}} are protected by synchronization on {{lock}}, > e.g., at lines: > [141-144|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L141-L144], > > [212-213|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L212-L213], > and > [212-215|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L212-L215]. > However, the {{writers.remove()}} at [line > 249|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L249] > is protected by synchronization on {{INSTANCE}}, *not* on {{lock}}. > Synchronizing on 2 different objects does not ensure mutual exclusion. This > is because 2 threads synchronizing on different objects can still execute in > parallel at the same time. > Note that lines > [215|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L215] > and > [249|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L249] > are modifying {{writers}} with {{put()}} and {{remove()}}, respectively. > h1. The Code for This Fix > This fix is very simple: just change {{synchronized (INSTANCE)}} to > {{synchronized (lock)}}, just like the methods containing the other lines > listed above.[] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24146) Cleanup TaskExecutionException in GenericUDTFExplode
[ https://issues.apache.org/jira/browse/HIVE-24146?focusedWorklogId=482615&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-482615 ] ASF GitHub Bot logged work on HIVE-24146: - Author: ASF GitHub Bot Created on: 12/Sep/20 20:06 Start Date: 12/Sep/20 20:06 Worklog Time Spent: 10m Work Description: dengzhhu653 opened a new pull request #1483: URL: https://github.com/apache/hive/pull/1483 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 482615) Time Spent: 20m (was: 10m) > Cleanup TaskExecutionException in GenericUDTFExplode > > > Key: HIVE-24146 > URL: https://issues.apache.org/jira/browse/HIVE-24146 > Project: Hive > Issue Type: Improvement > Components: UDF >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > - Remove TaskExecutionException, which may be not used anymore; > - Remove the default handling in GenericUDTFExplode#process, which has been > verified during the function initializing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24143) Include convention in JDBC converter operator in Calcite plan
[ https://issues.apache.org/jira/browse/HIVE-24143?focusedWorklogId=482607&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-482607 ] ASF GitHub Bot logged work on HIVE-24143: - Author: ASF GitHub Bot Created on: 12/Sep/20 20:06 Start Date: 12/Sep/20 20:06 Worklog Time Spent: 10m Work Description: jcamachor merged pull request #1482: URL: https://github.com/apache/hive/pull/1482 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 482607) Time Spent: 1h 20m (was: 1h 10m) > Include convention in JDBC converter operator in Calcite plan > - > > Key: HIVE-24143 > URL: https://issues.apache.org/jira/browse/HIVE-24143 > Project: Hive > Issue Type: Improvement > Components: CBO >Affects Versions: 4.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Among others, it will be useful to debug the dialect being chosen for query > generation. For instance: > {code} > HiveProject(jdbc_type_conversion_table1.ikey=[$0], > jdbc_type_conversion_table1.bkey=[$1], jdbc_type_conversion_table1.fkey=[$2], > jdbc_type_conversion_table1.dkey=[$3], > jdbc_type_conversion_table1.chkey=[$4], > jdbc_type_conversion_table1.dekey=[$5], > jdbc_type_conversion_table1.dtkey=[$6], jdbc_type_conversion_table1.tkey=[$7]) > HiveProject(ikey=[$0], bkey=[$1], fkey=[$2], dkey=[$3], chkey=[$4], > dekey=[$5], dtkey=[$6], tkey=[$7]) > ->HiveJdbcConverter(convention=[JDBC.DERBY]) > JdbcHiveTableScan(table=[[default, jdbc_type_conversion_table1]], > table:alias=[jdbc_type_conversion_table1]) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24143) Include convention in JDBC converter operator in Calcite plan
[ https://issues.apache.org/jira/browse/HIVE-24143?focusedWorklogId=482621&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-482621 ] ASF GitHub Bot logged work on HIVE-24143: - Author: ASF GitHub Bot Created on: 12/Sep/20 20:07 Start Date: 12/Sep/20 20:07 Worklog Time Spent: 10m Work Description: vineetgarg02 commented on pull request #1482: URL: https://github.com/apache/hive/pull/1482#issuecomment-691326259 +1, looks good to me This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 482621) Time Spent: 1.5h (was: 1h 20m) > Include convention in JDBC converter operator in Calcite plan > - > > Key: HIVE-24143 > URL: https://issues.apache.org/jira/browse/HIVE-24143 > Project: Hive > Issue Type: Improvement > Components: CBO >Affects Versions: 4.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Among others, it will be useful to debug the dialect being chosen for query > generation. For instance: > {code} > HiveProject(jdbc_type_conversion_table1.ikey=[$0], > jdbc_type_conversion_table1.bkey=[$1], jdbc_type_conversion_table1.fkey=[$2], > jdbc_type_conversion_table1.dkey=[$3], > jdbc_type_conversion_table1.chkey=[$4], > jdbc_type_conversion_table1.dekey=[$5], > jdbc_type_conversion_table1.dtkey=[$6], jdbc_type_conversion_table1.tkey=[$7]) > HiveProject(ikey=[$0], bkey=[$1], fkey=[$2], dkey=[$3], chkey=[$4], > dekey=[$5], dtkey=[$6], tkey=[$7]) > ->HiveJdbcConverter(convention=[JDBC.DERBY]) > JdbcHiveTableScan(table=[[default, jdbc_type_conversion_table1]], > table:alias=[jdbc_type_conversion_table1]) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24147) Table column names are not extracted correctly in Hive JDBC storage handler
[ https://issues.apache.org/jira/browse/HIVE-24147?focusedWorklogId=482637&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-482637 ] ASF GitHub Bot logged work on HIVE-24147: - Author: ASF GitHub Bot Created on: 12/Sep/20 20:08 Start Date: 12/Sep/20 20:08 Worklog Time Spent: 10m Work Description: jcamachor opened a new pull request #1486: URL: https://github.com/apache/hive/pull/1486 …BC storage handler ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 482637) Time Spent: 20m (was: 10m) > Table column names are not extracted correctly in Hive JDBC storage handler > --- > > Key: HIVE-24147 > URL: https://issues.apache.org/jira/browse/HIVE-24147 > Project: Hive > Issue Type: Bug > Components: JDBC storage handler >Affects Versions: 4.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > It seems the `ResultSetMetaData` for the query used to retrieve the table > columns names contains fully qualified names, instead of possibly supporting > the {{getTableName}} method. This ends up throwing the storage handler off > and leading to exceptions, both in CBO path and non-CBO path. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24150) Refactor CommitTxnRequest field order
[ https://issues.apache.org/jira/browse/HIVE-24150?focusedWorklogId=482672&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-482672 ] ASF GitHub Bot logged work on HIVE-24150: - Author: ASF GitHub Bot Created on: 12/Sep/20 20:11 Start Date: 12/Sep/20 20:11 Worklog Time Spent: 10m Work Description: deniskuzZ opened a new pull request #1489: URL: https://github.com/apache/hive/pull/1489 ### What changes were proposed in this pull request? Refactor CommitTxnRequest field order (keyValue and exclWriteEnabled). ### Why are the changes needed? HIVE-24125 introduced backward incompatible change. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 482672) Time Spent: 20m (was: 10m) > Refactor CommitTxnRequest field order > - > > Key: HIVE-24150 > URL: https://issues.apache.org/jira/browse/HIVE-24150 > Project: Hive > Issue Type: Bug >Reporter: Denys Kuzmenko >Assignee: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Refactor CommitTxnRequest field order (keyValue and exclWriteEnabled). > HIVE-24125 introduced backward incompatible change. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24145) Fix preemption issues in reducers and file sink operators
[ https://issues.apache.org/jira/browse/HIVE-24145?focusedWorklogId=482691&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-482691 ] ASF GitHub Bot logged work on HIVE-24145: - Author: ASF GitHub Bot Created on: 12/Sep/20 20:12 Start Date: 12/Sep/20 20:12 Worklog Time Spent: 10m Work Description: rbalamohan commented on a change in pull request #1485: URL: https://github.com/apache/hive/pull/1485#discussion_r486786544 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java ## @@ -216,29 +216,47 @@ public FSPaths(Path specPath, boolean isMmTable, boolean isDirectInsert, boolean } public void closeWriters(boolean abort) throws HiveException { + Exception exception = null; for (int idx = 0; idx < outWriters.length; idx++) { if (outWriters[idx] != null) { try { outWriters[idx].close(abort); updateProgress(); } catch (IOException e) { -throw new HiveException(e); +exception = e; +LOG.error("Error closing " + outWriters[idx].toString(), e); +// continue closing others } } } - try { + for (int i = 0; i < updaters.length; i++) { +if (updaters[i] != null) { + SerDeStats stats = updaters[i].getStats(); + // Ignore 0 row files except in case of insert overwrite + if (isDirectInsert && (stats.getRowCount() > 0 || isInsertOverwrite)) { +outPathsCommitted[i] = updaters[i].getUpdatedFilePath(); + } + try { +updaters[i].close(abort); + } catch (IOException e) { +exception = e; +LOG.error("Error closing " + updaters[i].toString(), e); +// continue closing others + } +} + } + // Made an attempt to close all writers. + if (exception != null) { for (int i = 0; i < updaters.length; i++) { if (updaters[i] != null) { -SerDeStats stats = updaters[i].getStats(); -// Ignore 0 row files except in case of insert overwrite -if (isDirectInsert && (stats.getRowCount() > 0 || isInsertOverwrite)) { - outPathsCommitted[i] = updaters[i].getUpdatedFilePath(); +try { + fs.delete(updaters[i].getUpdatedFilePath(), true); +} catch (IOException e) { + e.printStackTrace(); Review comment: LOG? ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java ## @@ -284,6 +285,11 @@ public Object process(Node nd, Stack stack, NodeProcessorCtx procCtx, // Create ReduceSink operator ReduceSinkOperator rsOp = getReduceSinkOp(partitionPositions, sortPositions, sortOrder, sortNullOrder, allRSCols, bucketColumns, numBuckets, fsParent, fsOp.getConf().getWriteType()); + // we have to make sure not to reorder the child operators as it might cause weird behavior in the tasks at + // the same level. when there is auto stats gather at the same level as another operation then it might + // cause unnecessary preemption. Maintaining the order here to avoid such preemption and possible errors Review comment: Plz add TEZ-3296 as ref if possible. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 482691) Time Spent: 40m (was: 0.5h) > Fix preemption issues in reducers and file sink operators > - > > Key: HIVE-24145 > URL: https://issues.apache.org/jira/browse/HIVE-24145 > Project: Hive > Issue Type: Bug >Reporter: Ramesh Kumar Thangarajan >Assignee: Ramesh Kumar Thangarajan >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > There are two issues because of preemption: > # Reducers are getting reordered as part of optimizations because of which > more preemption happen > # Preemption in the middle of writing can cause the file to not close and > lead to errors when we read the file later -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24138) Llap external client flow is broken due to netty shading
[ https://issues.apache.org/jira/browse/HIVE-24138?focusedWorklogId=482687&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-482687 ] ASF GitHub Bot logged work on HIVE-24138: - Author: ASF GitHub Bot Created on: 12/Sep/20 20:12 Start Date: 12/Sep/20 20:12 Worklog Time Spent: 10m Work Description: ayushtkn opened a new pull request #1491: URL: https://github.com/apache/hive/pull/1491 https://issues.apache.org/jira/browse/HIVE-24138 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 482687) Time Spent: 20m (was: 10m) > Llap external client flow is broken due to netty shading > > > Key: HIVE-24138 > URL: https://issues.apache.org/jira/browse/HIVE-24138 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Shubham Chaurasia >Assignee: Ayush Saxena >Priority: Critical > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > We shaded netty in hive-exec in - > https://issues.apache.org/jira/browse/HIVE-23073 > This breaks LLAP external client flow on LLAP daemon side - > LLAP daemon stacktrace - > {code} > 2020-09-09T18:22:13,413 INFO [TezTR-222977_4_0_0_0_0 > (497418324441977_0004_0_00_00_0)] llap.LlapOutputFormat: Returning > writer for: attempt_497418324441977_0004_0_00_00_0 > 2020-09-09T18:22:13,419 ERROR [TezTR-222977_4_0_0_0_0 > (497418324441977_0004_0_00_00_0)] tez.MapRecordSource: > java.lang.NoSuchMethodError: > org.apache.arrow.memory.BufferAllocator.buffer(I)Lorg/apache/hive/io/netty/buffer/ArrowBuf; > at > org.apache.hadoop.hive.llap.WritableByteChannelAdapter.write(WritableByteChannelAdapter.java:96) > at org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:74) > at org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:57) > at > org.apache.arrow.vector.ipc.WriteChannel.writeIntLittleEndian(WriteChannel.java:89) > at > org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:88) > at > org.apache.arrow.vector.ipc.ArrowWriter.ensureStarted(ArrowWriter.java:130) > at > org.apache.arrow.vector.ipc.ArrowWriter.writeBatch(ArrowWriter.java:102) > at > org.apache.hadoop.hive.llap.LlapArrowRecordWriter.write(LlapArrowRecordWriter.java:85) > at > org.apache.hadoop.hive.llap.LlapArrowRecordWriter.write(LlapArrowRecordWriter.java:46) > at > org.apache.hadoop.hive.ql.exec.vector.filesink.VectorFileSinkArrowOperator.process(VectorFileSinkArrowOperator.java:137) > at > org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) > at > org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158) > at > org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:809) > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:842) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:426) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:3
[jira] [Work logged] (HIVE-22782) Consolidate metastore call to fetch constraints
[ https://issues.apache.org/jira/browse/HIVE-22782?focusedWorklogId=482700&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-482700 ] ASF GitHub Bot logged work on HIVE-22782: - Author: ASF GitHub Bot Created on: 12/Sep/20 20:13 Start Date: 12/Sep/20 20:13 Worklog Time Spent: 10m Work Description: sankarh commented on a change in pull request #1419: URL: https://github.com/apache/hive/pull/1419#discussion_r486978810 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java ## @@ -1155,22 +1150,19 @@ void dumpConstraintMetadata(String dbName, String tblName, Path dbRoot, Hive hiv Path constraintsRoot = new Path(dbRoot, ReplUtils.CONSTRAINTS_ROOT_DIR_NAME); Path commonConstraintsFile = new Path(constraintsRoot, ConstraintFileType.COMMON.getPrefix() + tblName); Path fkConstraintsFile = new Path(constraintsRoot, ConstraintFileType.FOREIGNKEY.getPrefix() + tblName); - List pks = hiveDb.getPrimaryKeyList(dbName, tblName); - List fks = hiveDb.getForeignKeyList(dbName, tblName); - List uks = hiveDb.getUniqueConstraintList(dbName, tblName); - List nns = hiveDb.getNotNullConstraintList(dbName, tblName); - if ((pks != null && !pks.isEmpty()) || (uks != null && !uks.isEmpty()) - || (nns != null && !nns.isEmpty())) { + SQLAllTableConstraints tableConstraints = hiveDb.getTableConstraints(dbName,tblName); + if ((tableConstraints.getPrimaryKeys() != null && !tableConstraints.getPrimaryKeys().isEmpty()) || (tableConstraints.getUniqueConstraints() != null && !tableConstraints.getUniqueConstraints().isEmpty()) Review comment: Can add utility method to check for null and empty of given list. Used multiple times. Also use local variables to reduce the code. ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -5661,184 +5663,79 @@ public void dropConstraint(String dbName, String tableName, String constraintNam } } - public List getDefaultConstraintList(String dbName, String tblName) throws HiveException, NoSuchObjectException { + public SQLAllTableConstraints getTableConstraints(String dbName, String tblName) throws HiveException, NoSuchObjectException { try { - return getMSC().getDefaultConstraints(new DefaultConstraintsRequest(getDefaultCatalog(conf), dbName, tblName)); + AllTableConstraintsRequest tableConstraintsRequest = new AllTableConstraintsRequest(); + tableConstraintsRequest.setDbName(dbName); + tableConstraintsRequest.setTblName(tblName); + tableConstraintsRequest.setCatName(getDefaultCatalog(conf)); + return getMSC().getAllTableConstraints(tableConstraintsRequest); } catch (NoSuchObjectException e) { throw e; } catch (Exception e) { throw new HiveException(e); } } - - public List getCheckConstraintList(String dbName, String tblName) throws HiveException, NoSuchObjectException { -try { - return getMSC().getCheckConstraints(new CheckConstraintsRequest(getDefaultCatalog(conf), - dbName, tblName)); -} catch (NoSuchObjectException e) { - throw e; -} catch (Exception e) { - throw new HiveException(e); -} + public TableConstraintsInfo getAllTableConstraints(String dbName, String tblName) throws HiveException { +return getTableConstraints(dbName, tblName, false, false); } - /** - * Get all primary key columns associated with the table. - * - * @param dbName Database Name - * @param tblName Table Name - * @return Primary Key associated with the table. - * @throws HiveException - */ - public PrimaryKeyInfo getPrimaryKeys(String dbName, String tblName) throws HiveException { -return getPrimaryKeys(dbName, tblName, false); + public TableConstraintsInfo getReliableAndEnableTableConstraints(String dbName, String tblName) throws HiveException { +return getTableConstraints(dbName, tblName, true, true); } - /** - * Get primary key columns associated with the table that are available for optimization. - * - * @param dbName Database Name - * @param tblName Table Name - * @return Primary Key associated with the table. - * @throws HiveException - */ - public PrimaryKeyInfo getReliablePrimaryKeys(String dbName, String tblName) throws HiveException { -return getPrimaryKeys(dbName, tblName, true); - } - - private PrimaryKeyInfo getPrimaryKeys(String dbName, String tblName, boolean onlyReliable) + private TableConstraintsInfo getTableConstraints(String dbName, String tblName, boolean reliable, boolean enable) Review comment: nit: Use "fetchReliable" and "fetchEnabled" instead of "reliable" and "enable" as it sound like flag to enable something. ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java ## @@ -116,22 +
[jira] [Work logged] (HIVE-24151) MultiDelimitSerDe shifts data if strings contain non-ASCII characters
[ https://issues.apache.org/jira/browse/HIVE-24151?focusedWorklogId=482711&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-482711 ] ASF GitHub Bot logged work on HIVE-24151: - Author: ASF GitHub Bot Created on: 12/Sep/20 20:14 Start Date: 12/Sep/20 20:14 Worklog Time Spent: 10m Work Description: szlta opened a new pull request #1490: URL: https://github.com/apache/hive/pull/1490 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 482711) Time Spent: 50m (was: 40m) > MultiDelimitSerDe shifts data if strings contain non-ASCII characters > - > > Key: HIVE-24151 > URL: https://issues.apache.org/jira/browse/HIVE-24151 > Project: Hive > Issue Type: Bug >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > HIVE-22360 intended to fix another MultiDelimitSerde problem (with NULL last > columns) but introduced a regression: the approach of the fix is pretty much > all wrong, as the existing logic that operated on bytes got replaced by regex > matcher logic which deals in character positions, rather than byte positions. > As some non ASCII characters consist of more than 1 byte, the whole record > may get shifted due to this. > With this ticket I'm going to restore the old logic, and apply the proper fix > on that, but keeping (and extending) the test cases added with HIVE-22360 so > that we have a solution for both issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24084) Push Aggregates thru joins in case it re-groups previously unique columns
[ https://issues.apache.org/jira/browse/HIVE-24084?focusedWorklogId=482731&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-482731 ] ASF GitHub Bot logged work on HIVE-24084: - Author: ASF GitHub Bot Created on: 12/Sep/20 20:15 Start Date: 12/Sep/20 20:15 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on a change in pull request #1439: URL: https://github.com/apache/hive/pull/1439#discussion_r487096103 ## File path: ql/src/test/queries/clientpositive/tpch18.q ## @@ -0,0 +1,133 @@ +--! qt:dataset:tpch_0_001.customer +--! qt:dataset:tpch_0_001.lineitem +--! qt:dataset:tpch_0_001.nation +--! qt:dataset:tpch_0_001.orders +--! qt:dataset:tpch_0_001.part +--! qt:dataset:tpch_0_001.partsupp +--! qt:dataset:tpch_0_001.region +--! qt:dataset:tpch_0_001.supplier + + +use tpch_0_001; + +set hive.transpose.aggr.join=true; +set hive.transpose.aggr.join.unique=true; +set hive.mapred.mode=nonstrict; + +create view q18_tmp_cached as +select + l_orderkey, + sum(l_quantity) as t_sum_quantity +from + lineitem +where + l_orderkey is not null +group by + l_orderkey; + + + +explain cbo select +c_name, +c_custkey, +o_orderkey, +o_orderdate, +o_totalprice, +sum(l_quantity) +from + customer, + orders, + q18_tmp_cached t, + lineitem l +where +c_custkey = o_custkey +and o_orderkey = t.l_orderkey +and o_orderkey is not null +and t.t_sum_quantity > 300 +and o_orderkey = l.l_orderkey +and l.l_orderkey is not null +group by +c_name, +c_custkey, +o_orderkey, +o_orderdate, +o_totalprice +order by +o_totalprice desc, +o_orderdate +limit 100; + + + +select 'add constraints'; + +alter table orders add constraint pk_o primary key (o_orderkey) disable novalidate rely; +alter table customer add constraint pk_c primary key (c_custkey) disable novalidate rely; + Review comment: I've added both constraints - it only removed the IS NOT NULL filter it seems to me that 1 of the sum() is used as an output and the other is being used to filter by >300 - so both of them is being "used" ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveAggregateJoinTransposeRule.java ## @@ -303,6 +305,90 @@ public void onMatch(RelOptRuleCall call) { } } + /** + * Determines weather the give grouping is unique. + * + * Consider a join which might produce non-unique rows; but later the results are aggregated again. + * This method determines if there are sufficient columns in the grouping which have been present previously as unique column(s). + */ + private boolean isGroupingUnique(RelNode input, ImmutableBitSet groups) { +if (groups.isEmpty()) { + return false; +} +RelMetadataQuery mq = input.getCluster().getMetadataQuery(); +Set uKeys = mq.getUniqueKeys(input); +for (ImmutableBitSet u : uKeys) { + if (groups.contains(u)) { +return true; + } +} +if (input instanceof Join) { + Join join = (Join) input; + RexBuilder rexBuilder = input.getCluster().getRexBuilder(); + SimpleConditionInfo cond = new SimpleConditionInfo(join.getCondition(), rexBuilder); + + if (cond.valid) { +ImmutableBitSet newGroup = groups.intersect(ImmutableBitSet.fromBitSet(cond.fields)); +RelNode l = join.getLeft(); +RelNode r = join.getRight(); + +int joinFieldCount = join.getRowType().getFieldCount(); +int lFieldCount = l.getRowType().getFieldCount(); + +ImmutableBitSet groupL = newGroup.get(0, lFieldCount); +ImmutableBitSet groupR = newGroup.get(lFieldCount, joinFieldCount).shift(-lFieldCount); + +if (isGroupingUnique(l, groupL)) { Review comment: That could be done; and I'm sure it was true in this - but this logic will work better if it could walk down as many joins as it could - we might have an aggregate on top in the meantime a bunch of joins under it...so I feel that it will be beneficial to retain it. I feeled tempted to write a RelMd handler - however I don't think I could just introduce a new one easily. RelShuttle doesn't look like a good match - I'll leave it as a set of `instanceof` calls for now. I'll upload a new patch to see if digging deeper in the tree could do more or not. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 482731) Time Spent: 3h 10m (was: 3h) > Push Aggregates thru joins in case it re-groups previously unique columns > -
[jira] [Work logged] (HIVE-24143) Include convention in JDBC converter operator in Calcite plan
[ https://issues.apache.org/jira/browse/HIVE-24143?focusedWorklogId=482740&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-482740 ] ASF GitHub Bot logged work on HIVE-24143: - Author: ASF GitHub Bot Created on: 12/Sep/20 20:16 Start Date: 12/Sep/20 20:16 Worklog Time Spent: 10m Work Description: jcamachor commented on a change in pull request #1482: URL: https://github.com/apache/hive/pull/1482#discussion_r487299089 ## File path: ql/src/test/queries/clientpositive/external_jdbc_table2.q ## @@ -7,43 +6,43 @@ CREATE TEMPORARY FUNCTION dboutput AS 'org.apache.hadoop.hive.contrib.genericudf FROM src SELECT -dboutput ('jdbc:derby:;databaseName=${system:test.tmp.dir}/test_derby_auth1;create=true','user1','passwd1', +dboutput ('jdbc:derby:;databaseName=${system:test.tmp.dir}/test_derby_auth1_1;create=true','user1','passwd1', Review comment: I changed it to avoid a clash with other test in the temp directory, which I believe was causing HIVE-23910. ## File path: ql/src/test/results/clientpositive/llap/external_jdbc_table_perf.q.out ## @@ -6115,32 +6112,39 @@ WHERE "d_year" = 1999 AND "d_moy" BETWEEN 1 AND 3 AND "d_date_sk" IS NOT NULL) A Reduce Operator Tree: Merge Join Operator condition map: - Anti Join 0 to 1 + Left Outer Join 0 to 1 Review comment: This change is not actually related to this patch. Note that the test was disabled by default (HIVE-23910); it seems maybe a preliminary version of HIVE-23716 changed these q files and it should have not. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 482740) Time Spent: 1h 40m (was: 1.5h) > Include convention in JDBC converter operator in Calcite plan > - > > Key: HIVE-24143 > URL: https://issues.apache.org/jira/browse/HIVE-24143 > Project: Hive > Issue Type: Improvement > Components: CBO >Affects Versions: 4.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Among others, it will be useful to debug the dialect being chosen for query > generation. For instance: > {code} > HiveProject(jdbc_type_conversion_table1.ikey=[$0], > jdbc_type_conversion_table1.bkey=[$1], jdbc_type_conversion_table1.fkey=[$2], > jdbc_type_conversion_table1.dkey=[$3], > jdbc_type_conversion_table1.chkey=[$4], > jdbc_type_conversion_table1.dekey=[$5], > jdbc_type_conversion_table1.dtkey=[$6], jdbc_type_conversion_table1.tkey=[$7]) > HiveProject(ikey=[$0], bkey=[$1], fkey=[$2], dkey=[$3], chkey=[$4], > dekey=[$5], dtkey=[$6], tkey=[$7]) > ->HiveJdbcConverter(convention=[JDBC.DERBY]) > JdbcHiveTableScan(table=[[default, jdbc_type_conversion_table1]], > table:alias=[jdbc_type_conversion_table1]) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22782) Consolidate metastore call to fetch constraints
[ https://issues.apache.org/jira/browse/HIVE-22782?focusedWorklogId=482765&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-482765 ] ASF GitHub Bot logged work on HIVE-22782: - Author: ASF GitHub Bot Created on: 12/Sep/20 20:18 Start Date: 12/Sep/20 20:18 Worklog Time Spent: 10m Work Description: ashish-kumar-sharma commented on a change in pull request #1419: URL: https://github.com/apache/hive/pull/1419#discussion_r487366456 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java ## @@ -1155,22 +1150,19 @@ void dumpConstraintMetadata(String dbName, String tblName, Path dbRoot, Hive hiv Path constraintsRoot = new Path(dbRoot, ReplUtils.CONSTRAINTS_ROOT_DIR_NAME); Path commonConstraintsFile = new Path(constraintsRoot, ConstraintFileType.COMMON.getPrefix() + tblName); Path fkConstraintsFile = new Path(constraintsRoot, ConstraintFileType.FOREIGNKEY.getPrefix() + tblName); - List pks = hiveDb.getPrimaryKeyList(dbName, tblName); - List fks = hiveDb.getForeignKeyList(dbName, tblName); - List uks = hiveDb.getUniqueConstraintList(dbName, tblName); - List nns = hiveDb.getNotNullConstraintList(dbName, tblName); - if ((pks != null && !pks.isEmpty()) || (uks != null && !uks.isEmpty()) - || (nns != null && !nns.isEmpty())) { + SQLAllTableConstraints tableConstraints = hiveDb.getTableConstraints(dbName,tblName); Review comment: Done ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java ## @@ -1155,22 +1150,19 @@ void dumpConstraintMetadata(String dbName, String tblName, Path dbRoot, Hive hiv Path constraintsRoot = new Path(dbRoot, ReplUtils.CONSTRAINTS_ROOT_DIR_NAME); Path commonConstraintsFile = new Path(constraintsRoot, ConstraintFileType.COMMON.getPrefix() + tblName); Path fkConstraintsFile = new Path(constraintsRoot, ConstraintFileType.FOREIGNKEY.getPrefix() + tblName); - List pks = hiveDb.getPrimaryKeyList(dbName, tblName); - List fks = hiveDb.getForeignKeyList(dbName, tblName); - List uks = hiveDb.getUniqueConstraintList(dbName, tblName); - List nns = hiveDb.getNotNullConstraintList(dbName, tblName); - if ((pks != null && !pks.isEmpty()) || (uks != null && !uks.isEmpty()) - || (nns != null && !nns.isEmpty())) { + SQLAllTableConstraints tableConstraints = hiveDb.getTableConstraints(dbName,tblName); + if ((tableConstraints.getPrimaryKeys() != null && !tableConstraints.getPrimaryKeys().isEmpty()) || (tableConstraints.getUniqueConstraints() != null && !tableConstraints.getUniqueConstraints().isEmpty()) Review comment: Yes, code is very redundant. I have replaced it with CollectionsUtils.isNotEmpty() which does the same check i.e not null and isEmpty() ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -5661,184 +5663,79 @@ public void dropConstraint(String dbName, String tableName, String constraintNam } } - public List getDefaultConstraintList(String dbName, String tblName) throws HiveException, NoSuchObjectException { + public SQLAllTableConstraints getTableConstraints(String dbName, String tblName) throws HiveException, NoSuchObjectException { try { - return getMSC().getDefaultConstraints(new DefaultConstraintsRequest(getDefaultCatalog(conf), dbName, tblName)); + AllTableConstraintsRequest tableConstraintsRequest = new AllTableConstraintsRequest(); + tableConstraintsRequest.setDbName(dbName); + tableConstraintsRequest.setTblName(tblName); + tableConstraintsRequest.setCatName(getDefaultCatalog(conf)); + return getMSC().getAllTableConstraints(tableConstraintsRequest); } catch (NoSuchObjectException e) { throw e; } catch (Exception e) { throw new HiveException(e); } } - - public List getCheckConstraintList(String dbName, String tblName) throws HiveException, NoSuchObjectException { -try { - return getMSC().getCheckConstraints(new CheckConstraintsRequest(getDefaultCatalog(conf), - dbName, tblName)); -} catch (NoSuchObjectException e) { - throw e; -} catch (Exception e) { - throw new HiveException(e); -} + public TableConstraintsInfo getAllTableConstraints(String dbName, String tblName) throws HiveException { +return getTableConstraints(dbName, tblName, false, false); } - /** - * Get all primary key columns associated with the table. - * - * @param dbName Database Name - * @param tblName Table Name - * @return Primary Key associated with the table. - * @throws HiveException - */ - public PrimaryKeyInfo getPrimaryKeys(String dbName, String tblName) throws HiveException { -return getPrimaryKeys(dbName, tblName, false); + public TableConstraintsInfo ge
[jira] [Work logged] (HIVE-24145) Fix preemption issues in reducers and file sink operators
[ https://issues.apache.org/jira/browse/HIVE-24145?focusedWorklogId=482783&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-482783 ] ASF GitHub Bot logged work on HIVE-24145: - Author: ASF GitHub Bot Created on: 12/Sep/20 20:21 Start Date: 12/Sep/20 20:21 Worklog Time Spent: 10m Work Description: ramesh0201 opened a new pull request #1485: URL: https://github.com/apache/hive/pull/1485 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 482783) Time Spent: 50m (was: 40m) > Fix preemption issues in reducers and file sink operators > - > > Key: HIVE-24145 > URL: https://issues.apache.org/jira/browse/HIVE-24145 > Project: Hive > Issue Type: Bug >Reporter: Ramesh Kumar Thangarajan >Assignee: Ramesh Kumar Thangarajan >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > There are two issues because of preemption: > # Reducers are getting reordered as part of optimizations because of which > more preemption happen > # Preemption in the middle of writing can cause the file to not close and > lead to errors when we read the file later -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24151) MultiDelimitSerDe shifts data if strings contain non-ASCII characters
[ https://issues.apache.org/jira/browse/HIVE-24151?focusedWorklogId=482791&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-482791 ] ASF GitHub Bot logged work on HIVE-24151: - Author: ASF GitHub Bot Created on: 12/Sep/20 20:22 Start Date: 12/Sep/20 20:22 Worklog Time Spent: 10m Work Description: szlta edited a comment on pull request #1490: URL: https://github.com/apache/hive/pull/1490#issuecomment-691122063 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 482791) Time Spent: 1h (was: 50m) > MultiDelimitSerDe shifts data if strings contain non-ASCII characters > - > > Key: HIVE-24151 > URL: https://issues.apache.org/jira/browse/HIVE-24151 > Project: Hive > Issue Type: Bug >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > HIVE-22360 intended to fix another MultiDelimitSerde problem (with NULL last > columns) but introduced a regression: the approach of the fix is pretty much > all wrong, as the existing logic that operated on bytes got replaced by regex > matcher logic which deals in character positions, rather than byte positions. > As some non ASCII characters consist of more than 1 byte, the whole record > may get shifted due to this. > With this ticket I'm going to restore the old logic, and apply the proper fix > on that, but keeping (and extending) the test cases added with HIVE-22360 so > that we have a solution for both issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24149) HiveStreamingConnection doesn't close HMS connection
[ https://issues.apache.org/jira/browse/HIVE-24149?focusedWorklogId=482789&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-482789 ] ASF GitHub Bot logged work on HIVE-24149: - Author: ASF GitHub Bot Created on: 12/Sep/20 20:22 Start Date: 12/Sep/20 20:22 Worklog Time Spent: 10m Work Description: zeroflag opened a new pull request #1488: URL: https://github.com/apache/hive/pull/1488 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 482789) Time Spent: 20m (was: 10m) > HiveStreamingConnection doesn't close HMS connection > > > Key: HIVE-24149 > URL: https://issues.apache.org/jira/browse/HIVE-24149 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > There 3 HMS connections used by HiveStreamingConnection. One for TX one for > hearbeat and for notifications. The close method only closes the first 2 > leaving the last one open which eventually overloads HMS and it becomes > unresponsive. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24035) Add Jenkinsfile for branch-2.3
[ https://issues.apache.org/jira/browse/HIVE-24035?focusedWorklogId=482804&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-482804 ] ASF GitHub Bot logged work on HIVE-24035: - Author: ASF GitHub Bot Created on: 12/Sep/20 20:23 Start Date: 12/Sep/20 20:23 Worklog Time Spent: 10m Work Description: sunchao commented on pull request #1398: URL: https://github.com/apache/hive/pull/1398#issuecomment-691243264 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 482804) Time Spent: 2h 40m (was: 2.5h) > Add Jenkinsfile for branch-2.3 > -- > > Key: HIVE-24035 > URL: https://issues.apache.org/jira/browse/HIVE-24035 > Project: Hive > Issue Type: Test >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > > To enable precommit tests for github PR, we need to have a Jenkinsfile in the > repo. This is already done for master and branch-2. This adds the same for > branch-2.3 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24143) Include convention in JDBC converter operator in Calcite plan
[ https://issues.apache.org/jira/browse/HIVE-24143?focusedWorklogId=482822&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-482822 ] ASF GitHub Bot logged work on HIVE-24143: - Author: ASF GitHub Bot Created on: 12/Sep/20 20:24 Start Date: 12/Sep/20 20:24 Worklog Time Spent: 10m Work Description: vineetgarg02 commented on a change in pull request #1482: URL: https://github.com/apache/hive/pull/1482#discussion_r487292490 ## File path: ql/src/test/results/clientpositive/llap/external_jdbc_table_perf.q.out ## @@ -6115,32 +6112,39 @@ WHERE "d_year" = 1999 AND "d_moy" BETWEEN 1 AND 3 AND "d_date_sk" IS NOT NULL) A Reduce Operator Tree: Merge Join Operator condition map: - Anti Join 0 to 1 + Left Outer Join 0 to 1 Review comment: Also not sure what caused this change in plan. ## File path: ql/src/test/queries/clientpositive/external_jdbc_table2.q ## @@ -7,43 +6,43 @@ CREATE TEMPORARY FUNCTION dboutput AS 'org.apache.hadoop.hive.contrib.genericudf FROM src SELECT -dboutput ('jdbc:derby:;databaseName=${system:test.tmp.dir}/test_derby_auth1;create=true','user1','passwd1', +dboutput ('jdbc:derby:;databaseName=${system:test.tmp.dir}/test_derby_auth1_1;create=true','user1','passwd1', Review comment: Curious as to why this changed? ## File path: ql/src/test/results/clientpositive/llap/external_jdbc_table_perf.q.out ## @@ -6115,32 +6112,39 @@ WHERE "d_year" = 1999 AND "d_moy" BETWEEN 1 AND 3 AND "d_date_sk" IS NOT NULL) A Reduce Operator Tree: Merge Join Operator condition map: - Anti Join 0 to 1 + Left Outer Join 0 to 1 Review comment: Got it, make sense ## File path: ql/src/test/results/clientpositive/llap/external_jdbc_table_perf.q.out ## @@ -6115,32 +6112,39 @@ WHERE "d_year" = 1999 AND "d_moy" BETWEEN 1 AND 3 AND "d_date_sk" IS NOT NULL) A Reduce Operator Tree: Merge Join Operator condition map: - Anti Join 0 to 1 + Left Outer Join 0 to 1 Review comment: Also not sure what caused this change in plan. ## File path: ql/src/test/queries/clientpositive/external_jdbc_table2.q ## @@ -7,43 +6,43 @@ CREATE TEMPORARY FUNCTION dboutput AS 'org.apache.hadoop.hive.contrib.genericudf FROM src SELECT -dboutput ('jdbc:derby:;databaseName=${system:test.tmp.dir}/test_derby_auth1;create=true','user1','passwd1', +dboutput ('jdbc:derby:;databaseName=${system:test.tmp.dir}/test_derby_auth1_1;create=true','user1','passwd1', Review comment: Curious as to why this changed? ## File path: ql/src/test/results/clientpositive/llap/external_jdbc_table_perf.q.out ## @@ -6115,32 +6112,39 @@ WHERE "d_year" = 1999 AND "d_moy" BETWEEN 1 AND 3 AND "d_date_sk" IS NOT NULL) A Reduce Operator Tree: Merge Join Operator condition map: - Anti Join 0 to 1 + Left Outer Join 0 to 1 Review comment: Got it, make sense This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 482822) Time Spent: 1h 50m (was: 1h 40m) > Include convention in JDBC converter operator in Calcite plan > - > > Key: HIVE-24143 > URL: https://issues.apache.org/jira/browse/HIVE-24143 > Project: Hive > Issue Type: Improvement > Components: CBO >Affects Versions: 4.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Among others, it will be useful to debug the dialect being chosen for query > generation. For instance: > {code} > HiveProject(jdbc_type_conversion_table1.ikey=[$0], > jdbc_type_conversion_table1.bkey=[$1], jdbc_type_conversion_table1.fkey=[$2], > jdbc_type_conversion_table1.dkey=[$3], > jdbc_type_conversion_table1.chkey=[$4], > jdbc_type_conversion_table1.dekey=[$5], > jdbc_type_conversion_table1.dtkey=[$6], jdbc_type_conversion_table1.tkey=[$7]) > HiveProject(ikey=[$0], bkey=[$1], fkey=[$2], dkey=[$3], chkey=[$4], > dekey=[$5], dtkey=[$6], tkey=[$7]) > ->HiveJdbc
[jira] [Work logged] (HIVE-24127) Dump events from default catalog only
[ https://issues.apache.org/jira/browse/HIVE-24127?focusedWorklogId=482836&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-482836 ] ASF GitHub Bot logged work on HIVE-24127: - Author: ASF GitHub Bot Created on: 12/Sep/20 20:25 Start Date: 12/Sep/20 20:25 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1478: URL: https://github.com/apache/hive/pull/1478#discussion_r486826515 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/messaging/event/filters/CatalogFilter.java ## @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.metastore.messaging.event.filters; + +import org.apache.hadoop.hive.metastore.api.NotificationEvent; + +/** + * Utility function that constructs a notification filter to match a given catalog name. + */ +public class CatalogFilter extends BasicFilter { + private final String catalogName; + + public CatalogFilter(final String catalogName) { +this.catalogName = catalogName; + } + + @Override + boolean shouldAccept(final NotificationEvent event) { +if (catalogName == null || event.getCatName() == null || catalogName.equalsIgnoreCase(event.getCatName())) { Review comment: catalogName should never be null. If not configured also, it must be default which is "hive". I think we should let it fail if filter is initialized with catalogName being null . ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/messaging/event/filters/CatalogFilter.java ## @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.metastore.messaging.event.filters; + +import org.apache.hadoop.hive.metastore.api.NotificationEvent; + +/** + * Utility function that constructs a notification filter to match a given catalog name. + */ +public class CatalogFilter extends BasicFilter { + private final String catalogName; + + public CatalogFilter(final String catalogName) { +this.catalogName = catalogName; + } + + @Override + boolean shouldAccept(final NotificationEvent event) { +if (catalogName == null || event.getCatName() == null || catalogName.equalsIgnoreCase(event.getCatName())) { Review comment: catalogName should never be null. If not configured also, it must be default which is "hive". I think we should let it fail if filter is initialized with catalogName being null . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 482836) Time Spent: 0.5h (was: 20m) > Dump events from default catalog only > - > > Key: HIVE-24127 > URL: https://issues.apache.org/jira/browse/HIVE-24127 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-avai
[jira] [Work logged] (HIVE-24144) getIdentifierQuoteString in HiveDatabaseMetaData returns incorrect value
[ https://issues.apache.org/jira/browse/HIVE-24144?focusedWorklogId=482877&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-482877 ] ASF GitHub Bot logged work on HIVE-24144: - Author: ASF GitHub Bot Created on: 12/Sep/20 20:28 Start Date: 12/Sep/20 20:28 Worklog Time Spent: 10m Work Description: jcamachor opened a new pull request #1487: URL: https://github.com/apache/hive/pull/1487 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 482877) Remaining Estimate: 0h Time Spent: 10m > getIdentifierQuoteString in HiveDatabaseMetaData returns incorrect value > > > Key: HIVE-24144 > URL: https://issues.apache.org/jira/browse/HIVE-24144 > Project: Hive > Issue Type: Bug > Components: JDBC, JDBC storage handler >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > {code} > public String getIdentifierQuoteString() throws SQLException { > return " "; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24097) correct NPE exception in HiveMetastoreAuthorizer
[ https://issues.apache.org/jira/browse/HIVE-24097?focusedWorklogId=482879&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-482879 ] ASF GitHub Bot logged work on HIVE-24097: - Author: ASF GitHub Bot Created on: 12/Sep/20 20:29 Start Date: 12/Sep/20 20:29 Worklog Time Spent: 10m Work Description: nrg4878 commented on pull request #1448: URL: https://github.com/apache/hive/pull/1448#issuecomment-691254659 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 482879) Time Spent: 0.5h (was: 20m) > correct NPE exception in HiveMetastoreAuthorizer > > > Key: HIVE-24097 > URL: https://issues.apache.org/jira/browse/HIVE-24097 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0 >Reporter: Sam An >Assignee: Sam An >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > In some testing, we found it's possible to have NPE if the preEventType does > not fall within the several the HMS currently checks. This makes the > AuthzContext a null pointer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24084) Push Aggregates thru joins in case it re-groups previously unique columns
[ https://issues.apache.org/jira/browse/HIVE-24084?focusedWorklogId=482911&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-482911 ] ASF GitHub Bot logged work on HIVE-24084: - Author: ASF GitHub Bot Created on: 12/Sep/20 20:32 Start Date: 12/Sep/20 20:32 Worklog Time Spent: 10m Work Description: jcamachor commented on a change in pull request #1439: URL: https://github.com/apache/hive/pull/1439#discussion_r487130820 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveAggregateJoinTransposeRule.java ## @@ -303,6 +305,90 @@ public void onMatch(RelOptRuleCall call) { } } + /** + * Determines weather the give grouping is unique. + * + * Consider a join which might produce non-unique rows; but later the results are aggregated again. + * This method determines if there are sufficient columns in the grouping which have been present previously as unique column(s). + */ + private boolean isGroupingUnique(RelNode input, ImmutableBitSet groups) { +if (groups.isEmpty()) { + return false; +} +RelMetadataQuery mq = input.getCluster().getMetadataQuery(); +Set uKeys = mq.getUniqueKeys(input); +for (ImmutableBitSet u : uKeys) { + if (groups.contains(u)) { +return true; + } +} +if (input instanceof Join) { + Join join = (Join) input; + RexBuilder rexBuilder = input.getCluster().getRexBuilder(); + SimpleConditionInfo cond = new SimpleConditionInfo(join.getCondition(), rexBuilder); + + if (cond.valid) { +ImmutableBitSet newGroup = groups.intersect(ImmutableBitSet.fromBitSet(cond.fields)); +RelNode l = join.getLeft(); +RelNode r = join.getRight(); + +int joinFieldCount = join.getRowType().getFieldCount(); +int lFieldCount = l.getRowType().getFieldCount(); + +ImmutableBitSet groupL = newGroup.get(0, lFieldCount); +ImmutableBitSet groupR = newGroup.get(lFieldCount, joinFieldCount).shift(-lFieldCount); + +if (isGroupingUnique(l, groupL)) { Review comment: OK. If it turns out there are many changes and it may need some time to be fixed, feel free to defer to follow-up JIRA and let's merge this one. ## File path: ql/src/test/queries/clientpositive/tpch18.q ## @@ -0,0 +1,133 @@ +--! qt:dataset:tpch_0_001.customer +--! qt:dataset:tpch_0_001.lineitem +--! qt:dataset:tpch_0_001.nation +--! qt:dataset:tpch_0_001.orders +--! qt:dataset:tpch_0_001.part +--! qt:dataset:tpch_0_001.partsupp +--! qt:dataset:tpch_0_001.region +--! qt:dataset:tpch_0_001.supplier + + +use tpch_0_001; + +set hive.transpose.aggr.join=true; +set hive.transpose.aggr.join.unique=true; +set hive.mapred.mode=nonstrict; + +create view q18_tmp_cached as +select + l_orderkey, + sum(l_quantity) as t_sum_quantity +from + lineitem +where + l_orderkey is not null +group by + l_orderkey; + + + +explain cbo select +c_name, +c_custkey, +o_orderkey, +o_orderdate, +o_totalprice, +sum(l_quantity) +from + customer, + orders, + q18_tmp_cached t, + lineitem l +where +c_custkey = o_custkey +and o_orderkey = t.l_orderkey +and o_orderkey is not null +and t.t_sum_quantity > 300 +and o_orderkey = l.l_orderkey +and l.l_orderkey is not null +group by +c_name, +c_custkey, +o_orderkey, +o_orderdate, +o_totalprice +order by +o_totalprice desc, +o_orderdate +limit 100; + + + +select 'add constraints'; + +alter table orders add constraint pk_o primary key (o_orderkey) disable novalidate rely; +alter table customer add constraint pk_c primary key (c_custkey) disable novalidate rely; + Review comment: Thanks @kgyrtkirk . This seems to need further exploration, we thought https://issues.apache.org/jira/browse/HIVE-24087 was going to help here. @vineetgarg02 , could you take a look at this once this patch is merged? Maybe the shape of the plan is slightly different to the one we anticipated. ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveAggregateJoinTransposeRule.java ## @@ -303,6 +305,90 @@ public void onMatch(RelOptRuleCall call) { } } + /** + * Determines weather the give grouping is unique. + * + * Consider a join which might produce non-unique rows; but later the results are aggregated again. + * This method determines if there are sufficient columns in the grouping which have been present previously as unique column(s). + */ + private boolean isGroupingUnique(RelNode input, ImmutableBitSet groups) { +if (groups.isEmpty()) { + return false; +} +RelMetadataQuery mq = input.getCluster().getMetadataQuery(); +Set uKeys = mq.getUniqueKeys(input); +for (ImmutableBitSet u : uKeys) { + if (groups.contains(u)) { +return true; + } +} +if (inpu
[jira] [Work logged] (HIVE-24022) Optimise HiveMetaStoreAuthorizer.createHiveMetaStoreAuthorizer
[ https://issues.apache.org/jira/browse/HIVE-24022?focusedWorklogId=482924&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-482924 ] ASF GitHub Bot logged work on HIVE-24022: - Author: ASF GitHub Bot Created on: 12/Sep/20 20:33 Start Date: 12/Sep/20 20:33 Worklog Time Spent: 10m Work Description: nrg4878 commented on pull request #1385: URL: https://github.com/apache/hive/pull/1385#issuecomment-691213984 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 482924) Time Spent: 50m (was: 40m) > Optimise HiveMetaStoreAuthorizer.createHiveMetaStoreAuthorizer > -- > > Key: HIVE-24022 > URL: https://issues.apache.org/jira/browse/HIVE-24022 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Sam An >Priority: Minor > Labels: performance, pull-request-available > Fix For: 4.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > For a table with 3000+ partitions, analyze table takes a lot longer time as > HiveMetaStoreAuthorizer tries to create HiveConf for every partition request. > > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/metastore/HiveMetaStoreAuthorizer.java#L319] > > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/metastore/HiveMetaStoreAuthorizer.java#L447] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22290) ObjectStore.cleanWriteNotificationEvents and ObjectStore.cleanupEvents OutOfMemory on large number of pending events
[ https://issues.apache.org/jira/browse/HIVE-22290?focusedWorklogId=482929&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-482929 ] ASF GitHub Bot logged work on HIVE-22290: - Author: ASF GitHub Bot Created on: 12/Sep/20 20:34 Start Date: 12/Sep/20 20:34 Worklog Time Spent: 10m Work Description: nareshpr opened a new pull request #1484: URL: https://github.com/apache/hive/pull/1484 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 482929) Time Spent: 20m (was: 10m) > ObjectStore.cleanWriteNotificationEvents and ObjectStore.cleanupEvents > OutOfMemory on large number of pending events > > > Key: HIVE-22290 > URL: https://issues.apache.org/jira/browse/HIVE-22290 > Project: Hive > Issue Type: Bug > Components: HCatalog, repl >Affects Versions: 4.0.0 >Reporter: Thomas Prelle >Assignee: Naresh P R >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > As in [https://jira.apache.org/jira/browse/HIVE-19430] if there are large > number of events that haven't been cleaned up for some reason, then > ObjectStore.cleanWriteNotificationEvents() and ObjectStore.cleanupEvents can > run out of memory while it loads all the events to be deleted. > It should fetch events in batches. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23413) Create a new config to skip all locks
[ https://issues.apache.org/jira/browse/HIVE-23413?focusedWorklogId=482961&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-482961 ] ASF GitHub Bot logged work on HIVE-23413: - Author: ASF GitHub Bot Created on: 12/Sep/20 20:38 Start Date: 12/Sep/20 20:38 Worklog Time Spent: 10m Work Description: pvary commented on pull request #1220: URL: https://github.com/apache/hive/pull/1220#issuecomment-690931518 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 482961) Time Spent: 1h 40m (was: 1.5h) > Create a new config to skip all locks > - > > Key: HIVE-23413 > URL: https://issues.apache.org/jira/browse/HIVE-23413 > Project: Hive > Issue Type: Improvement >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23413.1.patch, HIVE-23413.2.patch > > Time Spent: 1h 40m > Remaining Estimate: 0h > > From time-to-time some query is blocked on locks which should not. > To have a quick workaround for this we should have a config which the user > can set in the session to disable acquiring/checking locks, so we can provide > it immediately and then later investigate and fix the root cause. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23413) Create a new config to skip all locks
[ https://issues.apache.org/jira/browse/HIVE-23413?focusedWorklogId=483002&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483002 ] ASF GitHub Bot logged work on HIVE-23413: - Author: ASF GitHub Bot Created on: 12/Sep/20 20:45 Start Date: 12/Sep/20 20:45 Worklog Time Spent: 10m Work Description: pvargacl commented on pull request #1220: URL: https://github.com/apache/hive/pull/1220#issuecomment-690929320 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 483002) Time Spent: 1h 50m (was: 1h 40m) > Create a new config to skip all locks > - > > Key: HIVE-23413 > URL: https://issues.apache.org/jira/browse/HIVE-23413 > Project: Hive > Issue Type: Improvement >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23413.1.patch, HIVE-23413.2.patch > > Time Spent: 1h 50m > Remaining Estimate: 0h > > From time-to-time some query is blocked on locks which should not. > To have a quick workaround for this we should have a config which the user > can set in the session to disable acquiring/checking locks, so we can provide > it immediately and then later investigate and fix the root cause. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24151) MultiDelimitSerDe shifts data if strings contain non-ASCII characters
[ https://issues.apache.org/jira/browse/HIVE-24151?focusedWorklogId=483017&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483017 ] ASF GitHub Bot logged work on HIVE-24151: - Author: ASF GitHub Bot Created on: 12/Sep/20 20:47 Start Date: 12/Sep/20 20:47 Worklog Time Spent: 10m Work Description: szlta commented on pull request #1490: URL: https://github.com/apache/hive/pull/1490#issuecomment-691122063 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 483017) Time Spent: 1h 10m (was: 1h) > MultiDelimitSerDe shifts data if strings contain non-ASCII characters > - > > Key: HIVE-24151 > URL: https://issues.apache.org/jira/browse/HIVE-24151 > Project: Hive > Issue Type: Bug >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > HIVE-22360 intended to fix another MultiDelimitSerde problem (with NULL last > columns) but introduced a regression: the approach of the fix is pretty much > all wrong, as the existing logic that operated on bytes got replaced by regex > matcher logic which deals in character positions, rather than byte positions. > As some non ASCII characters consist of more than 1 byte, the whole record > may get shifted due to this. > With this ticket I'm going to restore the old logic, and apply the proper fix > on that, but keeping (and extending) the test cases added with HIVE-22360 so > that we have a solution for both issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23841) Field writers is an HashSet, i.e., not thread-safe. Field writers is typically protected by synchronization on lock, but not in 1 location.
[ https://issues.apache.org/jira/browse/HIVE-23841?focusedWorklogId=483063&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483063 ] ASF GitHub Bot logged work on HIVE-23841: - Author: ASF GitHub Bot Created on: 12/Sep/20 20:51 Start Date: 12/Sep/20 20:51 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #1248: URL: https://github.com/apache/hive/pull/1248#issuecomment-691367907 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 483063) Time Spent: 50m (was: 40m) > Field writers is an HashSet, i.e., not thread-safe. Field writers is > typically protected by synchronization on lock, but not in 1 location. > > > Key: HIVE-23841 > URL: https://issues.apache.org/jira/browse/HIVE-23841 > Project: Hive > Issue Type: Bug > Environment: Any environment >Reporter: Adrian Nistor >Priority: Major > Labels: patch-available, pull-request-available > Attachments: HIVE-23841.patch > > Time Spent: 50m > Remaining Estimate: 0h > > I also submitted a pull request on github at: > > [https://github.com/apache/hive/pull/1248] > > (same patch) > h1. Description > > Field {{writers}} is a {{HashSet}} ([line > 70|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L70]), > i.e., not thread-safe. > Accesses to field {{writers}} are protected by synchronization on {{lock}}, > e.g., at lines: > [141-144|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L141-L144], > > [212-213|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L212-L213], > and > [212-215|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L212-L215]. > However, the {{writers.remove()}} at [line > 249|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L249] > is protected by synchronization on {{INSTANCE}}, *not* on {{lock}}. > Synchronizing on 2 different objects does not ensure mutual exclusion. This > is because 2 threads synchronizing on different objects can still execute in > parallel at the same time. > Note that lines > [215|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L215] > and > [249|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L249] > are modifying {{writers}} with {{put()}} and {{remove()}}, respectively. > h1. The Code for This Fix > This fix is very simple: just change {{synchronized (INSTANCE)}} to > {{synchronized (lock)}}, just like the methods containing the other lines > listed above.[] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24143) Include convention in JDBC converter operator in Calcite plan
[ https://issues.apache.org/jira/browse/HIVE-24143?focusedWorklogId=483067&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483067 ] ASF GitHub Bot logged work on HIVE-24143: - Author: ASF GitHub Bot Created on: 12/Sep/20 20:52 Start Date: 12/Sep/20 20:52 Worklog Time Spent: 10m Work Description: jcamachor merged pull request #1482: URL: https://github.com/apache/hive/pull/1482 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 483067) Time Spent: 2h (was: 1h 50m) > Include convention in JDBC converter operator in Calcite plan > - > > Key: HIVE-24143 > URL: https://issues.apache.org/jira/browse/HIVE-24143 > Project: Hive > Issue Type: Improvement > Components: CBO >Affects Versions: 4.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Among others, it will be useful to debug the dialect being chosen for query > generation. For instance: > {code} > HiveProject(jdbc_type_conversion_table1.ikey=[$0], > jdbc_type_conversion_table1.bkey=[$1], jdbc_type_conversion_table1.fkey=[$2], > jdbc_type_conversion_table1.dkey=[$3], > jdbc_type_conversion_table1.chkey=[$4], > jdbc_type_conversion_table1.dekey=[$5], > jdbc_type_conversion_table1.dtkey=[$6], jdbc_type_conversion_table1.tkey=[$7]) > HiveProject(ikey=[$0], bkey=[$1], fkey=[$2], dkey=[$3], chkey=[$4], > dekey=[$5], dtkey=[$6], tkey=[$7]) > ->HiveJdbcConverter(convention=[JDBC.DERBY]) > JdbcHiveTableScan(table=[[default, jdbc_type_conversion_table1]], > table:alias=[jdbc_type_conversion_table1]) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24146) Cleanup TaskExecutionException in GenericUDTFExplode
[ https://issues.apache.org/jira/browse/HIVE-24146?focusedWorklogId=483075&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483075 ] ASF GitHub Bot logged work on HIVE-24146: - Author: ASF GitHub Bot Created on: 12/Sep/20 20:52 Start Date: 12/Sep/20 20:52 Worklog Time Spent: 10m Work Description: dengzhhu653 opened a new pull request #1483: URL: https://github.com/apache/hive/pull/1483 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 483075) Time Spent: 0.5h (was: 20m) > Cleanup TaskExecutionException in GenericUDTFExplode > > > Key: HIVE-24146 > URL: https://issues.apache.org/jira/browse/HIVE-24146 > Project: Hive > Issue Type: Improvement > Components: UDF >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > - Remove TaskExecutionException, which may be not used anymore; > - Remove the default handling in GenericUDTFExplode#process, which has been > verified during the function initializing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24143) Include convention in JDBC converter operator in Calcite plan
[ https://issues.apache.org/jira/browse/HIVE-24143?focusedWorklogId=483081&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483081 ] ASF GitHub Bot logged work on HIVE-24143: - Author: ASF GitHub Bot Created on: 12/Sep/20 20:53 Start Date: 12/Sep/20 20:53 Worklog Time Spent: 10m Work Description: vineetgarg02 commented on pull request #1482: URL: https://github.com/apache/hive/pull/1482#issuecomment-691326259 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 483081) Time Spent: 2h 10m (was: 2h) > Include convention in JDBC converter operator in Calcite plan > - > > Key: HIVE-24143 > URL: https://issues.apache.org/jira/browse/HIVE-24143 > Project: Hive > Issue Type: Improvement > Components: CBO >Affects Versions: 4.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Among others, it will be useful to debug the dialect being chosen for query > generation. For instance: > {code} > HiveProject(jdbc_type_conversion_table1.ikey=[$0], > jdbc_type_conversion_table1.bkey=[$1], jdbc_type_conversion_table1.fkey=[$2], > jdbc_type_conversion_table1.dkey=[$3], > jdbc_type_conversion_table1.chkey=[$4], > jdbc_type_conversion_table1.dekey=[$5], > jdbc_type_conversion_table1.dtkey=[$6], jdbc_type_conversion_table1.tkey=[$7]) > HiveProject(ikey=[$0], bkey=[$1], fkey=[$2], dkey=[$3], chkey=[$4], > dekey=[$5], dtkey=[$6], tkey=[$7]) > ->HiveJdbcConverter(convention=[JDBC.DERBY]) > JdbcHiveTableScan(table=[[default, jdbc_type_conversion_table1]], > table:alias=[jdbc_type_conversion_table1]) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24147) Table column names are not extracted correctly in Hive JDBC storage handler
[ https://issues.apache.org/jira/browse/HIVE-24147?focusedWorklogId=483096&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483096 ] ASF GitHub Bot logged work on HIVE-24147: - Author: ASF GitHub Bot Created on: 12/Sep/20 20:56 Start Date: 12/Sep/20 20:56 Worklog Time Spent: 10m Work Description: jcamachor opened a new pull request #1486: URL: https://github.com/apache/hive/pull/1486 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 483096) Time Spent: 0.5h (was: 20m) > Table column names are not extracted correctly in Hive JDBC storage handler > --- > > Key: HIVE-24147 > URL: https://issues.apache.org/jira/browse/HIVE-24147 > Project: Hive > Issue Type: Bug > Components: JDBC storage handler >Affects Versions: 4.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > It seems the `ResultSetMetaData` for the query used to retrieve the table > columns names contains fully qualified names, instead of possibly supporting > the {{getTableName}} method. This ends up throwing the storage handler off > and leading to exceptions, both in CBO path and non-CBO path. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24151) MultiDelimitSerDe shifts data if strings contain non-ASCII characters
[ https://issues.apache.org/jira/browse/HIVE-24151?focusedWorklogId=483125&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483125 ] ASF GitHub Bot logged work on HIVE-24151: - Author: ASF GitHub Bot Created on: 12/Sep/20 21:15 Start Date: 12/Sep/20 21:15 Worklog Time Spent: 10m Work Description: szlta opened a new pull request #1490: URL: https://github.com/apache/hive/pull/1490 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 483125) Time Spent: 1h 20m (was: 1h 10m) > MultiDelimitSerDe shifts data if strings contain non-ASCII characters > - > > Key: HIVE-24151 > URL: https://issues.apache.org/jira/browse/HIVE-24151 > Project: Hive > Issue Type: Bug >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > HIVE-22360 intended to fix another MultiDelimitSerde problem (with NULL last > columns) but introduced a regression: the approach of the fix is pretty much > all wrong, as the existing logic that operated on bytes got replaced by regex > matcher logic which deals in character positions, rather than byte positions. > As some non ASCII characters consist of more than 1 byte, the whole record > may get shifted due to this. > With this ticket I'm going to restore the old logic, and apply the proper fix > on that, but keeping (and extending) the test cases added with HIVE-22360 so > that we have a solution for both issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24084) Push Aggregates thru joins in case it re-groups previously unique columns
[ https://issues.apache.org/jira/browse/HIVE-24084?focusedWorklogId=483143&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483143 ] ASF GitHub Bot logged work on HIVE-24084: - Author: ASF GitHub Bot Created on: 12/Sep/20 21:20 Start Date: 12/Sep/20 21:20 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on a change in pull request #1439: URL: https://github.com/apache/hive/pull/1439#discussion_r487096103 ## File path: ql/src/test/queries/clientpositive/tpch18.q ## @@ -0,0 +1,133 @@ +--! qt:dataset:tpch_0_001.customer +--! qt:dataset:tpch_0_001.lineitem +--! qt:dataset:tpch_0_001.nation +--! qt:dataset:tpch_0_001.orders +--! qt:dataset:tpch_0_001.part +--! qt:dataset:tpch_0_001.partsupp +--! qt:dataset:tpch_0_001.region +--! qt:dataset:tpch_0_001.supplier + + +use tpch_0_001; + +set hive.transpose.aggr.join=true; +set hive.transpose.aggr.join.unique=true; +set hive.mapred.mode=nonstrict; + +create view q18_tmp_cached as +select + l_orderkey, + sum(l_quantity) as t_sum_quantity +from + lineitem +where + l_orderkey is not null +group by + l_orderkey; + + + +explain cbo select +c_name, +c_custkey, +o_orderkey, +o_orderdate, +o_totalprice, +sum(l_quantity) +from + customer, + orders, + q18_tmp_cached t, + lineitem l +where +c_custkey = o_custkey +and o_orderkey = t.l_orderkey +and o_orderkey is not null +and t.t_sum_quantity > 300 +and o_orderkey = l.l_orderkey +and l.l_orderkey is not null +group by +c_name, +c_custkey, +o_orderkey, +o_orderdate, +o_totalprice +order by +o_totalprice desc, +o_orderdate +limit 100; + + + +select 'add constraints'; + +alter table orders add constraint pk_o primary key (o_orderkey) disable novalidate rely; +alter table customer add constraint pk_c primary key (c_custkey) disable novalidate rely; + Review comment: I've added both constraints - it only removed the IS NOT NULL filter it seems to me that 1 of the sum() is used as an output and the other is being used to filter by >300 - so both of them is being "used" ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveAggregateJoinTransposeRule.java ## @@ -303,6 +305,90 @@ public void onMatch(RelOptRuleCall call) { } } + /** + * Determines weather the give grouping is unique. + * + * Consider a join which might produce non-unique rows; but later the results are aggregated again. + * This method determines if there are sufficient columns in the grouping which have been present previously as unique column(s). + */ + private boolean isGroupingUnique(RelNode input, ImmutableBitSet groups) { +if (groups.isEmpty()) { + return false; +} +RelMetadataQuery mq = input.getCluster().getMetadataQuery(); +Set uKeys = mq.getUniqueKeys(input); +for (ImmutableBitSet u : uKeys) { + if (groups.contains(u)) { +return true; + } +} +if (input instanceof Join) { + Join join = (Join) input; + RexBuilder rexBuilder = input.getCluster().getRexBuilder(); + SimpleConditionInfo cond = new SimpleConditionInfo(join.getCondition(), rexBuilder); + + if (cond.valid) { +ImmutableBitSet newGroup = groups.intersect(ImmutableBitSet.fromBitSet(cond.fields)); +RelNode l = join.getLeft(); +RelNode r = join.getRight(); + +int joinFieldCount = join.getRowType().getFieldCount(); +int lFieldCount = l.getRowType().getFieldCount(); + +ImmutableBitSet groupL = newGroup.get(0, lFieldCount); +ImmutableBitSet groupR = newGroup.get(lFieldCount, joinFieldCount).shift(-lFieldCount); + +if (isGroupingUnique(l, groupL)) { Review comment: That could be done; and I'm sure it was true in this - but this logic will work better if it could walk down as many joins as it could - we might have an aggregate on top in the meantime a bunch of joins under it...so I feel that it will be beneficial to retain it. I feeled tempted to write a RelMd handler - however I don't think I could just introduce a new one easily. RelShuttle doesn't look like a good match - I'll leave it as a set of `instanceof` calls for now. I'll upload a new patch to see if digging deeper in the tree could do more or not. ## File path: ql/src/test/queries/clientpositive/tpch18.q ## @@ -0,0 +1,133 @@ +--! qt:dataset:tpch_0_001.customer +--! qt:dataset:tpch_0_001.lineitem +--! qt:dataset:tpch_0_001.nation +--! qt:dataset:tpch_0_001.orders +--! qt:dataset:tpch_0_001.part +--! qt:dataset:tpch_0_001.partsupp +--! qt:dataset:tpch_0_001.region +--! qt:dataset:tpch_0_001.supplier + + +use tpch_0_001; + +set hive.transpose.aggr.join=true; +set hive.transpose.aggr.join.unique=true; +set hive.mapred.mode=nonstrict; + +create view q18_tmp_cached as +select + l_
[jira] [Work logged] (HIVE-24143) Include convention in JDBC converter operator in Calcite plan
[ https://issues.apache.org/jira/browse/HIVE-24143?focusedWorklogId=483151&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483151 ] ASF GitHub Bot logged work on HIVE-24143: - Author: ASF GitHub Bot Created on: 12/Sep/20 21:21 Start Date: 12/Sep/20 21:21 Worklog Time Spent: 10m Work Description: jcamachor commented on a change in pull request #1482: URL: https://github.com/apache/hive/pull/1482#discussion_r487299089 ## File path: ql/src/test/queries/clientpositive/external_jdbc_table2.q ## @@ -7,43 +6,43 @@ CREATE TEMPORARY FUNCTION dboutput AS 'org.apache.hadoop.hive.contrib.genericudf FROM src SELECT -dboutput ('jdbc:derby:;databaseName=${system:test.tmp.dir}/test_derby_auth1;create=true','user1','passwd1', +dboutput ('jdbc:derby:;databaseName=${system:test.tmp.dir}/test_derby_auth1_1;create=true','user1','passwd1', Review comment: I changed it to avoid a clash with other test in the temp directory, which I believe was causing HIVE-23910. ## File path: ql/src/test/results/clientpositive/llap/external_jdbc_table_perf.q.out ## @@ -6115,32 +6112,39 @@ WHERE "d_year" = 1999 AND "d_moy" BETWEEN 1 AND 3 AND "d_date_sk" IS NOT NULL) A Reduce Operator Tree: Merge Join Operator condition map: - Anti Join 0 to 1 + Left Outer Join 0 to 1 Review comment: This change is not actually related to this patch. Note that the test was disabled by default (HIVE-23910); it seems maybe a preliminary version of HIVE-23716 changed these q files and it should have not. ## File path: ql/src/test/queries/clientpositive/external_jdbc_table2.q ## @@ -7,43 +6,43 @@ CREATE TEMPORARY FUNCTION dboutput AS 'org.apache.hadoop.hive.contrib.genericudf FROM src SELECT -dboutput ('jdbc:derby:;databaseName=${system:test.tmp.dir}/test_derby_auth1;create=true','user1','passwd1', +dboutput ('jdbc:derby:;databaseName=${system:test.tmp.dir}/test_derby_auth1_1;create=true','user1','passwd1', Review comment: I changed it to avoid a clash with other test in the temp directory, which I believe was causing HIVE-23910. ## File path: ql/src/test/results/clientpositive/llap/external_jdbc_table_perf.q.out ## @@ -6115,32 +6112,39 @@ WHERE "d_year" = 1999 AND "d_moy" BETWEEN 1 AND 3 AND "d_date_sk" IS NOT NULL) A Reduce Operator Tree: Merge Join Operator condition map: - Anti Join 0 to 1 + Left Outer Join 0 to 1 Review comment: This change is not actually related to this patch. Note that the test was disabled by default (HIVE-23910); it seems maybe a preliminary version of HIVE-23716 changed these q files and it should have not. ## File path: ql/src/test/queries/clientpositive/external_jdbc_table2.q ## @@ -7,43 +6,43 @@ CREATE TEMPORARY FUNCTION dboutput AS 'org.apache.hadoop.hive.contrib.genericudf FROM src SELECT -dboutput ('jdbc:derby:;databaseName=${system:test.tmp.dir}/test_derby_auth1;create=true','user1','passwd1', +dboutput ('jdbc:derby:;databaseName=${system:test.tmp.dir}/test_derby_auth1_1;create=true','user1','passwd1', Review comment: I changed it to avoid a clash with other test in the temp directory, which I believe was causing HIVE-23910. ## File path: ql/src/test/results/clientpositive/llap/external_jdbc_table_perf.q.out ## @@ -6115,32 +6112,39 @@ WHERE "d_year" = 1999 AND "d_moy" BETWEEN 1 AND 3 AND "d_date_sk" IS NOT NULL) A Reduce Operator Tree: Merge Join Operator condition map: - Anti Join 0 to 1 + Left Outer Join 0 to 1 Review comment: This change is not actually related to this patch. Note that the test was disabled by default (HIVE-23910); it seems maybe a preliminary version of HIVE-23716 changed these q files and it should have not. ## File path: ql/src/test/queries/clientpositive/external_jdbc_table2.q ## @@ -7,43 +6,43 @@ CREATE TEMPORARY FUNCTION dboutput AS 'org.apache.hadoop.hive.contrib.genericudf FROM src SELECT -dboutput ('jdbc:derby:;databaseName=${system:test.tmp.dir}/test_derby_auth1;create=true','user1','passwd1', +dboutput ('jdbc:derby:;databaseName=${system:test.tmp.dir}/test_derby_auth1_1;create=true','user1','passwd1', Review comment: I changed it to avoid a clash with other test in the temp directory, which I believe was causing HIVE-23910. ## File path: ql/src/test/results/clientpositive/llap/external_jdbc_table_perf.q.out ## @@ -6115,32 +6112,39 @@ WHERE "d_year" = 1999 AND "d_moy" BETWEEN 1 AND 3 AND "d_date_sk" IS NOT NULL) A Reduce Operator Tree: Merge Join Operator
[jira] [Work logged] (HIVE-24145) Fix preemption issues in reducers and file sink operators
[ https://issues.apache.org/jira/browse/HIVE-24145?focusedWorklogId=483190&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483190 ] ASF GitHub Bot logged work on HIVE-24145: - Author: ASF GitHub Bot Created on: 12/Sep/20 21:31 Start Date: 12/Sep/20 21:31 Worklog Time Spent: 10m Work Description: ramesh0201 opened a new pull request #1485: URL: https://github.com/apache/hive/pull/1485 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 483190) Time Spent: 1h (was: 50m) > Fix preemption issues in reducers and file sink operators > - > > Key: HIVE-24145 > URL: https://issues.apache.org/jira/browse/HIVE-24145 > Project: Hive > Issue Type: Bug >Reporter: Ramesh Kumar Thangarajan >Assignee: Ramesh Kumar Thangarajan >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > There are two issues because of preemption: > # Reducers are getting reordered as part of optimizations because of which > more preemption happen > # Preemption in the middle of writing can cause the file to not close and > lead to errors when we read the file later -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24149) HiveStreamingConnection doesn't close HMS connection
[ https://issues.apache.org/jira/browse/HIVE-24149?focusedWorklogId=483197&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483197 ] ASF GitHub Bot logged work on HIVE-24149: - Author: ASF GitHub Bot Created on: 12/Sep/20 21:31 Start Date: 12/Sep/20 21:31 Worklog Time Spent: 10m Work Description: zeroflag opened a new pull request #1488: URL: https://github.com/apache/hive/pull/1488 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 483197) Time Spent: 0.5h (was: 20m) > HiveStreamingConnection doesn't close HMS connection > > > Key: HIVE-24149 > URL: https://issues.apache.org/jira/browse/HIVE-24149 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > There 3 HMS connections used by HiveStreamingConnection. One for TX one for > hearbeat and for notifications. The close method only closes the first 2 > leaving the last one open which eventually overloads HMS and it becomes > unresponsive. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24151) MultiDelimitSerDe shifts data if strings contain non-ASCII characters
[ https://issues.apache.org/jira/browse/HIVE-24151?focusedWorklogId=483199&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483199 ] ASF GitHub Bot logged work on HIVE-24151: - Author: ASF GitHub Bot Created on: 12/Sep/20 21:31 Start Date: 12/Sep/20 21:31 Worklog Time Spent: 10m Work Description: szlta edited a comment on pull request #1490: URL: https://github.com/apache/hive/pull/1490#issuecomment-691122063 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 483199) Time Spent: 1.5h (was: 1h 20m) > MultiDelimitSerDe shifts data if strings contain non-ASCII characters > - > > Key: HIVE-24151 > URL: https://issues.apache.org/jira/browse/HIVE-24151 > Project: Hive > Issue Type: Bug >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > HIVE-22360 intended to fix another MultiDelimitSerde problem (with NULL last > columns) but introduced a regression: the approach of the fix is pretty much > all wrong, as the existing logic that operated on bytes got replaced by regex > matcher logic which deals in character positions, rather than byte positions. > As some non ASCII characters consist of more than 1 byte, the whole record > may get shifted due to this. > With this ticket I'm going to restore the old logic, and apply the proper fix > on that, but keeping (and extending) the test cases added with HIVE-22360 so > that we have a solution for both issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24035) Add Jenkinsfile for branch-2.3
[ https://issues.apache.org/jira/browse/HIVE-24035?focusedWorklogId=483211&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483211 ] ASF GitHub Bot logged work on HIVE-24035: - Author: ASF GitHub Bot Created on: 12/Sep/20 21:32 Start Date: 12/Sep/20 21:32 Worklog Time Spent: 10m Work Description: sunchao commented on pull request #1398: URL: https://github.com/apache/hive/pull/1398#issuecomment-691243264 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 483211) Time Spent: 2h 50m (was: 2h 40m) > Add Jenkinsfile for branch-2.3 > -- > > Key: HIVE-24035 > URL: https://issues.apache.org/jira/browse/HIVE-24035 > Project: Hive > Issue Type: Test >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > > To enable precommit tests for github PR, we need to have a Jenkinsfile in the > repo. This is already done for master and branch-2. This adds the same for > branch-2.3 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24143) Include convention in JDBC converter operator in Calcite plan
[ https://issues.apache.org/jira/browse/HIVE-24143?focusedWorklogId=483227&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483227 ] ASF GitHub Bot logged work on HIVE-24143: - Author: ASF GitHub Bot Created on: 12/Sep/20 21:35 Start Date: 12/Sep/20 21:35 Worklog Time Spent: 10m Work Description: vineetgarg02 commented on a change in pull request #1482: URL: https://github.com/apache/hive/pull/1482#discussion_r487292490 ## File path: ql/src/test/results/clientpositive/llap/external_jdbc_table_perf.q.out ## @@ -6115,32 +6112,39 @@ WHERE "d_year" = 1999 AND "d_moy" BETWEEN 1 AND 3 AND "d_date_sk" IS NOT NULL) A Reduce Operator Tree: Merge Join Operator condition map: - Anti Join 0 to 1 + Left Outer Join 0 to 1 Review comment: Also not sure what caused this change in plan. ## File path: ql/src/test/queries/clientpositive/external_jdbc_table2.q ## @@ -7,43 +6,43 @@ CREATE TEMPORARY FUNCTION dboutput AS 'org.apache.hadoop.hive.contrib.genericudf FROM src SELECT -dboutput ('jdbc:derby:;databaseName=${system:test.tmp.dir}/test_derby_auth1;create=true','user1','passwd1', +dboutput ('jdbc:derby:;databaseName=${system:test.tmp.dir}/test_derby_auth1_1;create=true','user1','passwd1', Review comment: Curious as to why this changed? ## File path: ql/src/test/results/clientpositive/llap/external_jdbc_table_perf.q.out ## @@ -6115,32 +6112,39 @@ WHERE "d_year" = 1999 AND "d_moy" BETWEEN 1 AND 3 AND "d_date_sk" IS NOT NULL) A Reduce Operator Tree: Merge Join Operator condition map: - Anti Join 0 to 1 + Left Outer Join 0 to 1 Review comment: Got it, make sense ## File path: ql/src/test/results/clientpositive/llap/external_jdbc_table_perf.q.out ## @@ -6115,32 +6112,39 @@ WHERE "d_year" = 1999 AND "d_moy" BETWEEN 1 AND 3 AND "d_date_sk" IS NOT NULL) A Reduce Operator Tree: Merge Join Operator condition map: - Anti Join 0 to 1 + Left Outer Join 0 to 1 Review comment: Also not sure what caused this change in plan. ## File path: ql/src/test/queries/clientpositive/external_jdbc_table2.q ## @@ -7,43 +6,43 @@ CREATE TEMPORARY FUNCTION dboutput AS 'org.apache.hadoop.hive.contrib.genericudf FROM src SELECT -dboutput ('jdbc:derby:;databaseName=${system:test.tmp.dir}/test_derby_auth1;create=true','user1','passwd1', +dboutput ('jdbc:derby:;databaseName=${system:test.tmp.dir}/test_derby_auth1_1;create=true','user1','passwd1', Review comment: Curious as to why this changed? ## File path: ql/src/test/results/clientpositive/llap/external_jdbc_table_perf.q.out ## @@ -6115,32 +6112,39 @@ WHERE "d_year" = 1999 AND "d_moy" BETWEEN 1 AND 3 AND "d_date_sk" IS NOT NULL) A Reduce Operator Tree: Merge Join Operator condition map: - Anti Join 0 to 1 + Left Outer Join 0 to 1 Review comment: Got it, make sense ## File path: ql/src/test/results/clientpositive/llap/external_jdbc_table_perf.q.out ## @@ -6115,32 +6112,39 @@ WHERE "d_year" = 1999 AND "d_moy" BETWEEN 1 AND 3 AND "d_date_sk" IS NOT NULL) A Reduce Operator Tree: Merge Join Operator condition map: - Anti Join 0 to 1 + Left Outer Join 0 to 1 Review comment: Also not sure what caused this change in plan. ## File path: ql/src/test/queries/clientpositive/external_jdbc_table2.q ## @@ -7,43 +6,43 @@ CREATE TEMPORARY FUNCTION dboutput AS 'org.apache.hadoop.hive.contrib.genericudf FROM src SELECT -dboutput ('jdbc:derby:;databaseName=${system:test.tmp.dir}/test_derby_auth1;create=true','user1','passwd1', +dboutput ('jdbc:derby:;databaseName=${system:test.tmp.dir}/test_derby_auth1_1;create=true','user1','passwd1', Review comment: Curious as to why this changed? ## File path: ql/src/test/results/clientpositive/llap/external_jdbc_table_perf.q.out ## @@ -6115,32 +6112,39 @@ WHERE "d_year" = 1999 AND "d_moy" BETWEEN 1 AND 3 AND "d_date_sk" IS NOT NULL) A Reduce Operator Tree: Merge Join Operator condition map: - Anti Join 0 to 1 + Left Outer Join 0 to 1 Review comment: Got it, make sense ## File path: ql/src/test/results/clientpositive/llap/external_jdbc_table_perf.q.out ## @@ -6115,32 +6112,39 @@ WHERE "d_year" = 1999 AND "d_moy" BETWEEN 1 AND 3 AND "d_date_sk" IS NOT NULL) A Reduce Operator
[jira] [Work logged] (HIVE-24127) Dump events from default catalog only
[ https://issues.apache.org/jira/browse/HIVE-24127?focusedWorklogId=483241&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483241 ] ASF GitHub Bot logged work on HIVE-24127: - Author: ASF GitHub Bot Created on: 12/Sep/20 21:37 Start Date: 12/Sep/20 21:37 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1478: URL: https://github.com/apache/hive/pull/1478#discussion_r486826515 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/messaging/event/filters/CatalogFilter.java ## @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.metastore.messaging.event.filters; + +import org.apache.hadoop.hive.metastore.api.NotificationEvent; + +/** + * Utility function that constructs a notification filter to match a given catalog name. + */ +public class CatalogFilter extends BasicFilter { + private final String catalogName; + + public CatalogFilter(final String catalogName) { +this.catalogName = catalogName; + } + + @Override + boolean shouldAccept(final NotificationEvent event) { +if (catalogName == null || event.getCatName() == null || catalogName.equalsIgnoreCase(event.getCatName())) { Review comment: catalogName should never be null. If not configured also, it must be default which is "hive". I think we should let it fail if filter is initialized with catalogName being null . ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/messaging/event/filters/CatalogFilter.java ## @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.metastore.messaging.event.filters; + +import org.apache.hadoop.hive.metastore.api.NotificationEvent; + +/** + * Utility function that constructs a notification filter to match a given catalog name. + */ +public class CatalogFilter extends BasicFilter { + private final String catalogName; + + public CatalogFilter(final String catalogName) { +this.catalogName = catalogName; + } + + @Override + boolean shouldAccept(final NotificationEvent event) { +if (catalogName == null || event.getCatName() == null || catalogName.equalsIgnoreCase(event.getCatName())) { Review comment: catalogName should never be null. If not configured also, it must be default which is "hive". I think we should let it fail if filter is initialized with catalogName being null . ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/messaging/event/filters/CatalogFilter.java ## @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, +
[jira] [Work logged] (HIVE-24144) getIdentifierQuoteString in HiveDatabaseMetaData returns incorrect value
[ https://issues.apache.org/jira/browse/HIVE-24144?focusedWorklogId=483280&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483280 ] ASF GitHub Bot logged work on HIVE-24144: - Author: ASF GitHub Bot Created on: 12/Sep/20 21:40 Start Date: 12/Sep/20 21:40 Worklog Time Spent: 10m Work Description: jcamachor opened a new pull request #1487: URL: https://github.com/apache/hive/pull/1487 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 483280) Time Spent: 20m (was: 10m) > getIdentifierQuoteString in HiveDatabaseMetaData returns incorrect value > > > Key: HIVE-24144 > URL: https://issues.apache.org/jira/browse/HIVE-24144 > Project: Hive > Issue Type: Bug > Components: JDBC, JDBC storage handler >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > {code} > public String getIdentifierQuoteString() throws SQLException { > return " "; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24097) correct NPE exception in HiveMetastoreAuthorizer
[ https://issues.apache.org/jira/browse/HIVE-24097?focusedWorklogId=483282&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483282 ] ASF GitHub Bot logged work on HIVE-24097: - Author: ASF GitHub Bot Created on: 12/Sep/20 21:43 Start Date: 12/Sep/20 21:43 Worklog Time Spent: 10m Work Description: nrg4878 commented on pull request #1448: URL: https://github.com/apache/hive/pull/1448#issuecomment-691254659 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 483282) Time Spent: 40m (was: 0.5h) > correct NPE exception in HiveMetastoreAuthorizer > > > Key: HIVE-24097 > URL: https://issues.apache.org/jira/browse/HIVE-24097 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0 >Reporter: Sam An >Assignee: Sam An >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > In some testing, we found it's possible to have NPE if the preEventType does > not fall within the several the HMS currently checks. This makes the > AuthzContext a null pointer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24084) Push Aggregates thru joins in case it re-groups previously unique columns
[ https://issues.apache.org/jira/browse/HIVE-24084?focusedWorklogId=483311&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483311 ] ASF GitHub Bot logged work on HIVE-24084: - Author: ASF GitHub Bot Created on: 12/Sep/20 21:45 Start Date: 12/Sep/20 21:45 Worklog Time Spent: 10m Work Description: jcamachor commented on a change in pull request #1439: URL: https://github.com/apache/hive/pull/1439#discussion_r487130820 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveAggregateJoinTransposeRule.java ## @@ -303,6 +305,90 @@ public void onMatch(RelOptRuleCall call) { } } + /** + * Determines weather the give grouping is unique. + * + * Consider a join which might produce non-unique rows; but later the results are aggregated again. + * This method determines if there are sufficient columns in the grouping which have been present previously as unique column(s). + */ + private boolean isGroupingUnique(RelNode input, ImmutableBitSet groups) { +if (groups.isEmpty()) { + return false; +} +RelMetadataQuery mq = input.getCluster().getMetadataQuery(); +Set uKeys = mq.getUniqueKeys(input); +for (ImmutableBitSet u : uKeys) { + if (groups.contains(u)) { +return true; + } +} +if (input instanceof Join) { + Join join = (Join) input; + RexBuilder rexBuilder = input.getCluster().getRexBuilder(); + SimpleConditionInfo cond = new SimpleConditionInfo(join.getCondition(), rexBuilder); + + if (cond.valid) { +ImmutableBitSet newGroup = groups.intersect(ImmutableBitSet.fromBitSet(cond.fields)); +RelNode l = join.getLeft(); +RelNode r = join.getRight(); + +int joinFieldCount = join.getRowType().getFieldCount(); +int lFieldCount = l.getRowType().getFieldCount(); + +ImmutableBitSet groupL = newGroup.get(0, lFieldCount); +ImmutableBitSet groupR = newGroup.get(lFieldCount, joinFieldCount).shift(-lFieldCount); + +if (isGroupingUnique(l, groupL)) { Review comment: OK. If it turns out there are many changes and it may need some time to be fixed, feel free to defer to follow-up JIRA and let's merge this one. ## File path: ql/src/test/queries/clientpositive/tpch18.q ## @@ -0,0 +1,133 @@ +--! qt:dataset:tpch_0_001.customer +--! qt:dataset:tpch_0_001.lineitem +--! qt:dataset:tpch_0_001.nation +--! qt:dataset:tpch_0_001.orders +--! qt:dataset:tpch_0_001.part +--! qt:dataset:tpch_0_001.partsupp +--! qt:dataset:tpch_0_001.region +--! qt:dataset:tpch_0_001.supplier + + +use tpch_0_001; + +set hive.transpose.aggr.join=true; +set hive.transpose.aggr.join.unique=true; +set hive.mapred.mode=nonstrict; + +create view q18_tmp_cached as +select + l_orderkey, + sum(l_quantity) as t_sum_quantity +from + lineitem +where + l_orderkey is not null +group by + l_orderkey; + + + +explain cbo select +c_name, +c_custkey, +o_orderkey, +o_orderdate, +o_totalprice, +sum(l_quantity) +from + customer, + orders, + q18_tmp_cached t, + lineitem l +where +c_custkey = o_custkey +and o_orderkey = t.l_orderkey +and o_orderkey is not null +and t.t_sum_quantity > 300 +and o_orderkey = l.l_orderkey +and l.l_orderkey is not null +group by +c_name, +c_custkey, +o_orderkey, +o_orderdate, +o_totalprice +order by +o_totalprice desc, +o_orderdate +limit 100; + + + +select 'add constraints'; + +alter table orders add constraint pk_o primary key (o_orderkey) disable novalidate rely; +alter table customer add constraint pk_c primary key (c_custkey) disable novalidate rely; + Review comment: Thanks @kgyrtkirk . This seems to need further exploration, we thought https://issues.apache.org/jira/browse/HIVE-24087 was going to help here. @vineetgarg02 , could you take a look at this once this patch is merged? Maybe the shape of the plan is slightly different to the one we anticipated. ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveAggregateJoinTransposeRule.java ## @@ -303,6 +305,90 @@ public void onMatch(RelOptRuleCall call) { } } + /** + * Determines weather the give grouping is unique. + * + * Consider a join which might produce non-unique rows; but later the results are aggregated again. + * This method determines if there are sufficient columns in the grouping which have been present previously as unique column(s). + */ + private boolean isGroupingUnique(RelNode input, ImmutableBitSet groups) { +if (groups.isEmpty()) { + return false; +} +RelMetadataQuery mq = input.getCluster().getMetadataQuery(); +Set uKeys = mq.getUniqueKeys(input); +for (ImmutableBitSet u : uKeys) { + if (groups.contains(u)) { +return true; + } +} +if (inpu
[jira] [Work logged] (HIVE-24022) Optimise HiveMetaStoreAuthorizer.createHiveMetaStoreAuthorizer
[ https://issues.apache.org/jira/browse/HIVE-24022?focusedWorklogId=483324&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483324 ] ASF GitHub Bot logged work on HIVE-24022: - Author: ASF GitHub Bot Created on: 12/Sep/20 21:46 Start Date: 12/Sep/20 21:46 Worklog Time Spent: 10m Work Description: nrg4878 commented on pull request #1385: URL: https://github.com/apache/hive/pull/1385#issuecomment-691213984 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 483324) Time Spent: 1h (was: 50m) > Optimise HiveMetaStoreAuthorizer.createHiveMetaStoreAuthorizer > -- > > Key: HIVE-24022 > URL: https://issues.apache.org/jira/browse/HIVE-24022 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Sam An >Priority: Minor > Labels: performance, pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > For a table with 3000+ partitions, analyze table takes a lot longer time as > HiveMetaStoreAuthorizer tries to create HiveConf for every partition request. > > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/metastore/HiveMetaStoreAuthorizer.java#L319] > > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/metastore/HiveMetaStoreAuthorizer.java#L447] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22290) ObjectStore.cleanWriteNotificationEvents and ObjectStore.cleanupEvents OutOfMemory on large number of pending events
[ https://issues.apache.org/jira/browse/HIVE-22290?focusedWorklogId=483327&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483327 ] ASF GitHub Bot logged work on HIVE-22290: - Author: ASF GitHub Bot Created on: 12/Sep/20 21:47 Start Date: 12/Sep/20 21:47 Worklog Time Spent: 10m Work Description: nareshpr opened a new pull request #1484: URL: https://github.com/apache/hive/pull/1484 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 483327) Time Spent: 0.5h (was: 20m) > ObjectStore.cleanWriteNotificationEvents and ObjectStore.cleanupEvents > OutOfMemory on large number of pending events > > > Key: HIVE-22290 > URL: https://issues.apache.org/jira/browse/HIVE-22290 > Project: Hive > Issue Type: Bug > Components: HCatalog, repl >Affects Versions: 4.0.0 >Reporter: Thomas Prelle >Assignee: Naresh P R >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > As in [https://jira.apache.org/jira/browse/HIVE-19430] if there are large > number of events that haven't been cleaned up for some reason, then > ObjectStore.cleanWriteNotificationEvents() and ObjectStore.cleanupEvents can > run out of memory while it loads all the events to be deleted. > It should fetch events in batches. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24151) MultiDelimitSerDe shifts data if strings contain non-ASCII characters
[ https://issues.apache.org/jira/browse/HIVE-24151?focusedWorklogId=483384&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483384 ] ASF GitHub Bot logged work on HIVE-24151: - Author: ASF GitHub Bot Created on: 12/Sep/20 21:51 Start Date: 12/Sep/20 21:51 Worklog Time Spent: 10m Work Description: szlta commented on pull request #1490: URL: https://github.com/apache/hive/pull/1490#issuecomment-691122063 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 483384) Time Spent: 1h 40m (was: 1.5h) > MultiDelimitSerDe shifts data if strings contain non-ASCII characters > - > > Key: HIVE-24151 > URL: https://issues.apache.org/jira/browse/HIVE-24151 > Project: Hive > Issue Type: Bug >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > HIVE-22360 intended to fix another MultiDelimitSerde problem (with NULL last > columns) but introduced a regression: the approach of the fix is pretty much > all wrong, as the existing logic that operated on bytes got replaced by regex > matcher logic which deals in character positions, rather than byte positions. > As some non ASCII characters consist of more than 1 byte, the whole record > may get shifted due to this. > With this ticket I'm going to restore the old logic, and apply the proper fix > on that, but keeping (and extending) the test cases added with HIVE-22360 so > that we have a solution for both issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23841) Field writers is an HashSet, i.e., not thread-safe. Field writers is typically protected by synchronization on lock, but not in 1 location.
[ https://issues.apache.org/jira/browse/HIVE-23841?focusedWorklogId=483426&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483426 ] ASF GitHub Bot logged work on HIVE-23841: - Author: ASF GitHub Bot Created on: 12/Sep/20 21:54 Start Date: 12/Sep/20 21:54 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #1248: URL: https://github.com/apache/hive/pull/1248#issuecomment-691367907 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 483426) Time Spent: 1h (was: 50m) > Field writers is an HashSet, i.e., not thread-safe. Field writers is > typically protected by synchronization on lock, but not in 1 location. > > > Key: HIVE-23841 > URL: https://issues.apache.org/jira/browse/HIVE-23841 > Project: Hive > Issue Type: Bug > Environment: Any environment >Reporter: Adrian Nistor >Priority: Major > Labels: patch-available, pull-request-available > Attachments: HIVE-23841.patch > > Time Spent: 1h > Remaining Estimate: 0h > > I also submitted a pull request on github at: > > [https://github.com/apache/hive/pull/1248] > > (same patch) > h1. Description > > Field {{writers}} is a {{HashSet}} ([line > 70|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L70]), > i.e., not thread-safe. > Accesses to field {{writers}} are protected by synchronization on {{lock}}, > e.g., at lines: > [141-144|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L141-L144], > > [212-213|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L212-L213], > and > [212-215|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L212-L215]. > However, the {{writers.remove()}} at [line > 249|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L249] > is protected by synchronization on {{INSTANCE}}, *not* on {{lock}}. > Synchronizing on 2 different objects does not ensure mutual exclusion. This > is because 2 threads synchronizing on different objects can still execute in > parallel at the same time. > Note that lines > [215|https://github.com/apache/hive/blob/c93d7797329103d6c509bada68b6da7f907b3dee/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L215] > and > [249|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/llap/LlapOutputFormatService.java#L249] > are modifying {{writers}} with {{put()}} and {{remove()}}, respectively. > h1. The Code for This Fix > This fix is very simple: just change {{synchronized (INSTANCE)}} to > {{synchronized (lock)}}, just like the methods containing the other lines > listed above.[] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24143) Include convention in JDBC converter operator in Calcite plan
[ https://issues.apache.org/jira/browse/HIVE-24143?focusedWorklogId=483430&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483430 ] ASF GitHub Bot logged work on HIVE-24143: - Author: ASF GitHub Bot Created on: 12/Sep/20 21:55 Start Date: 12/Sep/20 21:55 Worklog Time Spent: 10m Work Description: jcamachor merged pull request #1482: URL: https://github.com/apache/hive/pull/1482 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 483430) Time Spent: 2h 40m (was: 2.5h) > Include convention in JDBC converter operator in Calcite plan > - > > Key: HIVE-24143 > URL: https://issues.apache.org/jira/browse/HIVE-24143 > Project: Hive > Issue Type: Improvement > Components: CBO >Affects Versions: 4.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > Among others, it will be useful to debug the dialect being chosen for query > generation. For instance: > {code} > HiveProject(jdbc_type_conversion_table1.ikey=[$0], > jdbc_type_conversion_table1.bkey=[$1], jdbc_type_conversion_table1.fkey=[$2], > jdbc_type_conversion_table1.dkey=[$3], > jdbc_type_conversion_table1.chkey=[$4], > jdbc_type_conversion_table1.dekey=[$5], > jdbc_type_conversion_table1.dtkey=[$6], jdbc_type_conversion_table1.tkey=[$7]) > HiveProject(ikey=[$0], bkey=[$1], fkey=[$2], dkey=[$3], chkey=[$4], > dekey=[$5], dtkey=[$6], tkey=[$7]) > ->HiveJdbcConverter(convention=[JDBC.DERBY]) > JdbcHiveTableScan(table=[[default, jdbc_type_conversion_table1]], > table:alias=[jdbc_type_conversion_table1]) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24143) Include convention in JDBC converter operator in Calcite plan
[ https://issues.apache.org/jira/browse/HIVE-24143?focusedWorklogId=483441&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483441 ] ASF GitHub Bot logged work on HIVE-24143: - Author: ASF GitHub Bot Created on: 12/Sep/20 21:56 Start Date: 12/Sep/20 21:56 Worklog Time Spent: 10m Work Description: vineetgarg02 commented on pull request #1482: URL: https://github.com/apache/hive/pull/1482#issuecomment-691326259 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 483441) Time Spent: 2h 50m (was: 2h 40m) > Include convention in JDBC converter operator in Calcite plan > - > > Key: HIVE-24143 > URL: https://issues.apache.org/jira/browse/HIVE-24143 > Project: Hive > Issue Type: Improvement > Components: CBO >Affects Versions: 4.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Among others, it will be useful to debug the dialect being chosen for query > generation. For instance: > {code} > HiveProject(jdbc_type_conversion_table1.ikey=[$0], > jdbc_type_conversion_table1.bkey=[$1], jdbc_type_conversion_table1.fkey=[$2], > jdbc_type_conversion_table1.dkey=[$3], > jdbc_type_conversion_table1.chkey=[$4], > jdbc_type_conversion_table1.dekey=[$5], > jdbc_type_conversion_table1.dtkey=[$6], jdbc_type_conversion_table1.tkey=[$7]) > HiveProject(ikey=[$0], bkey=[$1], fkey=[$2], dkey=[$3], chkey=[$4], > dekey=[$5], dtkey=[$6], tkey=[$7]) > ->HiveJdbcConverter(convention=[JDBC.DERBY]) > JdbcHiveTableScan(table=[[default, jdbc_type_conversion_table1]], > table:alias=[jdbc_type_conversion_table1]) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24147) Table column names are not extracted correctly in Hive JDBC storage handler
[ https://issues.apache.org/jira/browse/HIVE-24147?focusedWorklogId=483459&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483459 ] ASF GitHub Bot logged work on HIVE-24147: - Author: ASF GitHub Bot Created on: 12/Sep/20 21:57 Start Date: 12/Sep/20 21:57 Worklog Time Spent: 10m Work Description: jcamachor opened a new pull request #1486: URL: https://github.com/apache/hive/pull/1486 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 483459) Time Spent: 40m (was: 0.5h) > Table column names are not extracted correctly in Hive JDBC storage handler > --- > > Key: HIVE-24147 > URL: https://issues.apache.org/jira/browse/HIVE-24147 > Project: Hive > Issue Type: Bug > Components: JDBC storage handler >Affects Versions: 4.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > It seems the `ResultSetMetaData` for the query used to retrieve the table > columns names contains fully qualified names, instead of possibly supporting > the {{getTableName}} method. This ends up throwing the storage handler off > and leading to exceptions, both in CBO path and non-CBO path. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24150) Refactor CommitTxnRequest field order
[ https://issues.apache.org/jira/browse/HIVE-24150?focusedWorklogId=483492&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483492 ] ASF GitHub Bot logged work on HIVE-24150: - Author: ASF GitHub Bot Created on: 12/Sep/20 22:00 Start Date: 12/Sep/20 22:00 Worklog Time Spent: 10m Work Description: deniskuzZ opened a new pull request #1489: URL: https://github.com/apache/hive/pull/1489 ### What changes were proposed in this pull request? Refactor CommitTxnRequest field order (keyValue and exclWriteEnabled). ### Why are the changes needed? HIVE-24125 introduced backward incompatible change. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 483492) Time Spent: 0.5h (was: 20m) > Refactor CommitTxnRequest field order > - > > Key: HIVE-24150 > URL: https://issues.apache.org/jira/browse/HIVE-24150 > Project: Hive > Issue Type: Bug >Reporter: Denys Kuzmenko >Assignee: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Refactor CommitTxnRequest field order (keyValue and exclWriteEnabled). > HIVE-24125 introduced backward incompatible change. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24145) Fix preemption issues in reducers and file sink operators
[ https://issues.apache.org/jira/browse/HIVE-24145?focusedWorklogId=483513&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483513 ] ASF GitHub Bot logged work on HIVE-24145: - Author: ASF GitHub Bot Created on: 12/Sep/20 22:01 Start Date: 12/Sep/20 22:01 Worklog Time Spent: 10m Work Description: rbalamohan commented on a change in pull request #1485: URL: https://github.com/apache/hive/pull/1485#discussion_r486786544 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java ## @@ -216,29 +216,47 @@ public FSPaths(Path specPath, boolean isMmTable, boolean isDirectInsert, boolean } public void closeWriters(boolean abort) throws HiveException { + Exception exception = null; for (int idx = 0; idx < outWriters.length; idx++) { if (outWriters[idx] != null) { try { outWriters[idx].close(abort); updateProgress(); } catch (IOException e) { -throw new HiveException(e); +exception = e; +LOG.error("Error closing " + outWriters[idx].toString(), e); +// continue closing others } } } - try { + for (int i = 0; i < updaters.length; i++) { +if (updaters[i] != null) { + SerDeStats stats = updaters[i].getStats(); + // Ignore 0 row files except in case of insert overwrite + if (isDirectInsert && (stats.getRowCount() > 0 || isInsertOverwrite)) { +outPathsCommitted[i] = updaters[i].getUpdatedFilePath(); + } + try { +updaters[i].close(abort); + } catch (IOException e) { +exception = e; +LOG.error("Error closing " + updaters[i].toString(), e); +// continue closing others + } +} + } + // Made an attempt to close all writers. + if (exception != null) { for (int i = 0; i < updaters.length; i++) { if (updaters[i] != null) { -SerDeStats stats = updaters[i].getStats(); -// Ignore 0 row files except in case of insert overwrite -if (isDirectInsert && (stats.getRowCount() > 0 || isInsertOverwrite)) { - outPathsCommitted[i] = updaters[i].getUpdatedFilePath(); +try { + fs.delete(updaters[i].getUpdatedFilePath(), true); +} catch (IOException e) { + e.printStackTrace(); Review comment: LOG? ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java ## @@ -284,6 +285,11 @@ public Object process(Node nd, Stack stack, NodeProcessorCtx procCtx, // Create ReduceSink operator ReduceSinkOperator rsOp = getReduceSinkOp(partitionPositions, sortPositions, sortOrder, sortNullOrder, allRSCols, bucketColumns, numBuckets, fsParent, fsOp.getConf().getWriteType()); + // we have to make sure not to reorder the child operators as it might cause weird behavior in the tasks at + // the same level. when there is auto stats gather at the same level as another operation then it might + // cause unnecessary preemption. Maintaining the order here to avoid such preemption and possible errors Review comment: Plz add TEZ-3296 as ref if possible. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 483513) Time Spent: 1h 10m (was: 1h) > Fix preemption issues in reducers and file sink operators > - > > Key: HIVE-24145 > URL: https://issues.apache.org/jira/browse/HIVE-24145 > Project: Hive > Issue Type: Bug >Reporter: Ramesh Kumar Thangarajan >Assignee: Ramesh Kumar Thangarajan >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > There are two issues because of preemption: > # Reducers are getting reordered as part of optimizations because of which > more preemption happen > # Preemption in the middle of writing can cause the file to not close and > lead to errors when we read the file later -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24138) Llap external client flow is broken due to netty shading
[ https://issues.apache.org/jira/browse/HIVE-24138?focusedWorklogId=483508&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483508 ] ASF GitHub Bot logged work on HIVE-24138: - Author: ASF GitHub Bot Created on: 12/Sep/20 22:01 Start Date: 12/Sep/20 22:01 Worklog Time Spent: 10m Work Description: ayushtkn opened a new pull request #1491: URL: https://github.com/apache/hive/pull/1491 https://issues.apache.org/jira/browse/HIVE-24138 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 483508) Time Spent: 0.5h (was: 20m) > Llap external client flow is broken due to netty shading > > > Key: HIVE-24138 > URL: https://issues.apache.org/jira/browse/HIVE-24138 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Shubham Chaurasia >Assignee: Ayush Saxena >Priority: Critical > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > We shaded netty in hive-exec in - > https://issues.apache.org/jira/browse/HIVE-23073 > This breaks LLAP external client flow on LLAP daemon side - > LLAP daemon stacktrace - > {code} > 2020-09-09T18:22:13,413 INFO [TezTR-222977_4_0_0_0_0 > (497418324441977_0004_0_00_00_0)] llap.LlapOutputFormat: Returning > writer for: attempt_497418324441977_0004_0_00_00_0 > 2020-09-09T18:22:13,419 ERROR [TezTR-222977_4_0_0_0_0 > (497418324441977_0004_0_00_00_0)] tez.MapRecordSource: > java.lang.NoSuchMethodError: > org.apache.arrow.memory.BufferAllocator.buffer(I)Lorg/apache/hive/io/netty/buffer/ArrowBuf; > at > org.apache.hadoop.hive.llap.WritableByteChannelAdapter.write(WritableByteChannelAdapter.java:96) > at org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:74) > at org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:57) > at > org.apache.arrow.vector.ipc.WriteChannel.writeIntLittleEndian(WriteChannel.java:89) > at > org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:88) > at > org.apache.arrow.vector.ipc.ArrowWriter.ensureStarted(ArrowWriter.java:130) > at > org.apache.arrow.vector.ipc.ArrowWriter.writeBatch(ArrowWriter.java:102) > at > org.apache.hadoop.hive.llap.LlapArrowRecordWriter.write(LlapArrowRecordWriter.java:85) > at > org.apache.hadoop.hive.llap.LlapArrowRecordWriter.write(LlapArrowRecordWriter.java:46) > at > org.apache.hadoop.hive.ql.exec.vector.filesink.VectorFileSinkArrowOperator.process(VectorFileSinkArrowOperator.java:137) > at > org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) > at > org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158) > at > org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:809) > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:842) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:426) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java
[jira] [Work logged] (HIVE-24151) MultiDelimitSerDe shifts data if strings contain non-ASCII characters
[ https://issues.apache.org/jira/browse/HIVE-24151?focusedWorklogId=483531&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483531 ] ASF GitHub Bot logged work on HIVE-24151: - Author: ASF GitHub Bot Created on: 12/Sep/20 22:02 Start Date: 12/Sep/20 22:02 Worklog Time Spent: 10m Work Description: szlta opened a new pull request #1490: URL: https://github.com/apache/hive/pull/1490 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 483531) Time Spent: 1h 50m (was: 1h 40m) > MultiDelimitSerDe shifts data if strings contain non-ASCII characters > - > > Key: HIVE-24151 > URL: https://issues.apache.org/jira/browse/HIVE-24151 > Project: Hive > Issue Type: Bug >Reporter: Ádám Szita >Assignee: Ádám Szita >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > HIVE-22360 intended to fix another MultiDelimitSerde problem (with NULL last > columns) but introduced a regression: the approach of the fix is pretty much > all wrong, as the existing logic that operated on bytes got replaced by regex > matcher logic which deals in character positions, rather than byte positions. > As some non ASCII characters consist of more than 1 byte, the whole record > may get shifted due to this. > With this ticket I'm going to restore the old logic, and apply the proper fix > on that, but keeping (and extending) the test cases added with HIVE-22360 so > that we have a solution for both issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22782) Consolidate metastore call to fetch constraints
[ https://issues.apache.org/jira/browse/HIVE-22782?focusedWorklogId=483521&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483521 ] ASF GitHub Bot logged work on HIVE-22782: - Author: ASF GitHub Bot Created on: 12/Sep/20 22:02 Start Date: 12/Sep/20 22:02 Worklog Time Spent: 10m Work Description: sankarh commented on a change in pull request #1419: URL: https://github.com/apache/hive/pull/1419#discussion_r486978810 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java ## @@ -1155,22 +1150,19 @@ void dumpConstraintMetadata(String dbName, String tblName, Path dbRoot, Hive hiv Path constraintsRoot = new Path(dbRoot, ReplUtils.CONSTRAINTS_ROOT_DIR_NAME); Path commonConstraintsFile = new Path(constraintsRoot, ConstraintFileType.COMMON.getPrefix() + tblName); Path fkConstraintsFile = new Path(constraintsRoot, ConstraintFileType.FOREIGNKEY.getPrefix() + tblName); - List pks = hiveDb.getPrimaryKeyList(dbName, tblName); - List fks = hiveDb.getForeignKeyList(dbName, tblName); - List uks = hiveDb.getUniqueConstraintList(dbName, tblName); - List nns = hiveDb.getNotNullConstraintList(dbName, tblName); - if ((pks != null && !pks.isEmpty()) || (uks != null && !uks.isEmpty()) - || (nns != null && !nns.isEmpty())) { + SQLAllTableConstraints tableConstraints = hiveDb.getTableConstraints(dbName,tblName); + if ((tableConstraints.getPrimaryKeys() != null && !tableConstraints.getPrimaryKeys().isEmpty()) || (tableConstraints.getUniqueConstraints() != null && !tableConstraints.getUniqueConstraints().isEmpty()) Review comment: Can add utility method to check for null and empty of given list. Used multiple times. Also use local variables to reduce the code. ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -5661,184 +5663,79 @@ public void dropConstraint(String dbName, String tableName, String constraintNam } } - public List getDefaultConstraintList(String dbName, String tblName) throws HiveException, NoSuchObjectException { + public SQLAllTableConstraints getTableConstraints(String dbName, String tblName) throws HiveException, NoSuchObjectException { try { - return getMSC().getDefaultConstraints(new DefaultConstraintsRequest(getDefaultCatalog(conf), dbName, tblName)); + AllTableConstraintsRequest tableConstraintsRequest = new AllTableConstraintsRequest(); + tableConstraintsRequest.setDbName(dbName); + tableConstraintsRequest.setTblName(tblName); + tableConstraintsRequest.setCatName(getDefaultCatalog(conf)); + return getMSC().getAllTableConstraints(tableConstraintsRequest); } catch (NoSuchObjectException e) { throw e; } catch (Exception e) { throw new HiveException(e); } } - - public List getCheckConstraintList(String dbName, String tblName) throws HiveException, NoSuchObjectException { -try { - return getMSC().getCheckConstraints(new CheckConstraintsRequest(getDefaultCatalog(conf), - dbName, tblName)); -} catch (NoSuchObjectException e) { - throw e; -} catch (Exception e) { - throw new HiveException(e); -} + public TableConstraintsInfo getAllTableConstraints(String dbName, String tblName) throws HiveException { +return getTableConstraints(dbName, tblName, false, false); } - /** - * Get all primary key columns associated with the table. - * - * @param dbName Database Name - * @param tblName Table Name - * @return Primary Key associated with the table. - * @throws HiveException - */ - public PrimaryKeyInfo getPrimaryKeys(String dbName, String tblName) throws HiveException { -return getPrimaryKeys(dbName, tblName, false); + public TableConstraintsInfo getReliableAndEnableTableConstraints(String dbName, String tblName) throws HiveException { +return getTableConstraints(dbName, tblName, true, true); } - /** - * Get primary key columns associated with the table that are available for optimization. - * - * @param dbName Database Name - * @param tblName Table Name - * @return Primary Key associated with the table. - * @throws HiveException - */ - public PrimaryKeyInfo getReliablePrimaryKeys(String dbName, String tblName) throws HiveException { -return getPrimaryKeys(dbName, tblName, true); - } - - private PrimaryKeyInfo getPrimaryKeys(String dbName, String tblName, boolean onlyReliable) + private TableConstraintsInfo getTableConstraints(String dbName, String tblName, boolean reliable, boolean enable) Review comment: nit: Use "fetchReliable" and "fetchEnabled" instead of "reliable" and "enable" as it sound like flag to enable something. ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java ## @@ -116,22 +
[jira] [Work logged] (HIVE-23413) Create a new config to skip all locks
[ https://issues.apache.org/jira/browse/HIVE-23413?focusedWorklogId=483545&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483545 ] ASF GitHub Bot logged work on HIVE-23413: - Author: ASF GitHub Bot Created on: 12/Sep/20 22:03 Start Date: 12/Sep/20 22:03 Worklog Time Spent: 10m Work Description: pvary commented on pull request #1220: URL: https://github.com/apache/hive/pull/1220#issuecomment-690931518 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 483545) Time Spent: 2h (was: 1h 50m) > Create a new config to skip all locks > - > > Key: HIVE-23413 > URL: https://issues.apache.org/jira/browse/HIVE-23413 > Project: Hive > Issue Type: Improvement >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23413.1.patch, HIVE-23413.2.patch > > Time Spent: 2h > Remaining Estimate: 0h > > From time-to-time some query is blocked on locks which should not. > To have a quick workaround for this we should have a config which the user > can set in the session to disable acquiring/checking locks, so we can provide > it immediately and then later investigate and fix the root cause. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23413) Create a new config to skip all locks
[ https://issues.apache.org/jira/browse/HIVE-23413?focusedWorklogId=483581&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483581 ] ASF GitHub Bot logged work on HIVE-23413: - Author: ASF GitHub Bot Created on: 12/Sep/20 22:06 Start Date: 12/Sep/20 22:06 Worklog Time Spent: 10m Work Description: pvargacl commented on pull request #1220: URL: https://github.com/apache/hive/pull/1220#issuecomment-690929320 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 483581) Time Spent: 2h 10m (was: 2h) > Create a new config to skip all locks > - > > Key: HIVE-23413 > URL: https://issues.apache.org/jira/browse/HIVE-23413 > Project: Hive > Issue Type: Improvement >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23413.1.patch, HIVE-23413.2.patch > > Time Spent: 2h 10m > Remaining Estimate: 0h > > From time-to-time some query is blocked on locks which should not. > To have a quick workaround for this we should have a config which the user > can set in the session to disable acquiring/checking locks, so we can provide > it immediately and then later investigate and fix the root cause. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23665) Rewrite last_value to first_value to enable streaming results
[ https://issues.apache.org/jira/browse/HIVE-23665?focusedWorklogId=483601&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483601 ] ASF GitHub Bot logged work on HIVE-23665: - Author: ASF GitHub Bot Created on: 13/Sep/20 00:48 Start Date: 13/Sep/20 00:48 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #1177: URL: https://github.com/apache/hive/pull/1177 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 483601) Time Spent: 1.5h (was: 1h 20m) > Rewrite last_value to first_value to enable streaming results > - > > Key: HIVE-23665 > URL: https://issues.apache.org/jira/browse/HIVE-23665 > Project: Hive > Issue Type: Bug >Reporter: Ramesh Kumar Thangarajan >Assignee: Ramesh Kumar Thangarajan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-23665.1.patch, HIVE-23665.2.patch, > HIVE-23665.3.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > Rewrite last_value to first_value to enable streaming results > last_value cannot be streamed because the intermediate results need to be > buffered to determine the window result till we get the last row in the > window. But if we can rewrite to first_value we can stream the results, > although the order of results will not be guaranteed (also not important) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23779) BasicStatsTask Info is not getting printed in beeline console
[ https://issues.apache.org/jira/browse/HIVE-23779?focusedWorklogId=483599&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483599 ] ASF GitHub Bot logged work on HIVE-23779: - Author: ASF GitHub Bot Created on: 13/Sep/20 00:48 Start Date: 13/Sep/20 00:48 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #1191: URL: https://github.com/apache/hive/pull/1191 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 483599) Time Spent: 1h (was: 50m) > BasicStatsTask Info is not getting printed in beeline console > - > > Key: HIVE-23779 > URL: https://issues.apache.org/jira/browse/HIVE-23779 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > After HIVE-16061, partition basic stats are not getting printed in beeline > console. > {code:java} > INFO : Partition {dt=2020-06-29} stats: [numFiles=21, numRows=22, > totalSize=14607, rawDataSize=0]{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23802) “merge files” job was submited to default queue when set hive.merge.tezfiles to true
[ https://issues.apache.org/jira/browse/HIVE-23802?focusedWorklogId=483600&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483600 ] ASF GitHub Bot logged work on HIVE-23802: - Author: ASF GitHub Bot Created on: 13/Sep/20 00:48 Start Date: 13/Sep/20 00:48 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #1206: URL: https://github.com/apache/hive/pull/1206 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 483600) Time Spent: 0.5h (was: 20m) > “merge files” job was submited to default queue when set hive.merge.tezfiles > to true > > > Key: HIVE-23802 > URL: https://issues.apache.org/jira/browse/HIVE-23802 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 3.1.0 >Reporter: gaozhan ding >Assignee: gaozhan ding >Priority: Major > Labels: pull-request-available > Attachments: 15940042679272.png, HIVE-23802.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > We use tez as the query engine. When hive.merge.tezfiles set to true,merge > files task, which followed by orginal task, will be submit to default queue > rather then the queue same with orginal task. > I study this issue for days and found that, every time starting a container, > "tez,queue.name" whill be unset in current session. Code are as below: > {code:java} > // TezSessionState.startSessionAndContainers() > // sessionState.getQueueName() comes from cluster wide configured queue names. > // sessionState.getConf().get("tez.queue.name") is explicitly set by user in > a session. > // TezSessionPoolManager sets tez.queue.name if user has specified one or > use the one from > // cluster wide queue names. > // There is no way to differentiate how this was set (user vs system). > // Unset this after opening the session so that reopening of session uses > the correct queue > // names i.e, if client has not died and if the user has explicitly set a > queue name > // then reopened session will use user specified queue name else default > cluster queue names. > conf.unset(TezConfiguration.TEZ_QUEUE_NAME); > {code} > So after the orgin task was submited to yarn, "tez.queue.name" will be unset. > While starting merge file task, it will try use the same session with orgin > job, but get false due to tez.queue.name was unset. Seems like we could not > unset this property. > {code:java} > // TezSessionPoolManager.canWorkWithSameSession() > if (!session.isDefault()) { > String queueName = session.getQueueName(); > String confQueueName = conf.get(TezConfiguration.TEZ_QUEUE_NAME); > LOG.info("Current queue name is " + queueName + " incoming queue name is " > + confQueueName); > return (queueName == null) ? confQueueName == null : > queueName.equals(confQueueName); > } else { > // this session should never be a default session unless something has > messed up. > throw new HiveException("The pool session " + session + " should have been > returned to the pool"); > } > {code} > !15940042679272.png! > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23731) Review of AvroInstance Cache
[ https://issues.apache.org/jira/browse/HIVE-23731?focusedWorklogId=483598&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483598 ] ASF GitHub Bot logged work on HIVE-23731: - Author: ASF GitHub Bot Created on: 13/Sep/20 00:48 Start Date: 13/Sep/20 00:48 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #1153: URL: https://github.com/apache/hive/pull/1153 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 483598) Time Spent: 1h 10m (was: 1h) > Review of AvroInstance Cache > > > Key: HIVE-23731 > URL: https://issues.apache.org/jira/browse/HIVE-23731 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22634) Improperly SemanticException when filter is optimized to False on a partition table
[ https://issues.apache.org/jira/browse/HIVE-22634?focusedWorklogId=483597&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483597 ] ASF GitHub Bot logged work on HIVE-22634: - Author: ASF GitHub Bot Created on: 13/Sep/20 00:48 Start Date: 13/Sep/20 00:48 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #865: URL: https://github.com/apache/hive/pull/865 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 483597) Time Spent: 1h 20m (was: 1h 10m) > Improperly SemanticException when filter is optimized to False on a partition > table > --- > > Key: HIVE-22634 > URL: https://issues.apache.org/jira/browse/HIVE-22634 > Project: Hive > Issue Type: Improvement >Reporter: EdisonWang >Assignee: EdisonWang >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-22634.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > When filter is optimized to False on a partition table, it will throw > improperly SemanticException reporting that there is no partition predicate > found. > The step to reproduce is > {code:java} > set hive.strict.checks.no.partition.filter=true; > CREATE TABLE test(id int, name string)PARTITIONED BY (`date` string); > select * from test where `date` = '20191201' and 1<>1; > {code} > > The above sql will throw "Queries against partitioned tables without a > partition filter" exception. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24154) Missing simplification opportunity with IN and EQUALS clauses
[ https://issues.apache.org/jira/browse/HIVE-24154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez reassigned HIVE-24154: -- > Missing simplification opportunity with IN and EQUALS clauses > - > > Key: HIVE-24154 > URL: https://issues.apache.org/jira/browse/HIVE-24154 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Minor > > For instance, in perf driver CBO query 74, there are several filters that > could be simplified further: > {code} > HiveFilter(condition=[AND(=($1, 1999), IN($1, 1998, 1999))]) > {code} > This may lead to incorrect estimates and leads to unnecessary execution time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24154) Missing simplification opportunity with IN and EQUALS clauses
[ https://issues.apache.org/jira/browse/HIVE-24154?focusedWorklogId=483602&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483602 ] ASF GitHub Bot logged work on HIVE-24154: - Author: ASF GitHub Bot Created on: 13/Sep/20 01:24 Start Date: 13/Sep/20 01:24 Worklog Time Spent: 10m Work Description: jcamachor opened a new pull request #1492: URL: https://github.com/apache/hive/pull/1492 …uses ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 483602) Remaining Estimate: 0h Time Spent: 10m > Missing simplification opportunity with IN and EQUALS clauses > - > > Key: HIVE-24154 > URL: https://issues.apache.org/jira/browse/HIVE-24154 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > For instance, in perf driver CBO query 74, there are several filters that > could be simplified further: > {code} > HiveFilter(condition=[AND(=($1, 1999), IN($1, 1998, 1999))]) > {code} > This may lead to incorrect estimates and leads to unnecessary execution time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24154) Missing simplification opportunity with IN and EQUALS clauses
[ https://issues.apache.org/jira/browse/HIVE-24154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24154: -- Labels: pull-request-available (was: ) > Missing simplification opportunity with IN and EQUALS clauses > - > > Key: HIVE-24154 > URL: https://issues.apache.org/jira/browse/HIVE-24154 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > For instance, in perf driver CBO query 74, there are several filters that > could be simplified further: > {code} > HiveFilter(condition=[AND(=($1, 1999), IN($1, 1998, 1999))]) > {code} > This may lead to incorrect estimates and leads to unnecessary execution time. -- This message was sent by Atlassian Jira (v8.3.4#803005)