[jira] [Created] (HIVE-24675) Handle external table replication for HA with same NS and lazy copy.
Arko Sharma created HIVE-24675: -- Summary: Handle external table replication for HA with same NS and lazy copy. Key: HIVE-24675 URL: https://issues.apache.org/jira/browse/HIVE-24675 Project: Hive Issue Type: Bug Reporter: Arko Sharma Assignee: Arko Sharma -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24674) Set repl.source.for property in the db if db is under replication
Ayush Saxena created HIVE-24674: --- Summary: Set repl.source.for property in the db if db is under replication Key: HIVE-24674 URL: https://issues.apache.org/jira/browse/HIVE-24674 Project: Hive Issue Type: Bug Reporter: Ayush Saxena Assignee: Ayush Saxena Add repl.source.for property in the database in case not already set, if the database is under replication. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24673) Migrate NegativeCliDriver and NegativeMinimrCliDriver to llap
Mustafa İman created HIVE-24673: --- Summary: Migrate NegativeCliDriver and NegativeMinimrCliDriver to llap Key: HIVE-24673 URL: https://issues.apache.org/jira/browse/HIVE-24673 Project: Hive Issue Type: Improvement Reporter: Mustafa İman Assignee: Mustafa İman These test drivers should run on llap. Otherwise we can run into situations where certain queries correctly fail on MapReduce but not on Tez. Also, it is better if negative cli drivers does not mask "Caused by" lines in test output. Otherwise, a query may start to fail for other reasons than the expected one and we do not realize it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24672) compute_stats_long.q fails for wrong reasons
Mustafa İman created HIVE-24672: --- Summary: compute_stats_long.q fails for wrong reasons Key: HIVE-24672 URL: https://issues.apache.org/jira/browse/HIVE-24672 Project: Hive Issue Type: Bug Reporter: Mustafa İman Assignee: Mustafa İman TestNegativeCliDriver[compute_stats_long] intends to test fmsketch has a hard limit on number of bit vectors (1024). However, the test fails for the following wrong reason. {code:java} Caused by: java.lang.RuntimeException: Can not recognize 1Caused by: java.lang.RuntimeException: Can not recognize 1 at org.apache.hadoop.hive.common.ndv.NumDistinctValueEstimatorFactory.getEmptyNumDistinctValueEstimator(NumDistinctValueEstimatorFactory.java:71) {code} Instead it should fail with {code:java} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: The maximum allowed value for number of bit vectors is 1024, but was passed 1 bit vectorsCaused by: org.apache.hadoop.hive.ql.metadata.HiveException: The maximum allowed value for number of bit vectors is 1024, but was passed 1 bit vectors at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeBitVectorFMSketch$NumericStatsEvaluator.iterate(GenericUDAFComputeBitVectorFMSketch.java:125) ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] {code} Since this function is superseeded by compute_bit_vector_fm, it is best if we add the same test for compute_bit_vector_fm too. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24671) Semijoinremoval should not run into an NPE in case the SJ filter contains an UDF
Zoltan Haindrich created HIVE-24671: --- Summary: Semijoinremoval should not run into an NPE in case the SJ filter contains an UDF Key: HIVE-24671 URL: https://issues.apache.org/jira/browse/HIVE-24671 Project: Hive Issue Type: Bug Reporter: Zoltan Haindrich Assignee: Zoltan Haindrich {code} set hive.optimize.index.filter=true; set hive.support.concurrency=true; set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; set hive.exec.dynamic.partition.mode=nonstrict; set hive.exec.dynamic.partition=true; set hive.vectorized.execution.enabled=true; drop table if exists t1; drop table if exists t2; create table t1 ( v1 string ); create table t2 ( v2 string ); insert into t1 values ('e123456789'),('x123456789'); insert into t2 values ('123'), ('e123456789'); -- alter table t1 update statistics set ('numRows'='9348843574','rawDataSize'='0'); alter table t1 update statistics set ('numRows'='934884357','rawDataSize'='0'); alter table t2 update statistics set ('numRows'='9348','rawDataSize'='0'); alter table t1 update statistics for column v1 set ('numNulls'='0','numDVs'='15541355','avgColLen'='10.0','maxColLen'='10'); alter table t2 update statistics for column v2 set ('numNulls'='0','numDVs'='155','avgColLen'='5.0','maxColLen'='10'); -- alter table t2 update statistics for column k set ('numNulls'='0','numDVs'='13876472','avgColLen'='15.9836','maxColLen'='16'); explain select v1,v2 from t1 join t2 on (substr(v1,1,3) = v2); {code} results in: {code} java.lang.NullPointerException at org.apache.hadoop.hive.ql.parse.TezCompiler.removeSemijoinOptimizationByBenefit(TezCompiler.java:1944) at org.apache.hadoop.hive.ql.parse.TezCompiler.semijoinRemovalBasedTransformations(TezCompiler.java:544) at org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:240) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:161) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.compilePlan(SemanticAnalyzer.java:12467) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12672) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:455) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:171) [...] {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24670) DeleteReaderValue should not allocate empty vectors for delete delta files
Ádám Szita created HIVE-24670: - Summary: DeleteReaderValue should not allocate empty vectors for delete delta files Key: HIVE-24670 URL: https://issues.apache.org/jira/browse/HIVE-24670 Project: Hive Issue Type: Improvement Reporter: Ádám Szita Assignee: Ádám Szita If delete delta caching is turned off, the plain record reader inside DeleteReaderValue allocates a batch with a schema that is equivalent to that of an insert delta. This is unnecessary as the struct part in a delete delta file is always empty. In cases where we have many delete delta files (e.g. due to compaction failures) and a wide table definition (e.g. 200+ cols) this puts a significant amount of memory pressure on the executor, while these empty structures will never be filled or otherwise utilized. I propose we specify an ACID schema with an empty struct part to this record reader to counter this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24669) Improve Filesystem usage in Hive::loadPartitionInternal
Peter Varga created HIVE-24669: -- Summary: Improve Filesystem usage in Hive::loadPartitionInternal Key: HIVE-24669 URL: https://issues.apache.org/jira/browse/HIVE-24669 Project: Hive Issue Type: Sub-task Reporter: Peter Varga Assignee: Peter Varga * Use native recursive listing instead doing it on the Hive side * Reuse the file list determined for writeNotificationlogs in quickstat generation -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24668) Improve FileSystem usage in dynamic partition handling
Peter Varga created HIVE-24668: -- Summary: Improve FileSystem usage in dynamic partition handling Key: HIVE-24668 URL: https://issues.apache.org/jira/browse/HIVE-24668 Project: Hive Issue Type: Improvement Reporter: Peter Varga Assignee: Peter Varga Possible improvements: * In the Movetask process both getFullDPSpecs and later Hive::getValidPartitionsInPath do a listing for dynamic partitions in the table, the result of the first can be reused * Hive::listFilesCreatedByQuery does the recursive listing on Hive side, the native recursive listing should be used * if we add a new partition we populate the quickstats, that will do another listing for the new partition, the files are already collected for the writeNotificationlogs, that can be used -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24667) Truncate optimization to avoid unnecessary per partition DB get operations
Denys Kuzmenko created HIVE-24667: - Summary: Truncate optimization to avoid unnecessary per partition DB get operations Key: HIVE-24667 URL: https://issues.apache.org/jira/browse/HIVE-24667 Project: Hive Issue Type: Sub-task Reporter: Denys Kuzmenko -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24666) Vectorized UDFToBoolean may unable to filter rows if input is string
Zhihua Deng created HIVE-24666: -- Summary: Vectorized UDFToBoolean may unable to filter rows if input is string Key: HIVE-24666 URL: https://issues.apache.org/jira/browse/HIVE-24666 Project: Hive Issue Type: Bug Components: Vectorization Reporter: Zhihua Deng Assignee: Zhihua Deng If we use cast boolean in where conditions to filter rows, in vectorization execution the filter is unable to filter rows, step to reproduce: {code:java} create table vtb (key string, value string); insert into table vtb values('0', 'val0'), ('false', 'valfalse'),('off', 'valoff'),('no','valno'),('vk', 'valvk'); select distinct value from vtb where cast(key as boolean); {code} It's seems we don't generate a SelectColumnIsTrue to filter the rows if the casted type is string: https://github.com/apache/hive/blob/ff6f3565e50148b7bcfbcf19b970379f2bd59290/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2995-L2996 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24665) Add commitAlterTable method to the HiveMetaHook interface
Marton Bod created HIVE-24665: - Summary: Add commitAlterTable method to the HiveMetaHook interface Key: HIVE-24665 URL: https://issues.apache.org/jira/browse/HIVE-24665 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod Currently we have pre and post hooks for create table and drop table commands, but only a pre hook for alter table commands. We should add a post hook as well (with a default implementation). -- This message was sent by Atlassian Jira (v8.3.4#803005)