[jira] [Created] (HIVE-24675) Handle external table replication for HA with same NS and lazy copy.

2021-01-20 Thread Arko Sharma (Jira)
Arko Sharma created HIVE-24675:
--

 Summary: Handle external table replication for HA with same NS and 
lazy copy.
 Key: HIVE-24675
 URL: https://issues.apache.org/jira/browse/HIVE-24675
 Project: Hive
  Issue Type: Bug
Reporter: Arko Sharma
Assignee: Arko Sharma






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24674) Set repl.source.for property in the db if db is under replication

2021-01-20 Thread Ayush Saxena (Jira)
Ayush Saxena created HIVE-24674:
---

 Summary: Set repl.source.for property in the db if db is under 
replication
 Key: HIVE-24674
 URL: https://issues.apache.org/jira/browse/HIVE-24674
 Project: Hive
  Issue Type: Bug
Reporter: Ayush Saxena
Assignee: Ayush Saxena


Add repl.source.for property in the database in case not already set, if the 
database is under replication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24673) Migrate NegativeCliDriver and NegativeMinimrCliDriver to llap

2021-01-20 Thread Jira
Mustafa İman created HIVE-24673:
---

 Summary: Migrate NegativeCliDriver and NegativeMinimrCliDriver to 
llap
 Key: HIVE-24673
 URL: https://issues.apache.org/jira/browse/HIVE-24673
 Project: Hive
  Issue Type: Improvement
Reporter: Mustafa İman
Assignee: Mustafa İman


These test drivers should run on llap. Otherwise we can run into situations 
where certain queries correctly fail on MapReduce but not on Tez.

Also, it is better if negative cli drivers does not mask "Caused by" lines in 
test output. Otherwise, a query may start to fail for other reasons than the 
expected one and we do not realize it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24672) compute_stats_long.q fails for wrong reasons

2021-01-20 Thread Jira
Mustafa İman created HIVE-24672:
---

 Summary: compute_stats_long.q fails for wrong reasons
 Key: HIVE-24672
 URL: https://issues.apache.org/jira/browse/HIVE-24672
 Project: Hive
  Issue Type: Bug
Reporter: Mustafa İman
Assignee: Mustafa İman


TestNegativeCliDriver[compute_stats_long] intends to test fmsketch has a hard 
limit on number of bit vectors (1024). However, the test fails for the 
following wrong reason.
{code:java}
Caused by: java.lang.RuntimeException: Can not recognize 1Caused by: 
java.lang.RuntimeException: Can not recognize 1 at 
org.apache.hadoop.hive.common.ndv.NumDistinctValueEstimatorFactory.getEmptyNumDistinctValueEstimator(NumDistinctValueEstimatorFactory.java:71)
{code}
Instead it should fail with 
{code:java}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: The maximum 
allowed value for number of bit vectors  is 1024, but was passed 1 bit 
vectorsCaused by: org.apache.hadoop.hive.ql.metadata.HiveException: The maximum 
allowed value for number of bit vectors  is 1024, but was passed 1 bit 
vectors at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeBitVectorFMSketch$NumericStatsEvaluator.iterate(GenericUDAFComputeBitVectorFMSketch.java:125)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
{code}
Since this function is superseeded by compute_bit_vector_fm, it is best if we 
add the same test for compute_bit_vector_fm too.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24671) Semijoinremoval should not run into an NPE in case the SJ filter contains an UDF

2021-01-20 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-24671:
---

 Summary: Semijoinremoval should not run into an NPE in case the SJ 
filter contains an UDF
 Key: HIVE-24671
 URL: https://issues.apache.org/jira/browse/HIVE-24671
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


{code}
set hive.optimize.index.filter=true;
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.vectorized.execution.enabled=true;



drop table if exists t1;
drop table if exists t2;

create table t1 (
v1 string
);

create table t2 (
v2 string
);

insert into t1 values ('e123456789'),('x123456789');
insert into t2 values
('123'),
 ('e123456789');


-- alter table t1 update statistics set 
('numRows'='9348843574','rawDataSize'='0');

alter table t1 update statistics set ('numRows'='934884357','rawDataSize'='0');
alter table t2 update statistics set ('numRows'='9348','rawDataSize'='0');

alter table t1 update statistics for column v1 set 
('numNulls'='0','numDVs'='15541355','avgColLen'='10.0','maxColLen'='10');
alter table t2 update statistics for column v2 set 
('numNulls'='0','numDVs'='155','avgColLen'='5.0','maxColLen'='10');
-- alter table t2 update statistics for column k set 
('numNulls'='0','numDVs'='13876472','avgColLen'='15.9836','maxColLen'='16');

explain
select v1,v2 from t1 join t2 on (substr(v1,1,3) = v2);
{code}

results in:
{code}
 java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.parse.TezCompiler.removeSemijoinOptimizationByBenefit(TezCompiler.java:1944)
at 
org.apache.hadoop.hive.ql.parse.TezCompiler.semijoinRemovalBasedTransformations(TezCompiler.java:544)
at 
org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:240)
at 
org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:161)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.compilePlan(SemanticAnalyzer.java:12467)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12672)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:455)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301)
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:171)
[...]
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24670) DeleteReaderValue should not allocate empty vectors for delete delta files

2021-01-20 Thread Jira
Ádám Szita created HIVE-24670:
-

 Summary: DeleteReaderValue should not allocate empty vectors for 
delete delta files
 Key: HIVE-24670
 URL: https://issues.apache.org/jira/browse/HIVE-24670
 Project: Hive
  Issue Type: Improvement
Reporter: Ádám Szita
Assignee: Ádám Szita


If delete delta caching is turned off, the plain record reader inside 
DeleteReaderValue allocates a batch with a schema that is equivalent to that of 
an insert delta.

This is unnecessary as the struct part in a delete delta file is always empty. 
In cases where we have many delete delta files (e.g. due to compaction 
failures) and a wide table definition (e.g. 200+ cols) this puts a significant 
amount of memory pressure on the executor, while these empty structures will 
never be filled or otherwise utilized.

I propose we specify an ACID schema with an empty struct part to this record 
reader to counter this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24669) Improve Filesystem usage in Hive::loadPartitionInternal

2021-01-20 Thread Peter Varga (Jira)
Peter Varga created HIVE-24669:
--

 Summary: Improve Filesystem usage in Hive::loadPartitionInternal
 Key: HIVE-24669
 URL: https://issues.apache.org/jira/browse/HIVE-24669
 Project: Hive
  Issue Type: Sub-task
Reporter: Peter Varga
Assignee: Peter Varga


* Use native recursive listing instead doing it on the Hive side
 * Reuse the file list determined for writeNotificationlogs in quickstat 
generation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24668) Improve FileSystem usage in dynamic partition handling

2021-01-20 Thread Peter Varga (Jira)
Peter Varga created HIVE-24668:
--

 Summary: Improve FileSystem usage in dynamic partition handling
 Key: HIVE-24668
 URL: https://issues.apache.org/jira/browse/HIVE-24668
 Project: Hive
  Issue Type: Improvement
Reporter: Peter Varga
Assignee: Peter Varga


Possible improvements:
 * In the Movetask process both getFullDPSpecs and later 
Hive::getValidPartitionsInPath do a listing for dynamic partitions in the 
table, the result of the first can be reused
 * Hive::listFilesCreatedByQuery does the recursive listing on Hive side, the 
native recursive listing should be used
 * if we add a new partition we populate the quickstats, that will do another 
listing for the new partition, the files are already collected for the 
writeNotificationlogs, that can be used



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24667) Truncate optimization to avoid unnecessary per partition DB get operations

2021-01-20 Thread Denys Kuzmenko (Jira)
Denys Kuzmenko created HIVE-24667:
-

 Summary: Truncate optimization to avoid unnecessary per partition 
DB get operations
 Key: HIVE-24667
 URL: https://issues.apache.org/jira/browse/HIVE-24667
 Project: Hive
  Issue Type: Sub-task
Reporter: Denys Kuzmenko






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24666) Vectorized UDFToBoolean may unable to filter rows if input is string

2021-01-20 Thread Zhihua Deng (Jira)
Zhihua Deng created HIVE-24666:
--

 Summary: Vectorized UDFToBoolean may unable to filter rows if 
input is string
 Key: HIVE-24666
 URL: https://issues.apache.org/jira/browse/HIVE-24666
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Reporter: Zhihua Deng
Assignee: Zhihua Deng


If we use cast boolean in where conditions to filter rows,  in vectorization 
execution the filter is unable to filter rows,  step to reproduce:
{code:java}
create table vtb (key string, value string);
insert into table vtb values('0', 'val0'), ('false', 'valfalse'),('off', 
'valoff'),('no','valno'),('vk', 'valvk');
select distinct value from vtb where cast(key as boolean); {code}
It's seems we don't generate a SelectColumnIsTrue to filter the rows if the 
casted type is string:
 
https://github.com/apache/hive/blob/ff6f3565e50148b7bcfbcf19b970379f2bd59290/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2995-L2996



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24665) Add commitAlterTable method to the HiveMetaHook interface

2021-01-20 Thread Marton Bod (Jira)
Marton Bod created HIVE-24665:
-

 Summary: Add commitAlterTable method to the HiveMetaHook interface
 Key: HIVE-24665
 URL: https://issues.apache.org/jira/browse/HIVE-24665
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


Currently we have pre and post hooks for create table and drop table commands, 
but only a pre hook for alter table commands. We should add a post hook as well 
(with a default implementation).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)