[jira] [Created] (HIVE-25803) URL Mapping appends hdfs:// even for LOCAL DIRECTORY ops

2021-12-13 Thread Soumitra Sulav (Jira)
Soumitra Sulav created HIVE-25803:
-

 Summary: URL Mapping appends hdfs:// even for LOCAL DIRECTORY ops
 Key: HIVE-25803
 URL: https://issues.apache.org/jira/browse/HIVE-25803
 Project: Hive
  Issue Type: Bug
  Components: Authorization, HiveServer2
Affects Versions: 4.0.0
Reporter: Soumitra Sulav
Assignee: Sai Hemanth Gantasala


Repro steps:

Connect to beeline

{code:java}
beeline -u 
"jdbc:hive2://quasar-pxlypi-2.quasar-pxlypi.root.hwx.site:10001/;principal=hive/_h...@root.hwx.site;ssl=true;sslTrustStore=/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks;trustStorePassword=VOAnRk5l4oXsg0upJ1ApscSuNksirOKgyhJvoPv2o4j;transportMode=http;httpPath=cliservice;"
{code}

 

Create a test table and run insert on local

{code:java}
> create table dual (id int); 
> insert overwrite local directory "/tmp/" select * from dual;
{code}


{code:java}
Error: Error while compiling statement: FAILED: HiveAccessControlException 
Permission denied: user [hrt_qa] does not have [ALL] privilege on 
[hdfs://ns1/tmp] (state=42000,code=4)
{code}


It always appends hdfs:// to the path even if the operation is meant for local 
directory.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25802) Log4j2 Vulnerability in Hive Storage API

2021-12-13 Thread Nikhil Gupta (Jira)
Nikhil Gupta created HIVE-25802:
---

 Summary: Log4j2 Vulnerability in Hive Storage API
 Key: HIVE-25802
 URL: https://issues.apache.org/jira/browse/HIVE-25802
 Project: Hive
  Issue Type: Bug
  Components: storage-api
Affects Versions: 4.0.0
Reporter: Nikhil Gupta
 Fix For: 4.0.0


Storage API branch also brings in log4j2 dependency <= 2.14.1 that can still 
expose a vulnerability in hive



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25801) Custom queue settings is not honoured by Query based compaction StatsUpdater

2021-12-13 Thread Jira
László Végh created HIVE-25801:
--

 Summary: Custom queue settings is not honoured by Query based 
compaction StatsUpdater
 Key: HIVE-25801
 URL: https://issues.apache.org/jira/browse/HIVE-25801
 Project: Hive
  Issue Type: Bug
Reporter: László Végh


{{hive.compactor.job.queue}} config limits resources available for compaction, 
so users can limit the effects of compaction on the cluster. However this 
settings does not affect stats collection which uses Driver.

HIVE-25595 is addressing the above issue for MR-based compaction. We need to 
incorporate the same thing for the Query-based compaction.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25800) loadDynamicPartitions in Hive.java should not load all partitions of a table from HMS

2021-12-13 Thread Sourabh Goyal (Jira)
Sourabh Goyal created HIVE-25800:


 Summary: loadDynamicPartitions in Hive.java should not load all 
partitions of a table from HMS 
 Key: HIVE-25800
 URL: https://issues.apache.org/jira/browse/HIVE-25800
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Reporter: Sourabh Goyal
Assignee: Sourabh Goyal


HIVE-20661 added an improvement in loadDynamicPartitions() api in Hive.java to 
not add partitions one by one in HMS. As part of that improvement, following 
code was introduced: 
{code:java}
// fetch all the partitions matching the part spec using the partition iterable
// this way the maximum batch size configuration parameter is considered
PartitionIterable partitionIterable = new PartitionIterable(Hive.get(), tbl, 
partSpec,
  conf.getInt(MetastoreConf.ConfVars.BATCH_RETRIEVE_MAX.getVarname(), 
300));
Iterator iterator = partitionIterable.iterator();

// Match valid partition path to partitions
while (iterator.hasNext()) {
  Partition partition = iterator.next();
  partitionDetailsMap.entrySet().stream()
  .filter(entry -> 
entry.getValue().fullSpec.equals(partition.getSpec()))
  .findAny().ifPresent(entry -> {
entry.getValue().partition = partition;
entry.getValue().hasOldPartition = true;
  });
} {code}
The above code fetches all the existing partitions for a table from HMS and 
compare that dynamic partitions list to decide old and new partitions to be 
added to HMS (in batches). The call to fetch all partitions has introduced a 
performance regression for tables with large number of partitions (of the order 
of 100K). 

 

The fix is to skip fetching all partitions. Instead, in the threadPool which 
loads each partition individually,  call get_partition() to check if the 
partition already exists in HMS or not.  

This will introduce additional getPartition() call for every partition to be 
loaded dynamically but removes fetching all existing partitions for a table. 

I believe this is fine since for tables with small number of existing 
partitions in HMS - getPartitions() won't add too much overhead but for tables 
with large number of existing partitions, it will certainly avoid getting all 
partitions from HMS 

cc - [~lpinter] [~ngangam] 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)