[jira] [Created] (HIVE-25803) URL Mapping appends hdfs:// even for LOCAL DIRECTORY ops
Soumitra Sulav created HIVE-25803: - Summary: URL Mapping appends hdfs:// even for LOCAL DIRECTORY ops Key: HIVE-25803 URL: https://issues.apache.org/jira/browse/HIVE-25803 Project: Hive Issue Type: Bug Components: Authorization, HiveServer2 Affects Versions: 4.0.0 Reporter: Soumitra Sulav Assignee: Sai Hemanth Gantasala Repro steps: Connect to beeline {code:java} beeline -u "jdbc:hive2://quasar-pxlypi-2.quasar-pxlypi.root.hwx.site:10001/;principal=hive/_h...@root.hwx.site;ssl=true;sslTrustStore=/var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks;trustStorePassword=VOAnRk5l4oXsg0upJ1ApscSuNksirOKgyhJvoPv2o4j;transportMode=http;httpPath=cliservice;" {code} Create a test table and run insert on local {code:java} > create table dual (id int); > insert overwrite local directory "/tmp/" select * from dual; {code} {code:java} Error: Error while compiling statement: FAILED: HiveAccessControlException Permission denied: user [hrt_qa] does not have [ALL] privilege on [hdfs://ns1/tmp] (state=42000,code=4) {code} It always appends hdfs:// to the path even if the operation is meant for local directory. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25802) Log4j2 Vulnerability in Hive Storage API
Nikhil Gupta created HIVE-25802: --- Summary: Log4j2 Vulnerability in Hive Storage API Key: HIVE-25802 URL: https://issues.apache.org/jira/browse/HIVE-25802 Project: Hive Issue Type: Bug Components: storage-api Affects Versions: 4.0.0 Reporter: Nikhil Gupta Fix For: 4.0.0 Storage API branch also brings in log4j2 dependency <= 2.14.1 that can still expose a vulnerability in hive -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25801) Custom queue settings is not honoured by Query based compaction StatsUpdater
László Végh created HIVE-25801: -- Summary: Custom queue settings is not honoured by Query based compaction StatsUpdater Key: HIVE-25801 URL: https://issues.apache.org/jira/browse/HIVE-25801 Project: Hive Issue Type: Bug Reporter: László Végh {{hive.compactor.job.queue}} config limits resources available for compaction, so users can limit the effects of compaction on the cluster. However this settings does not affect stats collection which uses Driver. HIVE-25595 is addressing the above issue for MR-based compaction. We need to incorporate the same thing for the Query-based compaction. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-25800) loadDynamicPartitions in Hive.java should not load all partitions of a table from HMS
Sourabh Goyal created HIVE-25800: Summary: loadDynamicPartitions in Hive.java should not load all partitions of a table from HMS Key: HIVE-25800 URL: https://issues.apache.org/jira/browse/HIVE-25800 Project: Hive Issue Type: Improvement Components: Hive Reporter: Sourabh Goyal Assignee: Sourabh Goyal HIVE-20661 added an improvement in loadDynamicPartitions() api in Hive.java to not add partitions one by one in HMS. As part of that improvement, following code was introduced: {code:java} // fetch all the partitions matching the part spec using the partition iterable // this way the maximum batch size configuration parameter is considered PartitionIterable partitionIterable = new PartitionIterable(Hive.get(), tbl, partSpec, conf.getInt(MetastoreConf.ConfVars.BATCH_RETRIEVE_MAX.getVarname(), 300)); Iterator iterator = partitionIterable.iterator(); // Match valid partition path to partitions while (iterator.hasNext()) { Partition partition = iterator.next(); partitionDetailsMap.entrySet().stream() .filter(entry -> entry.getValue().fullSpec.equals(partition.getSpec())) .findAny().ifPresent(entry -> { entry.getValue().partition = partition; entry.getValue().hasOldPartition = true; }); } {code} The above code fetches all the existing partitions for a table from HMS and compare that dynamic partitions list to decide old and new partitions to be added to HMS (in batches). The call to fetch all partitions has introduced a performance regression for tables with large number of partitions (of the order of 100K). The fix is to skip fetching all partitions. Instead, in the threadPool which loads each partition individually, call get_partition() to check if the partition already exists in HMS or not. This will introduce additional getPartition() call for every partition to be loaded dynamically but removes fetching all existing partitions for a table. I believe this is fine since for tables with small number of existing partitions in HMS - getPartitions() won't add too much overhead but for tables with large number of existing partitions, it will certainly avoid getting all partitions from HMS cc - [~lpinter] [~ngangam] -- This message was sent by Atlassian Jira (v8.20.1#820001)