[jira] [Created] (HIVE-22996) BasicStats parsing should check proactively for null or empty string
Jesus Camacho Rodriguez created HIVE-22996: -- Summary: BasicStats parsing should check proactively for null or empty string Key: HIVE-22996 URL: https://issues.apache.org/jira/browse/HIVE-22996 Project: Hive Issue Type: Bug Components: Statistics Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Rather than throwing an Exception for control flow, which will create unnecessary overhead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22995) Add support for location for managed tables on database
Naveen Gangam created HIVE-22995: Summary: Add support for location for managed tables on database Key: HIVE-22995 URL: https://issues.apache.org/jira/browse/HIVE-22995 Project: Hive Issue Type: Improvement Components: Hive Affects Versions: 3.1.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Attachments: Hive Metastore Support for Tenant-based storage heirarchy.pdf I have attached the initial spec to this jira. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22994) Add total file size to explain
Prasanth Jayachandran created HIVE-22994: Summary: Add total file size to explain Key: HIVE-22994 URL: https://issues.apache.org/jira/browse/HIVE-22994 Project: Hive Issue Type: Improvement Reporter: Prasanth Jayachandran HIVE-22979 added total file size to Statistics object for table scan operator. It will be very useful for debugging just from the explain output to know the actual on-disk file size (instead of getting describe formatted output). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22993) Include Bloom Filter in Column Statistics to Better Estimate nDV
David Mollitor created HIVE-22993: - Summary: Include Bloom Filter in Column Statistics to Better Estimate nDV Key: HIVE-22993 URL: https://issues.apache.org/jira/browse/HIVE-22993 Project: Hive Issue Type: Improvement Components: CBO, Statistics Reporter: David Mollitor When performing an INSERT statement, Hive has no way to determine the number of distinct values since the distinct values themselves are not recorded. {code:sql} create table test_mm(`id` int, `my_dt` date); insert into test_mm values (1, "2018-10-01"), (2, "2018-10-01"), (3, "2018-10-01"), (4, "2017-10-01"), (5, "2017-10-01"), (6, "2017-10-01"), (7, "2010-10-01"), (8, "2010-10-01"), (9, "2010-10-01"), (10, "1998-10-01"), (11, "1998-10-01"), (12, "1998-10-01"); DESCRIBE FORMATTED test_mm my_dt; -- distinct_count: 4 insert into test_mm values (13, "2030-10-01"), (14, "2030-10-01"), (15, "2030-10-01"); DESCRIBE FORMATTED test_mm my_dt; -- distinct_count: 4 {code} The first INSERT statement sees that there are 0 records, so it makes sense that any distinct values marked in the statistics. However, for the second INSERT, Hive has no idea if "2030-10-01" is distinct, so the distinct_count is unchanged. By introducing a bloom filter for column statistics, the second INSERT may be able to determine that "2030-10-01" is indeed unique and update the distinct_count accordingly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22992) ZkRegistryBase caching mechanism only caches the first instance
Antal Sinkovits created HIVE-22992: -- Summary: ZkRegistryBase caching mechanism only caches the first instance Key: HIVE-22992 URL: https://issues.apache.org/jira/browse/HIVE-22992 Project: Hive Issue Type: Bug Components: llap Affects Versions: 4.0.0 Reporter: Antal Sinkovits Assignee: Antal Sinkovits ZkRegistryBase caching mechanism only caches the first instance of the llap node running on the same host. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22991) INSERT INTO with named field list and CBO=false fails
David Doran created HIVE-22991: -- Summary: INSERT INTO with named field list and CBO=false fails Key: HIVE-22991 URL: https://issues.apache.org/jira/browse/HIVE-22991 Project: Hive Issue Type: Bug Components: Query Planning Affects Versions: 3.0.0 Reporter: David Doran Attachments: INSERT INTO with named field list and CBO=false bug.txt Queries that insert into tables using a named field list fail when CBO is disabled. Here's an example to illustrate what I mean: INSERT INTO dd_insert_values_test(field1, field2) VALUES ('hive.cbo.enable=false', 'With named fields'); Notice that it names the fields (field1, field2) into which the values should be inserted. With CBO disabled this query fails with error: -- Error: Error while compiling statement: FAILED: SemanticException 0:0 Expected 2 columns for insclause-0/daved@dd_insert_values_test; select produces 1 columns. Error encountered near token ''With named fields'' (state=42000,code=4) With CBO enabled it works. And without the named field list, it works with either CBO enabled or disabled. Please see the attached repro script: [^INSERT INTO with named field list and CBO=false bug.txt] -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Review Request 72200: TopN Key efficiency check might disable filter too soon
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72200/ --- (Updated March 6, 2020, 12:44 p.m.) Review request for hive, Gopal V, Jesús Camacho Rodríguez, Krisztian Kasa, and Rajesh Balamohan. Bugs: HIVE-22982 https://issues.apache.org/jira/browse/HIVE-22982 Repository: hive-git Description --- The check is triggered after every n batches but there can be multiple filters, one for each partition. Some filters might have less data then the others. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 12f4822e381 ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java f09867bb4e8 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 0f8eb173c66 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperBatch.java b487480b938 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperGeneralComparator.java 06ac661028f ql/src/java/org/apache/hadoop/hive/ql/plan/TopNKeyDesc.java ddd657e5552 ql/src/test/org/apache/hadoop/hive/ql/exec/TestTopNKeyFilter.java a91bc7354a7 Diff: https://reviews.apache.org/r/72200/diff/2/ Changes: https://reviews.apache.org/r/72200/diff/1-2/ Testing --- manually Thanks, Attila Magyar
[jira] [Created] (HIVE-22990) Build acknowledgement mechanism for repl dump and load
Aasha Medhi created HIVE-22990: -- Summary: Build acknowledgement mechanism for repl dump and load Key: HIVE-22990 URL: https://issues.apache.org/jira/browse/HIVE-22990 Project: Hive Issue Type: Bug Reporter: Aasha Medhi Assignee: Aasha Medhi -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22989) Don't close parent classloader when session being closed
Zhihua Deng created HIVE-22989: -- Summary: Don't close parent classloader when session being closed Key: HIVE-22989 URL: https://issues.apache.org/jira/browse/HIVE-22989 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Zhihua Deng When hiveserver2 load udfs, Registry will use session specified classloader to load them and add cache the classloader. When user don't set the aux jars, the classloader cached is equal to the session's parent classloader, in our case, we don't set the aux jars while update the session's parent classloader periodicity to update user jars dynamically. It's should do a sanity check when Registry closes the cached classloaders. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22988) LLAP: If consistent splits is disabled ordering instances is not required
Prasanth Jayachandran created HIVE-22988: Summary: LLAP: If consistent splits is disabled ordering instances is not required Key: HIVE-22988 URL: https://issues.apache.org/jira/browse/HIVE-22988 Project: Hive Issue Type: Bug Affects Versions: 4.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran LlapTaskSchedulerService always gets consistent ordered list of all LLAP instances even if consistent splits is disabled. When consistent split is disabled ordering isn't really useful as there is no cache locality. -- This message was sent by Atlassian Jira (v8.3.4#803005)