[jira] [Created] (HIVE-22988) LLAP: If consistent splits is disabled ordering instances is not required

2020-03-06 Thread Prasanth Jayachandran (Jira)
Prasanth Jayachandran created HIVE-22988:


 Summary: LLAP: If consistent splits is disabled ordering instances 
is not required
 Key: HIVE-22988
 URL: https://issues.apache.org/jira/browse/HIVE-22988
 Project: Hive
  Issue Type: Bug
Affects Versions: 4.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran


LlapTaskSchedulerService always gets consistent ordered list of all LLAP 
instances even if consistent splits is disabled. When consistent split is 
disabled ordering isn't really useful as there is no cache locality. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22990) Build acknowledgement mechanism for repl dump and load

2020-03-06 Thread Aasha Medhi (Jira)
Aasha Medhi created HIVE-22990:
--

 Summary: Build acknowledgement mechanism for repl dump and load
 Key: HIVE-22990
 URL: https://issues.apache.org/jira/browse/HIVE-22990
 Project: Hive
  Issue Type: Bug
Reporter: Aasha Medhi
Assignee: Aasha Medhi






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22989) Don't close parent classloader when session being closed

2020-03-06 Thread Zhihua Deng (Jira)
Zhihua Deng created HIVE-22989:
--

 Summary: Don't close parent classloader when session being closed
 Key: HIVE-22989
 URL: https://issues.apache.org/jira/browse/HIVE-22989
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Zhihua Deng


When hiveserver2 load udfs,  Registry will use session specified classloader to 
load them and add cache the classloader.  When user don't set the aux jars,  
the classloader cached is equal to the session's parent classloader, in our 
case, we don't set the aux jars while update the session's parent classloader 
periodicity to update user jars dynamically. It's should do a sanity check when 
Registry closes the cached classloaders.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22993) Include Bloom Filter in Column Statistics to Better Estimate nDV

2020-03-06 Thread David Mollitor (Jira)
David Mollitor created HIVE-22993:
-

 Summary: Include Bloom Filter in Column Statistics to Better 
Estimate nDV
 Key: HIVE-22993
 URL: https://issues.apache.org/jira/browse/HIVE-22993
 Project: Hive
  Issue Type: Improvement
  Components: CBO, Statistics
Reporter: David Mollitor


When performing an INSERT statement, Hive has no way to determine the number of 
distinct values since the distinct values themselves are not recorded.

{code:sql}
create table test_mm(`id` int, `my_dt` date);

insert into test_mm values (1, "2018-10-01"), (2, "2018-10-01"), (3, 
"2018-10-01"),
(4, "2017-10-01"), (5, "2017-10-01"), (6, "2017-10-01"),
(7, "2010-10-01"), (8, "2010-10-01"), (9, "2010-10-01"),
(10, "1998-10-01"), (11, "1998-10-01"), (12, "1998-10-01");

DESCRIBE FORMATTED test_mm my_dt;
-- distinct_count: 4

insert into test_mm values (13, "2030-10-01"), (14, "2030-10-01"), (15, 
"2030-10-01");

DESCRIBE FORMATTED test_mm my_dt;
-- distinct_count: 4
{code}

The first INSERT statement sees that there are 0 records, so it makes sense 
that any distinct values marked in the statistics.  However, for the second 
INSERT, Hive has no idea if "2030-10-01" is distinct, so the distinct_count is 
unchanged.  By introducing a bloom filter for column statistics, the second 
INSERT may be able to determine that "2030-10-01" is indeed unique and update 
the distinct_count accordingly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 72200: TopN Key efficiency check might disable filter too soon

2020-03-06 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72200/
---

(Updated March 6, 2020, 12:44 p.m.)


Review request for hive, Gopal V, Jesús Camacho Rodríguez, Krisztian Kasa, and 
Rajesh Balamohan.


Bugs: HIVE-22982
https://issues.apache.org/jira/browse/HIVE-22982


Repository: hive-git


Description
---

The check is triggered after every n batches but there can be multiple filters, 
one for each partition. Some filters might have less data then the others.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 12f4822e381 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java f09867bb4e8 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 
0f8eb173c66 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperBatch.java
 b487480b938 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperGeneralComparator.java
 06ac661028f 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TopNKeyDesc.java ddd657e5552 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestTopNKeyFilter.java a91bc7354a7 


Diff: https://reviews.apache.org/r/72200/diff/2/

Changes: https://reviews.apache.org/r/72200/diff/1-2/


Testing
---

manually


Thanks,

Attila Magyar



[jira] [Created] (HIVE-22992) ZkRegistryBase caching mechanism only caches the first instance

2020-03-06 Thread Antal Sinkovits (Jira)
Antal Sinkovits created HIVE-22992:
--

 Summary: ZkRegistryBase caching mechanism only caches the first 
instance
 Key: HIVE-22992
 URL: https://issues.apache.org/jira/browse/HIVE-22992
 Project: Hive
  Issue Type: Bug
  Components: llap
Affects Versions: 4.0.0
Reporter: Antal Sinkovits
Assignee: Antal Sinkovits


ZkRegistryBase caching mechanism only caches the first instance of the llap 
node running on the same host.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22991) INSERT INTO with named field list and CBO=false fails

2020-03-06 Thread David Doran (Jira)
David Doran created HIVE-22991:
--

 Summary: INSERT INTO with named field list and CBO=false fails
 Key: HIVE-22991
 URL: https://issues.apache.org/jira/browse/HIVE-22991
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 3.0.0
Reporter: David Doran
 Attachments: INSERT INTO with named field list and CBO=false bug.txt

Queries that insert into tables using a named field list fail when CBO is 
disabled.

Here's an example to illustrate what I mean:

INSERT INTO dd_insert_values_test(field1, field2) VALUES 
('hive.cbo.enable=false', 'With named fields');

Notice that it names the fields (field1, field2) into which the values should 
be inserted.

With CBO disabled this query fails with error:

-- Error: Error while compiling statement: FAILED: SemanticException 0:0 
Expected 2 columns for insclause-0/daved@dd_insert_values_test; select produces 
1 columns. Error encountered near token ''With named fields'' 
(state=42000,code=4)

With CBO enabled it works.

And without the named field list, it works with either CBO enabled or disabled. 
Please see the attached repro script: [^INSERT INTO with named field list and 
CBO=false bug.txt]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22994) Add total file size to explain

2020-03-06 Thread Prasanth Jayachandran (Jira)
Prasanth Jayachandran created HIVE-22994:


 Summary: Add total file size to explain
 Key: HIVE-22994
 URL: https://issues.apache.org/jira/browse/HIVE-22994
 Project: Hive
  Issue Type: Improvement
Reporter: Prasanth Jayachandran


HIVE-22979 added total file size to Statistics object for table scan operator. 
It will be very useful for debugging just from the explain output to know the 
actual on-disk file size (instead of getting describe formatted output). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22995) Add support for location for managed tables on database

2020-03-06 Thread Naveen Gangam (Jira)
Naveen Gangam created HIVE-22995:


 Summary: Add support for location for managed tables on database
 Key: HIVE-22995
 URL: https://issues.apache.org/jira/browse/HIVE-22995
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Affects Versions: 3.1.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam
 Attachments: Hive Metastore Support for Tenant-based storage 
heirarchy.pdf

I have attached the initial spec to this jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22996) BasicStats parsing should check proactively for null or empty string

2020-03-06 Thread Jesus Camacho Rodriguez (Jira)
Jesus Camacho Rodriguez created HIVE-22996:
--

 Summary: BasicStats parsing should check proactively for null or 
empty string
 Key: HIVE-22996
 URL: https://issues.apache.org/jira/browse/HIVE-22996
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Rather than throwing an Exception for control flow, which will create 
unnecessary overhead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)