[jira] [Created] (HIVE-22528) Bloom Filter not showing up in Explain plan

2019-11-21 Thread Subhajit Sinha (Jira)
Subhajit Sinha created HIVE-22528:
-

 Summary: Bloom Filter not showing up in Explain plan
 Key: HIVE-22528
 URL: https://issues.apache.org/jira/browse/HIVE-22528
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.1.0
 Environment: Test Environment.
Reporter: Subhajit Sinha


Hi Team,

We are using Hive version (Apache Hive (version 3.1.0.3.1.0.0-78) and trying to 
implement Bloom filter in it. So basically I have created a managed table with 
table properties defined as:

'orc.bloom.filter.columns'='***',  'orc.bloom.filter.fpp'='0.05',  
'orc.stripe.size'='268435456',

and stored it as orc file. While checking the explain plan(running: explain 
select count(1) from the_table where ) in the current Hive version, 
I couldn't see anything as "Bloom_Filter" in the Plan provided by the CBO. The 
table I'm querying data in has  records.

 

I have a few doubts:
 # Is Hive 3.1 version not using Bloom filter? If so, I have queried a normal 
table with same query and condition have seen that it takes more time compared 
to a table having Bloom filter defined on the column that has condition.
 # Is there any parameter that needs to be set to get the value/ Bloom filter 
in the table?
 # I have come across three parameters, please let me know what does these 
signify : 
h5. 
hive.tez.max.bloom.filter.entries,hive.tez.min.bloom.filter.entries,hive.tez.bloom.filter.factor

Please let me know if anyone has used Bloom filter. Let me know then the process



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22527) Hive on Tez : Job of merging samll files will be submitted into another queue (default queue)

2019-11-21 Thread zhangbutao (Jira)
zhangbutao created HIVE-22527:
-

 Summary: Hive on Tez : Job of merging samll files will be 
submitted into another queue (default queue)
 Key: HIVE-22527
 URL: https://issues.apache.org/jira/browse/HIVE-22527
 Project: Hive
  Issue Type: Bug
Affects Versions: 3.1.1, 3.1.0
Reporter: zhangbutao






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22526) Extract Compiler from Driver

2019-11-21 Thread Miklos Gergely (Jira)
Miklos Gergely created HIVE-22526:
-

 Summary: Extract Compiler from Driver
 Key: HIVE-22526
 URL: https://issues.apache.org/jira/browse/HIVE-22526
 Project: Hive
  Issue Type: Sub-task
  Components: Hive
Reporter: Miklos Gergely
Assignee: Miklos Gergely
 Fix For: 4.0.0


The Driver class contains ~600 lines of code responsible for compiling the 
command. That means that from the command String a Plan needs to be created, 
and also a transaction needs to be started (in most of the cases). This is a 
thing done by the compile function, which has a lot of sub functions to help 
this task, while itself is also really big. All these codes should be put into 
a separate class, where it can do it's job without getting mixed with the other 
codes in the Driver.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22525) Refactor HiveOpConverter

2019-11-21 Thread Miklos Gergely (Jira)
Miklos Gergely created HIVE-22525:
-

 Summary: Refactor HiveOpConverter
 Key: HIVE-22525
 URL: https://issues.apache.org/jira/browse/HIVE-22525
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Reporter: Miklos Gergely
Assignee: Miklos Gergely
 Fix For: 4.0.0


HiveOpConverter is on it's way to become a monster class. It is already ~1300 
lines long, and expected to grow. It should be refactored, cut into multiple 
classes in a reasonable way. It is a natural way to do this is to create 
separate visitor classes for the different RelNodes, which are already handled 
in different functions within HiveOpConverter. That way HiveOpConverter can be 
the dispatcher among those visitor classes, while each of them are handling 
some specific work, potentially requesting sub nodes to be dispatched by 
HiveOpConverter. The functions used by multiple visitors should be put into 
some utility class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 71792: COMPLETED_TXN_COMPONENTS table is never cleaned up unless Compactor runs

2019-11-21 Thread Denys Kuzmenko via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71792/
---

(Updated Nov. 21, 2019, 5:35 p.m.)


Review request for hive, Laszlo Pinter and Peter Vary.


Bugs: HIVE-21917
https://issues.apache.org/jira/browse/HIVE-21917


Repository: hive-git


Description
---

The Initiator thread in the metastore repeatedly loops over entries in the 
COMPLETED_TXN_COMPONENTS table to determine which partitions / tables might 
need to be compacted. However, entries are never removed from this table except 
by a completed Compactor run.

In a cluster where most tables / partitions are write-once read-many, this 
results in stale entries in this table never being cleaned up. In a small test 
cluster, we have observed approximately 45k entries in this table (virtually 
equal to the number of partitions in the cluster) while < 100 of these tables 
have delta files at all. Since most of the tables will never get enough writes 
to trigger a compaction (and in fact have only ever been written to once), the 
initiator thread keeps trying to evaluate them on every loop.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 610cf05204 
  
ql/src/test/org/apache/hadoop/hive/metastore/txn/TestCompactionTxnHandler.java 
b28b57779b 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
 8253ccb9c9 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
 268038795b 
  
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java
 e840758c9d 


Diff: https://reviews.apache.org/r/71792/diff/2/

Changes: https://reviews.apache.org/r/71792/diff/1-2/


Testing
---

Unit tests


Thanks,

Denys Kuzmenko



Review Request 71801: The error handler in LlapRecordReader might block if its queue is full

2019-11-21 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71801/
---

Review request for hive, Laszlo Bodor, Panos Garefalakis, and Slim Bouguerra.


Bugs: HIVE-22523
https://issues.apache.org/jira/browse/HIVE-22523


Repository: hive-git


Description
---

In setError() we set the value of an atomic reference (pendingError) and we 
also put the error in a queue. The latter seems not just unnecessary but it 
might block the caller of the handler if the queue is full. Also closing the 
reader might not properly handled as some of the flags are not volatile.


Diffs
-

  
llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapRecordReader.java
 77966aa9650 


Diff: https://reviews.apache.org/r/71801/diff/1/


Testing
---

q tests


Thanks,

Attila Magyar



Re: Review Request 71792: COMPLETED_TXN_COMPONENTS table is never cleaned up unless Compactor runs

2019-11-21 Thread Denys Kuzmenko via Review Board


> On Nov. 20, 2019, 3:19 p.m., Denys Kuzmenko wrote:
> > Not ready. Need to handle aborted and currently active compactions.
> 
> Denys Kuzmenko wrote:
> Handling above cases would complicate the Initiator logic and make 
> preliminare check longer. Not sure how critial it is that in case of 
> unsuccessful compaction attempt, on next run we won't retry unless there is 
> some change to the selected table/partiotion. Any thoughts on this?

Changed findPotentialCompactions query to: 

select distinct ctc_database, ctc_table, ctc_partition from 
COMPLETED_TXN_COMPONENTS where 
(select CC_STATE from COMPLETED_COMPACTIONS where ctc_database = CC_DATABASE 
and ctc_table = CC_TABLE and (ctc_partition is null or ctc_partition = 
cc_partition)
order by cc_id desc limit 1) IN ('a', 'f') || ctc_timestamp < current_timestamp

however this still won't cover skipped compactions due to already running one


- Denys


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71792/#review218723
---


On Nov. 20, 2019, 12:20 p.m., Denys Kuzmenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71792/
> ---
> 
> (Updated Nov. 20, 2019, 12:20 p.m.)
> 
> 
> Review request for hive, Laszlo Pinter and Peter Vary.
> 
> 
> Bugs: HIVE-21917
> https://issues.apache.org/jira/browse/HIVE-21917
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The Initiator thread in the metastore repeatedly loops over entries in the 
> COMPLETED_TXN_COMPONENTS table to determine which partitions / tables might 
> need to be compacted. However, entries are never removed from this table 
> except by a completed Compactor run.
> 
> In a cluster where most tables / partitions are write-once read-many, this 
> results in stale entries in this table never being cleaned up. In a small 
> test cluster, we have observed approximately 45k entries in this table 
> (virtually equal to the number of partitions in the cluster) while < 100 of 
> these tables have delta files at all. Since most of the tables will never get 
> enough writes to trigger a compaction (and in fact have only ever been 
> written to once), the initiator thread keeps trying to evaluate them on every 
> loop.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 
> 610cf05204 
>   
> ql/src/test/org/apache/hadoop/hive/metastore/txn/TestCompactionTxnHandler.java
>  b28b57779b 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
>  8253ccb9c9 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  6281208247 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java
>  e840758c9d 
> 
> 
> Diff: https://reviews.apache.org/r/71792/diff/1/
> 
> 
> Testing
> ---
> 
> Unit tests
> 
> 
> Thanks,
> 
> Denys Kuzmenko
> 
>



Re: Review Request 71792: COMPLETED_TXN_COMPONENTS table is never cleaned up unless Compactor runs

2019-11-21 Thread Denys Kuzmenko via Review Board


> On Nov. 20, 2019, 3:19 p.m., Denys Kuzmenko wrote:
> > Not ready. Need to handle aborted and currently active compactions.

Handling above cases would complicate the Initiator logic and make preliminare 
check longer. Not sure how critial it is that in case of unsuccessful 
compaction attempt, on next run we won't retry unless there is some change to 
the selected table/partiotion. Any thoughts on this?


- Denys


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71792/#review218723
---


On Nov. 20, 2019, 12:20 p.m., Denys Kuzmenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71792/
> ---
> 
> (Updated Nov. 20, 2019, 12:20 p.m.)
> 
> 
> Review request for hive, Laszlo Pinter and Peter Vary.
> 
> 
> Bugs: HIVE-21917
> https://issues.apache.org/jira/browse/HIVE-21917
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The Initiator thread in the metastore repeatedly loops over entries in the 
> COMPLETED_TXN_COMPONENTS table to determine which partitions / tables might 
> need to be compacted. However, entries are never removed from this table 
> except by a completed Compactor run.
> 
> In a cluster where most tables / partitions are write-once read-many, this 
> results in stale entries in this table never being cleaned up. In a small 
> test cluster, we have observed approximately 45k entries in this table 
> (virtually equal to the number of partitions in the cluster) while < 100 of 
> these tables have delta files at all. Since most of the tables will never get 
> enough writes to trigger a compaction (and in fact have only ever been 
> written to once), the initiator thread keeps trying to evaluate them on every 
> loop.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 
> 610cf05204 
>   
> ql/src/test/org/apache/hadoop/hive/metastore/txn/TestCompactionTxnHandler.java
>  b28b57779b 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
>  8253ccb9c9 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  6281208247 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java
>  e840758c9d 
> 
> 
> Diff: https://reviews.apache.org/r/71792/diff/1/
> 
> 
> Testing
> ---
> 
> Unit tests
> 
> 
> Thanks,
> 
> Denys Kuzmenko
> 
>



[jira] [Created] (HIVE-22524) CommandProcessorException should utilize standard Exception fields

2019-11-21 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-22524:
---

 Summary: CommandProcessorException should utilize standard 
Exception fields
 Key: HIVE-22524
 URL: https://issues.apache.org/jira/browse/HIVE-22524
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


CommandProcessorException right now has:
* getCause() inherited from Exception
* getException() local implementation
* getMessage() inherited from Exception
* getErrorMessage() local implementation




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22523) The error handler in LlapRecordReader might block if its queue is full

2019-11-21 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-22523:


 Summary: The error handler in LlapRecordReader might block if its 
queue is full
 Key: HIVE-22523
 URL: https://issues.apache.org/jira/browse/HIVE-22523
 Project: Hive
  Issue Type: Bug
Reporter: Attila Magyar
Assignee: Attila Magyar
 Fix For: 4.0.0


In setError() we set the value of an atomic reference (pendingError) and we 
also put the error in a queue. The latter seems not just unnecessary but it 
might block the caller of the handler if the queue is full. Also closing of the 
reader is might not properly handled as some of the flags are not volatile.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 71784: HiveProtoLoggingHook might consume lots of memory

2019-11-21 Thread Attila Magyar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71784/
---

(Updated Nov. 21, 2019, 9:40 a.m.)


Review request for hive, Laszlo Bodor, Harish Jaiprakash, Mustafa Iman, and 
Panos Garefalakis.


Changes
---

test might be flaky, ignore it until we find a better solution


Bugs: HIVE-22514
https://issues.apache.org/jira/browse/HIVE-22514


Repository: hive-git


Description
---

HiveProtoLoggingHook uses a ScheduledThreadPoolExecutor to submit writer tasks 
and to periodically handle rollover. The builtin ScheduledThreadPoolExecutor 
uses a unbounded queue which cannot be replaced from the outside. If log events 
are generated at a very fast rate this queue can grow large.

Since ScheduledThreadPoolExecutor does not support changing the default 
unbounded queue to a bounded one, the queue capacity is checked manually by the 
patch.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a7687d59004 
  ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java 
8eab54859bf 
  ql/src/test/org/apache/hadoop/hive/ql/hooks/TestHiveProtoLoggingHook.java 
450a0b544d6 


Diff: https://reviews.apache.org/r/71784/diff/2/

Changes: https://reviews.apache.org/r/71784/diff/1-2/


Testing
---

unittest


Thanks,

Attila Magyar