[jira] [Created] (HIVE-23168) Implement MJ HashTable contains key functionality

2020-04-09 Thread Panagiotis Garefalakis (Jira)
Panagiotis Garefalakis created HIVE-23168:
-

 Summary: Implement MJ HashTable contains key functionality
 Key: HIVE-23168
 URL: https://issues.apache.org/jira/browse/HIVE-23168
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23169) Probe runtime support for LLAP

2020-04-09 Thread Panagiotis Garefalakis (Jira)
Panagiotis Garefalakis created HIVE-23169:
-

 Summary: Probe runtime support for LLAP
 Key: HIVE-23169
 URL: https://issues.apache.org/jira/browse/HIVE-23169
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23171) Create Tool To Visualize Hive Parser Tree

2020-04-09 Thread David Mollitor (Jira)
David Mollitor created HIVE-23171:
-

 Summary: Create Tool To Visualize Hive Parser Tree
 Key: HIVE-23171
 URL: https://issues.apache.org/jira/browse/HIVE-23171
 Project: Hive
  Issue Type: Improvement
Reporter: David Mollitor
Assignee: David Mollitor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23172) Quoted Backtick Columns Are Not Parsing Correctly

2020-04-09 Thread David Mollitor (Jira)
David Mollitor created HIVE-23172:
-

 Summary: Quoted Backtick Columns Are Not Parsing Correctly
 Key: HIVE-23172
 URL: https://issues.apache.org/jira/browse/HIVE-23172
 Project: Hive
  Issue Type: Improvement
Reporter: David Mollitor
Assignee: David Mollitor


I recently came across a weird behavior while examining failures of 
{{special_character_in_tabnames_2.q}} while working on HIVE-23150. I was 
surprised to see it fail because I couldn't see of any reason why it should... 
it's doing pretty standard SQL statements just like every other test, but for 
some reason this test is just a *little bit* differently than most others and 
it brought this issue to light.

Turns out,... the parsing of table names is pretty much wrong across the board.

The statement that caught my attention was this:
{code:sql}
DROP TABLE IF EXISTS `s/c`;
{code}
And here is the relevant grammar:
{code:none}
fragment
RegexComponent
: 'a'..'z' | 'A'..'Z' | '0'..'9' | '_'
| PLUS | STAR | QUESTION | MINUS | DOT
| LPAREN | RPAREN | LSQUARE | RSQUARE | LCURLY | RCURLY
| BITWISEXOR | BITWISEOR | DOLLAR | '!'
;

Identifier
:
(Letter | Digit) (Letter | Digit | '_')*
| {allowQuotedId()}? QuotedIdentifier  /* though at the language level we 
allow all Identifiers to be QuotedIdentifiers; 
  at the API level only columns are 
allowed to be of this form */
| '`' RegexComponent+ '`'
;

fragment
QuotedIdentifier 
:
'`'  ( '``' | ~('`') )* '`' { 
setText(StringUtils.replace(getText().substring(1, getText().length() -1 ), 
"``", "`")); }
;
{code}
The mystery for me was that, for some reason, this String {{`s/c`}} was being 
stripped of its back-ticks. Every other test I investigated did not have this 
behavior, the back ticks were always preserved around the table name. The main 
Hive Java code base would see the back-ticks and deal with it internally. For 
HIVE-23150, I introduced some sanity checks and they were failing because they 
were expecting the back ticks to be present.

With the help of HIVE-23171 I finally figured it out. So, what I discovered is 
that pretty much every table name is hitting the {{RegexComponent}} rule and 
the back ticks are carried forward. However, {{`s/c`}} the forward slash `/` is 
not allowable in {{RegexComponent}} so it hits on {{QuotedIdentifier}} rule 
which is trimming the back ticks.

I validated this by disabling {{QuotedIdentifier}}. When I did this, {{`s/c`}} 
fails in error but {{`sc`}} parses successfully... because {{`sc`}} is being 
treated as a {{RegexComponent}}.

So, if you have {{allowQuotedId}} disabled, table names can only use the 
characters defined in the {{RegexComponent}} rule (otherwise it errors), and it 
will *not* strip the back ticks. If you have {{allowQuotedId}} enabled, then if 
the table name has a character not specified in {{RegexComponent}}, it will 
identify it as a table name and it *will* strip the back ticks, if all the 
characters are part of {{RegexComponent}} then it will *not* strip the back 
ticks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23170) Probe support for ORC DataConsumer

2020-04-09 Thread Panagiotis Garefalakis (Jira)
Panagiotis Garefalakis created HIVE-23170:
-

 Summary: Probe support for ORC DataConsumer
 Key: HIVE-23170
 URL: https://issues.apache.org/jira/browse/HIVE-23170
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23167) Extend compiler support for Probe static filters

2020-04-09 Thread Panagiotis Garefalakis (Jira)
Panagiotis Garefalakis created HIVE-23167:
-

 Summary: Extend compiler support for Probe static filters
 Key: HIVE-23167
 URL: https://issues.apache.org/jira/browse/HIVE-23167
 Project: Hive
  Issue Type: Sub-task
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23173) User login success/failed attempts should be logged

2020-04-09 Thread Naresh P R (Jira)
Naresh P R created HIVE-23173:
-

 Summary: User login success/failed attempts should be logged
 Key: HIVE-23173
 URL: https://issues.apache.org/jira/browse/HIVE-23173
 Project: Hive
  Issue Type: Improvement
Reporter: Naresh P R


User login success & failure attempts should be logged in server logs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Many ANTLR Tokens

2020-04-09 Thread David Mollitor
Hello Gang,

I am investigating HIVE-23172 and I am having a problem addressing this
because I am getting the following error from compiling the grammar:

hive-parser: Compilation failure
[ERROR]
/home/apache/hive/hive/parser/target/generated-sources/antlr3/org/apache/hadoop/hive/ql/parse/HiveParser.java:[40,38]
code too large

I traced it down to the fact that there are too many token defined.  In
HiveParser.java, it has the following:

 public static final String[] tokenNames = new String[] { ... };

That list is so long, it's breaking Java compilation.  Someone else came
across this awhile ago: HIVE-15577.

I observed that the parser defines two token for most elements, for example:

KW_TRUNCATE / TOK_TRUNCATETABLE

What is the value of having both?  Can we consolidate this down to one and
conserve some space?  I would propose just using  TOK_TRUNCATE and get rid
of the KW version.

Does anyone have an insight into why things are setup the way they are?


[jira] [Created] (HIVE-23174) Remove TOK_TRUNCATETABLE

2020-04-09 Thread David Mollitor (Jira)
David Mollitor created HIVE-23174:
-

 Summary: Remove TOK_TRUNCATETABLE
 Key: HIVE-23174
 URL: https://issues.apache.org/jira/browse/HIVE-23174
 Project: Hive
  Issue Type: Sub-task
Reporter: David Mollitor
Assignee: David Mollitor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23175) Skip serializing hadoop and tez config on HS side

2020-04-09 Thread Mustafa Iman (Jira)
Mustafa Iman created HIVE-23175:
---

 Summary: Skip serializing hadoop and tez config on HS side
 Key: HIVE-23175
 URL: https://issues.apache.org/jira/browse/HIVE-23175
 Project: Hive
  Issue Type: Improvement
Reporter: Mustafa Iman
Assignee: Mustafa Iman


HiveServer spends a lot of time serializing configuration objects. We can skip 
putting hadoop and tez config xml files in payload assuming that the configs 
are the same on both HS and AM side. This depends on Tez to load local xml 
configs when creating config objects 
https://issues.apache.org/jira/browse/TEZ-4141



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23163) Class TrustDomainAuthenticationTest should be abstract

2020-04-09 Thread Zhenyu Zheng (Jira)
Zhenyu Zheng created HIVE-23163:
---

 Summary: Class TrustDomainAuthenticationTest should be abstract
 Key: HIVE-23163
 URL: https://issues.apache.org/jira/browse/HIVE-23163
 Project: Hive
  Issue Type: Bug
Reporter: Zhenyu Zheng






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 72324: HIVE-22750: Consolidate LockType naming

2020-04-09 Thread Denys Kuzmenko via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72324/#review220265
---


Ship it!




Ship It!

- Denys Kuzmenko


On April 8, 2020, 3:09 p.m., Marton Bod wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72324/
> ---
> 
> (Updated April 8, 2020, 3:09 p.m.)
> 
> 
> Review request for hive, Denys Kuzmenko, Peter Vary, and Zoltan Chovan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-22750: Consolidate LockType naming
> 
> 
> Diffs
> -
> 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/HiveEndPoint.java
>  e249b7775e 
>   
> hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/mutate/client/lock/Lock.java
>  52eb6133e7 
>   metastore/scripts/upgrade/hive/hive-schema-4.0.0.hive.sql 03540bba4d 
>   metastore/scripts/upgrade/hive/upgrade-3.1.0-to-4.0.0.hive.sql fa518747de 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 17e6cdf162 
>   ql/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java 
> 72f095d264 
>   ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java 
> 80fb1aff78 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/LockType.java
>  8ae4351129 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/Types.php
>  db4cfb996a 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ttypes.py
>  cf3137928f 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-rb/hive_metastore_types.rb
>  849970eb56 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/LockComponentBuilder.java
>  c739d4d196 
>   standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift 
> 098ddec5dc 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  7d0db0c3a0 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java
>  da38a6bbd3 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/LockTypeUtil.java
>  PRE-CREATION 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreTxns.java
>  1dfc105958 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/utils/LockTypeUtilTest.java
>  PRE-CREATION 
>   streaming/src/java/org/apache/hive/streaming/TransactionBatch.java 
> d44065018f 
> 
> 
> Diff: https://reviews.apache.org/r/72324/diff/4/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Marton Bod
> 
>



Re: Review Request 72276: HIVE-23084: Implement kill query in multiple HS2 environment

2020-04-09 Thread Adam Szita via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72276/#review220266
---


Ship it!




Ship It!

- Adam Szita


On April 6, 2020, 10:04 a.m., Peter Varga wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72276/
> ---
> 
> (Updated April 6, 2020, 10:04 a.m.)
> 
> 
> Review request for hive and Adam Szita.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> KILL  command was implemented in:
> 
> https://issues.apache.org/jira/browse/HIVE-17483
> https://issues.apache.org/jira/browse/HIVE-20549
> But it is not working in an environment where service discovery is enabled 
> and more than one HS2 instance is running (except for manually sending the 
> kill query to all HS2 instance).
> 
> Solution:
> 
> If a HS2 instance can't kill a query locally, it should post a kill query 
> request to the Zookeeper
> Every HS2 should watch the Zookeeper for kill query requests and if its 
> running on that instance kill it
> Authorization of kill query should work the same
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 73f185a1f3 
>   
> itests/hive-unit/src/test/java/org/apache/hive/jdbc/BaseJdbcWithMiniLlap.java 
> 3973ec9270 
>   
> itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcWithMiniLlapArrow.java
>  68a515ccbe 
>   
> itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcWithServiceDiscovery.java
>  PRE-CREATION 
>   
> itests/hive-unit/src/test/java/org/apache/hive/service/cli/thrift/TestMiniHS2StateWithNoZookeeper.java
>  99e681e5b2 
>   
> itests/hive-unit/src/test/java/org/apache/hive/service/server/TestKillQueryZookeeperManager.java
>  PRE-CREATION 
>   itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java 
> 1b60a51ebd 
>   jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java db965e7a22 
>   
> ql/src/java/org/apache/hadoop/hive/ql/ddl/process/kill/KillQueriesOperation.java
>  afde1a4762 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java 
> 8becef1cd3 
>   service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
> 9e497545b5 
>   service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
> 277519cba5 
>   service/src/java/org/apache/hive/service/server/HiveServer2.java 181ea5d6d5 
>   service/src/java/org/apache/hive/service/server/KillQueryImpl.java 
> 883e32bd2e 
>   
> service/src/java/org/apache/hive/service/server/KillQueryZookeeperManager.java
>  PRE-CREATION 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/common/ZooKeeperHiveHelper.java
>  71d8651712 
> 
> 
> Diff: https://reviews.apache.org/r/72276/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Peter Varga
> 
>



Re: Review Request 72333: HIVE-23154: Fix race condition in Utilities::mvFileToFinalPath

2020-04-09 Thread Rajesh Balamohan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72333/
---

(Updated April 9, 2020, 7:44 a.m.)


Review request for hive, Ashutosh Chauhan and Thejas Nair.


Bugs: HIVE-23154
https://issues.apache.org/jira/browse/HIVE-23154


Repository: hive-git


Description
---

With rename(), we could run into a race condition between snapshot of files to 
be moved ("filesKept") and when the fs.rename happens. It is possible that run 
awaay task could have added more files to this. 

1. Patch fixes the problem by relying on local threadpool to move the files 
instead of fs.rename (where S3AFileSystem's rename is inherently parallel).

2. Same race condition issue persists in "insert into" mode as well, which was 
relying on "fs.rename". Patch fixes this issue as well.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java e25dc54e7d 


Diff: https://reviews.apache.org/r/72333/diff/2/

Changes: https://reviews.apache.org/r/72333/diff/1-2/


Testing
---


File Attachments (updated)


HIVE-23154.3.patch
  
https://reviews.apache.org/media/uploaded/files/2020/04/09/38ab8dfe-18c2-4174-84da-e1028ad4133c__HIVE-23154.3.patch


Thanks,

Rajesh Balamohan



[jira] [Created] (HIVE-23164) server is not properly terminated because of non-daemon threads

2020-04-09 Thread Eugene Chung (Jira)
Eugene Chung created HIVE-23164:
---

 Summary: server is not properly terminated because of non-daemon 
threads
 Key: HIVE-23164
 URL: https://issues.apache.org/jira/browse/HIVE-23164
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Eugene Chung
 Attachments: thread_dump_hiveserver2_is_not_terminated.txt

As you know, HiveServer2 which receives the deregister command is at first 
preparing shutdown. If there're no remaining sessions, HiveServer2.stop() is 
called to shut down. But I found the case that the HiveServer2 JVM is not 
terminated even if HiveServer2.stop() is called and properly processed. The 
case is always occurred when the local(embedded) metastore is used.

I've attached the full thread dump describing the situation.

[^thread_dump_hiveserver2_is_not_terminated.txt]

In this thread dump, you can see some bunch of 'daemon' threads, NO main 
thread, and some non-daemon thread(or user thread)s. As specified by 
[https://www.baeldung.com/java-daemon-thread], if there exists at least one 
user thread which is alive, JVM does not terminate. (Note that DestroyJavaVM 
thread is non-daemon but it's special.)

 
{code:java}
"pool-8-thread-1" #24 prio=5 os_prio=0 tid=0x7f52ad1fc000 nid=0x821c 
waiting on condition [0x7f525c50]
 java.lang.Thread.State: TIMED_WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for <0x0003cfa057c0> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
 at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
 - None
{code}
 

 

The thread above is created by ScheduledThreadPoolExecutor(int coreSize) with 
default ThreadFactory which always creates non-daemon thread. If such thread 
pool is not shut down with ScheduledThreadPoolExecutor.shutdown() method, JVM 
cannot terminate! The only way to kill is TERM signal. If JVM receives TERM 
signal, it ignores non-daemon threads.

So I have been digging modules which create ScheduledThreadPoolExecutor with 
non-daemon threads and now I got it. As you may guess, it's local(embedded) 
metastore. It's created by 
org.apache.hadoop.hive.metastore.HiveMetaStore.HMSHandler#startAlwaysTaskThreads()
 and ScheduledThreadPoolExecutor.shutdown() is never called.

Plus, I found another usage of creating such ScheduledThreadPoolExecutor and 
not calling its shutdown. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23165) Hive On Spark left join and right join generated inconsistent data

2020-04-09 Thread qingfa zhou (Jira)
qingfa zhou created HIVE-23165:
--

 Summary:  Hive On Spark left join and right join generated 
inconsistent data
 Key: HIVE-23165
 URL: https://issues.apache.org/jira/browse/HIVE-23165
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.2.0
 Environment: hive :2.3.0

spark:2.2.0

hadoop:2.7.3
Reporter: qingfa zhou
Assignee: Xuefu Zhang


*1)This is my sql.*

with delivery_day as (
 select * from (
 select dt,warehouse_code,b.sku_main_code,b.out_warehouse_code,b.is_pici_order
 from data_smartorder.dm_ordering_information_system_order_detail_parse t
 lateral view 
json_tuple(t.information_info,'warehouse_code','sku_main_code','调出仓','是否预付商品')b 
as warehouse_code,sku_main_code,out_warehouse_code,is_pici_order
 where dt=date_format(date_sub(current_date,1),'MMdd')
 and l1_category_name='策略配置'
 and l2_category_name='pb仓库补货仓品维度新'
 and b.is_pici_order='1'
 )t
),

avg_sale_7 as (
 select *,sku_sale_quantity+first_dilivery_quantity as avg_sale_7
 from (
 select t1.warehouse_code,t1.warehouse_name,t1.sku_main_code,t1.sku_name 
sku_main_name,
 sum(t1.warehouse_dispatch_quantity) as warehouse_dispatch_quantity,
 sum(t1.sku_sale_quantity) as sku_sale_quantity,
 sum(t1.first_dilivery_quantity) as first_dilivery_quantity
 from data_smartorder.dw_ordering_warehouse_sku_cargo_delivery_data_di t1
 where t1.dt=date_format(date_sub(current_date,1),'MMdd')
 group by t1.warehouse_code,t1.warehouse_name,t1.sku_main_code,t1.sku_name
 )t
)

 select t1.warehouse_code,t1.sku_main_code,t1.out_warehouse_code,
 t2.avg_sale_7
 from delivery_day t1
 left join avg_sale_7 t2
 on t1.warehouse_code=t2.warehouse_code
 and t1.sku_main_code=t2.sku_main_code
 where t1.sku_main_code='37010832'
 and t1.out_warehouse_code='1011';

left join and right join generated inconsistent data.

2) result in the left join 
7001  37010832  1011  26.8572
1011  37010832  1011  130.2858
2002  37010832  1011  40
1701  37010832  1011  NULL

3) result in the right join 
1011  37010832  1011  65.1429
2002  37010832  1011  20
7001  37010832  1011  13.4286

Inconsistent results in last column,'right join' 's result is right.But the 
results of hive on tez and sparksql are consistent and is true.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23166) Protect VGB from flushing too often

2020-04-09 Thread Panagiotis Garefalakis (Jira)
Panagiotis Garefalakis created HIVE-23166:
-

 Summary: Protect VGB from flushing too often
 Key: HIVE-23166
 URL: https://issues.apache.org/jira/browse/HIVE-23166
 Project: Hive
  Issue Type: Improvement
  Components: llap
Affects Versions: 4.0.0
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


The existing flush logic in our VectorGroupByOperator is completely static.
 It depends on the: number of HtEntries (*hive.vectorized.groupby.maxentries*) 
and the MAX memory threshold (by default 90% of available memory)
 
Assuming that we are not memory constrained the periodicity of flushing is 
currently dictated by the static number of entries (1M by default) which can be 
also misconfigured to a very low value.

I am proposing along with maxHtEntries, to also take into account current 
memory usage, to avoid flushing too ofter as it can hurt op throughput for 
particular workloads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 72112: HIVE-22869 - Add locking benchmark to metastore-tools/metastore-benchmarks

2020-04-09 Thread Zoltan Chovan via Review Board


> On April 3, 2020, 11:28 a.m., Denys Kuzmenko wrote:
> > standalone-metastore/metastore-tools/metastore-benchmarks/src/main/java/org/apache/hadoop/hive/metastore/tools/HMSBenchmarks.java
> > Lines 410 (patched)
> > 
> >
> > Do you really need to have physical tables?

no we don't :) removed this part


> On April 3, 2020, 11:28 a.m., Denys Kuzmenko wrote:
> > standalone-metastore/metastore-tools/tools-common/src/main/java/org/apache/hadoop/hive/metastore/tools/HMSClient.java
> > Lines 455 (patched)
> > 
> >
> > Should we care about replPolicy here?

probably not, removed it


- Zoltan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72112/#review220213
---


On April 2, 2020, 2:13 p.m., Zoltan Chovan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72112/
> ---
> 
> (Updated April 2, 2020, 2:13 p.m.)
> 
> 
> Review request for hive, Denys Kuzmenko, Aron Hamvas, Marton Bod, and Peter 
> Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Add the possibility to run benchmarks on opening lock in the HMS. Currently 
> this change only introduces single-threaded/single client testing. I'm 
> planning to add multi-client support in a separate change.
> 
> Example parametrisation is as follows:
> hbench -M "lock" -N 10 -d hive_test -W 0 -L 100
> hbench -M ".*Lock.*" -N 10 -d hive_test -W 0 -L 100 -T 8 --params 100
> 
> This will create N number (10) of tables to lock and it'll execute the lock() 
> for L number (100) of times on T (8) threads where each thread will strart an 
> HMS client
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-tools/metastore-benchmarks/src/main/java/org/apache/hadoop/hive/metastore/tools/BenchmarkTool.java
>  041cd76234 
>   
> standalone-metastore/metastore-tools/metastore-benchmarks/src/main/java/org/apache/hadoop/hive/metastore/tools/HMSBenchmarks.java
>  f53f2ef43b 
>   
> standalone-metastore/metastore-tools/tools-common/src/main/java/org/apache/hadoop/hive/metastore/tools/HMSClient.java
>  7cc1e42a8b 
>   
> standalone-metastore/metastore-tools/tools-common/src/main/java/org/apache/hadoop/hive/metastore/tools/Util.java
>  101d6759c5 
> 
> 
> Diff: https://reviews.apache.org/r/72112/diff/1/
> 
> 
> Testing
> ---
> 
> 
> File Attachments
> 
> 
> HIVE-22869.2.patch
>   
> https://reviews.apache.org/media/uploaded/files/2020/04/02/5e35e835-f383-495f-9964-e66773fd6a90__HIVE-22869.2.patch
> 
> 
> Thanks,
> 
> Zoltan Chovan
> 
>



Re: Review Request 72112: HIVE-22869 - Add locking benchmark to metastore-tools/metastore-benchmarks

2020-04-09 Thread Zoltan Chovan via Review Board


> On April 3, 2020, 9:59 a.m., Peter Vary wrote:
> > standalone-metastore/metastore-tools/tools-common/src/main/java/org/apache/hadoop/hive/metastore/tools/HMSClient.java
> > Lines 341 (patched)
> > 
> >
> > I put this in the code with HIVE-23042:
> >   boolean openTxn(int numTxns) throws TException {
> > client.open_txns(new OpenTxnRequest(numTxns, "Test", "Host"));
> > return true;
> >   }
> >   
> > Maybe merge those?

The main difference between our two implementations of openTxn is that mine 
automatically returns the opened txn's id, in your version there has to be an 
additional getOpenTxns() call made to get the Id. 
Not sure if getOpenTxns would return some other ids that belong to an other 
client when multiple threads are used, o sI might be misunderstanding the 
getOpenTxns() call.
What do you think?


- Zoltan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72112/#review220212
---


On April 2, 2020, 2:13 p.m., Zoltan Chovan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72112/
> ---
> 
> (Updated April 2, 2020, 2:13 p.m.)
> 
> 
> Review request for hive, Denys Kuzmenko, Aron Hamvas, Marton Bod, and Peter 
> Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Add the possibility to run benchmarks on opening lock in the HMS. Currently 
> this change only introduces single-threaded/single client testing. I'm 
> planning to add multi-client support in a separate change.
> 
> Example parametrisation is as follows:
> hbench -M "lock" -N 10 -d hive_test -W 0 -L 100
> hbench -M ".*Lock.*" -N 10 -d hive_test -W 0 -L 100 -T 8 --params 100
> 
> This will create N number (10) of tables to lock and it'll execute the lock() 
> for L number (100) of times on T (8) threads where each thread will strart an 
> HMS client
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-tools/metastore-benchmarks/src/main/java/org/apache/hadoop/hive/metastore/tools/BenchmarkTool.java
>  041cd76234 
>   
> standalone-metastore/metastore-tools/metastore-benchmarks/src/main/java/org/apache/hadoop/hive/metastore/tools/HMSBenchmarks.java
>  f53f2ef43b 
>   
> standalone-metastore/metastore-tools/tools-common/src/main/java/org/apache/hadoop/hive/metastore/tools/HMSClient.java
>  7cc1e42a8b 
>   
> standalone-metastore/metastore-tools/tools-common/src/main/java/org/apache/hadoop/hive/metastore/tools/Util.java
>  101d6759c5 
> 
> 
> Diff: https://reviews.apache.org/r/72112/diff/1/
> 
> 
> Testing
> ---
> 
> 
> File Attachments
> 
> 
> HIVE-22869.2.patch
>   
> https://reviews.apache.org/media/uploaded/files/2020/04/02/5e35e835-f383-495f-9964-e66773fd6a90__HIVE-22869.2.patch
> 
> 
> Thanks,
> 
> Zoltan Chovan
> 
>



Re: Review Request 72112: HIVE-22869 - Add locking benchmark to metastore-tools/metastore-benchmarks

2020-04-09 Thread Zoltan Chovan via Review Board


> On April 3, 2020, 9:59 a.m., Peter Vary wrote:
> > standalone-metastore/metastore-tools/metastore-benchmarks/src/main/java/org/apache/hadoop/hive/metastore/tools/HMSBenchmarks.java
> > Lines 430 (patched)
> > 
> >
> > Do we need this? Shouldn't the lock be unlocked by commitTnx?

you're right


- Zoltan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72112/#review220212
---


On April 2, 2020, 2:13 p.m., Zoltan Chovan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72112/
> ---
> 
> (Updated April 2, 2020, 2:13 p.m.)
> 
> 
> Review request for hive, Denys Kuzmenko, Aron Hamvas, Marton Bod, and Peter 
> Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Add the possibility to run benchmarks on opening lock in the HMS. Currently 
> this change only introduces single-threaded/single client testing. I'm 
> planning to add multi-client support in a separate change.
> 
> Example parametrisation is as follows:
> hbench -M "lock" -N 10 -d hive_test -W 0 -L 100
> hbench -M ".*Lock.*" -N 10 -d hive_test -W 0 -L 100 -T 8 --params 100
> 
> This will create N number (10) of tables to lock and it'll execute the lock() 
> for L number (100) of times on T (8) threads where each thread will strart an 
> HMS client
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-tools/metastore-benchmarks/src/main/java/org/apache/hadoop/hive/metastore/tools/BenchmarkTool.java
>  041cd76234 
>   
> standalone-metastore/metastore-tools/metastore-benchmarks/src/main/java/org/apache/hadoop/hive/metastore/tools/HMSBenchmarks.java
>  f53f2ef43b 
>   
> standalone-metastore/metastore-tools/tools-common/src/main/java/org/apache/hadoop/hive/metastore/tools/HMSClient.java
>  7cc1e42a8b 
>   
> standalone-metastore/metastore-tools/tools-common/src/main/java/org/apache/hadoop/hive/metastore/tools/Util.java
>  101d6759c5 
> 
> 
> Diff: https://reviews.apache.org/r/72112/diff/1/
> 
> 
> Testing
> ---
> 
> 
> File Attachments
> 
> 
> HIVE-22869.2.patch
>   
> https://reviews.apache.org/media/uploaded/files/2020/04/02/5e35e835-f383-495f-9964-e66773fd6a90__HIVE-22869.2.patch
> 
> 
> Thanks,
> 
> Zoltan Chovan
> 
>



Re: Review Request 72112: HIVE-22869 - Add locking benchmark to metastore-tools/metastore-benchmarks

2020-04-09 Thread Zoltan Chovan via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72112/
---

(Updated April 9, 2020, 2:58 p.m.)


Review request for hive, Denys Kuzmenko, Aron Hamvas, Marton Bod, and Peter 
Vary.


Repository: hive-git


Description
---

Add the possibility to run benchmarks on opening lock in the HMS. Currently 
this change only introduces single-threaded/single client testing. I'm planning 
to add multi-client support in a separate change.

Example parametrisation is as follows:
hbench -M "lock" -N 10 -d hive_test -W 0 -L 100
hbench -M ".*Lock.*" -N 10 -d hive_test -W 0 -L 100 -T 8 --params 100

This will create N number (10) of tables to lock and it'll execute the lock() 
for L number (100) of times on T (8) threads where each thread will strart an 
HMS client


Diffs
-

  
standalone-metastore/metastore-tools/metastore-benchmarks/src/main/java/org/apache/hadoop/hive/metastore/tools/BenchmarkTool.java
 041cd76234 
  
standalone-metastore/metastore-tools/metastore-benchmarks/src/main/java/org/apache/hadoop/hive/metastore/tools/HMSBenchmarks.java
 f53f2ef43b 
  
standalone-metastore/metastore-tools/tools-common/src/main/java/org/apache/hadoop/hive/metastore/tools/HMSClient.java
 7cc1e42a8b 
  
standalone-metastore/metastore-tools/tools-common/src/main/java/org/apache/hadoop/hive/metastore/tools/Util.java
 101d6759c5 


Diff: https://reviews.apache.org/r/72112/diff/1/


Testing
---


File Attachments (updated)


HIVE-22869.2.patch
  
https://reviews.apache.org/media/uploaded/files/2020/04/02/5e35e835-f383-495f-9964-e66773fd6a90__HIVE-22869.2.patch
HIVE-22869.3.patch
  
https://reviews.apache.org/media/uploaded/files/2020/04/09/458beaa7-4743-40fb-a213-1ae4527be823__HIVE-22869.3.patch


Thanks,

Zoltan Chovan