[jira] [Created] (HIVE-25663) Need to modify table/partition lock acquisition retry for Zookeeper option

2021-10-29 Thread Eugene Chung (Jira)
Eugene Chung created HIVE-25663:
---

 Summary: Need to modify table/partition lock acquisition retry for 
Zookeeper option
 Key: HIVE-25663
 URL: https://issues.apache.org/jira/browse/HIVE-25663
 Project: Hive
  Issue Type: Improvement
  Components: Locking
Reporter: Eugene Chung
Assignee: Eugene Chung
 Attachments: image-2021-10-30-11-54-42-164.png

 
{code:java}
LOCK TABLE default.my_table PARTITION (log_date='2021-10-30') EXCLUSIVE;
SET hive.query.timeout.seconds=5;
SELECT * FROM default.my_table WHERE log_date='2021-10-30' LIMIT 10;
{code}
 

If you execute the three SQLs above in the same session, the last SELECT will 
be cancelled by timeout error. The problem is that when you execute 'show 
locks', you will see a SHARED lock of default.my_table is remained for 100 
minutes, if you are using ZooKeeperHiveLockManager.

!image-2021-10-30-11-54-42-164.png|width=873,height=411!

I am going to explain the problem one by one.

 

The SELECT SQL which gets some data from a partitioned table 

 
{code:java}
SELECT * FROM my_table WHERE log_date='2021-10-30' LIMIT 10{code}
 

needs two SHARED locks in order. The two SHARED locks are
 * default.my_table
 * default.my_table@log_date=2021-10-30

Before executing the SQL, an EXCLUSIVE lock of the partition exists. I can 
simulate it easily with a DDL like below;

 
{code:java}
LOCK TABLE default.my_table PARTITION (log_date='2021-10-30') EXCLUSIVE{code}
 

The SELECT SQL can't acquire the SHARED lock of the partition and it retries to 
acquire it as specified by two configurations. The default values mean it will 
retry for 100 minutes.
 * hive.lock.sleep.between.retries=60s
 * hive.lock.numretries=100

 

If query.timeout is set to 5 seconds, the SELECT SQL is cancelled 5 seconds 
later and the client returns with timeout error. But the SHARED lock of the 
my_table is still remained. It's because [the current ZooKeeperHiveLockManager 
just logs 
InterruptedException|https://github.com/apache/hive/blob/8a8e03d02003aa3543f46f595b4425fd8c156ad9/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java#L326]
 and still goes on lock retry. This also means that the SQL processing thread 
is still doing its job for 100 minutes(by default) even though the SQL is 
cancelled. If the same SQL is executed 3 times, you can see 3 threads each of 
which thread dump is like below;

 
{code:java}
"HiveServer2-Background-Pool: Thread-154" #154 prio=5 os_prio=0 
tid=0x7f0ac91cb000 nid=0x13d25 waiting on condition [0x000
07f0aa2ce2000]
 java.lang.Thread.State: TIMED_WAITING (sleeping)
 at java.lang.Thread.sleep(Native Method)
 at 
org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager.lock(ZooKeeperHiveLockManager.java:303)
 at 
org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager.lock(ZooKeeperHiveLockManager.java:207)
 at 
org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager.acquireLocks(DummyTxnManager.java:199)
 at org.apache.hadoop.hive.ql.Driver.acquireLocks(Driver.java:1610)
 at org.apache.hadoop.hive.ql.Driver.lockAndRespond(Driver.java:1796)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1966)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1710)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1704)
 at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)
 at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:217)
 at 
org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:87)
 at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:309)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
 at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:322)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748){code}
 

 

I think ZooKeeperHiveLockManager should not swallow the unexpected exception. I 
should retry for expected ones.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24948) Reducing overhead of OrcInputFormat getSplits with bucket pruning

2021-03-26 Thread Eugene Chung (Jira)
Eugene Chung created HIVE-24948:
---

 Summary: Reducing overhead of OrcInputFormat getSplits with bucket 
pruning
 Key: HIVE-24948
 URL: https://issues.apache.org/jira/browse/HIVE-24948
 Project: Hive
  Issue Type: Bug
  Components: ORC, Query Processor, Tez
Reporter: Eugene Chung
Assignee: Eugene Chung
 Fix For: 4.0.0


The summarized flow of generating input splits at Tez AM is like below; (by 
calling HiveSplitGenerator.initialize())
 # Perform dynamic partition pruning
 # Get the list of InputSplit by calling InputFormat.getSplits()
https://github.com/apache/hive/blob/624f62aadc08577cafaa299cfcf17c71fa6cdb3a/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java#L260-L260
 # Perform bucket pruning with the list above if it's possible
https://github.com/apache/hive/blob/624f62aadc08577cafaa299cfcf17c71fa6cdb3a/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java#L299-L301

But I observed that the action 2, getting the list of InputSplit, can make big 
overhead when the inputs are ORC files in HDFS.


For example, there is a ORC table T partitioned by 'log_date' and each 
partitions is bucketed by a column 'q'. There are 240 buckets in each partition 
and the size of each bucket(ORC file) is, let's say, 100MB.

The SQL is like this. 

 
{noformat}
set hive.tez.bucket.pruning=true;
select q, count(*) from T
where log_date between '2020-01-01' and '2020-06-30'
and q = 'foobar'
group by q;{noformat}
It means there are 240 * 183(days) = 43680 buckets or ORC files in the input 
path, but thanks to bucket pruning, only 183 buckets will be processed.

 

In my company's environment, the whole processing time of the SQL was roughly 5 
minutes. However, I've checked that it took more than 3 minutes to make the 
list of OrcSplit for 43680 ORC files! The logs with tez.am.log.level=DEBUG 
showed like below;

 
{noformat}
2021-03-25 01:21:31,850 [DEBUG] [InputInitializer {Map 1} #0] 
|orc.OrcInputFormat|: getSplits started
...
2021-03-25 01:24:51,435 [DEBUG] [InputInitializer {Map 1} #0] 
|orc.OrcInputFormat|: getSplits finished
2021-03-25 01:24:51,444 [INFO] [InputInitializer {Map 1} #0] 
|io.HiveInputFormat|: number of splits 43680
2021-03-25 01:24:51,444 [DEBUG] [InputInitializer {Map 1} #0] |log.PerfLogger|: 

...
2021-03-25 01:26:03,385 [INFO] [Dispatcher thread {Central}] 
|app.DAGAppMaster|: DAG completed, dagId=dag_1615862187190_731117_1, 
dagState=SUCCEEDED{noformat}
 

With bucket pruning, I think making the whole list of ORC files is not 
necessary. Therefore, I suggest that the flow would be like this;
 # Perform dynamic partition pruning
 # Get the list of InputSplit by calling InputFormat.getSplits()
 ## OrcInputFormat.getSplits() returns the bucket-pruned list if BitSet from 
FixedBucketPruningOptimizer exists



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24713) HS2 never knows the deletion of znode on some cases

2021-02-01 Thread Eugene Chung (Jira)
Eugene Chung created HIVE-24713:
---

 Summary: HS2 never knows the deletion of znode on some cases
 Key: HIVE-24713
 URL: https://issues.apache.org/jira/browse/HIVE-24713
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Eugene Chung
Assignee: Eugene Chung
 Fix For: 4.0.0


While using zookeeper discovery mode, the problem that HS2 never knows 
deregistering from Zookeeper could always happen.

Reproduction is simple.
 # Find one of the zk servers which holds the DeRegisterWatcher watches of HS2. 
If ZK server is 3.5.0 or above, it's easily found with 
[http://zk-server:8080/commands/watches] (ZK AdminServer feature)
 # Check which HS2 is watching on the ZK server found at 1, say it's _hs2-of-2_
 # Restart the ZK server found at 1
 # Deregister HS2 with the command
{noformat}
hive --service hiveserver2 -deregister hs2-of-2{noformat}

 # _hs2-of-2_ never knows that it must be shut down because the watch event of 
DeregisterWatcher was already fired at the time of 3.

The reason of the problem is explained at 
[https://zookeeper.apache.org/doc/r3.3.3/zookeeperProgrammers.html#sc_WatchRememberThese]

I added logging to DeRegisterWatcher and checked what events were occurred at 
the time of 3(restarting of ZK server);
 # WatchedEvent state:Disconnected type:None path:null
 # WatchedEvent[WatchedEvent state:SyncConnected type:None path:null]
 # WatchedEvent[WatchedEvent state:SaslAuthenticated type:None path:null]
 # WatchedEvent[WatchedEvent state:SyncConnected type:NodeDataChanged
 path:/hiveserver2/serverUri=hs2-of-2:1;version=3.1.2;sequence=000711]

As the zk manual says, watchs are one-time triggers. When connection to the ZK 
server was reestablished, state:SyncConnected 
type:NodeDataChanged,path:hs2-of-2 was fired and that's all. *DeregisterWatcher 
must be registered again for the same znode to get a future NodeDeleted event.*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24590) Operation Logging still leaks the log4j Appenders

2021-01-06 Thread Eugene Chung (Jira)
Eugene Chung created HIVE-24590:
---

 Summary: Operation Logging still leaks the log4j Appenders
 Key: HIVE-24590
 URL: https://issues.apache.org/jira/browse/HIVE-24590
 Project: Hive
  Issue Type: Bug
  Components: Logging
Reporter: Eugene Chung
 Attachments: Screen Shot 2021-01-06 at 18.42.05.png, Screen Shot 
2021-01-06 at 18.42.24.png, Screen Shot 2021-01-06 at 18.42.55.png

I'm using Hive 3.1.2 with options below.
 * hive.server2.logging.operation.enabled=true
 * hive.server2.logging.operation.level=VERBOSE
 * hive.async.log.enabled=false

I already know the ticket, https://issues.apache.org/jira/browse/HIVE-17128 but 
it still leaks log4j RandomAccessFileManager.

!Screen Shot 2021-01-06 at 18.42.05.png|width=756,height=197!

I checked the operation log file which is not closed.

!Screen Shot 2021-01-06 at 18.42.24.png|width=603,height=272!

Then there's the log,
{code:java}
client.TezClient: Shutting down Tez Session, sessionName= {code}
!Screen Shot 2021-01-06 at 18.42.55.png|width=1372,height=26!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23954) count(*) with count(distinct) gives wrong results with hive.optimize.countdistinct=true

2020-07-29 Thread Eugene Chung (Jira)
Eugene Chung created HIVE-23954:
---

 Summary: count(*) with count(distinct) gives wrong results with 
hive.optimize.countdistinct=true
 Key: HIVE-23954
 URL: https://issues.apache.org/jira/browse/HIVE-23954
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 3.1.0, 3.0.0
Reporter: Eugene Chung


select count(*), count(distinct mycol) from db1.table1 where partitioned_column 
= '...'

 

is not working properly when hive.optimize.countdistinct is true. By default, 
it's true for all 3.x versions.

In the two plans below, the aggregations part in the Output of Group By 
Operator of Map 1 are different.

 

- hive.optimize.countdistinct=false
{code:java}
++
|  Explain   |
++
| Plan optimized by CBO. |
||
| Vertex dependency in root stage|
| Reducer 2 <- Map 1 (SIMPLE_EDGE)   |
||
| Stage-0|
|   Fetch Operator   |
| limit:-1   |
| Stage-1|
|   Reducer 2|
|   File Output Operator [FS_7]  |
| Group By Operator [GBY_5] (rows=1 width=24) |
|   
Output:["_col0","_col1"],aggregations:["count(VALUE._col0)","count(DISTINCT 
KEY._col0:0._col0)"] |
| <-Map 1 [SIMPLE_EDGE]  |
|   SHUFFLE [RS_4]   |
| Group By Operator [GBY_3] (rows=343640771 width=4160) |
|   
Output:["_col0","_col1","_col2"],aggregations:["count()","count(DISTINCT 
mid)"],keys:mid |
|   Select Operator [SEL_2] (rows=343640771 width=4160) |
| Output:["mid"] |
| TableScan [TS_0] (rows=343640771 width=4160) |
|   db1@table1,table1,Tbl:COMPLETE,Col:NONE,Output:["mid"] |
||
++{code}
 

- hive.optimize.countdistinct=true
{code:java}
++
|  Explain   |
++
| Plan optimized by CBO. |
||
| Vertex dependency in root stage|
| Reducer 2 <- Map 1 (SIMPLE_EDGE)   |
||
| Stage-0|
|   Fetch Operator   |
| limit:-1   |
| Stage-1|
|   Reducer 2|
|   File Output Operator [FS_7]  |
| Group By Operator [GBY_14] (rows=1 width=16) |
|   
Output:["_col0","_col1"],aggregations:["count(_col1)","count(_col0)"] |
|   Group By Operator [GBY_11] (rows=343640771 width=4160) |
| 
Output:["_col0","_col1"],aggregations:["count(VALUE._col0)"],keys:KEY._col0 |
|   <-Map 1 [SIMPLE_EDGE]|
| SHUFFLE [RS_10]|
|   PartitionCols:_col0  |
|   Group By Operator [GBY_9] (rows=343640771 width=4160) |
| Output:["_col0","_col1"],aggregations:["count()"],keys:mid |
| Select Operator [SEL_2] (rows=343640771 width=4160) |
|   Output:["mid"]   |
|   TableScan [TS_0] (rows=343640771 width=4160) |
| db1@table1,table1,Tbl:COMPLETE,Col:NONE,Output:["mid"] |
||
++
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23623) Not ignoring antlr RecognitionException from ParseDriver

2020-06-05 Thread Eugene Chung (Jira)
Eugene Chung created HIVE-23623:
---

 Summary: Not ignoring antlr RecognitionException from ParseDriver
 Key: HIVE-23623
 URL: https://issues.apache.org/jira/browse/HIVE-23623
 Project: Hive
  Issue Type: Improvement
  Components: Parser
Reporter: Eugene Chung
Assignee: Eugene Chung


With java.lang.Throwable.initCause(Throwable) API, we can set 
RecognitionException from antlr while throwing ParseException.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23458) Introduce unified thread pool for scheduled jobs

2020-05-12 Thread Eugene Chung (Jira)
Eugene Chung created HIVE-23458:
---

 Summary: Introduce unified thread pool for scheduled jobs
 Key: HIVE-23458
 URL: https://issues.apache.org/jira/browse/HIVE-23458
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Eugene Chung
Assignee: Eugene Chung


As I mentioned in [the comment of 
HIVE-23164|https://issues.apache.org/jira/browse/HIVE-23164?focusedCommentId=17089506&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17089506],
 I've made the unified scheduled executor service like 
org.apache.hadoop.hive.metastore.ThreadPool.

I think it could help
1. to minimize the possibility of making non-daemon threads when developers 
need ScheduledExecutorService
2. to achieve the utilization of server resources because the current situation 
is all of the modules make its own ScheduledExecutorService and all of the 
threads are just using for one job. 
3. an administrator of Hive server by providing hive.exec.scheduler.num.threads 
configuration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23296) Setting Tez caller ID with the Hive session user

2020-04-24 Thread Eugene Chung (Jira)
Eugene Chung created HIVE-23296:
---

 Summary: Setting Tez caller ID with the Hive session user
 Key: HIVE-23296
 URL: https://issues.apache.org/jira/browse/HIVE-23296
 Project: Hive
  Issue Type: Improvement
  Components: Tez
Reporter: Eugene Chung
Assignee: Eugene Chung
 Attachments: Screen Shot 2020-04-24 at 17.20.34.png

On the kerberized Hadoop environment, a submitter of an YARN job is the name 
part of the Hive server principal. A caller ID of the job is made of the OS 
user of the Hive server process.

The view and modify ACLs of the Hive server admin for all Tez tasks are set by 
org.apache.hadoop.hive.ql.exec.tez.TezTask#setAccessControlsForCurrentUser() so 
that the admin can see all tasks from tez-ui. But the admin hardly knows who 
executed each query.

I suggest to change the caller ID to include the actual Hive user. If the user 
is not known, the OS user of the Hive server process is included as is.

!Screen Shot 2020-04-24 at 17.20.34.png|width=683,height=29!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23276) remove HttpException NPE possibility

2020-04-22 Thread Eugene Chung (Jira)
Eugene Chung created HIVE-23276:
---

 Summary: remove HttpException NPE possibility 
 Key: HIVE-23276
 URL: https://issues.apache.org/jira/browse/HIVE-23276
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Reporter: Eugene Chung
Assignee: Eugene Chung


{code:java}
java.lang.NullPointerException: null
  at org.apache.http.HttpException.clean(HttpException.java:48) 
~[httpcore-4.4.10.jar:4.4.10]
  at org.apache.http.HttpException.(HttpException.java:105) 
~[httpcore-4.4.10.jar:4.4.10]
  at 
org.apache.hive.jdbc.HttpKerberosRequestInterceptor.addHttpAuthHeader(HttpKerberosRequestInterceptor.java:67)
 ~[hive-jdbc-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
  at 
org.apache.hive.jdbc.HttpRequestInterceptorBase.process(HttpRequestInterceptorBase.java:74)
 ~[hive-jdbc-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
  at 
org.apache.http.protocol.ImmutableHttpProcessor.process(ImmutableHttpProcessor.java:133)
 ~[httpcore-4.4.10.jar:4.4.10]
  at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:183) 
~[httpclient-4.5.6.jar:4.5.6]
  at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) 
~[httpclient-4.5.6.jar:4.5.6]
  at 
org.apache.http.impl.execchain.ServiceUnavailableRetryExec.execute(ServiceUnavailableRetryExec.java:85)
 ~[httpclient-4.5.6.jar:4.5.6]
  at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) 
~[httpclient-4.5.6.jar:4.5.6]
  at 
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
 ~[httpclient-4.5.6.jar:4.5.6]
  at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:118)
 ~[httpclient-4.5.6.jar:4.5.6]
  at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
 ~[httpclient-4.5.6.jar:4.5.6]
  at 
org.apache.thrift.transport.THttpClient.flushUsingHttpClient(THttpClient.java:251)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
  at org.apache.thrift.transport.THttpClient.flush(THttpClient.java:313) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
  at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:73) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
  at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:62) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
  at 
org.apache.hive.service.rpc.thrift.TCLIService$Client.send_OpenSession(TCLIService.java:170)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
  at 
org.apache.hive.service.rpc.thrift.TCLIService$Client.OpenSession(TCLIService.java:162)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
  at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:862) 
~[hive-jdbc-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
  at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:320) 
[hive-jdbc-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
  at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:94) 
[hive-jdbc-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
  at java.sql.DriverManager.getConnection(DriverManager.java:664) [?:1.8.0_191]
  at java.sql.DriverManager.getConnection(DriverManager.java:208) [?:1.8.0_191]
  at 
org.apache.hive.beeline.DatabaseConnection.connect(DatabaseConnection.java:145) 
[hive-beeline-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
  at 
org.apache.hive.beeline.DatabaseConnection.getConnection(DatabaseConnection.java:209)
 [hive-beeline-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
  at org.apache.hive.beeline.Commands.connect(Commands.java:1679) 
[hive-beeline-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
  at org.apache.hive.beeline.Commands.connect(Commands.java:1574) 
[hive-beeline-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_191]
  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_191]
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_191]
  at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_191]
  at 
org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:57)
 [hive-beeline-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
  at org.apache.hive.beeline.BeeLine.execCommandWithPrefix(BeeLine.java:1494) 
[hive-beeline-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
  at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1533) 
[hive-beeline-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
  at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1393) 
[hive-beeline-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
  at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1163) 
[hive-beeline-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
  at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:) 
[hive-beeline-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
  at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:555) 
[hive-beeline-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
  at org.apache.hive.beeline.BeeLine.main(BeeLine.java:537) 
[hive-beeline-4.0.0-SNAP

[jira] [Created] (HIVE-23164) server is not properly terminated because of non-daemon threads

2020-04-09 Thread Eugene Chung (Jira)
Eugene Chung created HIVE-23164:
---

 Summary: server is not properly terminated because of non-daemon 
threads
 Key: HIVE-23164
 URL: https://issues.apache.org/jira/browse/HIVE-23164
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Eugene Chung
 Attachments: thread_dump_hiveserver2_is_not_terminated.txt

As you know, HiveServer2 which receives the deregister command is at first 
preparing shutdown. If there're no remaining sessions, HiveServer2.stop() is 
called to shut down. But I found the case that the HiveServer2 JVM is not 
terminated even if HiveServer2.stop() is called and properly processed. The 
case is always occurred when the local(embedded) metastore is used.

I've attached the full thread dump describing the situation.

[^thread_dump_hiveserver2_is_not_terminated.txt]

In this thread dump, you can see some bunch of 'daemon' threads, NO main 
thread, and some non-daemon thread(or user thread)s. As specified by 
[https://www.baeldung.com/java-daemon-thread], if there exists at least one 
user thread which is alive, JVM does not terminate. (Note that DestroyJavaVM 
thread is non-daemon but it's special.)

 
{code:java}
"pool-8-thread-1" #24 prio=5 os_prio=0 tid=0x7f52ad1fc000 nid=0x821c 
waiting on condition [0x7f525c50]
 java.lang.Thread.State: TIMED_WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for <0x0003cfa057c0> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
 at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
 - None
{code}
 

 

The thread above is created by ScheduledThreadPoolExecutor(int coreSize) with 
default ThreadFactory which always creates non-daemon thread. If such thread 
pool is not shut down with ScheduledThreadPoolExecutor.shutdown() method, JVM 
cannot terminate! The only way to kill is TERM signal. If JVM receives TERM 
signal, it ignores non-daemon threads.

So I have been digging modules which create ScheduledThreadPoolExecutor with 
non-daemon threads and now I got it. As you may guess, it's local(embedded) 
metastore. It's created by 
org.apache.hadoop.hive.metastore.HiveMetaStore.HMSHandler#startAlwaysTaskThreads()
 and ScheduledThreadPoolExecutor.shutdown() is never called.

Plus, I found another usage of creating such ScheduledThreadPoolExecutor and 
not calling its shutdown. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23153) deregister from zookeeper is not properly worked on kerberized environment

2020-04-07 Thread Eugene Chung (Jira)
Eugene Chung created HIVE-23153:
---

 Summary: deregister from zookeeper is not properly worked on 
kerberized environment
 Key: HIVE-23153
 URL: https://issues.apache.org/jira/browse/HIVE-23153
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Eugene Chung
Assignee: Eugene Chung
 Attachments: Screen Shot 2020-04-08 at 5.00.40.png

Deregistering from Zookeeper, initiated by the command 'hive --service 
hiveserver2 -deregister ', is not properly worked when HiveServer2 
and Zookeeper are kerberized. Even though hive-site.xml has configuration for 
Zookeeper Kerberos login (hive.server2.authentication.kerberos.principal and 
keytab), it isn't used. I know that before executing the command, kinit with 
hiveserver2 keytab would work. But as I said, hive-site.xml already has.

 

 
{code:java}
2020-04-08 04:45:44,698 INFO [main] zookeeper.ZooKeeper: 
-78.jar:/usr/hdp/3.1.0.0-78/tez/lib/hadoop-azure-3.1.1.3.1.0.0-78.jar:/usr/hdp/3.1.0.0-78/tez/lib/hadoop-azure-datalake-3.1.1.3.1.0.0-78.jar:/usr/hdp/3.1.0.0-78/tez/lib/hadoop-hdfs-client-3.1.1.3.1.0.0-78.jar:/usr/hdp/3.1.0.0-78/tez/lib/hadoop-mapreduce-client-common-3.1.1.3.1.0.0-78.jar:/usr/hdp/3.1.0.0-78/tez/lib/hadoop-mapreduce-client-core-3.1.1.3.1.0.0-78.jar:/usr/hdp/3.1.0.0-78/tez/lib/hadoop-yarn-server-timeline-pluginstorage-3.1.1.3.1.0.0-78.jar:/usr/hdp/3.1.0.0-78/tez/lib/jersey-client-1.19.jar:/usr/hdp/3.1.0.0-78/tez/lib/jersey-json-1.19.jar:/usr/hdp/3.1.0.0-78/tez/lib/jettison-1.3.4.jar:/usr/hdp/3.1.0.0-78/tez/lib/jetty-server-9.3.22.v20171030.jar:/usr/hdp/3.1.0.0-78/tez/lib/jetty-util-9.3.22.v20171030.jar:/usr/hdp/3.1.0.0-78/tez/lib/jsr305-3.0.0.jar:/usr/hdp/3.1.0.0-78/tez/lib/metrics-core-3.1.0.jar:/usr/hdp/3.1.0.0-78/tez/lib/protobuf-java-2.5.0.jar:/usr/hdp/3.1.0.0-78/tez/lib/servlet-api-2.5.jar:/usr/hdp/3.1.0.0-78/tez/lib/slf4j-api-1.7.10.jar:/usr/hdp/3.1.0.0-78/tez/conf
2020-04-08 04:45:44,698 INFO [main] zookeeper.ZooKeeper: Client 
environment:java.library.path=:
2020-04-08 04:45:44,698 INFO [main] zookeeper.ZooKeeper: Client 
environment:java.io.tmpdir=/tmp
2020-04-08 04:45:44,698 INFO [main] zookeeper.ZooKeeper: Client 
environment:java.compiler=
2020-04-08 04:45:44,698 INFO [main] zookeeper.ZooKeeper: Client 
environment:os.name=Linux
2020-04-08 04:45:44,698 INFO [main] zookeeper.ZooKeeper: Client 
environment:os.arch=amd64
2020-04-08 04:45:44,698 INFO [main] zookeeper.ZooKeeper: Client 
environment:os.version=...
2020-04-08 04:45:44,698 INFO [main] zookeeper.ZooKeeper: Client 
environment:user.name=...
2020-04-08 04:45:44,698 INFO [main] zookeeper.ZooKeeper: Client 
environment:user.home=...
2020-04-08 04:45:44,698 INFO [main] zookeeper.ZooKeeper: Client 
environment:user.dir=...
2020-04-08 04:45:44,699 INFO [main] zookeeper.ZooKeeper: Initiating client 
connection, connectString=... sessionTimeout=6 
watcher=org.apache.curator.ConnectionState@706eab5d
2020-04-08 04:45:44,725 INFO [main-SendThread(...)] zookeeper.ClientCnxn: 
Opening socket connection to server ...:2181. Will not attempt to authenticate 
using SASL (unknown error)
2020-04-08 04:45:44,731 INFO [main-SendThread(...:2181)] zookeeper.ClientCnxn: 
Socket connection established to ...:2181, initiating session
2020-04-08 04:45:44,743 INFO [main-SendThread(...:2181)] zookeeper.ClientCnxn: 
Session establishment complete on server ...:2181, sessionid = 
0x27148fd2ab1002e, negotiated timeout = 6
2020-04-08 04:45:44,751 INFO [main-EventThread] state.ConnectionStateManager: 
State change: CONNECTED
2020-04-08 04:45:44,760 WARN [main] server.HiveServer2: Will attempt to remove 
the znode: /hiveserver2/serverUri=...;version=3.1.2;sequence=49 from 
ZooKeeper
Will attempt to remove the znode: 
/hiveserver2/serverUri=...;version=3.1.2;sequence=49 from ZooKeeper
2020-04-08 04:45:44,768 INFO [Curator-Framework-0] imps.CuratorFrameworkImpl: 
backgroundOperationsLoop exiting
2020-04-08 04:45:44,771 INFO [main] zookeeper.ZooKeeper: Session: 
0x27148fd2ab1002e closed
2020-04-08 04:45:44,771 INFO [main-EventThread] zookeeper.ClientCnxn: 
EventThread shut down
2020-04-08 04:45:44,794 INFO [shutdown-hook-0] server.HiveServer2: SHUTDOWN_MSG:
/
SHUTDOWN_MSG: Shutting down HiveServer2 at ...
/{code}
 

Plus, the result code of CuratorEvent from DeleteCallback is not checked. In 
the case that kerberos login is not done, the code is -102 
([KeeperException.Code.NoAuth|[https://zookeeper.apache.org/doc/r3.4.6/api/org/apache/zookeeper/KeeperException.Code.html#NOAUTH]]).

!Screen Shot 2020-04-08 at 5.00.40.png|width=742,height=201!

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)