[jira] [Created] (HIVE-20989) JDBC: The GetOperationStatus + log can block query progress via sleep()

2018-11-29 Thread Gopal V (JIRA)
Gopal V created HIVE-20989:
--

 Summary: JDBC: The GetOperationStatus + log can block query 
progress via sleep()
 Key: HIVE-20989
 URL: https://issues.apache.org/jira/browse/HIVE-20989
 Project: Hive
  Issue Type: Bug
Reporter: Gopal V


There is an exponential sleep operation inside the CLIService which can end up 
adding tens of seconds to a query which has already completed.

{code}
"HiveServer2-Handler-Pool: Thread-9373" #9373 prio=5 os_prio=0 
tid=0x7f4d5e72d800 nid=0xb634a waiting on condition [0x7f28d06a5000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.hive.service.cli.CLIService.progressUpdateLog(CLIService.java:506)
at 
org.apache.hive.service.cli.CLIService.getOperationStatus(CLIService.java:480)
at 
org.apache.hive.service.cli.thrift.ThriftCLIService.GetOperationStatus(ThriftCLIService.java:695)
at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1757)
at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetOperationStatus.getResult(TCLIService.java:1742)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}

The sleep loop is on the server side.

{code}
private static final long PROGRESS_MAX_WAIT_NS = 30 * 10l;

private JobProgressUpdate progressUpdateLog(boolean isProgressLogRequested, 
Operation operation, HiveConf conf) {
...
long startTime = System.nanoTime();
int timeOutMs = 8;
try {
  while (sessionState.getProgressMonitor() == null && !operation.isDone()) {
long remainingMs = (PROGRESS_MAX_WAIT_NS - (System.nanoTime() - 
startTime)) / 100l;
if (remainingMs <= 0) {
  LOG.debug("timed out and hence returning progress log as NULL");
  return new JobProgressUpdate(ProgressMonitor.NULL);
}
Thread.sleep(Math.min(remainingMs, timeOutMs));
timeOutMs <<= 1;
  }
{code}

After about 16 seconds of execution of the query, the timeOutMs is 16384 ms, 
which means the next sleep cycle is for min(30 - 17, 16) = 13.

If the query finishes on the 17th second, the JDBC server will only respond 
after the 30th second when it will check for operation.isDone() and return.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20988) Wrong results for group by queries with primary key on multiple columns

2018-11-29 Thread Vineet Garg (JIRA)
Vineet Garg created HIVE-20988:
--

 Summary: Wrong results for group by queries with primary key on 
multiple columns
 Key: HIVE-20988
 URL: https://issues.apache.org/jira/browse/HIVE-20988
 Project: Hive
  Issue Type: Bug
Affects Versions: 4.0.0
Reporter: Vineet Garg
Assignee: Vineet Garg






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] hive pull request #498: HIVE-20987: split Druid Tests and start nodes on ran...

2018-11-29 Thread b-slim
GitHub user b-slim opened a pull request:

https://github.com/apache/hive/pull/498

HIVE-20987: split Druid Tests and start nodes on random ports

Change-Id: I89009fd8a79a85b26bcc080c34a07577125f0110

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/b-slim/hive HIVE-20987

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/498.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #498


commit 959438cf82a522f6cf573b8df2b7850762ea6c7b
Author: Slim Bouguerra 
Date:   2018-11-30T03:35:55Z

HIVE-20987: split Druid Tests and start nodes on random ports

Change-Id: I89009fd8a79a85b26bcc080c34a07577125f0110




---


[jira] [Created] (HIVE-20987) Split Druid Tests to avoid Timeouts

2018-11-29 Thread slim bouguerra (JIRA)
slim bouguerra created HIVE-20987:
-

 Summary: Split Druid Tests to avoid Timeouts
 Key: HIVE-20987
 URL: https://issues.apache.org/jira/browse/HIVE-20987
 Project: Hive
  Issue Type: Test
Reporter: slim bouguerra
Assignee: slim bouguerra


Currently Druid Tests fail with Timeout issue.

I am plaining on splitting the test into 2 batches at least to avoid timeouts.

I will tweak the test code to pick random Druid nodes ports like that minimize 
the collision issue that we saw before.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 69054: HIVE-20740 : Remove global lock in ObjectStore.setConf method

2018-11-29 Thread Naveen Gangam via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69054/#review210953
---


Ship it!




Ship It!

- Naveen Gangam


On Nov. 27, 2018, 7:18 a.m., Vihang Karajgaonkar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69054/
> ---
> 
> (Updated Nov. 27, 2018, 7:18 a.m.)
> 
> 
> Review request for hive, Andrew Sherman, Alan Gates, and Peter Vary.
> 
> 
> Bugs: HIVE-20740
> https://issues.apache.org/jira/browse/HIVE-20740
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-20740 : Remove global lock in ObjectStore.setConf method
> 
> 
> Diffs
> -
> 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenarios.java
>  5a88550f0625a7ec1890df7f54e7fa579f58fff4 
>   
> itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcWithMiniHS2.java 
> 5cb0a887e672f49739f5b648e608fba66de06326 
>   ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 
> 455ffc3887e62fa503cc3fa28255702ea9da3cc0 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
>  570281b54fa236d5bb568b4ded9b166ef367f613 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PersistenceManagerProvider.java
>  PRE-CREATION 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestObjectStore.java
>  af9efd98ea210335c6ac1d3da8624e02aadc2706 
> 
> 
> Diff: https://reviews.apache.org/r/69054/diff/6/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Vihang Karajgaonkar
> 
>



[jira] [Created] (HIVE-20986) Add TransactionalValidationListener to HMS preListeners only when ACID support is enabled

2018-11-29 Thread Karthik Manamcheri (JIRA)
Karthik Manamcheri created HIVE-20986:
-

 Summary: Add TransactionalValidationListener to HMS preListeners 
only when ACID support is enabled
 Key: HIVE-20986
 URL: https://issues.apache.org/jira/browse/HIVE-20986
 Project: Hive
  Issue Type: Improvement
Reporter: Karthik Manamcheri
Assignee: Adam Holley


We add the TransactionalValidationListener to the preListeners in HMS 
unconditionally.
{code:java}
public void init() throws MetaException {
  ..
  preListeners.add(0, new TransactionalValidationListener(conf));
  ..
}{code}

This causes some performance issues because the listener is called even when 
not needed. Lets add a condition around this and add this listener only if the 
transactional support is enabled.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20985) If select operator inputs are temporary columns vectorization may reuse some of them as output

2018-11-29 Thread Zoltan Haindrich (JIRA)
Zoltan Haindrich created HIVE-20985:
---

 Summary: If select operator inputs are temporary columns 
vectorization may reuse some of them as output
 Key: HIVE-20985
 URL: https://issues.apache.org/jira/browse/HIVE-20985
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20984) Hive cannot call MapReduce, please tell me where there is a configuration problem

2018-11-29 Thread yuxuqi (JIRA)
yuxuqi created HIVE-20984:
-

 Summary: Hive cannot call MapReduce, please tell me where there is 
a configuration problem
 Key: HIVE-20984
 URL: https://issues.apache.org/jira/browse/HIVE-20984
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Affects Versions: 1.1.0
 Environment: CDH 5.14.2 hive 1.1.0
Reporter: yuxuqi
 Attachments: image-2018-11-29-19-58-56-026.png

{color:#FF}{color}





 
 yarn.acl.enable
 true
 
 
 yarn.admin.acl
 *
 
 
 yarn.resourcemanager.address
 master:8032
 
 
 yarn.resourcemanager.admin.address
 master:8033
 
 
 yarn.resourcemanager.scheduler.address
 master:8030
 
 
 yarn.resourcemanager.resource-tracker.address
 master:8031
 
 
 yarn.resourcemanager.webapp.address
 master:8088
 
 
 yarn.resourcemanager.webapp.https.address
 master:8090
 
 
 yarn.resourcemanager.client.thread-count
 50
 
 
 yarn.resourcemanager.scheduler.client.thread-count
 50
 
 
 yarn.resourcemanager.admin.client.thread-count
 1
 
 
 yarn.scheduler.minimum-allocation-mb
 1024
 
 
 yarn.scheduler.increment-allocation-mb
 512
 
 
 yarn.scheduler.maximum-allocation-mb
 31647
 
 
 yarn.scheduler.minimum-allocation-vcores
 1
 
 
 yarn.scheduler.increment-allocation-vcores
 1
 
 
 yarn.scheduler.maximum-allocation-vcores
 48
 
 
 yarn.resourcemanager.amliveliness-monitor.interval-ms
 1000
 
 
 yarn.am.liveness-monitor.expiry-interval-ms
 60
 
 
 yarn.resourcemanager.am.max-attempts
 2
 
 
 yarn.resourcemanager.container.liveness-monitor.interval-ms
 60
 
 
 yarn.resourcemanager.nm.liveness-monitor.interval-ms
 1000
 
 
 yarn.nm.liveness-monitor.expiry-interval-ms
 60
 
 
 yarn.resourcemanager.resource-tracker.client.thread-count
 50
 
 
 yarn.application.classpath
 
$HADOOP_CLIENT_CONF_DIR,$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*
 
 
 yarn.resourcemanager.scheduler.class
 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
 
 
 yarn.resourcemanager.max-completed-applications
 1
 
 
 yarn.nodemanager.remote-app-log-dir
 /tmp/logs
 
 
 yarn.nodemanager.remote-app-log-dir-suffix
 logs
 




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20983) Vectorization: Scale up small hashtables, when collisions are detected

2018-11-29 Thread Gopal V (JIRA)
Gopal V created HIVE-20983:
--

 Summary: Vectorization: Scale up small hashtables, when collisions 
are detected
 Key: HIVE-20983
 URL: https://issues.apache.org/jira/browse/HIVE-20983
 Project: Hive
  Issue Type: Bug
Reporter: Gopal V


Hive's hashtable estimates are getting better with HyperLogLog stats in place, 
but an accurate estimate does not always result in a low number of collisions.

The hashtables which contain a very small number of items tend to lose their 
O(1) lookup performance where there are collisions. Since collisions are easy 
to detect within the fast hashtable implementation, a rehashing to a higher 
size will help these small hashtables avoid collisions and go back to O(1) perf.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)