[jira] [Created] (HIVE-18099) Hive shouldn't pickup mapreduce conf for Tez

2017-11-17 Thread Zhiyuan Yang (JIRA)
Zhiyuan Yang created HIVE-18099:
---

 Summary: Hive shouldn't pickup mapreduce conf for Tez
 Key: HIVE-18099
 URL: https://issues.apache.org/jira/browse/HIVE-18099
 Project: Hive
  Issue Type: Bug
Reporter: Zhiyuan Yang
Assignee: Zhiyuan Yang


Right now Hive is reading some mapreduce conf for Tez engine. 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java?utf8=%E2%9C%93#L720
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java?#L796
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java?#L860



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18064) Hive on Tez parallel order by

2017-11-14 Thread Zhiyuan Yang (JIRA)
Zhiyuan Yang created HIVE-18064:
---

 Summary: Hive on Tez parallel order by
 Key: HIVE-18064
 URL: https://issues.apache.org/jira/browse/HIVE-18064
 Project: Hive
  Issue Type: Bug
Reporter: Zhiyuan Yang
Assignee: Zhiyuan Yang


We've built parallel sorting in TEZ-3837. It does sampling as output is 
generated and figure out a range partitioner for shuffle edge. Each reducer 
output a sorted span. This is mainly for external consumption since output 
files need to be read in certain order.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Review Request 62706: HIVE-17473 implement workload management pools

2017-10-11 Thread Zhiyuan Yang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62706/#review187692
---




ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WorkloadManager.java
Lines 250 (patched)
<https://reviews.apache.org/r/62706/#comment264747>

Why add up parallelism of parent node with children node? Shouldn't 
parent's paralleism be a sum of children's?



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WorkloadManager.java
Lines 275 (patched)
<https://reviews.apache.org/r/62706/#comment264753>

This piece get really complicated right now. I think there is good chance 
this can be prettier. Are you going you rewrite this (as you mentioned in jira)?



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WorkloadManager.java
Lines 322 (patched)
<https://reviews.apache.org/r/62706/#comment264739>

unreachable statement?



ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestWorkloadManager.java
Lines 243 (patched)
<https://reviews.apache.org/r/62706/#comment264751>

Why can user use non-leaf queue? The fact that sum of sub-queue can be less 
than parent queue's resource looks weird. Is this by design?


- Zhiyuan Yang


On Sept. 30, 2017, 12:57 a.m., Sergey Shelukhin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62706/
> ---
> 
> (Updated Sept. 30, 2017, 12:57 a.m.)
> 
> 
> Review request for hive, Zhiyuan Yang and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> see jira
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java 
> 4f2997b95b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/UserPoolMapping.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WorkloadManager.java 
> 3f621271cc 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestWorkloadManager.java 
> 7adf895077 
>   service/src/java/org/apache/hive/service/server/HiveServer2.java 5cb973ca95 
> 
> 
> Diff: https://reviews.apache.org/r/62706/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>



[jira] [Created] (HIVE-17641) Visibility issue of Task.done cause Driver skip stages in parallel execution

2017-09-28 Thread Zhiyuan Yang (JIRA)
Zhiyuan Yang created HIVE-17641:
---

 Summary: Visibility issue of Task.done cause Driver skip stages in 
parallel execution
 Key: HIVE-17641
 URL: https://issues.apache.org/jira/browse/HIVE-17641
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.1
Reporter: Zhiyuan Yang
Assignee: Zhiyuan Yang


Task.done is not volatile. In case of parallel execution, TaskRunner thread set 
this value, and Driver thread read this value when it determines whether a 
child task is runnable

DriverContext.java
{code}
public static boolean isLaunchable(Task tsk) {
return !tsk.getQueued() && !tsk.getInitialized() && tsk.isRunnable();
{code}
Task.java
{code}
public boolean isRunnable() {
boolean isrunnable = true;
if (parentTasks != null) {
  for (Task parent : parentTasks) {
if (!parent.done()) {
{code}

This happens without any synchronization, so a child can be not runnable even 
all parents finish.

To make it worse, Driver think query is successful when there is no running 
task or runnable task, so query may finish without executing some stages.
Driver.java
{code}
while (!destroyed && driverCxt.isRunning()) {
{code}
DriverContext.java
{code}
public synchronized boolean isRunning() {
return !shutdown && (!running.isEmpty() || !runnable.isEmpty());
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Review Request 62091: HIVE-17386 support LLAP workload management in HS2 (low level only)

2017-09-25 Thread Zhiyuan Yang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62091/#review186173
---


Ship it!




Ship It!

- Zhiyuan Yang


On Sept. 13, 2017, 1:04 a.m., Sergey Shelukhin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62091/
> ---
> 
> (Updated Sept. 13, 2017, 1:04 a.m.)
> 
> 
> Review request for hive, Zhiyuan Yang, Gunther Hagleitner, and Siddharth Seth.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> see jira
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 24c5db0e47 
>   itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 
> b3677322ca 
>   
> llap-client/src/java/org/apache/hadoop/hive/llap/tez/LlapProtocolClientProxy.java
>  b6501842e8 
>   
> llap-client/src/java/org/apache/hadoop/hive/registry/impl/TezAmInstance.java 
> a71904cf34 
>   llap-client/src/test/org/apache/hadoop/hive/llap/TestAsyncPbRpcProxy.java 
> 1c4f0e7a09 
>   llap-common/src/java/org/apache/hadoop/hive/llap/AsyncPbRpcProxy.java 
> 7726794fea 
>   
> llap-common/src/java/org/apache/hadoop/hive/llap/impl/LlapPluginProtocolClientImpl.java
>  19e81e6fa5 
>   llap-common/src/java/org/apache/hadoop/hive/llap/impl/ProtobufProxy.java 
> fa99536bea 
>   llap-common/src/protobuf/LlapPluginProtocol.proto 39349b119d 
>   
> llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java
>  26747fc5ca 
>   
> llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/endpoint/LlapPluginServerImpl.java
>  4d5333f995 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 93a36c612d 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/AmPluginNode.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/tez/GuaranteedTasksAllocator.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/tez/LlapPluginEndpointClient.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/tez/LlapPluginEndpointClientImpl.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/QueryAllocationManager.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java 
> 6e8122dc85 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java 
> 9f721553d6 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolSession.java 
> 8ecdbbf999 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java 
> 170de2143d 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java e6e236de6e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WmTezSession.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WorkloadManager.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/tez/monitoring/TezJobMonitor.java 
> 9e2846ca6c 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LlapClusterStateForCompile.java
>  7a02a563e9 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/tez/SampleTezSessionState.java 
> 4e5d99134b 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestGuaranteedTaskAllocator.java
>  PRE-CREATION 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezSessionPool.java 
> 5e1e68cfa8 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezTask.java 9b9eead0af 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestWorkloadManager.java 
> PRE-CREATION 
>   service/src/java/org/apache/hive/service/server/HiveServer2.java e5f449122b 
> 
> 
> Diff: https://reviews.apache.org/r/62091/diff/4/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>



Re: Review Request 62091: HIVE-17386 support LLAP workload management in HS2 (low level only)

2017-09-12 Thread Zhiyuan Yang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62091/#review185018
---




common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
Lines 2385-2386 (patched)
<https://reviews.apache.org/r/62091/#comment261351>

Should mention setting this conf means enable workload management



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/GuaranteedTasksAllocator.java
Lines 101-106 (patched)
<https://reviews.apache.org/r/62091/#comment261379>

Why is this here given it's already a daemon thread



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/GuaranteedTasksAllocator.java
Lines 147 (patched)
<https://reviews.apache.org/r/62091/#comment261433>

Additional define statement will be better.



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/GuaranteedTasksAllocator.java
Lines 191 (patched)
<https://reviews.apache.org/r/62091/#comment261361>

How would AM registry help in AM recovery? If that's not the case, this 
piece means any update during AM failure & recovery will fail the session, 
which make AM recovery in vain.



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/GuaranteedTasksAllocator.java
Lines 201-215 (patched)
<https://reviews.apache.org/r/62091/#comment261362>

You are really determined to knock out that field...



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/LlapPluginEndpointClientImpl.java
Lines 61 (patched)
<https://reviews.apache.org/r/62091/#comment261251>

git apply complains

HIVE-17386.02.patch:1162: trailing whitespace.
  }
warning: 1 line adds whitespace errors.



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WorkloadManager.java
Lines 220 (patched)
<https://reviews.apache.org/r/62091/#comment261507>

Wrong log message



service/src/java/org/apache/hive/service/server/HiveServer2.java
Lines 169 (patched)
<https://reviews.apache.org/r/62091/#comment261514>

Where is the code that really put this wm instance in use? Additional jira?


- Zhiyuan Yang


On Sept. 5, 2017, 6:52 p.m., Sergey Shelukhin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62091/
> ---
> 
> (Updated Sept. 5, 2017, 6:52 p.m.)
> 
> 
> Review request for hive, Zhiyuan Yang, Gunther Hagleitner, and Siddharth Seth.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> see jira
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6de07d2e76 
>   itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 
> b3677322ca 
>   
> llap-client/src/java/org/apache/hadoop/hive/llap/tez/LlapProtocolClientProxy.java
>  b6501842e8 
>   llap-client/src/test/org/apache/hadoop/hive/llap/TestAsyncPbRpcProxy.java 
> 1c4f0e7a09 
>   llap-common/src/java/org/apache/hadoop/hive/llap/AsyncPbRpcProxy.java 
> 7726794fea 
>   
> llap-common/src/java/org/apache/hadoop/hive/llap/impl/LlapPluginProtocolClientImpl.java
>  19e81e6fa5 
>   llap-common/src/java/org/apache/hadoop/hive/llap/impl/ProtobufProxy.java 
> fa99536bea 
>   llap-common/src/protobuf/LlapPluginProtocol.proto 39349b119d 
>   
> llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java
>  cf8bd469dc 
>   
> llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/endpoint/LlapPluginServerImpl.java
>  f3c0d5213f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 93a36c612d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/tez/GuaranteedTasksAllocator.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/tez/LlapPluginEndpointClient.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/tez/LlapPluginEndpointClientImpl.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/QueryAllocationManager.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java 
> 4f58565a4c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java 
> 1f4705c083 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolSession.java 
> 005eeedc02 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java 
> fe5c6a1e45 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java f1f10286a3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WmTezSession.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WorkloadManager.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/tez/monitoring/TezJobMonitor.java 

[jira] [Created] (HIVE-17393) AMReporter need hearbeat every external 'AM'

2017-08-25 Thread Zhiyuan Yang (JIRA)
Zhiyuan Yang created HIVE-17393:
---

 Summary: AMReporter need hearbeat every external 'AM'
 Key: HIVE-17393
 URL: https://issues.apache.org/jira/browse/HIVE-17393
 Project: Hive
  Issue Type: Bug
  Components: llap
Reporter: Zhiyuan Yang
Assignee: Zhiyuan Yang


AMReporter only remember first AM that submit the query and heartbeat to it. In 
case of external client, there might be multiple 'AM's and every of them need 
node heartbeat.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17228) Bump tez version to 0.9.0

2017-08-02 Thread Zhiyuan Yang (JIRA)
Zhiyuan Yang created HIVE-17228:
---

 Summary: Bump tez version to 0.9.0
 Key: HIVE-17228
 URL: https://issues.apache.org/jira/browse/HIVE-17228
 Project: Hive
  Issue Type: Bug
Reporter: Zhiyuan Yang
Assignee: Zhiyuan Yang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17047) Allow table property to be populated to jobConf to make FixedLengthInputFormat work

2017-07-05 Thread Zhiyuan Yang (JIRA)
Zhiyuan Yang created HIVE-17047:
---

 Summary: Allow table property to be populated to jobConf to make 
FixedLengthInputFormat work
 Key: HIVE-17047
 URL: https://issues.apache.org/jira/browse/HIVE-17047
 Project: Hive
  Issue Type: Bug
Reporter: Zhiyuan Yang
Assignee: Zhiyuan Yang
 Fix For: 1.2.1


To make FixedLengthInputFormat work in Hive, we need table specific value for 
the configuration "fixedlengthinputformat.record.length". Right now the best 
place would be table property. Unfortunately, table property is not alway 
populated to InputFormat configurations because of this in HiveInputFormat:
{code}
PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
if ((part != null) && (part.getTableDesc() != null)) {
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-16710) Make MAX_MS_TYPENAME_LENGTH configurable

2017-05-18 Thread Zhiyuan Yang (JIRA)
Zhiyuan Yang created HIVE-16710:
---

 Summary: Make MAX_MS_TYPENAME_LENGTH configurable
 Key: HIVE-16710
 URL: https://issues.apache.org/jira/browse/HIVE-16710
 Project: Hive
  Issue Type: Bug
Reporter: Zhiyuan Yang
Assignee: Zhiyuan Yang
 Fix For: 2.2.0


HIVE-11985 introduced type name length check in 2.0.0. Before 2.3 (HIVE-12274), 
users have no way to work around this check if they do get very long type name. 
We should make max type name length configurable before 2.3.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16690) Configure Tez cartesian product edge based on LLAP cluster size

2017-05-16 Thread Zhiyuan Yang (JIRA)
Zhiyuan Yang created HIVE-16690:
---

 Summary: Configure Tez cartesian product edge based on LLAP 
cluster size
 Key: HIVE-16690
 URL: https://issues.apache.org/jira/browse/HIVE-16690
 Project: Hive
  Issue Type: Bug
Reporter: Zhiyuan Yang
Assignee: Zhiyuan Yang


In HIVE-14731 we are using default value for target parallelism of fair 
cartesian product edge. Ideally this should be set according to cluster size. 
In case of LLAP it's pretty easy to get cluster size, i.e., number of executors.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16596) CrossProductCheck failed to detect cross product between two unions

2017-05-05 Thread Zhiyuan Yang (JIRA)
Zhiyuan Yang created HIVE-16596:
---

 Summary: CrossProductCheck failed to detect cross product between 
two unions
 Key: HIVE-16596
 URL: https://issues.apache.org/jira/browse/HIVE-16596
 Project: Hive
  Issue Type: Bug
Reporter: Zhiyuan Yang
Assignee: Zhiyuan Yang


To reproduce:
{code}
create table f (a int, b string);
set hive.auto.convert.join=false;
explain select * from (select * from f union all select * from f) a join 
(select * from f union all select * from f) b;
{code}

No cross product warning is given.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16082) Allow user to change number of listener thread in LlapTaskCommunicator

2017-03-01 Thread Zhiyuan Yang (JIRA)
Zhiyuan Yang created HIVE-16082:
---

 Summary: Allow user to change number of listener thread in 
LlapTaskCommunicator
 Key: HIVE-16082
 URL: https://issues.apache.org/jira/browse/HIVE-16082
 Project: Hive
  Issue Type: Bug
Reporter: Zhiyuan Yang
Assignee: Zhiyuan Yang


Now LlapTaskCommunicator always has same number of RPC listener thread with 
TezTaskCommunicatorImpl. There are scenarios when we want them different: for 
example, in Llap only mode, we want less TezTaskCommunicatorImpl's listener 
thread to reduce off-heap memory usage.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-14951) ArrayIndexOutOfBoundsException in GroupByOperator

2016-10-13 Thread Zhiyuan Yang (JIRA)
Zhiyuan Yang created HIVE-14951:
---

 Summary: ArrayIndexOutOfBoundsException in GroupByOperator
 Key: HIVE-14951
 URL: https://issues.apache.org/jira/browse/HIVE-14951
 Project: Hive
  Issue Type: Bug
Reporter: Zhiyuan Yang


Query:
select * from (select distinct a from f16) as f16, (select distinct a from f1) 
as fprime where f16.a = fprime.a;

Table: 
create table f1 (a int, b string);
create table f16 (a int, b string);

Config:
set hive.auto.convert.sortmerge.join=true;
set hive.auto.convert.join=false;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)

2016-09-09 Thread Zhiyuan Yang (JIRA)
Zhiyuan Yang created HIVE-14731:
---

 Summary: Use Tez cartesian product edge in Hive (unpartitioned 
case only)
 Key: HIVE-14731
 URL: https://issues.apache.org/jira/browse/HIVE-14731
 Project: Hive
  Issue Type: Bug
Reporter: Zhiyuan Yang
Assignee: Zhiyuan Yang


Given cartesian product edge is available in Tez now (see TEZ-3230), let's 
integrate it into Hive on Tez. This allows us to have more than one reducer in 
cross product queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)