[jira] [Created] (HIVE-18099) Hive shouldn't pickup mapreduce conf for Tez
Zhiyuan Yang created HIVE-18099: --- Summary: Hive shouldn't pickup mapreduce conf for Tez Key: HIVE-18099 URL: https://issues.apache.org/jira/browse/HIVE-18099 Project: Hive Issue Type: Bug Reporter: Zhiyuan Yang Assignee: Zhiyuan Yang Right now Hive is reading some mapreduce conf for Tez engine. https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java?utf8=%E2%9C%93#L720 https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java?#L796 https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java?#L860 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-18064) Hive on Tez parallel order by
Zhiyuan Yang created HIVE-18064: --- Summary: Hive on Tez parallel order by Key: HIVE-18064 URL: https://issues.apache.org/jira/browse/HIVE-18064 Project: Hive Issue Type: Bug Reporter: Zhiyuan Yang Assignee: Zhiyuan Yang We've built parallel sorting in TEZ-3837. It does sampling as output is generated and figure out a range partitioner for shuffle edge. Each reducer output a sorted span. This is mainly for external consumption since output files need to be read in certain order. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: Review Request 62706: HIVE-17473 implement workload management pools
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/62706/#review187692 --- ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WorkloadManager.java Lines 250 (patched) <https://reviews.apache.org/r/62706/#comment264747> Why add up parallelism of parent node with children node? Shouldn't parent's paralleism be a sum of children's? ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WorkloadManager.java Lines 275 (patched) <https://reviews.apache.org/r/62706/#comment264753> This piece get really complicated right now. I think there is good chance this can be prettier. Are you going you rewrite this (as you mentioned in jira)? ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WorkloadManager.java Lines 322 (patched) <https://reviews.apache.org/r/62706/#comment264739> unreachable statement? ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestWorkloadManager.java Lines 243 (patched) <https://reviews.apache.org/r/62706/#comment264751> Why can user use non-leaf queue? The fact that sum of sub-queue can be less than parent queue's resource looks weird. Is this by design? - Zhiyuan Yang On Sept. 30, 2017, 12:57 a.m., Sergey Shelukhin wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/62706/ > --- > > (Updated Sept. 30, 2017, 12:57 a.m.) > > > Review request for hive, Zhiyuan Yang and Prasanth_J. > > > Repository: hive-git > > > Description > --- > > see jira > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java > 4f2997b95b > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/UserPoolMapping.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WorkloadManager.java > 3f621271cc > ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestWorkloadManager.java > 7adf895077 > service/src/java/org/apache/hive/service/server/HiveServer2.java 5cb973ca95 > > > Diff: https://reviews.apache.org/r/62706/diff/1/ > > > Testing > --- > > > Thanks, > > Sergey Shelukhin > >
[jira] [Created] (HIVE-17641) Visibility issue of Task.done cause Driver skip stages in parallel execution
Zhiyuan Yang created HIVE-17641: --- Summary: Visibility issue of Task.done cause Driver skip stages in parallel execution Key: HIVE-17641 URL: https://issues.apache.org/jira/browse/HIVE-17641 Project: Hive Issue Type: Bug Affects Versions: 1.2.1 Reporter: Zhiyuan Yang Assignee: Zhiyuan Yang Task.done is not volatile. In case of parallel execution, TaskRunner thread set this value, and Driver thread read this value when it determines whether a child task is runnable DriverContext.java {code} public static boolean isLaunchable(Task tsk) { return !tsk.getQueued() && !tsk.getInitialized() && tsk.isRunnable(); {code} Task.java {code} public boolean isRunnable() { boolean isrunnable = true; if (parentTasks != null) { for (Task parent : parentTasks) { if (!parent.done()) { {code} This happens without any synchronization, so a child can be not runnable even all parents finish. To make it worse, Driver think query is successful when there is no running task or runnable task, so query may finish without executing some stages. Driver.java {code} while (!destroyed && driverCxt.isRunning()) { {code} DriverContext.java {code} public synchronized boolean isRunning() { return !shutdown && (!running.isEmpty() || !runnable.isEmpty()); {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: Review Request 62091: HIVE-17386 support LLAP workload management in HS2 (low level only)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/62091/#review186173 --- Ship it! Ship It! - Zhiyuan Yang On Sept. 13, 2017, 1:04 a.m., Sergey Shelukhin wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/62091/ > --- > > (Updated Sept. 13, 2017, 1:04 a.m.) > > > Review request for hive, Zhiyuan Yang, Gunther Hagleitner, and Siddharth Seth. > > > Repository: hive-git > > > Description > --- > > see jira > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 24c5db0e47 > itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java > b3677322ca > > llap-client/src/java/org/apache/hadoop/hive/llap/tez/LlapProtocolClientProxy.java > b6501842e8 > > llap-client/src/java/org/apache/hadoop/hive/registry/impl/TezAmInstance.java > a71904cf34 > llap-client/src/test/org/apache/hadoop/hive/llap/TestAsyncPbRpcProxy.java > 1c4f0e7a09 > llap-common/src/java/org/apache/hadoop/hive/llap/AsyncPbRpcProxy.java > 7726794fea > > llap-common/src/java/org/apache/hadoop/hive/llap/impl/LlapPluginProtocolClientImpl.java > 19e81e6fa5 > llap-common/src/java/org/apache/hadoop/hive/llap/impl/ProtobufProxy.java > fa99536bea > llap-common/src/protobuf/LlapPluginProtocol.proto 39349b119d > > llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java > 26747fc5ca > > llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/endpoint/LlapPluginServerImpl.java > 4d5333f995 > ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 93a36c612d > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/AmPluginNode.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/GuaranteedTasksAllocator.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/LlapPluginEndpointClient.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/LlapPluginEndpointClientImpl.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/QueryAllocationManager.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java > 6e8122dc85 > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java > 9f721553d6 > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolSession.java > 8ecdbbf999 > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java > 170de2143d > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java e6e236de6e > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WmTezSession.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WorkloadManager.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/monitoring/TezJobMonitor.java > 9e2846ca6c > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LlapClusterStateForCompile.java > 7a02a563e9 > ql/src/test/org/apache/hadoop/hive/ql/exec/tez/SampleTezSessionState.java > 4e5d99134b > > ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestGuaranteedTaskAllocator.java > PRE-CREATION > ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezSessionPool.java > 5e1e68cfa8 > ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezTask.java 9b9eead0af > ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestWorkloadManager.java > PRE-CREATION > service/src/java/org/apache/hive/service/server/HiveServer2.java e5f449122b > > > Diff: https://reviews.apache.org/r/62091/diff/4/ > > > Testing > --- > > > Thanks, > > Sergey Shelukhin > >
Re: Review Request 62091: HIVE-17386 support LLAP workload management in HS2 (low level only)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/62091/#review185018 --- common/src/java/org/apache/hadoop/hive/conf/HiveConf.java Lines 2385-2386 (patched) <https://reviews.apache.org/r/62091/#comment261351> Should mention setting this conf means enable workload management ql/src/java/org/apache/hadoop/hive/ql/exec/tez/GuaranteedTasksAllocator.java Lines 101-106 (patched) <https://reviews.apache.org/r/62091/#comment261379> Why is this here given it's already a daemon thread ql/src/java/org/apache/hadoop/hive/ql/exec/tez/GuaranteedTasksAllocator.java Lines 147 (patched) <https://reviews.apache.org/r/62091/#comment261433> Additional define statement will be better. ql/src/java/org/apache/hadoop/hive/ql/exec/tez/GuaranteedTasksAllocator.java Lines 191 (patched) <https://reviews.apache.org/r/62091/#comment261361> How would AM registry help in AM recovery? If that's not the case, this piece means any update during AM failure & recovery will fail the session, which make AM recovery in vain. ql/src/java/org/apache/hadoop/hive/ql/exec/tez/GuaranteedTasksAllocator.java Lines 201-215 (patched) <https://reviews.apache.org/r/62091/#comment261362> You are really determined to knock out that field... ql/src/java/org/apache/hadoop/hive/ql/exec/tez/LlapPluginEndpointClientImpl.java Lines 61 (patched) <https://reviews.apache.org/r/62091/#comment261251> git apply complains HIVE-17386.02.patch:1162: trailing whitespace. } warning: 1 line adds whitespace errors. ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WorkloadManager.java Lines 220 (patched) <https://reviews.apache.org/r/62091/#comment261507> Wrong log message service/src/java/org/apache/hive/service/server/HiveServer2.java Lines 169 (patched) <https://reviews.apache.org/r/62091/#comment261514> Where is the code that really put this wm instance in use? Additional jira? - Zhiyuan Yang On Sept. 5, 2017, 6:52 p.m., Sergey Shelukhin wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/62091/ > --- > > (Updated Sept. 5, 2017, 6:52 p.m.) > > > Review request for hive, Zhiyuan Yang, Gunther Hagleitner, and Siddharth Seth. > > > Repository: hive-git > > > Description > --- > > see jira > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6de07d2e76 > itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java > b3677322ca > > llap-client/src/java/org/apache/hadoop/hive/llap/tez/LlapProtocolClientProxy.java > b6501842e8 > llap-client/src/test/org/apache/hadoop/hive/llap/TestAsyncPbRpcProxy.java > 1c4f0e7a09 > llap-common/src/java/org/apache/hadoop/hive/llap/AsyncPbRpcProxy.java > 7726794fea > > llap-common/src/java/org/apache/hadoop/hive/llap/impl/LlapPluginProtocolClientImpl.java > 19e81e6fa5 > llap-common/src/java/org/apache/hadoop/hive/llap/impl/ProtobufProxy.java > fa99536bea > llap-common/src/protobuf/LlapPluginProtocol.proto 39349b119d > > llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java > cf8bd469dc > > llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/endpoint/LlapPluginServerImpl.java > f3c0d5213f > ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 93a36c612d > > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/GuaranteedTasksAllocator.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/LlapPluginEndpointClient.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/LlapPluginEndpointClientImpl.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/QueryAllocationManager.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java > 4f58565a4c > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java > 1f4705c083 > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolSession.java > 005eeedc02 > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java > fe5c6a1e45 > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java f1f10286a3 > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WmTezSession.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WorkloadManager.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/monitoring/TezJobMonitor.java
[jira] [Created] (HIVE-17393) AMReporter need hearbeat every external 'AM'
Zhiyuan Yang created HIVE-17393: --- Summary: AMReporter need hearbeat every external 'AM' Key: HIVE-17393 URL: https://issues.apache.org/jira/browse/HIVE-17393 Project: Hive Issue Type: Bug Components: llap Reporter: Zhiyuan Yang Assignee: Zhiyuan Yang AMReporter only remember first AM that submit the query and heartbeat to it. In case of external client, there might be multiple 'AM's and every of them need node heartbeat. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17228) Bump tez version to 0.9.0
Zhiyuan Yang created HIVE-17228: --- Summary: Bump tez version to 0.9.0 Key: HIVE-17228 URL: https://issues.apache.org/jira/browse/HIVE-17228 Project: Hive Issue Type: Bug Reporter: Zhiyuan Yang Assignee: Zhiyuan Yang -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17047) Allow table property to be populated to jobConf to make FixedLengthInputFormat work
Zhiyuan Yang created HIVE-17047: --- Summary: Allow table property to be populated to jobConf to make FixedLengthInputFormat work Key: HIVE-17047 URL: https://issues.apache.org/jira/browse/HIVE-17047 Project: Hive Issue Type: Bug Reporter: Zhiyuan Yang Assignee: Zhiyuan Yang Fix For: 1.2.1 To make FixedLengthInputFormat work in Hive, we need table specific value for the configuration "fixedlengthinputformat.record.length". Right now the best place would be table property. Unfortunately, table property is not alway populated to InputFormat configurations because of this in HiveInputFormat: {code} PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString()); if ((part != null) && (part.getTableDesc() != null)) { {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-16710) Make MAX_MS_TYPENAME_LENGTH configurable
Zhiyuan Yang created HIVE-16710: --- Summary: Make MAX_MS_TYPENAME_LENGTH configurable Key: HIVE-16710 URL: https://issues.apache.org/jira/browse/HIVE-16710 Project: Hive Issue Type: Bug Reporter: Zhiyuan Yang Assignee: Zhiyuan Yang Fix For: 2.2.0 HIVE-11985 introduced type name length check in 2.0.0. Before 2.3 (HIVE-12274), users have no way to work around this check if they do get very long type name. We should make max type name length configurable before 2.3. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16690) Configure Tez cartesian product edge based on LLAP cluster size
Zhiyuan Yang created HIVE-16690: --- Summary: Configure Tez cartesian product edge based on LLAP cluster size Key: HIVE-16690 URL: https://issues.apache.org/jira/browse/HIVE-16690 Project: Hive Issue Type: Bug Reporter: Zhiyuan Yang Assignee: Zhiyuan Yang In HIVE-14731 we are using default value for target parallelism of fair cartesian product edge. Ideally this should be set according to cluster size. In case of LLAP it's pretty easy to get cluster size, i.e., number of executors. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16596) CrossProductCheck failed to detect cross product between two unions
Zhiyuan Yang created HIVE-16596: --- Summary: CrossProductCheck failed to detect cross product between two unions Key: HIVE-16596 URL: https://issues.apache.org/jira/browse/HIVE-16596 Project: Hive Issue Type: Bug Reporter: Zhiyuan Yang Assignee: Zhiyuan Yang To reproduce: {code} create table f (a int, b string); set hive.auto.convert.join=false; explain select * from (select * from f union all select * from f) a join (select * from f union all select * from f) b; {code} No cross product warning is given. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16082) Allow user to change number of listener thread in LlapTaskCommunicator
Zhiyuan Yang created HIVE-16082: --- Summary: Allow user to change number of listener thread in LlapTaskCommunicator Key: HIVE-16082 URL: https://issues.apache.org/jira/browse/HIVE-16082 Project: Hive Issue Type: Bug Reporter: Zhiyuan Yang Assignee: Zhiyuan Yang Now LlapTaskCommunicator always has same number of RPC listener thread with TezTaskCommunicatorImpl. There are scenarios when we want them different: for example, in Llap only mode, we want less TezTaskCommunicatorImpl's listener thread to reduce off-heap memory usage. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-14951) ArrayIndexOutOfBoundsException in GroupByOperator
Zhiyuan Yang created HIVE-14951: --- Summary: ArrayIndexOutOfBoundsException in GroupByOperator Key: HIVE-14951 URL: https://issues.apache.org/jira/browse/HIVE-14951 Project: Hive Issue Type: Bug Reporter: Zhiyuan Yang Query: select * from (select distinct a from f16) as f16, (select distinct a from f1) as fprime where f16.a = fprime.a; Table: create table f1 (a int, b string); create table f16 (a int, b string); Config: set hive.auto.convert.sortmerge.join=true; set hive.auto.convert.join=false; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)
Zhiyuan Yang created HIVE-14731: --- Summary: Use Tez cartesian product edge in Hive (unpartitioned case only) Key: HIVE-14731 URL: https://issues.apache.org/jira/browse/HIVE-14731 Project: Hive Issue Type: Bug Reporter: Zhiyuan Yang Assignee: Zhiyuan Yang Given cartesian product edge is available in Tez now (see TEZ-3230), let's integrate it into Hive on Tez. This allows us to have more than one reducer in cross product queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)