RE: Parallel queries to HS2/Tez
Hi Siddharth, Thank you for the reply, please note I am running apache-hive-1.2.1, I assume LLAP is 2.X onwards? The settings certainly work for me as I have a setup to reproduce the reported error pretty easily. >In terms of Tez local mode - there's a jira open to support concurrent DAGs. I guess the one I have reported TEZ-3420 be marked as duplicate, I am happy to test when a patch is available. Regards, Uday Kantar Media | Audience Intelligence | T: +44 (0)20 8967 4760| Mob: +44 (0) 7825 675509 - uday.chitra...@kantarmedia.com<mailto:uday.chitra...@kantarmedia.com> From: Siddharth Seth [mailto:ss...@apache.org] Sent: 07 October 2016 17:44 To: user@tez.apache.org Subject: Re: Parallel queries to HS2/Tez Uday, Are you running this with LLAP? Without LLAP. as far as I know, Hive will launch additional sessions even with the settings that you mentioned. If this is working for you - great. You may want to send out a note to the hive community, or open a jira requesting control over concurrent queries via a simpler option (instead of having to configure 3 parameters, potentially 2 more for thread pool sizes) In terms of Tez local mode - there's a jira open to support concurrent DAGs. On Thu, Oct 6, 2016 at 3:36 AM, Chitragar, Uday (KMLWG) mailto:uday.chitra...@kantarmedia.com>> wrote: Just FYI: Following settings throttle the requests sequentially as a workaround. hive.server2.tez.initialize.default.sessions=true hive.server2.tez.default.queues=default hive.server2.tez.sessions.per.default.queue=1 Regards, Uday Kantar Media | Audience Intelligence | T: +44 (0)20 8967 4760| Mob: +44 (0) 7825 675509 - uday.chitra...@kantarmedia.com<mailto:uday.chitra...@kantarmedia.com> -Original Message- From: Hitesh Shah [mailto:hit...@apache.org<mailto:hit...@apache.org>] Sent: 29 August 2016 21:42 To: user@tez.apache.org<mailto:user@tez.apache.org> Subject: Re: Parallel queries to HS2/Tez I think there are some thread pool related settings in HiveServer2 which could be used to throttle the no. of concurrent queries down to 1. One quick search led me to https://issues.apache.org/jira/browse/HIVE-5229<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HIVE-2D5229&d=DQMFaQ&c=zdK58V2JKULZdB8nuBRpog&r=p-90rivnY3ZNWQowg9GzRcPkl8oZ7brZ3jcbrVIesCk&m=qYdOVeVY5L5GBkMYoOMv5dl6fwVxeWz1v0n-Bzz7q1U&s=vX6CE7q1JHZ5vTFckaCeO7mlMLGd7HLYRfwDKkL-cjU&e=> but you may wish to ask the same question on the hive mailing lists for a definitive answer. thanks - Hitesh > On Aug 27, 2016, at 1:02 AM, Chitragar, Uday (KMLWG) > mailto:uday.chitra...@kantarmedia.com>> wrote: > > Hi Hitesh, > > Thank you for the advice. While I get dev help on TEZ-3420, are there any > recommendations in terms of configuring HIVE/HS2 to run the dags > sequentially? Interestingly this is not a problem with HDP deployment which > obviously has a 'fuller' setup. Local mode really helps to test. > > Thank you, > Uday > From: Hitesh Shah mailto:hit...@apache.org>> > Sent: 25 August 2016 20:06:30 > To: user@tez.apache.org<mailto:user@tez.apache.org> > Subject: Re: Parallel queries to HS2/Tez > > Hello Uday, > > I don't believe anyone has tried running 2 dags in parallel in local mode > within the same TezClient ( and definitely not for HiveServer2 ). If this is > with 2 instances of Tez client, this could likely be a bug in terms of either > how Hive is setting up the TezClient for local mode with the same directories > or a bug somewhere in Tez where clashing directories for intermediate data > might be causing an issue. FWIW, the Tez AM does not support running 2 dags > in parallel and quite a bit of this code path is used with local mode. > > It would be great if you could file a JIRA for this with more detailed logs > and then take help of the dev community to come up with a patch that > addresses the issue in your environment. > > thanks > - Hitesh > > > > > > > On Aug 25, 2016, at 8:34 AM, Chitragar, Uday (KMLWG) > > mailto:uday.chitra...@kantarmedia.com>> > > wrote: > > > > Hello, > > > > When running parallel queries (simultaneous connections by two beeline > > clients to HS2), I get the following exception (full debug attached), > > interestingly running the queries one after the other completes without any > > problem. > > > > The setup is Hive (1.2.1) and Tez (0.8.4) running in local mode. > > Apologies in advance if this forum is not the right place for this > > question, thank you. > > > > 2016-08-25 15:45:41,333 DEBUG > > [TezTaskEventRouter{attempt_1472136335089_0001_1_01_00_0}]: > > impl.ShuffleInputEventH
Re: Parallel queries to HS2/Tez
Uday, Are you running this with LLAP? Without LLAP. as far as I know, Hive will launch additional sessions even with the settings that you mentioned. If this is working for you - great. You may want to send out a note to the hive community, or open a jira requesting control over concurrent queries via a simpler option (instead of having to configure 3 parameters, potentially 2 more for thread pool sizes) In terms of Tez local mode - there's a jira open to support concurrent DAGs. On Thu, Oct 6, 2016 at 3:36 AM, Chitragar, Uday (KMLWG) < uday.chitra...@kantarmedia.com> wrote: > Just FYI: > > Following settings throttle the requests sequentially as a workaround. > > hive.server2.tez.initialize.default.sessions=true > hive.server2.tez.default.queues=default > hive.server2.tez.sessions.per.default.queue=1 > > Regards, > Uday > > Kantar Media | Audience Intelligence | T: +44 (0)20 8967 4760| Mob: +44 > (0) 7825 675509 - uday.chitra...@kantarmedia.com > > -Original Message- > From: Hitesh Shah [mailto:hit...@apache.org] > Sent: 29 August 2016 21:42 > To: user@tez.apache.org > Subject: Re: Parallel queries to HS2/Tez > > I think there are some thread pool related settings in HiveServer2 which > could be used to throttle the no. of concurrent queries down to 1. One > quick search led me to https://issues.apache.org/jira/browse/HIVE-5229 > but you may wish to ask the same question on the hive mailing lists for a > definitive answer. > > thanks > - Hitesh > > > > On Aug 27, 2016, at 1:02 AM, Chitragar, Uday (KMLWG) > wrote: > > > > Hi Hitesh, > > > > Thank you for the advice. While I get dev help on TEZ-3420, are there > any recommendations in terms of configuring HIVE/HS2 to run the dags > sequentially? Interestingly this is not a problem with HDP deployment which > obviously has a 'fuller' setup. Local mode really helps to test. > > > > Thank you, > > Uday > > From: Hitesh Shah > > Sent: 25 August 2016 20:06:30 > > To: user@tez.apache.org > > Subject: Re: Parallel queries to HS2/Tez > > > > Hello Uday, > > > > I don't believe anyone has tried running 2 dags in parallel in local > mode within the same TezClient ( and definitely not for HiveServer2 ). If > this is with 2 instances of Tez client, this could likely be a bug in terms > of either how Hive is setting up the TezClient for local mode with the same > directories or a bug somewhere in Tez where clashing directories for > intermediate data might be causing an issue. FWIW, the Tez AM does not > support running 2 dags in parallel and quite a bit of this code path is > used with local mode. > > > > It would be great if you could file a JIRA for this with more detailed > logs and then take help of the dev community to come up with a patch that > addresses the issue in your environment. > > > > thanks > > - Hitesh > > > > > > > > > > > > > On Aug 25, 2016, at 8:34 AM, Chitragar, Uday (KMLWG) > wrote: > > > > > > Hello, > > > > > > When running parallel queries (simultaneous connections by two beeline > clients to HS2), I get the following exception (full debug attached), > interestingly running the queries one after the other completes without any > problem. > > > > > > The setup is Hive (1.2.1) and Tez (0.8.4) running in local mode. > > > Apologies in advance if this forum is not the right place for this > question, thank you. > > > > > > 2016-08-25 15:45:41,333 DEBUG > > > [TezTaskEventRouter{attempt_1472136335089_0001_1_01_00_0}]: > > > impl.ShuffleInputEventHandlerImpl > > > (ShuffleInputEventHandlerImpl.java:processDataMovementEvent(127)) - > > > DME srcIdx: 0, targetIndex: 9, attemptNum > > > : 0, payload: [hasEmptyPartitions: true, host: , port: 0, > > > pathComponent: , runDuration: 0] > > > 2016-08-25 15:45:41,557 ERROR [TezChild]: tez.MapRecordSource > (MapRecordSource.java:processRow(90)) - java.lang.IllegalStateException: > Invalid input path file:/acorn/QC/OraExtract/20160131/Devices/Devices_ > extract_20160229T080613_3 > > > at org.apache.hadoop.hive.ql.exec.MapOperator. > getNominalPath(MapOperator.java:415) > > > at org.apache.hadoop.hive.ql.exec.MapOperator. > cleanUpInputFileChangedOp(MapOperator.java:457) > > > at org.apache.hadoop.hive.ql.exec.Operator. > cleanUpInputFileChanged(Operator.java:1069) > > > at > > > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java: > > > 501) > > > > > > > > > > > > 2016-08-25 15:45:41,817 INFO [TezChild]: > > > io.HiveContextAwareRecordReader > > > (HiveContextAwareRecordReader.java:doNext(326)) - Cannot get > > > partition description from > > > file:/acorn/QC/reportlib/VM_ValEdit.24656because cannot find dir = > > > file:/ac > > > orn/QC/reportlib/VM_ValEdit.24656 in pathToPartitionInfo: > > > [file:/acorn/QC/OraExtract/20160131/Devices] > > > > > > > > > > > > Regards, > > > Uday > > > > > > > > > > > > > > > Kantar Disclaimer > >
RE: Parallel queries to HS2/Tez
Just FYI: Following settings throttle the requests sequentially as a workaround. hive.server2.tez.initialize.default.sessions=true hive.server2.tez.default.queues=default hive.server2.tez.sessions.per.default.queue=1 Regards, Uday Kantar Media | Audience Intelligence | T: +44 (0)20 8967 4760| Mob: +44 (0) 7825 675509 - uday.chitra...@kantarmedia.com -Original Message- From: Hitesh Shah [mailto:hit...@apache.org] Sent: 29 August 2016 21:42 To: user@tez.apache.org Subject: Re: Parallel queries to HS2/Tez I think there are some thread pool related settings in HiveServer2 which could be used to throttle the no. of concurrent queries down to 1. One quick search led me to https://issues.apache.org/jira/browse/HIVE-5229 but you may wish to ask the same question on the hive mailing lists for a definitive answer. thanks - Hitesh > On Aug 27, 2016, at 1:02 AM, Chitragar, Uday (KMLWG) > wrote: > > Hi Hitesh, > > Thank you for the advice. While I get dev help on TEZ-3420, are there any > recommendations in terms of configuring HIVE/HS2 to run the dags > sequentially? Interestingly this is not a problem with HDP deployment which > obviously has a 'fuller' setup. Local mode really helps to test. > > Thank you, > Uday > From: Hitesh Shah > Sent: 25 August 2016 20:06:30 > To: user@tez.apache.org > Subject: Re: Parallel queries to HS2/Tez > > Hello Uday, > > I don't believe anyone has tried running 2 dags in parallel in local mode > within the same TezClient ( and definitely not for HiveServer2 ). If this is > with 2 instances of Tez client, this could likely be a bug in terms of either > how Hive is setting up the TezClient for local mode with the same directories > or a bug somewhere in Tez where clashing directories for intermediate data > might be causing an issue. FWIW, the Tez AM does not support running 2 dags > in parallel and quite a bit of this code path is used with local mode. > > It would be great if you could file a JIRA for this with more detailed logs > and then take help of the dev community to come up with a patch that > addresses the issue in your environment. > > thanks > - Hitesh > > > > > > > On Aug 25, 2016, at 8:34 AM, Chitragar, Uday (KMLWG) > > wrote: > > > > Hello, > > > > When running parallel queries (simultaneous connections by two beeline > > clients to HS2), I get the following exception (full debug attached), > > interestingly running the queries one after the other completes without any > > problem. > > > > The setup is Hive (1.2.1) and Tez (0.8.4) running in local mode. > > Apologies in advance if this forum is not the right place for this > > question, thank you. > > > > 2016-08-25 15:45:41,333 DEBUG > > [TezTaskEventRouter{attempt_1472136335089_0001_1_01_00_0}]: > > impl.ShuffleInputEventHandlerImpl > > (ShuffleInputEventHandlerImpl.java:processDataMovementEvent(127)) - > > DME srcIdx: 0, targetIndex: 9, attemptNum > > : 0, payload: [hasEmptyPartitions: true, host: , port: 0, > > pathComponent: , runDuration: 0] > > 2016-08-25 15:45:41,557 ERROR [TezChild]: tez.MapRecordSource > > (MapRecordSource.java:processRow(90)) - java.lang.IllegalStateException: > > Invalid input path > > file:/acorn/QC/OraExtract/20160131/Devices/Devices_extract_20160229T080613_3 > > at > > org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:415) > > at > > org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:457) > > at > > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1069) > > at > > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java: > > 501) > > > > > > > > 2016-08-25 15:45:41,817 INFO [TezChild]: > > io.HiveContextAwareRecordReader > > (HiveContextAwareRecordReader.java:doNext(326)) - Cannot get > > partition description from > > file:/acorn/QC/reportlib/VM_ValEdit.24656because cannot find dir = > > file:/ac > > orn/QC/reportlib/VM_ValEdit.24656 in pathToPartitionInfo: > > [file:/acorn/QC/OraExtract/20160131/Devices] > > > > > > > > Regards, > > Uday > > > > > > > > > > Kantar Disclaimer
Re: Parallel queries to HS2/Tez
I think there are some thread pool related settings in HiveServer2 which could be used to throttle the no. of concurrent queries down to 1. One quick search led me to https://issues.apache.org/jira/browse/HIVE-5229 but you may wish to ask the same question on the hive mailing lists for a definitive answer. thanks — Hitesh > On Aug 27, 2016, at 1:02 AM, Chitragar, Uday (KMLWG) > wrote: > > Hi Hitesh, > > Thank you for the advice. While I get dev help on TEZ-3420, are there any > recommendations in terms of configuring HIVE/HS2 to run the dags > sequentially? Interestingly this is not a problem with HDP deployment which > obviously has a 'fuller' setup. Local mode really helps to test. > > Thank you, > Uday > From: Hitesh Shah > Sent: 25 August 2016 20:06:30 > To: user@tez.apache.org > Subject: Re: Parallel queries to HS2/Tez > > Hello Uday, > > I don’t believe anyone has tried running 2 dags in parallel in local mode > within the same TezClient ( and definitely not for HiveServer2 ). If this is > with 2 instances of Tez client, this could likely be a bug in terms of either > how Hive is setting up the TezClient for local mode with the same directories > or a bug somewhere in Tez where clashing directories for intermediate data > might be causing an issue. FWIW, the Tez AM does not support running 2 dags > in parallel and quite a bit of this code path is used with local mode. > > It would be great if you could file a JIRA for this with more detailed logs > and then take help of the dev community to come up with a patch that > addresses the issue in your environment. > > thanks > — Hitesh > > > > > > > On Aug 25, 2016, at 8:34 AM, Chitragar, Uday (KMLWG) > > wrote: > > > > Hello, > > > > When running parallel queries (simultaneous connections by two beeline > > clients to HS2), I get the following exception (full debug attached), > > interestingly running the queries one after the other completes without any > > problem. > > > > The setup is Hive (1.2.1) and Tez (0.8.4) running in local mode. > > Apologies in advance if this forum is not the right place for this > > question, thank you. > > > > 2016-08-25 15:45:41,333 DEBUG > > [TezTaskEventRouter{attempt_1472136335089_0001_1_01_00_0}]: > > impl.ShuffleInputEventHandlerImpl > > (ShuffleInputEventHandlerImpl.java:processDataMovementEvent(127)) - DME > > srcIdx: 0, targetIndex: 9, attemptNum > > : 0, payload: [hasEmptyPartitions: true, host: , port: 0, pathComponent: , > > runDuration: 0] > > 2016-08-25 15:45:41,557 ERROR [TezChild]: tez.MapRecordSource > > (MapRecordSource.java:processRow(90)) - java.lang.IllegalStateException: > > Invalid input path > > file:/acorn/QC/OraExtract/20160131/Devices/Devices_extract_20160229T080613_3 > > at > > org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:415) > > at > > org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:457) > > at > > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1069) > > at > > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:501) > > > > > > > > 2016-08-25 15:45:41,817 INFO [TezChild]: io.HiveContextAwareRecordReader > > (HiveContextAwareRecordReader.java:doNext(326)) – > > Cannot get partition description from > > file:/acorn/QC/reportlib/VM_ValEdit.24656because cannot find dir = file:/ac > > orn/QC/reportlib/VM_ValEdit.24656 in pathToPartitionInfo: > > [file:/acorn/QC/OraExtract/20160131/Devices] > > > > > > > > Regards, > > Uday > > > > > > > > > > Kantar Disclaimer
Re: Parallel queries to HS2/Tez
Hi Hitesh, Thank you for the advice. While I get dev help on TEZ-3420<https://issues.apache.org/jira/browse/TEZ-3420>, are there any recommendations in terms of configuring HIVE/HS2 to run the dags sequentially? Interestingly this is not a problem with HDP deployment which obviously has a 'fuller' setup. Local mode really helps to test. Thank you, Uday From: Hitesh Shah Sent: 25 August 2016 20:06:30 To: user@tez.apache.org Subject: Re: Parallel queries to HS2/Tez Hello Uday, I don’t believe anyone has tried running 2 dags in parallel in local mode within the same TezClient ( and definitely not for HiveServer2 ). If this is with 2 instances of Tez client, this could likely be a bug in terms of either how Hive is setting up the TezClient for local mode with the same directories or a bug somewhere in Tez where clashing directories for intermediate data might be causing an issue. FWIW, the Tez AM does not support running 2 dags in parallel and quite a bit of this code path is used with local mode. It would be great if you could file a JIRA for this with more detailed logs and then take help of the dev community to come up with a patch that addresses the issue in your environment. thanks — Hitesh > On Aug 25, 2016, at 8:34 AM, Chitragar, Uday (KMLWG) > wrote: > > Hello, > > When running parallel queries (simultaneous connections by two beeline > clients to HS2), I get the following exception (full debug attached), > interestingly running the queries one after the other completes without any > problem. > > The setup is Hive (1.2.1) and Tez (0.8.4) running in local mode. > Apologies in advance if this forum is not the right place for this question, > thank you. > > 2016-08-25 15:45:41,333 DEBUG > [TezTaskEventRouter{attempt_1472136335089_0001_1_01_00_0}]: > impl.ShuffleInputEventHandlerImpl > (ShuffleInputEventHandlerImpl.java:processDataMovementEvent(127)) - DME > srcIdx: 0, targetIndex: 9, attemptNum > : 0, payload: [hasEmptyPartitions: true, host: , port: 0, pathComponent: , > runDuration: 0] > 2016-08-25 15:45:41,557 ERROR [TezChild]: tez.MapRecordSource > (MapRecordSource.java:processRow(90)) - java.lang.IllegalStateException: > Invalid input path > file:/acorn/QC/OraExtract/20160131/Devices/Devices_extract_20160229T080613_3 > at > org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:415) > at > org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:457) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1069) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:501) > > > > 2016-08-25 15:45:41,817 INFO [TezChild]: io.HiveContextAwareRecordReader > (HiveContextAwareRecordReader.java:doNext(326)) – > Cannot get partition description from > file:/acorn/QC/reportlib/VM_ValEdit.24656because cannot find dir = file:/ac > orn/QC/reportlib/VM_ValEdit.24656 in pathToPartitionInfo: > [file:/acorn/QC/OraExtract/20160131/Devices] > > > > Regards, > Uday > > > > > Kantar Disclaimer
Re: Parallel queries to HS2/Tez
Hello Uday, I don’t believe anyone has tried running 2 dags in parallel in local mode within the same TezClient ( and definitely not for HiveServer2 ). If this is with 2 instances of Tez client, this could likely be a bug in terms of either how Hive is setting up the TezClient for local mode with the same directories or a bug somewhere in Tez where clashing directories for intermediate data might be causing an issue. FWIW, the Tez AM does not support running 2 dags in parallel and quite a bit of this code path is used with local mode. It would be great if you could file a JIRA for this with more detailed logs and then take help of the dev community to come up with a patch that addresses the issue in your environment. thanks — Hitesh > On Aug 25, 2016, at 8:34 AM, Chitragar, Uday (KMLWG) > wrote: > > Hello, > > When running parallel queries (simultaneous connections by two beeline > clients to HS2), I get the following exception (full debug attached), > interestingly running the queries one after the other completes without any > problem. > > The setup is Hive (1.2.1) and Tez (0.8.4) running in local mode. > Apologies in advance if this forum is not the right place for this question, > thank you. > > 2016-08-25 15:45:41,333 DEBUG > [TezTaskEventRouter{attempt_1472136335089_0001_1_01_00_0}]: > impl.ShuffleInputEventHandlerImpl > (ShuffleInputEventHandlerImpl.java:processDataMovementEvent(127)) - DME > srcIdx: 0, targetIndex: 9, attemptNum > : 0, payload: [hasEmptyPartitions: true, host: , port: 0, pathComponent: , > runDuration: 0] > 2016-08-25 15:45:41,557 ERROR [TezChild]: tez.MapRecordSource > (MapRecordSource.java:processRow(90)) - java.lang.IllegalStateException: > Invalid input path > file:/acorn/QC/OraExtract/20160131/Devices/Devices_extract_20160229T080613_3 > at > org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:415) > at > org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:457) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1069) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:501) > > > > 2016-08-25 15:45:41,817 INFO [TezChild]: io.HiveContextAwareRecordReader > (HiveContextAwareRecordReader.java:doNext(326)) – > Cannot get partition description from > file:/acorn/QC/reportlib/VM_ValEdit.24656because cannot find dir = file:/ac > orn/QC/reportlib/VM_ValEdit.24656 in pathToPartitionInfo: > [file:/acorn/QC/OraExtract/20160131/Devices] > > > > Regards, > Uday > > > > > Kantar Disclaimer