Thanks guys, always helpful. On Mon, Mar 9, 2015 at 7:37 PM, Hitesh Shah <[email protected]> wrote:
> A clarification for (2), you can share an AM across multiple users by > using form of proxy users and passing in the required delegation tokens to > talk to various services such as HDFS. Also, HiveServer2 when the doAs mode > is set to false, runs all AMs as user hive but can effectively run queries > for various different users by doing its security check at the “perimeter”. > > — Hitesh > > On Mar 9, 2015, at 10:30 AM, Bikas Saha <[email protected]> wrote: > > > >>(1)- For every TEZ AM it is possible to launch just a single query/DAG > at a time. So within a given AM several DAGs can be executed only in > sequential order (a.k.a. a session), not in parallel. To execute DAGs in > parallel we always need several AMs. > > > > Correct. Today a single AM will accept new DAGs when the AM is idle and > run them. An AM is idle when no DAG is running. > > > > >>(2)- The AM is user-specific, and each user is expected to run queries > through its own AM (or on multiple AMs if there is a need for parallelism). > > > > Correct in a secure cluster. In a non-secure cluster an AM runs as the > yarn user which is common to all AMs. In a secure cluster, any entity that > has been given a client token (for that app attempt) by the RM, can > communicate with the AM. In a non-secure cluster, any entity that has > obtained the AMs connection information from the RM can communicate with > the AM. The AM has an additional set of ACL’s that determine who can > submit, view, modify DAGs. > > > > >>(3)- Several users can submit their DAGs as the same user (e.g.: > through hiveserver2), but in this case we will still have several AM. > > > > Correct. However, the number of AMs will be determined by the policy of > the mediating server. It may choose to launch a new AM for every new DAG. > Or queue up and round robin through a limited set of AMs, etc. > > > > Bikas > > > > From: Fabio C. [mailto:[email protected]] > > Sent: Monday, March 09, 2015 4:31 AM > > To: [email protected]; [email protected] > > Subject: Parallel queries/dags running in same AM? > > > > Hi all, > > I've been using Tez on hive, and I had a chance to hear a conversation > that mismatches with my present knowledge, can anyone confirm the following > statement? > > (1)- For every TEZ AM it is possible to launch just a single query/DAG > at a time. So within a given AM several DAGs can be executed only in > sequential order (a.k.a. a session), not in parallel. To execute DAGs in > parallel we always need several AMs. > > (2)- The AM is user-specific, and each user is expected to run queries > through its own AM (or on multiple AMs if there is a need for parallelism). > > (3)- Several users can submit their DAGs as the same user (e.g.: through > hiveserver2), but in this case we will still have several AM. > > > > Thanks in advance > > > > Fabio > >
