On Fri, Mar 17, 2017 at 4:15 PM, Subru Krishnan <[email protected]> wrote:
> Thanks Arun for the heads-up. > > Hi Sergiy, > > We do run an UAM pool under one process (AMRMProxyService in NM) as that's > the mechanism we use to span a single job across multiple clusters that are > under federation. This is achieved by using the doAs method in > UserGroupInformation, exactly as Jason pointed out. > > The e2e *prototype* code (and docs/slides) is available in the Federation > umbrella jira: > https://issues.apache.org/jira/browse/YARN-2915 > > I have created a utility class that's used throughout YARN Federation to > create RMProxies per UGI - FederationProxyProviderUtil > <https://github.com/apache/hadoop/blob/YARN-2915/hadoop- > yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop- > yarn-server-common/src/main/java/org/apache/hadoop/yarn/ > server/federation/failover/FederationProxyProviderUtil.java> > (as part of YARN-3673 <https://issues.apache.org/jira/browse/YARN-3673>), > which should provide a good starting point for you. > > You should also keep an eye on UAM pool JIRA which Botong is working on > right now: > https://issues.apache.org/jira/browse/YARN-5531 Hi YARN devs, *Huge* thanks for your help! If I understand you correctly, that means I do not need any changes to YARN client API to run multiple AMs in one process - an excellent news! I will study the federation code and try that technique in REEF. I'll let you know how it goes. Again, thanks a lot Subru, Arun, and Jason -- you guys are awesome :) Cheers, Sergiy. > On Thu, Mar 16, 2017 at 2:49 PM, Arun Suresh <[email protected]> > wrote: > > > Hey Sergiy, > > > > I think a similar approach IIUC, where an AM for a app running on a > > cluster acts as an unmanaged AM on another cluster. I believe they use a > > separate UGI for each sub-cluster and wrap it around a doAs before the > > actual allocate call. > > > > Subru might be able to give more details. > > > > Cheers > > -Arun > > > > On Thu, Mar 16, 2017 at 2:34 PM, Jason Lowe <[email protected] > > > > wrote: > > > >> The doAs method in UserGroupInformation is what you want when dealing > >> with multiple UGIs. It determines what UGI instance the code within the > >> doAs scope gets when that code tries to lookup the current user. > >> Each AM is designed to run in a separate JVM, so each has some > >> main()-like entry point that does everything to setup the AM. > >> Theoretically all you need to do is create two, separate UGIs then use > each > >> instance to perform a doAs wrapping the invocation of the corresponding > >> AM's entry point. After that, everything that AM does will get the UGI > of > >> the doAs invocation as the current user. Since the AMs are running in > >> separate doAs instances they will get separate UGIs for the current user > >> and thus separate credentials. > >> Jason > >> > >> > >> On Thursday, March 16, 2017 4:03 PM, Sergiy Matusevych < > >> [email protected]> wrote: > >> > >> > >> Hi Jason, > >> > >> Thanks a lot for your help again! Having two separate > >> UserGroupInformation instances is exactly what I had in mind. What I do > not > >> understand, though, is how to make sure that our second call to > >> .regsiterApplicationMaster() will pick the right UserGroupInformation > >> object. I would love to find a way that does not involve any changes to > the > >> YARN client, but if we have to patch it, of course, I agree that we > need to > >> have a generic yet minimally invasive solution. > >> Thank you!Sergiy. > >> > >> > >> On Thu, Mar 16, 2017 at 8:03 AM, Jason Lowe <[email protected]> > wrote: > >> > > >> > I believe a cleaner way to solve this problem is to create two, > >> _separate_ UserGroupInformation objects and wrap each AM instances in a > UGI > >> doAs so they aren't trying to share the same credentials. This is one > >> example of a token bleeding over and causing problems. I suspect trying > to > >> fix these one-by-one as they pop up is going to be frustrating compared > to > >> just ensuring the credentials remain separate as if they really were > >> running in separate JVMs. > >> > > >> > Adding Daryn who knows a lot more about the UGI stuff so he can > correct > >> any misunderstandings on my part. > >> > > >> > Jason > >> > > >> > > >> > On Wednesday, March 15, 2017 1:11 AM, Sergiy Matusevych < > >> [email protected]> wrote: > >> > > >> > > >> > Hi YARN developers, > >> > > >> > I have an interesting problem that I think is related to YARN Java > >> client. > >> > I am trying to launch *two* application masters in one container. To > be > >> > more specific, I am starting a Spark job on YARN, and launch an Apache > >> REEF > >> > Unmanaged AM from the Spark Driver. > >> > > >> > Technically, YARN Resource Manager should not care which process each > AM > >> > runs in. However, there is a problem with the YARN Java client > >> > implementation: there is a global UserGroupInformation object that > holds > >> > the user credentials of the current RM session. This data structure is > >> > shared by all AMs, and when REEF application tries to register the > >> second > >> > (unmanaged) AM, the client library presents to YARN RM all > credentials, > >> > including the security token of the first (managed) AM. YARN rejects > >> such > >> > registration request, throwing InvalidApplicationMasterReques > tException > >> > "Application Master is already registered". > >> > > >> > I feel like this issue can be resolved by a relatively small update to > >> the > >> > YARN Java client - e.g. by introducing a new variant of the > >> > AMRMClientAsync.registerApplicationMaster() that would take the > >> required > >> > security token (instead of getting it implicitly from the > >> > UserGroupInformation.getCurrentUser().getCredentials() etc.), or > having > >> > some sort of RM session class that would wrap all data that is > currently > >> > global. I need to think about the elegant API for it. > >> > > >> > What do you guys think? I would love to work on this problem and send > >> you a > >> > pull request for the upcoming 2.9 release. > >> > > >> > Cheers, > >> > Sergiy. > >> > > >> > > >> > >> > >> > >> > > > > >
