[ 
https://issues.apache.org/jira/browse/YARN-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300289#comment-15300289
 ] 

Sangjin Lee commented on YARN-1151:
-----------------------------------

I don't think a separate configuration is necessary to handle non-local paths 
here. It may help clarifying the intent (the user of that aux service wanting 
to use non-local paths), but that is probably it. I think it would be far 
simpler to allow the current classpath config to have non-local paths and 
handle it in {{AuxServices}}. Handling a mix of local and non-local paths 
should not add to complexity, I don't think.

Having said that, I am still wary of using 
{{URL.setURLStreamHandlerFactory()}}. This method can be invoked at most once 
for a given *JVM*. If you invoke it multiple times, it will throw an exception. 
See [javadoc|http://docs.oracle.com/javase/8/docs/api/java/net/URL.html]. I do 
see that the patch does recognize the fact and tries to initialize only once. 
But note that it is safe only if this code is the only place in the NM that 
calls this method. The moment in the future we add another call to this method 
anywhere within the NM code, it will blow up. If we want to create a 
classloader that handles non-local paths, it might be better to implement that 
feature directly with the classloader implementation itself.

Then again, I'd like us to go back to the problem statement. Is it truly a good 
idea to have a JVM process/service that is backed by an hdfs jar? I can see 
many reasons why that performance could be terrible. Note that the JVM will 
open a file the moment the file is needed for a class search, and keep it open. 
Any time it needs to load a class, it will do a linear scan in the classpath 
reading the files in the classpath. It seems to me that the performance of such 
a process would be much worse than having the files locally. It might be a 
better idea to localize the jars from hdfs and start the process on top of the 
localized files.

For this reason and other reasons mentioned in earlier comments, it seems to me 
that a container-based approach would be a better one. It will have a better 
isolation, lifecycle management, logging, localization, and so on. We could 
extend the current NM auxiliary service to start it using hdfs jars, but I 
don't think it would be the best solution for the problem. My 2 cents.

> Ability to configure auxiliary services from HDFS-based JAR files
> -----------------------------------------------------------------
>
>                 Key: YARN-1151
>                 URL: https://issues.apache.org/jira/browse/YARN-1151
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>    Affects Versions: 2.1.0-beta, 2.9.0
>            Reporter: john lilley
>            Assignee: Xuan Gong
>              Labels: auxiliary-service, yarn
>         Attachments: YARN-1151.1.patch
>
>
> I would like to install an auxiliary service in Hadoop YARN without actually 
> installing files/services on every node in the system.  Discussions on the 
> user@ list indicate that this is not easily done.  The reason we want an 
> auxiliary service is that our application has some persistent-data components 
> that are not appropriate for HDFS.  In fact, they are somewhat analogous to 
> the mapper output of MapReduce's shuffle, which is what led me to 
> auxiliary-services in the first place.  It would be much easier if we could 
> just place our service's JARs in HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to