[
https://issues.apache.org/jira/browse/YARN-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13759388#comment-13759388
]
john lilley commented on YARN-1151:
-----------------------------------
Discussion thread from user@ list:
Please log a JIRA on https://issues.apache.org/jira/browse/YARN (do let the
thread know the ID as well, in spirit of http://xkcd.com/979/) :)
On Thu, Sep 5, 2013 at 11:41 PM, John Lilley <[email protected]> wrote:
> Harsh,
>
> Thanks as usual for your sage advice. I was hoping to avoid actually
> installing anything on individual Hadoop nodes and finessing the service by
> spawning it from a task using LocalResources, but this is probably fraught
> with trouble.
> FWIW, I would vote to be able to load YARN services from HDFS. What is the
> appropriate forum to file a request like that?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:[email protected]]
> Sent: Wednesday, September 04, 2013 12:05 AM
> To: <[email protected]>
> Subject: Re: yarn-site.xml and aux-services
>
>> Thanks for the clarification. I would find it very convenient in this case
>> to have my custom jars available in HDFS, but I can see the added complexity
>> needed for YARN to maintain cache those to local disk.
>
> We could class-load directly from HDFS, like HBase Co-Processors do.
>
>> Consider a scenario analogous to the MR shuffle, where the persistent
>> service serves up mapper output files to the reducers across the network:
>
> Isn't this more complex than just running a dedicated service all the time,
> and/or implementing a way to spawn/end a dedicated service temporarily? I'd
> pick trying to implement such a thing than have my containers implement more
> logic.
>
> On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <[email protected]>
> wrote:
>> Harsh,
>>
>> Thanks for the clarification. I would find it very convenient in this case
>> to have my custom jars available in HDFS, but I can see the added complexity
>> needed for YARN to maintain cache those to local disk.
>>
>> What about having the tasks themselves start the per-node service as a child
>> process? I've been told that the NM kills the process group, but won't
>> setgrp() circumvent that?
>>
>> Even given that, would the child process of one task have proper environment
>> and permission to act on behalf of other tasks? Consider a scenario
>> analogous to the MR shuffle, where the persistent service serves up mapper
>> output files to the reducers across the network:
>> 1) AM spawns "mapper-like" tasks around the cluster
>> 2) Each mapper-like task on a given node launches a "persistent service"
>> child, but only if one is not already running.
>> 3) Each mapper-like task writes one or more output files, and informs the
>> service of those files (along with AM-id, Task-id etc).
>> 4) AM spawns "reducer-like" tasks around the cluster.
>> 5) Each reducer-like task is told which nodes contain "mapper" result data,
>> and connects to services on those nodes to read the data.
>>
>> There are some details missing, like how the lifetime of the temporary files
>> is controlled to extend beyond the mapper-like task lifetime but still be
>> cleaned up on AM exit, and how the reducer-like tasks are informed of which
>> nodes have data.
>>
>> John
>>
>>
>> -----Original Message-----
>> From: Harsh J [mailto:[email protected]]
>> Sent: Friday, August 23, 2013 11:00 AM
>> To: <[email protected]>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> The general practice is to install your deps into a custom location such as
>> /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also
>> configuring the classes under the aux-services list. You need to take care
>> of deploying jar versions to /opt/john-jars/ contents across the cluster
>> though.
>>
>> I think it may be a neat idea to have jars be placed on HDFS or any other
>> DFS, and the yarn-site.xml indicating the location plus class to load.
>> Similar to HBase co-processors. But I'll defer to Vinod on if this would be
>> a good thing to do.
>>
>> (I know the right next thing with such an ability people will ask for
>> is hot-code-upgrades...)
>>
>> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <[email protected]>
>> wrote:
>>> Are there recommended conventions for adding additional code to a
>>> stock Hadoop install?
>>>
>>> It would be nice if we could piggyback on whatever mechanisms are
>>> used to distribute hadoop itself around the cluster.
>>>
>>> john
>>>
>>> From: Vinod Kumar Vavilapalli [mailto:[email protected]]
>>> Sent: Thursday, August 22, 2013 6:25 PM
>>>
>>> To: [email protected]
>>> Subject: Re: yarn-site.xml and aux-services
>>> Auxiliary services are essentially administer-configured services.
>>> So, they have to be set up at install time - before NM is started.
>>> +Vinod
>>>
>>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley
>>> <[email protected]>
>>> wrote:
>>>
>>> Following up on this, how exactly does one *install* the jar(s) for
>>> auxiliary service? Can it be shipped out with the LocalResources of an AM?
>>> MapReduce's aux-service is presumably installed with Hadoop and is
>>> just sitting there in the right place, but if one wanted to make a
>>> whole new aux-service that belonged with an AM, how would one do it?
>>> John
>>>
>>>
>>> -----Original Message-----
>>> From: John Lilley [mailto:[email protected]]
>>> Sent: Wednesday, June 05, 2013 11:41 AM
>>> To: [email protected]
>>> Subject: RE: yarn-site.xml and aux-services
>>>
>>> Wow, thanks. Is this documented anywhere other than the code? I
>>> hate to waste y'alls time on things that can be RTFMed.
>>> John
>>>
>>> -----Original Message-----
>>> From: Harsh J [mailto:[email protected]]
>>> Sent: Wednesday, June 05, 2013 9:35 AM
>>> To: <[email protected]>
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>> John,
>>>
>>> The format is ID and sub-config based:
>>>
>>> First, you define an ID as a service, like the string "foo". This is
>>> the ID the applications may lookup in their container responses map
>>> we discussed over another thread (around shuffle handler).
>>>
>>> <property>
>>> <name>yarn.nodemanager.aux-services</name>
>>> <value>foo</value>
>>> </property>
>>>
>>> Then you define an actual implementation class for that ID "foo", like so:
>>>
>>> <property>
>>> <name>yarn.nodemanager.aux-services.foo.class</name>
>>> <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>>
>>> If you have multiple services foo and bar, then it would appear like
>>> the below (comma separated IDs and individual configs):
>>>
>>> <property>
>>> <name>yarn.nodemanager.aux-services</name>
>>> <value>foo,bar</value>
>>> </property>
>>> <property>
>>> <name>yarn.nodemanager.aux-services.foo.class</name>
>>> <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>> <property>
>>> <name>yarn.nodemanager.aux-services.bar.class</name>
>>> <value>com.mypack.MyAuxServiceClassForBar</value>
>>> </property>
>>>
>>> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley
>>> <[email protected]>
>>> wrote:
>>>> Good, I was hoping that would be the case. But what are the
>>>> mechanics of it? Do I just add another entry? And what exactly is
>>>> "madreduce.shuffle"?
>>>> A scoped class name? Or a key string into some map elsewhere?
>>>>
>>>> e.g. like:
>>>>
>>>> <property>
>>>> <name>yarn.nodemanager.aux-services</name>
>>>> <value>mapreduce.shuffle</value> </property> <property>
>>>> <name>yarn.nodemanager.aux-services</name>
>>>> <value>myauxserviceclassname</value>
>>>> </property>
>>>>
>>>> Concerning auxiliary services -- do they communicate with
>>>> NodeManager via RPC? Is there an interface to implement? How are
>>>> they opened and closed with NodeManager?
>>>>
>>>> Thanks
>>>> John
>>>>
>>>> -----Original Message-----
>>>> From: Harsh J [mailto:[email protected]]
>>>> Sent: Tuesday, June 04, 2013 11:58 PM
>>>> To: <[email protected]>
>>>> Subject: Re: yarn-site.xml and aux-services
>>>>
>>>> Yes, thats what this is for. You can implement, pass in and use
>>>> your own AuxService. It needs to be on the NodeManager CLASSPATH to
>>>> run (and NM has to be restarted to apply).
>>>>
>>>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley
>>>> <[email protected]>
>>>> wrote:
>>>>> I notice the yarn-site.xml
>>>>>
>>>>> <property>
>>>>> <name>yarn.nodemanager.aux-services</name>
>>>>> <value>mapreduce.shuffle</value>
>>>>> <description>shuffle service that needs to be set for Map
>>>>> Reduce to run </description>
>>>>> </property>
>>>>>
>>>>> Is this a general-purpose hook?
>>>>> Can I tell yarn to run *my* per-node service?
>>>>> Is there some other way (within the recommended Hadoop framework)
>>>>> to run a per-node service that exists during the lifetime of the
>>>>> NodeManager?
>>>>> John Lilley
> Ability to configure auxiliary services from HDFS-based JAR files
> -----------------------------------------------------------------
>
> Key: YARN-1151
> URL: https://issues.apache.org/jira/browse/YARN-1151
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: nodemanager
> Affects Versions: 2.1.0-beta
> Reporter: john lilley
> Priority: Minor
> Labels: auxiliary-service, yarn
>
> I would like to install an auxiliary service in Hadoop YARN without actually
> installing files/services on every node in the system. Discussions on the
> user@ list indicate that this is not easily done. The reason we want an
> auxiliary service is that our application has some persistent-data components
> that are not appropriate for HDFS. In fact, they are somewhat analogous to
> the mapper output of MapReduce's shuffle, which is what led me to
> auxiliary-services in the first place. It would be much easier if we could
> just place our service's JARs in HDFS.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira