Hitesh,

Yes that is it exactly.  We want to implement distributed algorithms where some 
data should persist in a scope beyond "task".  We could write it to HDFS, but 
that is a high overhead for non-persistent data; at least that is what I am 
told on this forum.

Is it possible/desirable to get the "mapreduce shuffle service" to serve up 
data files for me, or is it bound too tightly to MR?

john

-----Original Message-----
From: Hitesh Shah [mailto:[email protected]] 
Sent: Friday, May 24, 2013 3:35 PM
To: [email protected]
Subject: Re: Custom ApplicationMaster development

Hi John,

Yes - you probably could. 

I don't  know of anyone that has written any other auxiliary service till date 
so if you come across anything lacking in the handling/support of aux services, 
please do file feature-request/bug jiras.

For the application that you mentioned, I am assuming you are looking to build 
some form of a data 'caching' service that can store a job's output to be used 
by subsequent jobs? 

-- Hitesh

On May 24, 2013, at 1:33 PM, John Lilley wrote:

> Hitesh,
> 
> Regarding your comments:
>  - the files are served by an auxiliary service ( mapreduce shuffle service ) 
> running within the NodeManager. 
>  - The NM needs to be configured to tell it which aux services to start up.
> 
> Does this mean that I could in theory write an auxiliary service, perhaps 
> modeled after the mapreduce shuffle service, to handle such node-level tasks 
> as serving up files?  What I am trying to understand is whether my 
> application can perform similar actions to MapReduce.  I am not trying to 
> replace MapReduce, however the ability to perform equivalent operations would 
> be very useful to our application.  For example, there are transitive closure 
> algorithms that can be written by iterative MapReduce jobs, but which can 
> potentially be much more efficient if they are able to avoid landing 
> intermediate results on HDFS.
> 
> Thanks
> John
> 
> 
> -----Original Message-----
> From: Hitesh Shah [mailto:[email protected]]
> Sent: Thursday, May 23, 2013 5:10 PM
> To: [email protected]
> Subject: Re: Custom ApplicationMaster development
> 
> Hello John
> 
> To add to Chris' email:
> 
> Do take a look at 
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html
>   - this is probably a bit of date. 
>   - the actual source code of distributed-shell in the source tree would be 
> the best guideline to follow after taking a brief look at the link above.
> 
> Compatibility
>  - 0.23 and 2.0 are similar to a large extent but there are differences - not 
> sure if it is possible to code for compatibility.
>  - To get apis into a relatively stable state, a lot of changes have 
> gone in since 2.0.4 was released
> 
> Task output files
>  - the files are served by an auxiliary service ( mapreduce shuffle service ) 
> running within the NodeManager. 
>  - The NM needs to be configured to tell it which aux services to start up.
>  - The protocols support some level of information passing via the service 
> data constructs. 
>  - the service is notified when an application completes such that it 
> can be used to delete data if needed
> 
> -- Hitesh
> 
> 
> On May 23, 2013, at 3:45 PM, John Lilley wrote:
> 
>> I am getting started with development of a custom ApplicationMaster and I 
>> didn't think that the user@ list was quite the right place for it.  
>> Apologies if this list isn't the right place either.  Some of my questions 
>> are really newbie, like:
>> 
>> *         Is there an FAQ for non-MR YARN development?
>> 
>> *         Is there an FAQ for configuring/building/running Hadoop from 
>> source, preferably in Eclipse?
>> 
>> *         What is the recommended configuration/environment for development 
>> of a YARN app?  I would like to use Eclipse under Windows if that even makes 
>> any sense.
>> 
>> *         Would you start with a Hadoop release or build from version 
>> control?
>> 
>> *         Is it possible to code for compatibility between 2.0 and 0.23?
>> 
>> *         Is there an ApplicationMaster example that can be used as a 
>> starting point?
>> I also have some more in-depth questions:
>> 
>> *         When a MapReduce task creates its output files and makes them 
>> available over HTTP, is it the NodeManager that serves them up?  If my YARN 
>> task wants to do something similar, how does it tell the NodeManager?  How 
>> are the files removed later?
>> 
>> *         Is it possible to install objects or services that run as peers of 
>> the NodeManager as opposed to tasks?  Are there any recommended per-node 
>> patterns as opposed to per-task patterns?
>> 
>> Thanks
>> John
>> 
> 

Reply via email to