[ 
https://issues.apache.org/jira/browse/S4-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13484379#comment-13484379
 ] 

Matthieu Morel commented on S4-25:
----------------------------------

I uploaded a patch in branch S4-25 (here 
https://git-wip-us.apache.org/repos/asf?p=incubator-s4.git;a=shortlog;h=refs/heads/S4-25),
 and added some documentation here : 
https://cwiki.apache.org/confluence/display/S4/Deploying+S4+applications+with+YARN

The approach is to preserve S4 deployment model (coordination through 
ZooKeeper, application loading logic in the S4 nodes), and make a projection on 
YARN in order to start S4 nodes.

The patch depends on hadoop-2.0.2-alpha, the latest release.

The patch adds a new subproject, s4-yarn and provides the s4-yarn command to 
deploy S4 applications. You can combine S4 parameters as well as YARN specific 
parameters (num_containers, queue, user etc...)

I also added a regression test that uses MiniYARNCluster and MiniDFSCluster.

Pending issues:
* It's not clear to me how to stop an application. The 
{{YarnClientImpl#killApplication}} method seems to kill the application master, 
but not the processes launched by this application master
* I could not figure how to add yarn test dependencies. That may be a gradle 
issue, or the way hadoop-2.0.2-alpha packages are distributed on maven. Not 
sure. In the meantime, I added them to a local lib/ directory of the S4 
distribution

Arun: because we used a released version of Yarn, we used the raw API, not 
YARN-103
                
> Write S4 Application Master to deploy S4 in Yarn
> ------------------------------------------------
>
>                 Key: S4-25
>                 URL: https://issues.apache.org/jira/browse/S4-25
>             Project: Apache S4
>          Issue Type: New Feature
>            Reporter: J Mohamed Zahoor
>             Fix For: 0.6
>
>         Attachments: S4-ApplicationMaster.diff, S4-Client.diff, 
> S4-Constants.diff, S4-YARN-1.patch
>
>
> On the lines of s4PigWrapper, write a s4 application master to host s4 piper 
> inside Hadoop Yarn. This could be useful not only for reading data stored in 
> hadoop ( to build or train a model)... But we could make use of the resource 
> manager to deploy s4 instances in remote machine and monitor them. In short, 
> we could make use of most of the resource management , scheduling and other 
> good stuff in Yarn.
> - Yarn is useful to deploy and launch s4 instances.
> - It still requires deploying node managers on each box which means it will
> be useful if one is running more than one s4 process on a node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to