[
https://issues.apache.org/jira/browse/S4-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13484379#comment-13484379
]
Matthieu Morel commented on S4-25:
----------------------------------
I uploaded a patch in branch S4-25 (here
https://git-wip-us.apache.org/repos/asf?p=incubator-s4.git;a=shortlog;h=refs/heads/S4-25),
and added some documentation here :
https://cwiki.apache.org/confluence/display/S4/Deploying+S4+applications+with+YARN
The approach is to preserve S4 deployment model (coordination through
ZooKeeper, application loading logic in the S4 nodes), and make a projection on
YARN in order to start S4 nodes.
The patch depends on hadoop-2.0.2-alpha, the latest release.
The patch adds a new subproject, s4-yarn and provides the s4-yarn command to
deploy S4 applications. You can combine S4 parameters as well as YARN specific
parameters (num_containers, queue, user etc...)
I also added a regression test that uses MiniYARNCluster and MiniDFSCluster.
Pending issues:
* It's not clear to me how to stop an application. The
{{YarnClientImpl#killApplication}} method seems to kill the application master,
but not the processes launched by this application master
* I could not figure how to add yarn test dependencies. That may be a gradle
issue, or the way hadoop-2.0.2-alpha packages are distributed on maven. Not
sure. In the meantime, I added them to a local lib/ directory of the S4
distribution
Arun: because we used a released version of Yarn, we used the raw API, not
YARN-103
> Write S4 Application Master to deploy S4 in Yarn
> ------------------------------------------------
>
> Key: S4-25
> URL: https://issues.apache.org/jira/browse/S4-25
> Project: Apache S4
> Issue Type: New Feature
> Reporter: J Mohamed Zahoor
> Fix For: 0.6
>
> Attachments: S4-ApplicationMaster.diff, S4-Client.diff,
> S4-Constants.diff, S4-YARN-1.patch
>
>
> On the lines of s4PigWrapper, write a s4 application master to host s4 piper
> inside Hadoop Yarn. This could be useful not only for reading data stored in
> hadoop ( to build or train a model)... But we could make use of the resource
> manager to deploy s4 instances in remote machine and monitor them. In short,
> we could make use of most of the resource management , scheduling and other
> good stuff in Yarn.
> - Yarn is useful to deploy and launch s4 instances.
> - It still requires deploying node managers on each box which means it will
> be useful if one is running more than one s4 process on a node.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira