You can try wrap these steps into oozie java action. java action is executed as map-only job on random node using only one mapper.
1. Dummy dirty 1.1. download from HDFS locally (use java api or sh script) 1.2. run your R app feeding it with downloaded data. Problem: you have to install R on each TT node. You can't know for sure which one would be used for mapper with java action. 2. More complicated Split your workflow on two coordinators 2.1. The first coordiantor prepares data 2.2. runs java action. This action uses ssh to connect to single known node and runs ssh script/pushes event/notifies by url. This script/app should 2.2.1. copy data locally 2.2.2. run R application The second coordiantor waits for a special flag The R app from #2.2.2. should produce flag at the end (if it runs with SUCCESS). If the second coordiantors gets the flag and the defined folder, it should continue to work. hope, it helps. Anyway all these solutions are not "hadoop patterns" that is why they are ugly. 2013/11/27 ZORAIDA HIDALGO SANCHEZ <[email protected]> > Hi all, > > has anybody experimented with R and Oozie? We have a customized ETL > running on HADOOP that is orchestrated by Oozie. Once the data is loaded > into HDFS, we download it and we apply some heuristics locally. We want to > continue using R(we do not contemplate using either Mahout or RMR for now) > but we need to integrate the training step into our workflow(once the main > tunning has been performed, just for new data). > > We would appreciate if someone can share with us her experience. > > Regards, > > Zoraida.- > > ________________________________ > > Este mensaje se dirige exclusivamente a su destinatario. Puede consultar > nuestra política de envío y recepción de correo electrónico en el enlace > situado más abajo. > This message is intended exclusively for its addressee. We only send and > receive email on the basis of the terms set out at: > http://www.tid.es/ES/PAGINAS/disclaimer.aspx >
