Hi Ananth, Please see my answers in-line
Regards, Sandeep On Thu, Oct 27, 2016 at 11:52 PM, ananth <ananthg.a...@gmail.com> wrote: > Hello all, > > I want to use Apex for executing R scripts wherein the parameters for the > script are coming in as tuples. In this regard, I have a few questions: > > * I am presuming that the R dependencies are to be installed on all of > the hadoop nodes and the R script is to be put in the classpath ? > The R script will be referring to a few R libraries as part of its code. > [Sandeep] Yes, all R dependencies including R libraries should be installed on all Hadoop nodes and R script should be in the classpath. > * Is it fair to say that that the YARN container allocation does not > work exactly as the scriptoperator ( named as Rscript in malhar) > uses the REngine which is present locally as a binary ? Especially > if the R script itself uses parallelism in terms of its code etc. I > am asking this to plan out the resources required for such an > implementation. > [Sandeep] I might be wrong here but, I think, Rscript would be run inside the YARN container. > * Is there a good documentation / pointer for best practices to be > followed when developing applications which use the ScriptOperator > equivalent constructs wherein there are external code constructs > that might be executed ? > [Sandeep] As far as I know there isn't any documentation as of now. > Regards, > > Ananth > >