Hi Anant, The blog post about R-on-Dataflow should work for R-on-Beam -- it just predates Beam; there is no longer any Dataflow Python that isn't based on Beam :)
What have you tried? Thanks, Dan On Mon, Apr 10, 2017 at 11:18 PM, Anant Bhandarkar < [email protected]> wrote: > Hi, > We are using Big Query for our querying needs. > We are also looking to use Dataflow with some of the statistical > libraries. We are using R libraries to build these statistical models. > > We are looking to run our data through the statistical models such as ELM > , GAM, ARIMA etc. We see that python doesn't have all these libraries which > we get as Cran packages in R. > > We have seen this example where there is a possibility to run R on data > flow. > > > https://medium.com/google-cloud/cloud-dataflow-can-autoscale > -r-programs-for-massively-parallel-data-processing-492b57bd732d > https://github.com/gregmcinnes/incubator-beam/blob/python- > sdk/sdks/python/apache_beam/examples/complete/wordlength/ > wordlength_R/wordlength_R.py > > If we are able to use parallelization provided by Dataflow along with R > libraries this would be a great for us as a team and also the whole Data > science community which relies on R Packages. > > We would need some help from the Beam to achieve this. > > I see that it will be a very good use case for the whole of data science > community that will enable usage of both Python and R on Beam and Dataflow. > > Regards, > Anant > >
