Hi, We are using Big Query for our querying needs. We are also looking to use Dataflow with some of the statistical libraries. We are using R libraries to build these statistical models.
We are looking to run our data through the statistical models such as ELM , GAM, ARIMA etc. We see that python doesn't have all these libraries which we get as Cran packages in R. We have seen this example where there is a possibility to run R on data flow. https://medium.com/google-cloud/cloud-dataflow-can-autoscale-r-programs-for- massively-parallel-data-processing-492b57bd732d https://github.com/gregmcinnes/incubator-beam/blob/python-sdk/sdks/python/ apache_beam/examples/complete/wordlength/wordlength_R/wordlength_R.py If we are able to use parallelization provided by Dataflow along with R libraries this would be a great for us as a team and also the whole Data science community which relies on R Packages. We would need some help from the Beam to achieve this. I see that it will be a very good use case for the whole of data science community that will enable usage of both Python and R on Beam and Dataflow. Regards, Anant
