Re: Using R in Data Flow

2017-04-12 Thread Dan Halperin
Hi Anant,

The blog post about R-on-Dataflow should work for R-on-Beam -- it just
predates Beam; there is no longer any Dataflow Python that isn't based on
Beam :)

What have you tried?

Thanks,
Dan

On Mon, Apr 10, 2017 at 11:18 PM, Anant Bhandarkar <
anant.bhandar...@impactanalytics.co> wrote:

> Hi,
> We are using Big Query for our querying needs.
> We are also looking to use Dataflow with some of the statistical
> libraries. We are using R libraries to build these statistical models.
>
> We are looking to run our data through the statistical models such as ELM
> , GAM, ARIMA etc. We see that python doesn't have all these libraries which
> we get as Cran packages in R.
>
> We have seen this example where there is a possibility to run R on data
> flow.
>
>
> https://medium.com/google-cloud/cloud-dataflow-can-autoscale
> -r-programs-for-massively-parallel-data-processing-492b57bd732d
> https://github.com/gregmcinnes/incubator-beam/blob/python-
> sdk/sdks/python/apache_beam/examples/complete/wordlength/
> wordlength_R/wordlength_R.py
>
> If we are able to use parallelization provided by Dataflow along with R
> libraries this would be a great for us as a team and also the whole Data
> science community which relies on R Packages.
>
> We would need some help from the Beam to achieve this.
>
> I see that it will be a very good use case for the whole of data science
> community that will enable usage of both Python and R on Beam and Dataflow.
>
> Regards,
> Anant
>
>


Using R in Data Flow

2017-04-11 Thread Anant Bhandarkar
Hi,
We are using Big Query for our querying needs.
We are also looking to use Dataflow with some of the statistical libraries.
We are using R libraries to build these statistical models.

We are looking to run our data through the statistical models such as ELM ,
GAM, ARIMA etc. We see that python doesn't have all these libraries which
we get as Cran packages in R.

We have seen this example where there is a possibility to run R on data
flow.


https://medium.com/google-cloud/cloud-dataflow-can-autoscale-r-programs-for-
massively-parallel-data-processing-492b57bd732d
https://github.com/gregmcinnes/incubator-beam/blob/python-sdk/sdks/python/
apache_beam/examples/complete/wordlength/wordlength_R/wordlength_R.py

If we are able to use parallelization provided by Dataflow along with R
libraries this would be a great for us as a team and also the whole Data
science community which relies on R Packages.

We would need some help from the Beam to achieve this.

I see that it will be a very good use case for the whole of data science
community that will enable usage of both Python and R on Beam and Dataflow.

Regards,
Anant