What do you need in sparkR that mllib / ml don't have....most of the basic analysis that you need on stream can be done through mllib components... On Jul 13, 2015 2:35 PM, "Feynman Liang" <fli...@databricks.com> wrote:
> Sorry; I think I may have used poor wording. SparkR will let you use R to > analyze the data, but it has to be loaded into memory using SparkR (see SparkR > DataSources > <http://people.apache.org/~pwendell/spark-releases/latest/sparkr.html>). > You will still have to write a Java receiver to store the data into some > tabular datastore (e.g. Hive) before loading them as SparkR DataFrames and > performing the analysis. > > R specific questions such as windowing in R should go to R-help@; you > won't be able to use window since that is a Spark Streaming method. > > On Mon, Jul 13, 2015 at 2:23 PM, Oded Maimon <o...@scene53.com> wrote: > >> You are helping me understanding stuff here a lot. >> >> I believe I have 3 last questions.. >> >> If is use java receiver to get the data, how should I save it in memory? >> Using store command or other command? >> >> Once stored, how R can read that data? >> >> Can I use window command in R? I guess not because it is a streaming >> command, right? Any other way to window the data? >> >> Sent from IPhone >> >> >> >> >> On Mon, Jul 13, 2015 at 2:07 PM -0700, "Feynman Liang" < >> fli...@databricks.com> wrote: >> >> If you use SparkR then you can analyze the data that's currently in >>> memory with R; otherwise you will have to write to disk (eg HDFS). >>> >>> On Mon, Jul 13, 2015 at 1:45 PM, Oded Maimon <o...@scene53.com> wrote: >>> >>>> Thanks again. >>>> What I'm missing is where can I store the data? Can I store it in spark >>>> memory and then use R to analyze it? Or should I use hdfs? Any other places >>>> that I can save the data? >>>> >>>> What would you suggest? >>>> >>>> Thanks... >>>> >>>> Sent from IPhone >>>> >>>> >>>> >>>> >>>> On Mon, Jul 13, 2015 at 1:41 PM -0700, "Feynman Liang" < >>>> fli...@databricks.com> wrote: >>>> >>>> If you don't require true streaming processing and need to use R for >>>>> analysis, SparkR on a custom data source seems to fit your use case. >>>>> >>>>> On Mon, Jul 13, 2015 at 1:06 PM, Oded Maimon <o...@scene53.com> wrote: >>>>> >>>>>> Hi, thanks for replying! >>>>>> I want to do the entire process in stages. Get the data using Java or >>>>>> scala because they are the only Langs that supports custom receivers, >>>>>> keep >>>>>> the data <somewhere>, use R to analyze it, keep the results <somewhere>, >>>>>> output the data to different systems. >>>>>> >>>>>> I thought that <somewhere> can be spark memory using rdd or >>>>>> dstreams.. But could it be that I need to keep it in hdfs to make the >>>>>> entire process in stages? >>>>>> >>>>>> Sent from IPhone >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Jul 13, 2015 at 12:07 PM -0700, "Feynman Liang" < >>>>>> fli...@databricks.com> wrote: >>>>>> >>>>>> Hi Oded, >>>>>>> >>>>>>> I'm not sure I completely understand your question, but it sounds >>>>>>> like you could have the READER receiver produce a DStream which is >>>>>>> windowed/processed in Spark Streaming and forEachRDD to do the OUTPUT. >>>>>>> However, streaming in SparkR is not currently supported (SPARK-6803 >>>>>>> <https://issues.apache.org/jira/browse/SPARK-6803>) so I'm not too >>>>>>> sure how ANALYZER would fit in. >>>>>>> >>>>>>> Feynman >>>>>>> >>>>>>> On Sun, Jul 12, 2015 at 11:23 PM, Oded Maimon <o...@scene53.com> >>>>>>> wrote: >>>>>>> >>>>>>>> any help / idea will be appreciated :) >>>>>>>> thanks >>>>>>>> >>>>>>>> >>>>>>>> Regards, >>>>>>>> Oded Maimon >>>>>>>> Scene53. >>>>>>>> >>>>>>>> On Sun, Jul 12, 2015 at 4:49 PM, Oded Maimon <o...@scene53.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi All, >>>>>>>>> we are evaluating spark for real-time analytic. what we are trying >>>>>>>>> to do is the following: >>>>>>>>> >>>>>>>>> - READER APP- use custom receiver to get data from rabbitmq >>>>>>>>> (written in scala) >>>>>>>>> - ANALYZER APP - use spark R application to read the data >>>>>>>>> (windowed), analyze it every minute and save the results inside >>>>>>>>> spark >>>>>>>>> - OUTPUT APP - user spark application (scala/java/python) to >>>>>>>>> read the results from R every X minutes and send the data to few >>>>>>>>> external >>>>>>>>> systems >>>>>>>>> >>>>>>>>> basically at the end i would like to have the READER COMPONENT as >>>>>>>>> an app that always consumes the data and keeps it in spark, >>>>>>>>> have as many ANALYZER COMPONENTS as my data scientists wants, and >>>>>>>>> have one OUTPUT APP that will read the ANALYZER results and send it >>>>>>>>> to any >>>>>>>>> relevant system. >>>>>>>>> >>>>>>>>> what is the right way to do it? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Oded. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> *This email and any files transmitted with it are confidential and >>>>>>>> intended solely for the use of the individual or entity to whom they >>>>>>>> are >>>>>>>> addressed. Please note that any disclosure, copying or distribution of >>>>>>>> the >>>>>>>> content of this information is strictly forbidden. If you have received >>>>>>>> this email message in error, please destroy it immediately and notify >>>>>>>> its >>>>>>>> sender.* >>>>>>>> >>>>>>> >>>>>>> >>>>>> *This email and any files transmitted with it are confidential and >>>>>> intended solely for the use of the individual or entity to whom they are >>>>>> addressed. Please note that any disclosure, copying or distribution of >>>>>> the >>>>>> content of this information is strictly forbidden. If you have received >>>>>> this email message in error, please destroy it immediately and notify its >>>>>> sender.* >>>>>> >>>>> >>>>> >>>> *This email and any files transmitted with it are confidential and >>>> intended solely for the use of the individual or entity to whom they are >>>> addressed. Please note that any disclosure, copying or distribution of the >>>> content of this information is strictly forbidden. If you have received >>>> this email message in error, please destroy it immediately and notify its >>>> sender.* >>>> >>> >>> >> *This email and any files transmitted with it are confidential and >> intended solely for the use of the individual or entity to whom they are >> addressed. Please note that any disclosure, copying or distribution of the >> content of this information is strictly forbidden. If you have received >> this email message in error, please destroy it immediately and notify its >> sender.* >> > >