What do you need in sparkR that mllib / ml don't  have....most of the basic
analysis that you need on stream can be done through mllib components...
On Jul 13, 2015 2:35 PM, "Feynman Liang" <fli...@databricks.com> wrote:

> Sorry; I think I may have used poor wording. SparkR will let you use R to
> analyze the data, but it has to be loaded into memory using SparkR (see SparkR
> DataSources
> <http://people.apache.org/~pwendell/spark-releases/latest/sparkr.html>).
> You will still have to write a Java receiver to store the data into some
> tabular datastore (e.g. Hive) before loading them as SparkR DataFrames and
> performing the analysis.
>
> R specific questions such as windowing in R should go to R-help@; you
> won't be able to use window since that is a Spark Streaming method.
>
> On Mon, Jul 13, 2015 at 2:23 PM, Oded Maimon <o...@scene53.com> wrote:
>
>> You are helping me understanding stuff here a lot.
>>
>> I believe I have 3 last questions..
>>
>> If is use java receiver to get the data, how should I save it in memory?
>> Using store command or other command?
>>
>> Once stored, how R can read that data?
>>
>> Can I use window command in R? I guess not because it is a streaming
>> command, right? Any other way to window the data?
>>
>> Sent from IPhone
>>
>>
>>
>>
>> On Mon, Jul 13, 2015 at 2:07 PM -0700, "Feynman Liang" <
>> fli...@databricks.com> wrote:
>>
>>  If you use SparkR then you can analyze the data that's currently in
>>> memory with R; otherwise you will have to write to disk (eg HDFS).
>>>
>>> On Mon, Jul 13, 2015 at 1:45 PM, Oded Maimon <o...@scene53.com> wrote:
>>>
>>>> Thanks again.
>>>> What I'm missing is where can I store the data? Can I store it in spark
>>>> memory and then use R to analyze it? Or should I use hdfs? Any other places
>>>> that I can save the data?
>>>>
>>>> What would you suggest?
>>>>
>>>> Thanks...
>>>>
>>>> Sent from IPhone
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Jul 13, 2015 at 1:41 PM -0700, "Feynman Liang" <
>>>> fli...@databricks.com> wrote:
>>>>
>>>>  If you don't require true streaming processing and need to use R for
>>>>> analysis, SparkR on a custom data source seems to fit your use case.
>>>>>
>>>>> On Mon, Jul 13, 2015 at 1:06 PM, Oded Maimon <o...@scene53.com> wrote:
>>>>>
>>>>>> Hi, thanks for replying!
>>>>>> I want to do the entire process in stages. Get the data using Java or
>>>>>> scala because they are the only Langs that supports custom receivers, 
>>>>>> keep
>>>>>> the data <somewhere>, use R to analyze it, keep the results <somewhere>,
>>>>>> output the data to different systems.
>>>>>>
>>>>>> I thought that <somewhere> can be spark memory using rdd or
>>>>>> dstreams.. But could it be that I need to keep it in hdfs to make the
>>>>>> entire process in stages?
>>>>>>
>>>>>> Sent from IPhone
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 13, 2015 at 12:07 PM -0700, "Feynman Liang" <
>>>>>> fli...@databricks.com> wrote:
>>>>>>
>>>>>>  Hi Oded,
>>>>>>>
>>>>>>> I'm not sure I completely understand your question, but it sounds
>>>>>>> like you could have the READER receiver produce a DStream which is
>>>>>>> windowed/processed in Spark Streaming and forEachRDD to do the OUTPUT.
>>>>>>> However, streaming in SparkR is not currently supported (SPARK-6803
>>>>>>> <https://issues.apache.org/jira/browse/SPARK-6803>) so I'm not too
>>>>>>> sure how ANALYZER would fit in.
>>>>>>>
>>>>>>> Feynman
>>>>>>>
>>>>>>> On Sun, Jul 12, 2015 at 11:23 PM, Oded Maimon <o...@scene53.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> any help / idea will be appreciated :)
>>>>>>>> thanks
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Oded Maimon
>>>>>>>> Scene53.
>>>>>>>>
>>>>>>>> On Sun, Jul 12, 2015 at 4:49 PM, Oded Maimon <o...@scene53.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi All,
>>>>>>>>> we are evaluating spark for real-time analytic. what we are trying
>>>>>>>>> to do is the following:
>>>>>>>>>
>>>>>>>>>    - READER APP- use custom receiver to get data from rabbitmq
>>>>>>>>>    (written in scala)
>>>>>>>>>    - ANALYZER APP - use spark R application to read the data
>>>>>>>>>    (windowed), analyze it every minute and save the results inside 
>>>>>>>>> spark
>>>>>>>>>    - OUTPUT APP - user spark application (scala/java/python) to
>>>>>>>>>    read the results from R every X minutes and send the data to few 
>>>>>>>>> external
>>>>>>>>>    systems
>>>>>>>>>
>>>>>>>>> basically at the end i would like to have the READER COMPONENT as
>>>>>>>>> an app that always consumes the data and keeps it in spark,
>>>>>>>>> have as many ANALYZER COMPONENTS as my data scientists wants, and
>>>>>>>>> have one OUTPUT APP that will read the ANALYZER results and send it 
>>>>>>>>> to any
>>>>>>>>> relevant system.
>>>>>>>>>
>>>>>>>>> what is the right way to do it?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Oded.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> *This email and any files transmitted with it are confidential and
>>>>>>>> intended solely for the use of the individual or entity to whom they 
>>>>>>>> are
>>>>>>>> addressed. Please note that any disclosure, copying or distribution of 
>>>>>>>> the
>>>>>>>> content of this information is strictly forbidden. If you have received
>>>>>>>> this email message in error, please destroy it immediately and notify 
>>>>>>>> its
>>>>>>>> sender.*
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> *This email and any files transmitted with it are confidential and
>>>>>> intended solely for the use of the individual or entity to whom they are
>>>>>> addressed. Please note that any disclosure, copying or distribution of 
>>>>>> the
>>>>>> content of this information is strictly forbidden. If you have received
>>>>>> this email message in error, please destroy it immediately and notify its
>>>>>> sender.*
>>>>>>
>>>>>
>>>>>
>>>> *This email and any files transmitted with it are confidential and
>>>> intended solely for the use of the individual or entity to whom they are
>>>> addressed. Please note that any disclosure, copying or distribution of the
>>>> content of this information is strictly forbidden. If you have received
>>>> this email message in error, please destroy it immediately and notify its
>>>> sender.*
>>>>
>>>
>>>
>> *This email and any files transmitted with it are confidential and
>> intended solely for the use of the individual or entity to whom they are
>> addressed. Please note that any disclosure, copying or distribution of the
>> content of this information is strictly forbidden. If you have received
>> this email message in error, please destroy it immediately and notify its
>> sender.*
>>
>
>

Reply via email to