Re: Using GraphX with Spark Streaming?

2014-10-06 Thread Jayant Shekhar
Arko,

It would be useful to know more details on the use case you are trying to
solve. As Tobias wrote, Spark Streaming works on DStream, which is a
continuous series of RDDs.

Do check out performance tuning :
https://spark.apache.org/docs/latest/streaming-programming-guide.html#performance-tuning
It is important to reduce the processing time of each batch of data.
Ideally you would want data processing to keep up with the data ingestion.

Thanks,
Jayant


On Sun, Oct 5, 2014 at 6:45 PM, Tobias Pfeiffer  wrote:

> Arko,
>
> On Sat, Oct 4, 2014 at 1:40 AM, Arko Provo Mukherjee <
> arkoprovomukher...@gmail.com> wrote:
>>
>> Apologies if this is a stupid question but I am trying to understand
>> why this can or cannot be done. As far as I understand that streaming
>> algorithms need to be different from batch algorithms as the streaming
>> algorithms are generally incremental. Hence the question whether the
>> RDD transformations can be extended to streaming or not.
>>
>
> I don't think that streaming algorithms are "generally incremental" in
> Spark Streaming. In fact, data is collected and every N seconds
> (minutes/...), the data collected during that interval is batch-processed
> as with normal batch operations. In fact, using data previously obtained
> from the stream (in previous intervals) is a bit more complicated than
> plain batch processing. If the graph you want to create only uses data from
> one interval/batch, that should be dead simple. You might want to have a
> look at
> https://spark.apache.org/docs/latest/streaming-programming-guide.html#discretized-streams-dstreams
>
> Tobias
>
>


Re: Using GraphX with Spark Streaming?

2014-10-05 Thread Tobias Pfeiffer
Arko,

On Sat, Oct 4, 2014 at 1:40 AM, Arko Provo Mukherjee <
arkoprovomukher...@gmail.com> wrote:
>
> Apologies if this is a stupid question but I am trying to understand
> why this can or cannot be done. As far as I understand that streaming
> algorithms need to be different from batch algorithms as the streaming
> algorithms are generally incremental. Hence the question whether the
> RDD transformations can be extended to streaming or not.
>

I don't think that streaming algorithms are "generally incremental" in
Spark Streaming. In fact, data is collected and every N seconds
(minutes/...), the data collected during that interval is batch-processed
as with normal batch operations. In fact, using data previously obtained
from the stream (in previous intervals) is a bit more complicated than
plain batch processing. If the graph you want to create only uses data from
one interval/batch, that should be dead simple. You might want to have a
look at
https://spark.apache.org/docs/latest/streaming-programming-guide.html#discretized-streams-dstreams

Tobias


Using GraphX with Spark Streaming?

2014-10-03 Thread Arko Provo Mukherjee
Hello Spark Gurus,

I am trying to learn Spark. I am specially interested in GraphX.

Since Spark can used in streaming context as well, I wanted to know
whether it is possible to use the Spark Toolkits like GraphX or MLlib
in the streaming context?

Apologies if this is a stupid question but I am trying to understand
why this can or cannot be done. As far as I understand that streaming
algorithms need to be different from batch algorithms as the streaming
algorithms are generally incremental. Hence the question whether the
RDD transformations can be extended to streaming or not.

Thanks much in advance for all the help!
Warm regards
Arko

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org