Re: Using GraphX with Spark Streaming?
Arko, It would be useful to know more details on the use case you are trying to solve. As Tobias wrote, Spark Streaming works on DStream, which is a continuous series of RDDs. Do check out performance tuning : https://spark.apache.org/docs/latest/streaming-programming-guide.html#performance-tuning It is important to reduce the processing time of each batch of data. Ideally you would want data processing to keep up with the data ingestion. Thanks, Jayant On Sun, Oct 5, 2014 at 6:45 PM, Tobias Pfeiffer wrote: > Arko, > > On Sat, Oct 4, 2014 at 1:40 AM, Arko Provo Mukherjee < > arkoprovomukher...@gmail.com> wrote: >> >> Apologies if this is a stupid question but I am trying to understand >> why this can or cannot be done. As far as I understand that streaming >> algorithms need to be different from batch algorithms as the streaming >> algorithms are generally incremental. Hence the question whether the >> RDD transformations can be extended to streaming or not. >> > > I don't think that streaming algorithms are "generally incremental" in > Spark Streaming. In fact, data is collected and every N seconds > (minutes/...), the data collected during that interval is batch-processed > as with normal batch operations. In fact, using data previously obtained > from the stream (in previous intervals) is a bit more complicated than > plain batch processing. If the graph you want to create only uses data from > one interval/batch, that should be dead simple. You might want to have a > look at > https://spark.apache.org/docs/latest/streaming-programming-guide.html#discretized-streams-dstreams > > Tobias > >
Re: Using GraphX with Spark Streaming?
Arko, On Sat, Oct 4, 2014 at 1:40 AM, Arko Provo Mukherjee < arkoprovomukher...@gmail.com> wrote: > > Apologies if this is a stupid question but I am trying to understand > why this can or cannot be done. As far as I understand that streaming > algorithms need to be different from batch algorithms as the streaming > algorithms are generally incremental. Hence the question whether the > RDD transformations can be extended to streaming or not. > I don't think that streaming algorithms are "generally incremental" in Spark Streaming. In fact, data is collected and every N seconds (minutes/...), the data collected during that interval is batch-processed as with normal batch operations. In fact, using data previously obtained from the stream (in previous intervals) is a bit more complicated than plain batch processing. If the graph you want to create only uses data from one interval/batch, that should be dead simple. You might want to have a look at https://spark.apache.org/docs/latest/streaming-programming-guide.html#discretized-streams-dstreams Tobias
Using GraphX with Spark Streaming?
Hello Spark Gurus, I am trying to learn Spark. I am specially interested in GraphX. Since Spark can used in streaming context as well, I wanted to know whether it is possible to use the Spark Toolkits like GraphX or MLlib in the streaming context? Apologies if this is a stupid question but I am trying to understand why this can or cannot be done. As far as I understand that streaming algorithms need to be different from batch algorithms as the streaming algorithms are generally incremental. Hence the question whether the RDD transformations can be extended to streaming or not. Thanks much in advance for all the help! Warm regards Arko - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org