[Spark Streaming] Help with updateStateByKey()

2015-04-23 Thread allonsy
Hi everybody, I think I could use some help with the /updateStateByKey()/ JAVA method in Spark Streaming. *Context:* I have a /JavaReceiverInputDStreamDataUpdate du/ DStream, where object /DataUpdate/ mainly has 2 fields of interest (in my case), namely du.personId (an Integer

Re: Help with updateStateByKey

2014-12-18 Thread Tathagata Das
Another point to start playing with updateStateByKey is the example StatefulNetworkWordCount. See the streaming examples directory in the Spark repository. TD On Thu, Dec 18, 2014 at 6:07 AM, Pierce Lamb richard.pierce.l...@gmail.com wrote: I am trying to run stateful Spark Streaming

Re: Help with updateStateByKey

2014-12-18 Thread Silvio Fiorito
Hi Pierce, You shouldn’t have to use groupByKey because updateStateByKey will get a Seq of all the values for that key already. I used that for realtime sessionization as well. What I did was key my incoming events, then send them to udpateStateByKey. The updateStateByKey function then

Re: Help with updateStateByKey

2014-12-18 Thread Pierce Lamb
: user@spark.apache.org user@spark.apache.org Subject: Re: Help with updateStateByKey Hi Silvio, This is a great suggestion (I wanted to get rid of groupByKey), I have been trying to implement it this morning, but having some trouble. I would love to see your code for the function that goes

Re: Help with updateStateByKey

2014-12-18 Thread Silvio Fiorito
@spark.apache.org user@spark.apache.org Subject: Re: Help with updateStateByKey Hi Silvio, This is a great suggestion (I wanted to get rid of groupByKey), I have been trying to implement it this morning, but having some trouble. I would love to see your code for the function that goes inside

Help with updateStateByKey

2014-12-17 Thread Pierce Lamb
I am trying to run stateful Spark Streaming computations over (fake) apache web server logs read from Kafka. The goal is to sessionize the web traffic similar to this blog post: http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/