Hi everybody,
I think I could use some help with the /updateStateByKey()/ JAVA method in
Spark Streaming.
*Context:*
I have a /JavaReceiverInputDStreamDataUpdate du/ DStream, where object
/DataUpdate/ mainly has 2 fields of interest (in my case), namely
du.personId (an Integer
Another point to start playing with updateStateByKey is the example
StatefulNetworkWordCount. See the streaming examples directory in the
Spark repository.
TD
On Thu, Dec 18, 2014 at 6:07 AM, Pierce Lamb
richard.pierce.l...@gmail.com wrote:
I am trying to run stateful Spark Streaming
Hi Pierce,
You shouldn’t have to use groupByKey because updateStateByKey will get a
Seq of all the values for that key already.
I used that for realtime sessionization as well. What I did was key my
incoming events, then send them to udpateStateByKey. The updateStateByKey
function then
: user@spark.apache.org user@spark.apache.org
Subject: Re: Help with updateStateByKey
Hi Silvio,
This is a great suggestion (I wanted to get rid of groupByKey), I have been
trying to implement it this morning, but having some trouble. I would love
to see your code for the function that goes
@spark.apache.org user@spark.apache.org
Subject: Re: Help with updateStateByKey
Hi Silvio,
This is a great suggestion (I wanted to get rid of groupByKey), I have
been
trying to implement it this morning, but having some trouble. I would
love
to see your code for the function that goes inside
I am trying to run stateful Spark Streaming computations over (fake)
apache web server logs read from Kafka. The goal is to sessionize
the web traffic similar to this blog post:
http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/