subject:"RE\: how to setup steady state stream partitions"

RE: how to setup steady state stream partitions

2014-09-10 Thread Anton Brazhnyk

Just a guess.
updateStateByKey has overloaded variants with partitioner as parameter. Can it 
help?

-Original Message-
From: qihong [mailto:qc...@pivotal.io] 
Sent: Tuesday, September 09, 2014 9:13 PM
To: u...@spark.incubator.apache.org
Subject: Re: how to setup steady state stream partitions

Thanks for your response. I do have something like:

val inputDStream = ...
val keyedDStream = inputDStream.map(...)  // use sensorId as key val 
partitionedDStream = keyedDstream.transform(rdd = rdd.partitionBy(new
MyPartitioner(...)))
val stateDStream = partitionedDStream.updateStateByKey[...](udpateFunction)

The partitionedDStream does have steady partitions, but stateDStream does not 
have steady partitions, i.e., in the partition 0 of partitionedDStream, there's 
only data for sensors 0 to 999, but the partition 0 of stateDStream contains 
data for some sensors from 0 to 999 range, and lot of sensor from other 
partitions of partitionedDStream. 

I wish the partition 0 of stateDStream only contains the data from the 
partition 0 of partitionedDStream, partiton 1 of stateDStream only from 
partition 1 of partitionedDStream, and so on. Anyone knows how to implement 
that?

Thanks!

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-setup-steady-state-stream-partitions-tp13850p13853.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
commands, e-mail: user-h...@spark.apache.org

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

RE: how to setup steady state stream partitions

2014-09-10 Thread qihong

Thanks for your response! I found that too, and it does the trick! Here's
refined code:

val inputDStream = ... 
val keyedDStream = inputDStream.map(...)  // use sensorId as key 
val partitionedDStream = keyedDstream.transform(rdd = rdd.partitionBy(new
MyPartitioner(...))) 
val stateDStream = partitionedDStream.updateStateByKey[...](udpateFunction,
new MyPartitioner(...)) 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-setup-steady-state-stream-partitions-tp13850p13931.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

RE: how to setup steady state stream partitions

RE: how to setup steady state stream partitions

2 matches

Site Navigation

Mail list logo

Footer information