This is a *fantastic* question.  The idea of how we identify individual
things in multiple  DStreams is worth looking at.

The reason being, that you can then fine tune your streaming job, based on
the RDD identifiers (i.e. are the timestamps from the producer correlating
closely to the order in which RDD elements are being produced) ?  If *NO*
then you need to (1) dial up throughput on producer sources or else (2)
increase cluster size so that spark is capable of evenly handling load.

You cant decide to do (1) or (2) unless you can track  when the streaming
elements are being  converted to RDDs by spark itself.



On Wed, Feb 18, 2015 at 6:54 PM, Neelesh <neele...@gmail.com> wrote:

> There does not seem to be a definitive answer on this. Every time I google
> for message ordering,the only relevant thing that comes up is this  -
> http://samza.apache.org/learn/documentation/0.8/comparisons/spark-streaming.html
> .
>
> With a kafka receiver that pulls data from a single kafka partition of a
> kafka topic, are individual messages in the microbatch in same the order as
> kafka partition? Are successive microbatches originating from a kafka
> partition executed in order?
>
>
> Thanks!
>
>



-- 
jay vyas

Reply via email to