I dont think you can expect any order guarantee except the records in one partition. On Jul 4, 2015 7:43 AM, "khaledh" <khal...@gmail.com> wrote:
> I'm writing a Spark Streaming application that uses RabbitMQ to consume > events. One feature of RabbitMQ that I intend to make use of is bulk ack of > messages, i.e. no need to ack one-by-one, but only ack the last event in a > batch and that would ack the entire batch. > > Before I commit to doing so, I'd like to know if Spark Streaming always > processes RDDs in the same order they arrive in, i.e. if RDD1 arrives > before > RDD2, is it true that RDD2 will never be scheduled/processed before RDD1 is > finished? > > This is crucial to the ack logic, since if RDD2 can be potentially > processed > while RDD1 is still being processed, then if I ack the the last event in > RDD2 that would also ack all events in RDD1, even though they may have not > been completely processed yet. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Are-Spark-Streaming-RDDs-always-processed-in-order-tp23616.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >