I agree. This issue should be fixed in Spark rather rely on replay of Kafka messages.
Dib On Aug 28, 2014 6:45 AM, "RodrigoB" <rodrigo.boav...@aspect.com> wrote: > Dibyendu, > > Tnks for getting back. > > I believe you are absolutely right. We were under the assumption that the > raw data was being computed again and that's not happening after further > tests. This applies to Kafka as well. > > The issue is of major priority fortunately. > > Regarding your suggestion, I would maybe prefer to have the problem > resolved > within Spark's internals since once the data is replicated we should be > able > to access it once more and not having to pool it back again from Kafka or > any other stream that is being affected by this issue. If for example there > is a big amount of batches to be recomputed I would rather have them done > distributed than overloading the batch interval with huge amount of Kafka > messages. > > I do not have yet enough know how on where is the issue and about the > internal Spark code so I can't really how much difficult will be the > implementation. > > tnks, > Rod > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Low-Level-Kafka-Consumer-for-Spark-tp11258p12966.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >