tream(new
>>>>> KafkaReceiver(_props, i)).
>>>>>
>>>>> I have found, in your codes, all the messages are retrieved correctly,
>>>>> but
>>>>> _receiver.store(_dataBuffer.iterator()) which is spark streaming
>
eved correctly,
>>>> but
>>>> _receiver.store(_dataBuffer.iterator()) which is spark streaming
>>>> abstract
>>>> class's method does not seem to work correctly.
>>>>
>>>> Have you tried running your spark str
i)).
>>>
>>> I have found, in your codes, all the messages are retrieved correctly,
>>> but
>>> _receiver.store(_dataBuffer.iterator()) which is spark streaming
>>> abstract
>>> class's method does not seem to work correctly.
>&g
class's method does not seem to work correctly.
>>
>> Have you tried running your spark streaming kafka consumer with kafka
>> 0.8.1.1 and spark 1.2.0 ?
>>
>> - Kidong.
>>
>>
>>
>>
>>
>>
>> --
>> View this mes
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Low-Level-Kafka-Consumer-for-Spark-tp11258p21180.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>
01560.n3.nabble.com/Low-Level-Kafka-Consumer-for-Spark-tp11258p21180.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional comman
a Consumer for Spark
>>
>> Dibyendu,
>>
>> Just to make sure I will not be misunderstood - My concerns are referring
>> to the Spark upcoming solution and not yours. I would to gather the
>> perspective of someone which implemented recovery with Kafka a different
>> way.
&
re, to improve the reliable Kafka receiver like what you mentioned is
> on our scheduler.
>
> Thanks
> Jerry
>
>
> -Original Message-
> From: RodrigoB [mailto:rodrigo.boav...@aspect.com]
> Sent: Wednesday, December 3, 2014 5:44 AM
> To: u...@spark.incubator.apache.org
eiver like what you mentioned is on our scheduler.
Thanks
Jerry
-Original Message-
From: RodrigoB [mailto:rodrigo.boav...@aspect.com]
Sent: Wednesday, December 3, 2014 5:44 AM
To: u...@spark.incubator.apache.org
Subject: Re: Low Level Kafka Consumer for Spark
Dibyendu,
Just to make sure I
-user-list.1001560.n3.nabble.com/Low-Level-Kafka-Consumer-for-Spark-tp11258p20196.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For
ments will be
greatly appreciated.Tnks,Rod
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Low-Level-Kafka-Consumer-for-Spark-tp11258p20181.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
artitioning the messages across
> >> different RDDs.
> >>
> >> Does your Receiver guarantee this behavior, until the problem is fixed
> in
> >> Spark 1.2?
> >>
> >> Regards,
> >> Alon
> >>
> >>
> >>
>
ages are assigned to a
>> single RDD, instead of arbitrarily repartitioning the messages across
>> different RDDs.
>>
>> Does your Receiver guarantee this behavior, until the problem is fixed in
>> Spark 1.2?
>>
>> Regards,
>> Alon
>>
>>
d in
> Spark 1.2?
>
> Regards,
> Alon
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Low-Level-Kafka-Consumer-for-Spark-tp11258p14233.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
em is fixed in
Spark 1.2?
Regards,
Alon
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Low-Level-Kafka-Consumer-for-Spark-tp11258p14233.html
Sent from the Apache Spark User List mailing list archive at Nabbl
-checkpoint-recovery-causes-IO-re-execution-td12568.html#a13205
>>>>
>>>> Re-computations do occur, but the only RDD's that are recovered are the
>>>> ones
>>>> from the data checkpoint. This is what we've seen. Is not enough by
>>
chunks of data being consumed to
>>> Receiver node on let's say a second bases then having it persisted to
>>> HDFS
>>> every second could be a big challenge for keeping JVM performance - maybe
>>> that could be reason why it
tion
>>> lineage
>>> is checkpointed, but if we have big chunks of data being consumed to
>>> Receiver node on let's say a second bases then having it persisted to
>>> HDFS
>>> every second could be a big challenge f
gt;> state consistent recovery feels to me like another big issue to address.
>>
>> I plan on having a dive into the Streaming code and try to at least
>> contribute with some ideas. Some more insight from anyone on the dev team
>> will be very appreciated.
>>
&g
sistent recovery feels to me like another big issue to address.
>
> I plan on having a dive into the Streaming code and try to at least
> contribute with some ideas. Some more insight from anyone on the dev team
> will be very appreciated.
>
> tnks,
> Rod
>
>
>
>
> --
&g
like another big issue to address.
I plan on having a dive into the Streaming code and try to at least
contribute with some ideas. Some more insight from anyone on the dev team
will be very appreciated.
tnks,
Rod
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabb
t;> Dib
>>>> On Aug 28, 2014 6:45 AM, "RodrigoB" wrote:
>>>>
>>>>> Dibyendu,
>>>>>
>>>>> Tnks for getting back.
>>>>>
>>>>> I believe you are absolutely right. We were under the assumpt
t;> tests. This applies to Kafka as well.
>>>>
>>>> The issue is of major priority fortunately.
>>>>
>>>> Regarding your suggestion, I would maybe prefer to have the problem
>>>> resolved
>>>> within Spark's inter
I'm no expert. But as I understand, yes you create multiple streams to
consume multiple partitions in parallel. If they're all in the same
Kafka consumer group, you'll get exactly one copy of the message so
yes if you have 10 consumers and 3 Kafka partitions I believe only 3
will be getting message
partitionSize=30
>> >>
>> >> JavaDStream lines = newMessages.map(new
>> >> Function, String>() {
>> >> ...
>> >>
>> >> public String call(Tuple2 tuple2) {
>> >> ret
t; >>
> >> JavaDStream words = lines.flatMap(new
> >> MetricsComputeFunction()
> >> );
> >>
> >> JavaPairDStream wordCounts = words.mapToPair(
> >> new PairFunction() {
> >>
Function,
> Void>() {...});
>
> Thanks,
> Bharat
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Low-Level-Kafka-Consumer-for-Spark-tp11258p13131.html
> Sent from the Apache Spark User List mailing list archive at N
ffected by this issue. If for example
>>> there
>>> is a big amount of batches to be recomputed I would rather have them done
>>> distributed than overloading the batch interval with huge amount of Kafka
>>> messages.
>>>
>>> I do not have yet
ew this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Low-Level-Kafka-Consumer-for-Spark-tp11258p13131.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscr
e. If for example
>> there
>> is a big amount of batches to be recomputed I would rather have them done
>> distributed than overloading the batch interval with huge amount of Kafka
>> messages.
>>
>> I do not have yet enough know how on where is the issue and ab
d than overloading the batch interval with huge amount of Kafka
> messages.
>
> I do not have yet enough know how on where is the issue and about the
> internal Spark code so I can't really how much difficult will be the
> implementation.
>
> tnks,
> Rod
>
>
>
>
ion.
tnks,
Rod
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Low-Level-Kafka-Consumer-for-Spark-tp11258p12966.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
t;>> way to specify multiple worker processes (on different machines) to read
>>>> from Kafka? Maybe one worker process for each partition?
>>>>
>>>> If there is no such option, what happens when the single machine
>>>> hosting the
>>&g
t;>>
>>> If there is no such option, what happens when the single machine hosting
>>> the
>>> "Kafka Reader" worker process dies and is replaced by a different machine
>>> (like in cloud)?
>>>
>>> Thanks,
>>> Bharat
>
by a different machine
>> (like in cloud)?
>>
>> Thanks,
>> Bharat
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Low-Level-Kafka-Consumer-for-Spark-tp11258p12788.html
>> Sent f
and is replaced by a different machine
> (like in cloud)?
>
> Thanks,
> Bharat
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Low-Level-Kafka-Consumer-for-Spark-tp11258p12788.h
checkpoints are done every
> batch interval.
>
> Was it on purpose to solely depend on the Kafka commit to recover data and
> recomputations between data checkpoints? If so, how to make this work?
>
> tnks
> Rod
>
>
>
> --
> View this message in context:
> ht
ache-spark-user-list.1001560.n3.nabble.com/Low-Level-Kafka-Consumer-for-Spark-tp11258p12788.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For
a and
recomputations between data checkpoints? If so, how to make this work?
tnks
Rod
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Low-Level-Kafka-Consumer-for-Spark-tp11258p12757.html
Sent from the Apache Spark User List mailing list archive at Nabbl
gt;>> streams/receivers, adding a Java API for receivers was something we did
>>> specifically to allow this :)
>>>
>>> - Patrick
>>>
>>>
>>> On Sat, Aug 2, 2014 at 10:09 AM, Dibyendu Bhattacharya <
>>> dibyendu.bhattach...@gma
g 2, 2014 at 10:09 AM, Dibyendu Bhattacharya <
>> dibyendu.bhattach...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I have implemented a Low Level Kafka Consumer for Spark Streaming using
>>> Kafka Simple Consumer API. This API will give better contr
byendu Bhattacharya <
> dibyendu.bhattach...@gmail.com> wrote:
>
>> Hi,
>>
>> I have implemented a Low Level Kafka Consumer for Spark Streaming using
>> Kafka Simple Consumer API. This API will give better control over the Kafka
>> offset management and recovery
09 AM, Dibyendu Bhattacharya <
dibyendu.bhattach...@gmail.com> wrote:
> Hi,
>
> I have implemented a Low Level Kafka Consumer for Spark Streaming using
> Kafka Simple Consumer API. This API will give better control over the Kafka
> offset management and recovery from failures. As
simple consumer and
managing the offsets explicitly. I think this will be a great addition to
Spark Streaming, but curious what others think.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Low-Level-Kafka-Consumer-for-Spark-tp11258p11281.html
Sent from the
Hi,
I have implemented a Low Level Kafka Consumer for Spark Streaming using
Kafka Simple Consumer API. This API will give better control over the Kafka
offset management and recovery from failures. As the present Spark
KafkaUtils uses HighLevel Kafka Consumer API, I wanted to have a better
45 matches
Mail list logo