Thanks, Mich for your reply.
I agree, it is not so scalable and efficient. But it works correctly for
kafka transaction, and there is no problem with committing offset to kafka
async for now.
I try to tell you some more details about my streaming job.
CustomReceiver does not receive anything
Interesting
My concern is infinite Loop in* foreachRDD*: The *while(true)* loop within
foreachRDD creates an infinite loop within each Spark executor. This might
not be the most efficient approach, especially since offsets are committed
asynchronously.?
HTH
Mich Talebzadeh,
Technologist |
Because spark streaming for kafk transaction does not work correctly to
suit my need, I moved to another approach using raw kafka consumer which
handles read_committed messages from kafka correctly.
My codes look like the following.
JavaDStream stream = ssc.receiverStream(new CustomReceiver());
Thank you Mich for your reply.
Actually, I tried to do most of your advice.
When spark.streaming.kafka.allowNonConsecutiveOffsets=false, I got the
following error.
Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most
recent failure: Lost task 0.0 in stage 1.0 (TID 3)
Hi Kidong,
There may be few potential reasons why the message counts from your Kafka
producer and Spark Streaming consumer might not match, especially with
transactional messages and read_committed isolation level.
1) Just ensure that both your Spark Streaming job and the Kafka consumer
written
Hi,
I have a kafka producer which sends messages transactionally to kafka and
spark streaming job which should consume read_committed messages from kafka.
But there is a problem for spark streaming to consume read_committed
messages.
The count of messages sent by kafka producer transactionally is