Re: Spark streaming job for kafka transaction does not consume read_committed messages correctly.

2024-04-14 Thread Kidong Lee
Thanks, Mich for your reply. I agree, it is not so scalable and efficient. But it works correctly for kafka transaction, and there is no problem with committing offset to kafka async for now. I try to tell you some more details about my streaming job. CustomReceiver does not receive anything

Re: Spark streaming job for kafka transaction does not consume read_committed messages correctly.

2024-04-14 Thread Mich Talebzadeh
Interesting My concern is infinite Loop in* foreachRDD*: The *while(true)* loop within foreachRDD creates an infinite loop within each Spark executor. This might not be the most efficient approach, especially since offsets are committed asynchronously.? HTH Mich Talebzadeh, Technologist |

Re: Spark streaming job for kafka transaction does not consume read_committed messages correctly.

2024-04-14 Thread Kidong Lee
Because spark streaming for kafk transaction does not work correctly to suit my need, I moved to another approach using raw kafka consumer which handles read_committed messages from kafka correctly. My codes look like the following. JavaDStream stream = ssc.receiverStream(new CustomReceiver());

Re: Spark streaming job for kafka transaction does not consume read_committed messages correctly.

2024-04-13 Thread Kidong Lee
Thank you Mich for your reply. Actually, I tried to do most of your advice. When spark.streaming.kafka.allowNonConsecutiveOffsets=false, I got the following error. Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 3)

Re: Spark streaming job for kafka transaction does not consume read_committed messages correctly.

2024-04-13 Thread Mich Talebzadeh
Hi Kidong, There may be few potential reasons why the message counts from your Kafka producer and Spark Streaming consumer might not match, especially with transactional messages and read_committed isolation level. 1) Just ensure that both your Spark Streaming job and the Kafka consumer written

Spark streaming job for kafka transaction does not consume read_committed messages correctly.

2024-04-12 Thread Kidong Lee
Hi, I have a kafka producer which sends messages transactionally to kafka and spark streaming job which should consume read_committed messages from kafka. But there is a problem for spark streaming to consume read_committed messages. The count of messages sent by kafka producer transactionally is