If you’re using WAL with Kafka, Spark Streaming will ignore this 
configuration(autocommit.enable) and explicitly call commitOffset to update 
offset to Kafka AFTER WAL is done.

No matter what you’re setting with autocommit.enable, internally Spark 
Streaming will set it to false to turn off autocommit mechanism.

Thanks
Jerry

From: Shushant Arora [mailto:shushantaror...@gmail.com]
Sent: Monday, July 6, 2015 8:11 PM
To: user
Subject: kafka offset commit in spark streaming 1.2

In spark streaming 1.2 , Is offset of kafka message consumed are updated in 
zookeeper only after writing in WAL if WAL and checkpointig are enabled or is 
it depends upon kafkaparams while initialing the kafkaDstream.


Map<String,String> kafkaParams = new HashMap<String, String>();
            kafkaParams.put("zookeeper.connect","ip:2181");
            kafkaParams.put("group.id<http://group.id>", "testgroup");
            
kafkaParams.put("zookeeper.session.timeout.ms<http://zookeeper.session.timeout.ms>",
 "10000");
            kafkaParams.put("autocommit.enable","true");
            
kafkaParams.put("zookeeper.sync.time.ms<http://zookeeper.sync.time.ms>", "250");

 kafkaStreams.add(KafkaUtils.createStream(jssc, byte[].class, 
byte[].class,kafka.serializer.DefaultDecoder.class , 
kafka.serializer.DefaultDecoder.class,
                                                kafkaParams, topicsMap, 
StorageLevel.MEMORY_ONLY()));


Here since I have set autocommit.enable to true , does spark streaming will 
ignore this and always call explicit commitOffset high level  consumer 
connector or does it depends on parameter passed?

Since if it depends upon parameter and receiver calls explicit commit only when 
autocommit is false, then I should override the default autocommit to false 
from true while enabling WAL, since it may give duplicate in case of failure if 
WAL is enabled and autocommit is true.

Reply via email to