Hi everyone!
I’m stuck with duplicates of tuples.
I have kafka-spout with 4 workers for 4 partitions. Some tuples duplicating as
I see in database, twice or triple. This happens not just in time, but after
few second, about 30 second. Sometimes this happens after few minutes. I can’t
figure out what happens, everything is ok in workers log files.
My topology config:
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("event", kafkaSpout, 4).setNumTasks(4);
builder.setBolt(“events", new EventsBatchBolt(topologyConfig), 1)
.shuffleGrouping("event")
.setNumTasks(1);
builder.setBolt("properties", new EventPropertiesBolt(topologyConfig),
1)
.fieldsGrouping("vertica", new Fields("event_id", "fieldname",
"repr_type", "repr_value"))
.setNumTasks(1);
storm config:
Config config = new Config();
config.put(Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS, 60);
config.put(Config.TOPOLOGY_ACKER_EXECUTORS, 20);
config.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 2);
config.put(Config.TOPOLOGY_RECEIVER_BUFFER_SIZE, 8);
config.put(Config.TOPOLOGY_TRANSFER_BUFFER_SIZE, 32);
config.put(Config.TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE, 131072);
config.put(Config.TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE, 131072);
config.setNumAckers(2);
config.setDebug(false);
My guess, I doing something wrong with topology config or something wrong with
storm-kafka-plus 0.4.
Any ideas why I have duplicates?
Thanks
— With best regards, Irek Khasyanov