Marco –
Our first bolt emits a summarized record of the info we received from the 
spouts –
It is time based – every 30 seconds we emit one record that summarizes all the 
records we received from the spout –
We don’t re-emit the source records that we received from the spouts, they are 
persisted on cold path storage though and we can access them offline for 
detailed analysis -

Is this similar to what you are trying to do?

Thx,
Mauro.

From: Marco Costantini [mailto:[email protected]]
Sent: Monday, November 20, 2017 1:01 AM
To: [email protected]
Subject: A Batching Bolt

Hello,
I need to group/batch tuples. I've seen an excellent tutorial which does this. 
It handles timeouts and batch size breaches. Great. However, there, all of the 
logic takes place in the final bolt. That means it does not have the problem of 
"emitting batched information".

Sadly for me, I want to create a distinct bolt in the middle of a topology for 
batching. This means I have to worry about emitting batches of information.

I tried it out. Both with the batching done in the final bolt, and with the 
batching done in a separate bolt. When it's done in the final bolt, all is 
well. When it's done in a separate bolt, performance suffers greatly. By this I 
mean the indexing rate of ElasticSearch (probably not a good measure of 
performance, I know). The batching method is the same in both cases.

Question: Is it bad to emit a Map or a List of objects? What are the best 
practices for batching in a distinct batching bolt?

Please and thank you,
Marco.

Reply via email to