What is the best practice approach to share, across bolts, a Collection that 
will be used by many bolts each will perform a specific summarization and 
statistics calculation.
The objective is to retrieve the collection only once , instead of retrieving 
from  each for each bolt.

Should I just emit the collection from the intermediary bolt or is there a 
better way something like a internal cache ?

The overall topology approach is , using fieldsGrouping:
---
1)KafkaSpout
Receives the identifier(UUID) that will drive the retrieval of a collection of 
retail  transactions.  example: List<Transaction>

2) Bolt
Retrieves and emitts (collector.emit) the collection of transactions that will 
be subjet to multiple calculations  ( Is this correct  or could cause a memory 
issue as the number of Bolts growth ?)

3) Around 6 other Bolts should use that same collection of transactions to 
execute different types of summarization and statistics calculation and write 
the metrics to Cassandra.
---

Thanks
IPVP

Reply via email to