Good morning, I have a Storm project that aggregates Entities into Groups. Each Group is considered "ready to process" if didn't receive an Event for the past 10 minutes. This means that I need to save those groups in the meantime. I planned 2 scenarios: use an external storage (MySQL, Cassandra, Redis...) or use Storm as storage (using a distributed HashMap along some kind of bolt, for example). Both solutions have their benefits and their weak points:
In the first case, we need to access to an external system, that means that we will need to aggregate the tuples to minimize the round trips, but we have the stuff stored in a more or less reliable place (depends of the db will be more reliable or less reliable). In the second case, we need to increase the TTL of the spout to ensure that we give enough time to the group of tuples to be considered ready to process. However, we prevent the round trips and we can scale just considering the size of the storm. Said that, which solution is better? And, more important, is there are any red line that shouldn't be crossed in terms of TTL if the second case wants to be applied? To give you an idea, I'm planning in have a TTL of near 2h. Regards, Ivan Garcia Maya
