We have a topology that we're trying to push the throughput on as much as possible. While profiling the topology we found that we are holding onto a lot of memory in our list of tuples prior to acking them. It appears that most of this memory is coming from holding onto the original message payload in its raw format (char[] in our case). Our topology is performing online aggregation, so our internal tracking memory is typically quite small as we aggregate 1,000's of messages into a single bucket. However, maintaining the list of all raw tuple payloads that went into the aggregation bucket for the duration of our checkpointing frequency can chew up a significant footprint of memory.
Is there a way to clear the tuple Values() after it has been processed, but before acking it? Our alternative solution is to try a different serialization format that requires a smaller payload. While this would potentially reduce our footprint by a good factor, it would still have limits. Ideally we could strip the tuple list down to only the required message IDs bits required for proper storm message acking. Any ideas? We are on version 0.9.0.1. Thanks, Mike -- Mike Heffner <[email protected]> Librato, Inc.
