We have a topology that we're trying to push the throughput on as much as
possible. While profiling the topology we found that we are holding onto a
lot of memory in our list of tuples prior to acking them. It appears that
most of this memory is coming from holding onto the original message
payload in its raw format (char[] in our case). Our topology is performing
online aggregation, so our internal tracking memory is typically quite
small as we aggregate 1,000's of messages into a single bucket. However,
maintaining the list of all raw tuple payloads that went into the
aggregation bucket for the duration of our checkpointing frequency can chew
up a significant footprint of memory.

Is there a way to clear the tuple Values() after it has been processed, but
before acking it? Our alternative solution is to try a different
serialization format that requires a smaller payload. While this would
potentially reduce our footprint by a good factor, it would still have
limits. Ideally we could strip the tuple list down to only the required
message IDs bits required for proper storm message acking.

Any ideas? We are on version 0.9.0.1.

Thanks,

Mike

-- 

  Mike Heffner <[email protected]>
  Librato, Inc.

Reply via email to