Jon, We actually took that exact approach in our testing:
tuple.getValues().clear() Good to hear that others have recognized as a pain point and that there's room for improvement. Cheers, Mike On Thu, Apr 17, 2014 at 8:16 PM, Jon Logan <[email protected]> wrote: > I've ran into a similar issue. There's been talk in the past about fixing > this, but it hasn't been. As a work around, you can actually use Reflection > to get a hold of the private "values" variable, and just call clear() on it. > > > https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/backtype/storm/tuple/TupleImpl.java > > > On Thu, Apr 17, 2014 at 11:00 AM, Mike Heffner <[email protected]> wrote: > >> We have a topology that we're trying to push the throughput on as much as >> possible. While profiling the topology we found that we are holding onto a >> lot of memory in our list of tuples prior to acking them. It appears that >> most of this memory is coming from holding onto the original message >> payload in its raw format (char[] in our case). Our topology is performing >> online aggregation, so our internal tracking memory is typically quite >> small as we aggregate 1,000's of messages into a single bucket. However, >> maintaining the list of all raw tuple payloads that went into the >> aggregation bucket for the duration of our checkpointing frequency can chew >> up a significant footprint of memory. >> >> Is there a way to clear the tuple Values() after it has been processed, >> but before acking it? Our alternative solution is to try a different >> serialization format that requires a smaller payload. While this would >> potentially reduce our footprint by a good factor, it would still have >> limits. Ideally we could strip the tuple list down to only the required >> message IDs bits required for proper storm message acking. >> >> Any ideas? We are on version 0.9.0.1. >> >> Thanks, >> >> Mike >> >> -- >> >> Mike Heffner <[email protected]> >> Librato, Inc. >> >> > -- Mike Heffner <[email protected]> Librato, Inc.
