Hi Nikolaos, Maybe try experimenting with max.spout.pending. You may have a buildup of tuples due to a high max.spout.pending. Would check capacity of each bolt, find which one(s) are ~ 1, add more executors for those, and see how things look then.
--John On Wed, Jan 13, 2016 at 3:06 PM, Nikolaos Pavlakis < [email protected]> wrote: > Hello, > > I am implementing a distributed algorithm for pagerank estimation using > Storm. I have been having memory problems, so I decided to create a dummy > implementation that does not explicitly save anything in memory, to > determine whether the problem lies in my algorithm or my Storm structure. > > Indeed, while the only thing the dummy implementation does is > message-passing (a lot of it), the memory of each worker process keeps > rising until the pipeline is clogged. I do not understand why this might be > happening. > > My cluster has 18 machines (some with 8g, some 16g and some 32g of > memory). I have set the worker heap size to 6g (-Xmx6g). > > My topology is very very simple: > One spout > One bolt (with parallelism). > > The bolt receives data from the spout (fieldsGrouping) and also from other > tasks of itself. > > My message-passing pattern is based on random walks with a certain > stopping probability. More specifically: > The spout generates a tuple. > One specific task from the bolt receives this tuple. > Based on a certain probability, this task generates another tuple and > emits it again to another task of the same bolt. > > > I am stuck at this problem for quite a while, so it would be very helpful > if someone could help. > > Best Regards, > Nick >
