Hello,

I am implementing a distributed algorithm for pagerank estimation using
Storm. I have been having memory problems, so I decided to create a dummy
implementation that does not explicitly save anything in memory, to
determine whether the problem lies in my algorithm or my Storm structure.

Indeed, while the only thing the dummy implementation does is
message-passing (a lot of it), the memory of each worker process keeps
rising until the pipeline is clogged. I do not understand why this might be
happening.

My cluster has 18 machines (some with 8g, some 16g and some 32g of memory).
I have set the worker heap size to 6g (-Xmx6g).

My topology is very very simple:
One spout
One bolt (with parallelism).

The bolt receives data from the spout (fieldsGrouping) and also from other
tasks of itself.

My message-passing pattern is based on random walks with a certain stopping
probability. More specifically:
The spout generates a tuple.
One specific task from the bolt receives this tuple.
Based on a certain probability, this task generates another tuple and emits
it again to another task of the same bolt.


I am stuck at this problem for quite a while, so it would be very helpful
if someone could help.

Best Regards,
Nick

Reply via email to