Hello, I am implementing a distributed algorithm for pagerank estimation using Storm. I have been having memory problems, so I decided to create a dummy implementation that does not explicitly save anything in memory, to determine whether the problem lies in my algorithm or my Storm structure.
Indeed, while the only thing the dummy implementation does is message-passing (a lot of it), the memory of each worker process keeps rising until the pipeline is clogged. I do not understand why this might be happening. My cluster has 18 machines (some with 8g, some 16g and some 32g of memory). I have set the worker heap size to 6g (-Xmx6g). My topology is very very simple: One spout One bolt (with parallelism). The bolt receives data from the spout (fieldsGrouping) and also from other tasks of itself. My message-passing pattern is based on random walks with a certain stopping probability. More specifically: The spout generates a tuple. One specific task from the bolt receives this tuple. Based on a certain probability, this task generates another tuple and emits it again to another task of the same bolt. I am stuck at this problem for quite a while, so it would be very helpful if someone could help. Best Regards, Nick
