+1 for Andrew, definitely agree profiling with jvisualvm or whatever is definitely something to do if you have not done already
On Wed, Jan 13, 2016 at 3:30 PM, Andrew Xor <[email protected]> wrote: > Hey, > > Care to give version of storm/jvm? Does this happen on cluster execution > only or when also running the topology in local mode? Unfortunately, > probably the best way to find what's really going on is to profile your > topology... if you can run the topology locally this will make things quite > a bit easier as profiling storm topologies on a live cluster can be quite > time consuming. > > Regards. > > On Wed, Jan 13, 2016 at 10:06 PM, Nikolaos Pavlakis < > [email protected]> wrote: > >> Hello, >> >> I am implementing a distributed algorithm for pagerank estimation using >> Storm. I have been having memory problems, so I decided to create a dummy >> implementation that does not explicitly save anything in memory, to >> determine whether the problem lies in my algorithm or my Storm structure. >> >> Indeed, while the only thing the dummy implementation does is >> message-passing (a lot of it), the memory of each worker process keeps >> rising until the pipeline is clogged. I do not understand why this might be >> happening. >> >> My cluster has 18 machines (some with 8g, some 16g and some 32g of >> memory). I have set the worker heap size to 6g (-Xmx6g). >> >> My topology is very very simple: >> One spout >> One bolt (with parallelism). >> >> The bolt receives data from the spout (fieldsGrouping) and also from >> other tasks of itself. >> >> My message-passing pattern is based on random walks with a certain >> stopping probability. More specifically: >> The spout generates a tuple. >> One specific task from the bolt receives this tuple. >> Based on a certain probability, this task generates another tuple and >> emits it again to another task of the same bolt. >> >> >> I am stuck at this problem for quite a while, so it would be very helpful >> if someone could help. >> >> Best Regards, >> Nick >> > >
