Well... detecting memory leaks in Java is a bit tricky as Java does a lot for you. Generally though, as long as you avoid using "new" operator and close any resources that you do not use you should be fine... but a Profiler such as the ones mentioned by Nathan will tell you the whole truth. YourKit is awesome and has a free trial, go ahead and test drive it. I am pretty sure that you need a working jar (or compilable code that has a main function in it) in order to profile it, although if you want to profile your bolts and spouts is a bit tricker. Hopefully your algorithm (or portions of it) can be put in a sample test program that is able to be executed locally for you to profile it.
Hope this helped. Regards, A. On Thu, Mar 5, 2015 at 8:33 PM, Sa Li <[email protected]> wrote: > > On Thu, Mar 5, 2015 at 10:26 AM, Andrew Xor <[email protected]> > wrote: > >> Unfortunately that is not fixed, it depends on the computations and >> data-structures you have; in my case for example I use more than 2GB since >> I need to keep a large matrix in memory... having said that, in most cases >> it should be relatively easy to estimate how much memory you are going to >> need and use that... or if that's not possible you can just increase it and >> try the "set and see" approach. Check for memory leaks as well... (unclosed >> resources and so on...!) >> >> Regards. >> >> A. >> >> On Thu, Mar 5, 2015 at 8:21 PM, Sa Li <[email protected]> wrote: >> >>> Thanks, Nathan. How much is should be in general? >>> >>> On Thu, Mar 5, 2015 at 10:15 AM, Nathan Leung <[email protected]> wrote: >>> >>>> Your worker is allocated a maximum of 768mb of heap. It's quite >>>> possible that this is not enough. Try increasing Xmx i worker.childopts. >>>> On Mar 5, 2015 1:10 PM, "Sa Li" <[email protected]> wrote: >>>> >>>>> Hi, All >>>>> >>>>> I have been running a trident topology on production server, code is >>>>> like this: >>>>> >>>>> topology.newStream("spoutInit", kafkaSpout) >>>>> .each(new Fields("str"), >>>>> new JsonObjectParse(), >>>>> new Fields("eventType", "event")) >>>>> .parallelismHint(pHint) >>>>> .groupBy(new Fields("event")) >>>>> .persistentAggregate(PostgresqlState.newFactory(config), >>>>> new Fields("eventType"), new EventUpdater(), new Fields("eventWord")) >>>>> ; >>>>> >>>>> Config conf = new Config(); >>>>> conf.registerMetricsConsumer(LoggingMetricsConsumer.class, 1); >>>>> >>>>> Basically, it does simple things to get data from kafka, parse to >>>>> different field and write into postgresDB. But in storm UI, I did see >>>>> such error, "java.lang.OutOfMemoryError: GC overhead limit exceeded". It >>>>> all happens in same worker of each node - 6703. I understand this is >>>>> because by default the JVM is configured to throw this error if you are >>>>> spending more than *98% of the total time in GC and after the GC less >>>>> than 2% of the heap is recovered*. >>>>> >>>>> I am not sure what is exact cause for memory leak, is it OK by simply >>>>> increase the heap? Here is my storm.yaml: >>>>> >>>>> supervisor.slots.ports: >>>>> >>>>> - 6700 >>>>> >>>>> - 6701 >>>>> >>>>> - 6702 >>>>> >>>>> - 6703 >>>>> >>>>> nimbus.childopts: "-Xmx1024m -Djava.net.preferIPv4Stack=true" >>>>> >>>>> ui.childopts: "-Xmx768m -Djava.net.preferIPv4Stack=true" >>>>> >>>>> supervisor.childopts: "-Djava.net.preferIPv4Stack=true" >>>>> >>>>> worker.childopts: "-Xmx768m -Djava.net.preferIPv4Stack=true" >>>>> >>>>> >>>>> Anyone has similar issues, and what will be the best way to overcome? >>>>> >>>>> >>>>> thanks in advance >>>>> >>>>> AL >>>>> >>>>> >>>>> >>>>> >>> >> >
