Thank you very much for the reply, here is error I saw in production server
worker-6703.log,


On Thu, Mar 5, 2015 at 11:31 AM, Nathan Leung <[email protected]> wrote:

> Yeah, then in this case maybe you can install JDK / Yourkit in the remote
> machines and run the tools over X or something.  I'm assuming this is a
> development cluster (not live / production) and that installing debugging
> tools and running remote UIs etc is not a problem.  :)
>
> On Thu, Mar 5, 2015 at 1:52 PM, Andrew Xor <[email protected]>
> wrote:
>
>> Nathan I think that if he wants to profile a bolt per se that runs in a
>> worker that resides in a different cluster node than the one the profiling
>> tool runs he won't be able to attach the process since it resides in a
>> different physical machine, me thinks (well, now that I think of it better
>> it can be done... via remote debugging but that's just a pain in the ***).
>>
>> Regards,
>>
>> A.
>>
>> On Thu, Mar 5, 2015 at 8:46 PM, Nathan Leung <[email protected]> wrote:
>>
>>> You don't need to change your code. As Andrew mentioned you can get a
>>> lot of mileage by profiling your logic in a standalone program. For
>>> jvisualvm, you can just run your program (a loop that runs for a long time
>>> is best) then attach to the running process with jvisualvm.  It's pretty
>>> straightforward to use and you can also find good guides with a Google
>>> search.
>>> On Mar 5, 2015 1:43 PM, "Andrew Xor" <[email protected]>
>>> wrote:
>>>
>>>> ​
>>>> Well...  detecting memory leaks in Java is a bit tricky as Java does a
>>>> lot for you. Generally though, as long as you avoid using "new" operator
>>>> and close any resources that you do not use you should be fine... but a
>>>> Profiler such as the ones mentioned by Nathan will tell you the whole
>>>> truth. YourKit is awesome and has a free trial, go ahead and test drive it.
>>>> I am pretty sure that you need a working jar (or compilable code that has a
>>>> main function in it) in order to profile it, although if you want to
>>>> profile your bolts and spouts is a bit tricker. Hopefully your algorithm
>>>> (or portions of it) can be put in a sample test program that is able to be
>>>> executed locally for you to profile it.
>>>>
>>>> Hope this helped. Regards,
>>>>
>>>> A.
>>>> ​
>>>>
>>>> On Thu, Mar 5, 2015 at 8:33 PM, Sa Li <[email protected]> wrote:
>>>>
>>>>>
>>>>> On Thu, Mar 5, 2015 at 10:26 AM, Andrew Xor <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Unfortunately that is not fixed, it depends on the computations and
>>>>>> data-structures you have; in my case for example I use more than 2GB 
>>>>>> since
>>>>>> I need to keep a large matrix in memory... having said that, in most 
>>>>>> cases
>>>>>> it should be relatively easy to estimate how much memory you are going to
>>>>>> need and use that... or if that's not possible you can just increase it 
>>>>>> and
>>>>>> try the "set and see" approach. Check for memory leaks as well... 
>>>>>> (unclosed
>>>>>> resources and so on...!)
>>>>>>
>>>>>> Regards.
>>>>>>
>>>>>> ​A.​
>>>>>>
>>>>>> On Thu, Mar 5, 2015 at 8:21 PM, Sa Li <[email protected]> wrote:
>>>>>>
>>>>>>> Thanks, Nathan. How much is should be in general?
>>>>>>>
>>>>>>> On Thu, Mar 5, 2015 at 10:15 AM, Nathan Leung <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Your worker is allocated a maximum of 768mb of heap. It's quite
>>>>>>>> possible that this is not enough. Try increasing Xmx i 
>>>>>>>> worker.childopts.
>>>>>>>> On Mar 5, 2015 1:10 PM, "Sa Li" <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi, All
>>>>>>>>>
>>>>>>>>> I have been running a trident topology on production server, code
>>>>>>>>> is like this:
>>>>>>>>>
>>>>>>>>> topology.newStream("spoutInit", kafkaSpout)
>>>>>>>>>                 .each(new Fields("str"),
>>>>>>>>>                         new JsonObjectParse(),
>>>>>>>>>                         new Fields("eventType", "event"))
>>>>>>>>>                 .parallelismHint(pHint)
>>>>>>>>>                 .groupBy(new Fields("event"))
>>>>>>>>>                 
>>>>>>>>> .persistentAggregate(PostgresqlState.newFactory(config), new 
>>>>>>>>> Fields("eventType"), new EventUpdater(), new Fields("eventWord"))
>>>>>>>>>         ;
>>>>>>>>>
>>>>>>>>>         Config conf = new Config();
>>>>>>>>>         conf.registerMetricsConsumer(LoggingMetricsConsumer.class, 1);
>>>>>>>>>
>>>>>>>>> Basically, it does simple things to get data from kafka, parse to 
>>>>>>>>> different field and write into postgresDB. But in storm UI, I did see 
>>>>>>>>> such error, "java.lang.OutOfMemoryError: GC overhead limit exceeded". 
>>>>>>>>> It all happens in same worker of each node - 6703. I understand this 
>>>>>>>>> is because by default the JVM is configured to throw this error if 
>>>>>>>>> you are spending more than *98% of the total time in GC and after the 
>>>>>>>>> GC less than 2% of the heap is recovered*.
>>>>>>>>>
>>>>>>>>> I am not sure what is exact cause for memory leak, is it OK by simply 
>>>>>>>>> increase the heap? Here is my storm.yaml:
>>>>>>>>>
>>>>>>>>> supervisor.slots.ports:
>>>>>>>>>
>>>>>>>>>      - 6700
>>>>>>>>>
>>>>>>>>>      - 6701
>>>>>>>>>
>>>>>>>>>      - 6702
>>>>>>>>>
>>>>>>>>>      - 6703
>>>>>>>>>
>>>>>>>>> nimbus.childopts: "-Xmx1024m -Djava.net.preferIPv4Stack=true"
>>>>>>>>>
>>>>>>>>> ui.childopts: "-Xmx768m -Djava.net.preferIPv4Stack=true"
>>>>>>>>>
>>>>>>>>> supervisor.childopts: "-Djava.net.preferIPv4Stack=true"
>>>>>>>>>
>>>>>>>>> worker.childopts: "-Xmx768m -Djava.net.preferIPv4Stack=true"
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Anyone has similar issues, and what will be the best way to
>>>>>>>>> overcome?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> thanks in advance
>>>>>>>>>
>>>>>>>>> AL
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>
>

Reply via email to