Yeah, then in this case maybe you can install JDK / Yourkit in the remote
machines and run the tools over X or something.  I'm assuming this is a
development cluster (not live / production) and that installing debugging
tools and running remote UIs etc is not a problem.  :)

On Thu, Mar 5, 2015 at 1:52 PM, Andrew Xor <[email protected]>
wrote:

> Nathan I think that if he wants to profile a bolt per se that runs in a
> worker that resides in a different cluster node than the one the profiling
> tool runs he won't be able to attach the process since it resides in a
> different physical machine, me thinks (well, now that I think of it better
> it can be done... via remote debugging but that's just a pain in the ***).
>
> Regards,
>
> A.
>
> On Thu, Mar 5, 2015 at 8:46 PM, Nathan Leung <[email protected]> wrote:
>
>> You don't need to change your code. As Andrew mentioned you can get a lot
>> of mileage by profiling your logic in a standalone program. For jvisualvm,
>> you can just run your program (a loop that runs for a long time is best)
>> then attach to the running process with jvisualvm.  It's pretty
>> straightforward to use and you can also find good guides with a Google
>> search.
>> On Mar 5, 2015 1:43 PM, "Andrew Xor" <[email protected]> wrote:
>>
>>> ​
>>> Well...  detecting memory leaks in Java is a bit tricky as Java does a
>>> lot for you. Generally though, as long as you avoid using "new" operator
>>> and close any resources that you do not use you should be fine... but a
>>> Profiler such as the ones mentioned by Nathan will tell you the whole
>>> truth. YourKit is awesome and has a free trial, go ahead and test drive it.
>>> I am pretty sure that you need a working jar (or compilable code that has a
>>> main function in it) in order to profile it, although if you want to
>>> profile your bolts and spouts is a bit tricker. Hopefully your algorithm
>>> (or portions of it) can be put in a sample test program that is able to be
>>> executed locally for you to profile it.
>>>
>>> Hope this helped. Regards,
>>>
>>> A.
>>> ​
>>>
>>> On Thu, Mar 5, 2015 at 8:33 PM, Sa Li <[email protected]> wrote:
>>>
>>>>
>>>> On Thu, Mar 5, 2015 at 10:26 AM, Andrew Xor <
>>>> [email protected]> wrote:
>>>>
>>>>> Unfortunately that is not fixed, it depends on the computations and
>>>>> data-structures you have; in my case for example I use more than 2GB since
>>>>> I need to keep a large matrix in memory... having said that, in most cases
>>>>> it should be relatively easy to estimate how much memory you are going to
>>>>> need and use that... or if that's not possible you can just increase it 
>>>>> and
>>>>> try the "set and see" approach. Check for memory leaks as well... 
>>>>> (unclosed
>>>>> resources and so on...!)
>>>>>
>>>>> Regards.
>>>>>
>>>>> ​A.​
>>>>>
>>>>> On Thu, Mar 5, 2015 at 8:21 PM, Sa Li <[email protected]> wrote:
>>>>>
>>>>>> Thanks, Nathan. How much is should be in general?
>>>>>>
>>>>>> On Thu, Mar 5, 2015 at 10:15 AM, Nathan Leung <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Your worker is allocated a maximum of 768mb of heap. It's quite
>>>>>>> possible that this is not enough. Try increasing Xmx i worker.childopts.
>>>>>>> On Mar 5, 2015 1:10 PM, "Sa Li" <[email protected]> wrote:
>>>>>>>
>>>>>>>> Hi, All
>>>>>>>>
>>>>>>>> I have been running a trident topology on production server, code
>>>>>>>> is like this:
>>>>>>>>
>>>>>>>> topology.newStream("spoutInit", kafkaSpout)
>>>>>>>>                 .each(new Fields("str"),
>>>>>>>>                         new JsonObjectParse(),
>>>>>>>>                         new Fields("eventType", "event"))
>>>>>>>>                 .parallelismHint(pHint)
>>>>>>>>                 .groupBy(new Fields("event"))
>>>>>>>>                 
>>>>>>>> .persistentAggregate(PostgresqlState.newFactory(config), new 
>>>>>>>> Fields("eventType"), new EventUpdater(), new Fields("eventWord"))
>>>>>>>>         ;
>>>>>>>>
>>>>>>>>         Config conf = new Config();
>>>>>>>>         conf.registerMetricsConsumer(LoggingMetricsConsumer.class, 1);
>>>>>>>>
>>>>>>>> Basically, it does simple things to get data from kafka, parse to 
>>>>>>>> different field and write into postgresDB. But in storm UI, I did see 
>>>>>>>> such error, "java.lang.OutOfMemoryError: GC overhead limit exceeded". 
>>>>>>>> It all happens in same worker of each node - 6703. I understand this 
>>>>>>>> is because by default the JVM is configured to throw this error if you 
>>>>>>>> are spending more than *98% of the total time in GC and after the GC 
>>>>>>>> less than 2% of the heap is recovered*.
>>>>>>>>
>>>>>>>> I am not sure what is exact cause for memory leak, is it OK by simply 
>>>>>>>> increase the heap? Here is my storm.yaml:
>>>>>>>>
>>>>>>>> supervisor.slots.ports:
>>>>>>>>
>>>>>>>>      - 6700
>>>>>>>>
>>>>>>>>      - 6701
>>>>>>>>
>>>>>>>>      - 6702
>>>>>>>>
>>>>>>>>      - 6703
>>>>>>>>
>>>>>>>> nimbus.childopts: "-Xmx1024m -Djava.net.preferIPv4Stack=true"
>>>>>>>>
>>>>>>>> ui.childopts: "-Xmx768m -Djava.net.preferIPv4Stack=true"
>>>>>>>>
>>>>>>>> supervisor.childopts: "-Djava.net.preferIPv4Stack=true"
>>>>>>>>
>>>>>>>> worker.childopts: "-Xmx768m -Djava.net.preferIPv4Stack=true"
>>>>>>>>
>>>>>>>>
>>>>>>>> Anyone has similar issues, and what will be the best way to
>>>>>>>> overcome?
>>>>>>>>
>>>>>>>>
>>>>>>>> thanks in advance
>>>>>>>>
>>>>>>>> AL
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>

Reply via email to