Thanks Jonathan,

I've tried to produce an example script which exhibits the slowdown and
posted it on Pastebin: http://pastebin.com/kTSsDUr3

The slowdown seems to occur when we are using a lot of UDFs to parse our
input data. Variant A in the script is noticeably slower than variant B in
Pig 0.10 while performance is similar in Pig 0.9.1

I've pasted the exec() function of the GFV function on Pastebin as well:
http://pastebin.com/FVnkQCJ5

Please let us know if you need more details.

Thanks,
Chun

On 8/7/12 10:07 PM, "Jonathan Coveney" <jcove...@gmail.com> wrote:

> Can you guys give a script that has the issue? My tactic would be to use
> some sort of profiler (we have access to YourKit for open source Pig
> contribution work) and try and isolate what is triggering GC.
> 
> 2012/8/7 Prashant Kommireddi <prash1...@gmail.com>
> 
>> Hi All,
>> 
>> Just wanted to follow-up on Chun's question. Several of our Pig users have
>> been experiencing slow start-ups with Pig 0.10.0, when the same script runs
>> fine with 0.9.1. Anyone else facing similar issues?
>> 
>> Thanks,
>> Prashant
>> 
>> Hi all,
>> 
>> I'm trying to move from Pig 0.9.1 to Pig 0.10.0 . When I try to run the
>> same
>> script using the two Pig versions, 0.9.1 starts off fast and almost
>> immediately submits the job to the cluster. On the other hand, Pig 0.10.0
>> takes forever to submit the job. When I use the java option
>> -XX:+PrintGCDetails, I see that for 0.10.0 the GC is being run many times
>> before and after the job is submitted to the cluster.
>> 
>> Does anyone know what is causing this and/or how I might be able to
>> troubleshoot it?
>> 
>> I've uploaded truncated output showing when GC happens to
>> Pastebin:http://pastebin.com/B8WTHW9r
>> 
>> Thanks,
>> Chun
>> 

Reply via email to