Thanks Jonathan, I've tried to produce an example script which exhibits the slowdown and posted it on Pastebin: http://pastebin.com/kTSsDUr3
The slowdown seems to occur when we are using a lot of UDFs to parse our input data. Variant A in the script is noticeably slower than variant B in Pig 0.10 while performance is similar in Pig 0.9.1 I've pasted the exec() function of the GFV function on Pastebin as well: http://pastebin.com/FVnkQCJ5 Please let us know if you need more details. Thanks, Chun On 8/7/12 10:07 PM, "Jonathan Coveney" <jcove...@gmail.com> wrote: > Can you guys give a script that has the issue? My tactic would be to use > some sort of profiler (we have access to YourKit for open source Pig > contribution work) and try and isolate what is triggering GC. > > 2012/8/7 Prashant Kommireddi <prash1...@gmail.com> > >> Hi All, >> >> Just wanted to follow-up on Chun's question. Several of our Pig users have >> been experiencing slow start-ups with Pig 0.10.0, when the same script runs >> fine with 0.9.1. Anyone else facing similar issues? >> >> Thanks, >> Prashant >> >> Hi all, >> >> I'm trying to move from Pig 0.9.1 to Pig 0.10.0 . When I try to run the >> same >> script using the two Pig versions, 0.9.1 starts off fast and almost >> immediately submits the job to the cluster. On the other hand, Pig 0.10.0 >> takes forever to submit the job. When I use the java option >> -XX:+PrintGCDetails, I see that for 0.10.0 the GC is being run many times >> before and after the job is submitted to the cluster. >> >> Does anyone know what is causing this and/or how I might be able to >> troubleshoot it? >> >> I've uploaded truncated output showing when GC happens to >> Pastebin:http://pastebin.com/B8WTHW9r >> >> Thanks, >> Chun >>