There is an option in the pig properties: # Use this option to turn on UDF timers. This will cause two # counters to be tracked for every UDF and LoadFunc in your script: # approx_microsecs measures approximate time spent inside a UDF # approx_invocations reports the approximate number of times the UDF was invoked pig.udf.profile=false
Set this option to true and you can see a new counter in the job detail view on your nodes (not visible in local mode). This should give you a hint about invocations and spent time. In the pig properties there are also some performance options. But I have never touched them, but maybe there is something in for you. I got my biggest gain of performance with profiling/benchmarking my UDFs and find slow code. Your UDF should not waste any time especially in the "exec" method. Next I would increase the number of reducers and the available memory. Possible in PIG with: SET default_parallel 20; SET mapred.child.java.opts '-Xmx8196m' But these are just some quick tipps. Definitely check out the performance chapters. Marco 2013/7/15 Serega Sheypak <[email protected]> > I'm using CDH 4.3 with pig 0.11 > I don't see anything. That's the problem. ^) > Script is just running. It has been tested locally on small dataset. We've > created special small testing framework for testing pig scripts. > > It's my assumption that invokation is not optimal. I have no Idea hot to > measure execution time or make some kind of perofmance metrics. Do you have > any experience in it? > > > 2013/7/15 Duckworth, Will <[email protected]> > > > What exactly are you seeing? Which version of PIG are you using? We saw > > similar issues registering the UDFs in older versions of PIG. > > > > > > > > Will Duckworth Senior Vice President, Software Engineering | comScore, > > Inc.(NASDAQ:SCOR) > > o +1 (703) 438-2108 | m +1 (301) 606-2977 | mailto: > [email protected] > > > > > ..................................................................................................... > > > > comScore Media Metrix(r) Multi-Platform: Audience Analytics for the Brave > > New Digital World > > > > www.comscore.com/multiplatform > > -----Original Message----- > > From: Serega Sheypak [mailto:[email protected]] > > Sent: Monday, July 15, 2013 7:48 AM > > To: [email protected] > > Subject: Jython UDF invokation > > > > Hi dear pig users. > > Looks like i have significant perfomance problems with Jython UDF. I have > > an assumption that UDF call cost is very high. > > How can I prove my assumption? > > Are there any solutions for such problem? > > What are best practices in implementing UDFs? > > >
