There is an option in the pig properties:

# Use this option to turn on UDF timers. This will cause two
# counters to be tracked for every UDF and LoadFunc in your script:
# approx_microsecs measures approximate time spent inside a UDF
# approx_invocations reports the approximate number of times the UDF was
invoked
pig.udf.profile=false

Set this option to true and you can see a new counter in the job detail
view on your nodes (not visible in local mode). This should give you a hint
about invocations and spent time.

In the pig properties there are also some performance options. But I have
never touched them, but maybe there is something in for you.

I got my biggest gain of performance with profiling/benchmarking my UDFs
and find slow code. Your UDF should not waste any time especially in the
"exec" method. Next I would increase the number of reducers and the
available memory. Possible in PIG with:

SET default_parallel 20;
SET mapred.child.java.opts '-Xmx8196m'

But these are just some quick tipps. Definitely check out the performance
chapters.

Marco




2013/7/15 Serega Sheypak <[email protected]>

> I'm using CDH 4.3 with pig 0.11
> I don't see anything. That's the problem. ^)
> Script is just running. It has been tested locally on small dataset. We've
> created special small testing framework for testing pig scripts.
>
> It's my assumption that invokation is not optimal. I have no Idea hot to
> measure execution time or make some kind of perofmance metrics. Do you have
> any experience in it?
>
>
> 2013/7/15 Duckworth, Will <[email protected]>
>
> > What exactly are you seeing?  Which version of PIG are you using?  We saw
> > similar issues registering the UDFs in older versions of PIG.
> >
> >
> >
> > Will Duckworth  Senior Vice President, Software Engineering  | comScore,
> > Inc.(NASDAQ:SCOR)
> > o +1 (703) 438-2108 | m +1 (301) 606-2977 | mailto:
> [email protected]
> >
> >
> .....................................................................................................
> >
> > comScore Media Metrix(r) Multi-Platform: Audience Analytics for the Brave
> > New Digital World
> >
> > www.comscore.com/multiplatform
> > -----Original Message-----
> > From: Serega Sheypak [mailto:[email protected]]
> > Sent: Monday, July 15, 2013 7:48 AM
> > To: [email protected]
> > Subject: Jython UDF invokation
> >
> > Hi dear pig users.
> > Looks like i have significant perfomance problems with Jython UDF. I have
> > an assumption that UDF call cost is very high.
> > How can I prove my assumption?
> > Are there any solutions for such problem?
> > What are best practices in implementing UDFs?
> >
>

Reply via email to