Run it in local mode after doing export PIG_OPTS="-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/heapdump.hprof" . Then you should be able to look into the heapdump and see where you are leaking memory in your UDF.
On Thu, Dec 10, 2015 at 9:23 AM, <william.dowl...@thomsonreuters.com> wrote: > Hi Pig community, > > I am running a pig process using a python UDF, and getting a failure that > is hard to debug. The relevant parts of the script are: > > REGISTER [...]clustercentroid_udfs.py using jython as UDFS ; > > [... definition of cluster_vals ...] > grouped = group cluster_vals by (clusters::cluster_id, tfidf::att, > clusters::block_size); > cluster_tfidf = foreach grouped { > generate > group.clusters::cluster_id as cluster_id, > group.clusters::block_size as block_size, > group.tfidf::att as att, > UDFS.normalize_avg_words(cluster_vals.tfidf::pairs) as centroid; > } > store cluster_tfidf into [...] > > I can remove essentially all the logic from UDFS.normalize_avg_words > and still get the failure, for example I get the failure with this > definition of normalize_avg_words(): > @outputSchema('words: {wvpairs: (word: chararray, normvalue: double)}') > def normalize_avg_words(line): > return [] > > The log for the failing task has > > 2015-12-09 16:18:47,510 INFO [main] > org.apache.pig.data.SchemaTupleBackend: Key [pig.schematuple] was not > set... will not generate code. > 2015-12-09 16:18:47,534 INFO [main] > org.apache.pig.scripting.jython.JythonScriptEngine: created tmp > python.cachedir=/data/3/yarn/nm/usercache/sesadmin/appcache/application_1444666458457_553099/container_e17_1444666458457_553099_01_685857/tmp/pig_jython_6256288828533965407 > 2015-12-09 16:18:49,443 INFO [main] > org.apache.pig.scripting.jython.JythonFunction: Schema 'words: {wvpairs: > (word: chararray, normvalue: double)}' defined for func normalize_avg_words > 2015-12-09 16:18:49,498 INFO [main] > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce: > Aliases being processed per job phase (AliasName[line,offset]): M: > grouped[87,10] C: R: cluster_tfidf[99,16] > 2015-12-09 16:18:49,511 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : java.lang.IndexOutOfBoundsException: Index: 1, > Size: 1 > at java.util.ArrayList.rangeCheck(ArrayList.java:638) > at java.util.ArrayList.get(ArrayList.java:414) > at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:118) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getValueTuple(POPackage.java:348) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNextTuple(POPackage.java:269) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:421) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:412) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:256) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) > at > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > > I do not get the failure if I use just a 5GB segment of my full 300GB data > set. > > Also, I do not get the failure if I comment out the call to the UDF: > > cluster_tfidf = foreach grouped { > generate > group.clusters::cluster_id as cluster_id, > group.clusters::block_size as block_size, > group.tfidf::att as att; > -- UDFS.normalize_avg_words(cluster_vals.tfidf::pairs) as centroid; > } > > I wonder if the failure is ultimately caused by an out of memory > someplace, but I haven't seen anything in the log that indicates that > directly. (I have tried using a large number of reducers in the definition > of grouped but the result is the same). What should I look for in the log > that would be a telltale for out of memory? How would I address it? > > Since I don't get the failure when the UDF call is commented out, I wonder > if the problem is in the call itself, but don't know how to diagnose or > debug that. > > Any help would be much appreciated! > > Apache Pig version 0.12.0-cdh5.3.3 (rexported) > Hadoop 2.5.0-cdh5.3.3 > > Thanks, > Will > > William F Dowling > Senior Technologist > Thomson Reuters >