Hi, Houssam:
What's the error in your pig log file? I were trying to reproduce it with
1000 rows, 500 columns.
A = load 'random.txt' using PigStorage(':') as
(f1:double,f2:double,.........,f500:double);
B = group A all;
D = foreach B generate group,COR(A.$0,A.$1,A.$2,A.$3,.......A.$499);
dump D;

The exception in pig log file is
Backend error message
---------------------
Error: java.lang.OutOfMemoryError: *GC overhead limit exceeded*
at java.lang.Double.valueOf(Double.java:492)
 at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:390)
at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:313)
 at
org.apache.pig.data.utils.SedesHelper.readGenericTuple(SedesHelper.java:144)
at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:344)
 at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:313)
at
org.apache.pig.data.utils.SedesHelper.readGenericTuple(SedesHelper.java:144)
 at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:344)
at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:313)
 at
org.apache.pig.data.InternalCachedBag$CachedBagIterator.hasNext(InternalCachedBag.java:208)
at org.apache.pig.builtin.COR.combine(COR.java:258)
 at org.apache.pig.builtin.COR$Intermed.exec(COR.java:171)
at org.apache.pig.builtin.COR$Intermed.exec(COR.java:164)
 at org.apache.pig.backend.hadoop.executionengine.physi

Backend error message
---------------------
Error: java.lang.OutOfMemoryError: Java heap space
 at java.lang.Double.valueOf(Double.java:492)
at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:390)
 at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:313)
at
org.apache.pig.data.utils.SedesHelper.readGenericTuple(SedesHelper.java:144)
 at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:344)
at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:313)
 at
org.apache.pig.data.utils.SedesHelper.readGenericTuple(SedesHelper.java:144)
at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:344)
 at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:313)
at
org.apache.pig.data.InternalCachedBag$CachedBagIterator.hasNext(InternalCachedBag.java:208)
 at org.apache.pig.builtin.COR.combine(COR.java:258)
at org.apache.pig.builtin.COR$Intermed.exec(COR.java:171)
 at org.apache.pig.builtin.COR$Intermed.exec(COR.java:164)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.ex

Backend error message
---------------------
Error: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.ArrayList.<init>(ArrayList.java:112)
 at org.apache.pig.data.DefaultTuple.<init>(DefaultTuple.java:67)
at org.apache.pig.data.BinSedesTuple.<init>(BinSedesTuple.java:67)
 at
org.apache.pig.data.BinSedesTupleFactory.newTuple(BinSedesTupleFactory.java:38)
at
org.apache.pig.data.utils.SedesHelper.readGenericTuple(SedesHelper.java:142)
 at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:344)
at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:313)
 at
org.apache.pig.data.utils.SedesHelper.readGenericTuple(SedesHelper.java:144)
at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:344)
 at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:313)
at
org.apache.pig.data.InternalCachedBag$CachedBagIterator.hasNext(InternalCachedBag.java:208)
 at org.apache.pig.builtin.COR.combine(COR.java:258)
at org.apache.pig.builtin.COR$Intermed.exec(COR.java:171)
 at org.apache.pig.builtin.COR$Inte

Backend error message
---------------------
Error: java.lang.OutOfMemoryError: Java heap space
 at
org.apache.pig.data.BinSedesTupleFactory.newTuple(BinSedesTupleFactory.java:38)
at
org.apache.pig.data.utils.SedesHelper.readGenericTuple(SedesHelper.java:142)
 at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:344)
at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:313)
 at
org.apache.pig.data.utils.SedesHelper.readGenericTuple(SedesHelper.java:144)
at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:344)
 at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:313)
at
org.apache.pig.data.InternalCachedBag$CachedBagIterator.hasNext(InternalCachedBag.java:208)
 at org.apache.pig.builtin.COR.combine(COR.java:258)
at org.apache.pig.builtin.COR$Intermed.exec(COR.java:171)
 at org.apache.pig.builtin.COR$Intermed.exec(COR.java:164)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:337)
 at org.apache.pig.backend.hadoop.executionengin

Error message from task (map) task_201302211102_0561_m_000000
-------------------------------------------------------------
ERROR 6016: Out of memory.

org.apache.pig.backend.executionengine.ExecException: ERROR 6016: Out of
memory.
at java.lang.Double.valueOf(Double.java:492)
 at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:390)
at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:313)
 at
org.apache.pig.data.utils.SedesHelper.readGenericTuple(SedesHelper.java:144)
at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:344)
 at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:313)
at
org.apache.pig.data.utils.SedesHelper.readGenericTuple(SedesHelper.java:144)
 at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:344)
at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:313)
 at
org.apache.pig.data.InternalCachedBag$CachedBagIterator.hasNext(InternalCachedBag.java:208)
at org.apache.pig.builtin.COR.combine(COR.java:258)
 at org.apache.pig.builtin.COR$Intermed.exec(COR.java:171)
at org.apache.pig.builtin.COR$Intermed.exec(COR.java:164)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
... 13 more
================================================================================
Error message from task (map) task_201302211102_0561_m_000000
-------------------------------------------------------------
ERROR 6016: Out of memory.

org.apache.pig.backend.executionengine.ExecException: ERROR 6016: Out of
memory.
 at java.lang.Double.valueOf(Double.java:492)
at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:390)
 at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:313)
at
org.apache.pig.data.utils.SedesHelper.readGenericTuple(SedesHelper.java:144)
 at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:344)
at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:313)
 at
org.apache.pig.data.utils.SedesHelper.readGenericTuple(SedesHelper.java:144)
at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:344)
 at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:313)
at
org.apache.pig.data.InternalCachedBag$CachedBagIterator.hasNext(InternalCachedBag.java:208)
 at org.apache.pig.builtin.COR.combine(COR.java:258)
at org.apache.pig.builtin.COR$Intermed.exec(COR.java:171)
 at org.apache.pig.builtin.COR$Intermed.exec(COR.java:164)
Caused by: java.lang.OutOfMemoryError: Java heap space
... 13 more
================================================================================
Error message from task (map) task_201302211102_0561_m_000000
-------------------------------------------------------------
ERROR 6016: Out of memory.

org.apache.pig.backend.executionengine.ExecException: ERROR 6016: Out of
memory.
at java.util.ArrayList.<init>(ArrayList.java:112)
 at org.apache.pig.data.DefaultTuple.<init>(DefaultTuple.java:67)
at org.apache.pig.data.BinSedesTuple.<init>(BinSedesTuple.java:67)
 at
org.apache.pig.data.BinSedesTupleFactory.newTuple(BinSedesTupleFactory.java:38)
at
org.apache.pig.data.utils.SedesHelper.readGenericTuple(SedesHelper.java:142)
 at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:344)
at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:313)
 at
org.apache.pig.data.utils.SedesHelper.readGenericTuple(SedesHelper.java:144)
at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:344)
 at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:313)
at
org.apache.pig.data.InternalCachedBag$CachedBagIterator.hasNext(InternalCachedBag.java:208)
 at org.apache.pig.builtin.COR.combine(COR.java:258)
at org.apache.pig.builtin.COR$Intermed.exec(COR.java:171)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
... 13 more
================================================================================
Error message from task (map) task_201302211102_0561_m_000000
-------------------------------------------------------------
ERROR 6016: Out of memory.

org.apache.pig.backend.executionengine.ExecException: ERROR 6016: Out of
memory.
 at
org.apache.pig.data.BinSedesTupleFactory.newTuple(BinSedesTupleFactory.java:38)
at
org.apache.pig.data.utils.SedesHelper.readGenericTuple(SedesHelper.java:142)
 at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:344)
at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:313)
 at
org.apache.pig.data.utils.SedesHelper.readGenericTuple(SedesHelper.java:144)
at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:344)
 at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:313)
at
org.apache.pig.data.InternalCachedBag$CachedBagIterator.hasNext(InternalCachedBag.java:208)
 at org.apache.pig.builtin.COR.combine(COR.java:258)
at org.apache.pig.builtin.COR$Intermed.exec(COR.java:171)
 at org.apache.pig.builtin.COR$Intermed.exec(COR.java:164)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:337)
Caused by: java.lang.OutOfMemoryError: Java heap space
... 12 more
================================================================================
Pig Stack Trace
---------------
ERROR 6016: Out of memory.

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
open iterator for alias D. Backend error : Out of memory.
 at org.apache.pig.PigServer.openIterator(PigServer.java:826)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:696)
 at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:320)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
 at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
 at org.apache.pig.Main.run(Main.java:538)
at org.apache.pig.Main.main(Main.java:157)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
6016: Out of memory.
at java.lang.Double.valueOf(Double.java:492)
at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:390)
 at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:313)
at
org.apache.pig.data.utils.SedesHelper.readGenericTuple(SedesHelper.java:144)
 at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:344)
at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:313)
 at
org.apache.pig.data.utils.SedesHelper.readGenericTuple(SedesHelper.java:144)
at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:344)
 at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:313)
at
org.apache.pig.data.InternalCachedBag$CachedBagIterator.hasNext(InternalCachedBag.java:208)
 at org.apache.pig.builtin.COR.combine(COR.java:258)
at org.apache.pig.builtin.COR$Intermed.exec(COR.java:171)
 at org.apache.pig.builtin.COR$Intermed.exec(COR.java:164)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
... 13 more
================================================================================



"GC overhead limit exceeded" means too much percentage of the time is spent
on GC, and too less percentage is recovered. This feature is designed to
prevent applications from running an extended period of time while making
little or no progress because the heap is too small.

I tried to disable this in Java by "export
PIG_OPTS=-D-XX:-UseGCOverheadLimit" to avoid "GC overhead limit exceeded".
It getting better, but still fail in the end and still can see it got
thrown at one place. I will see if I can profile the memory usage. No clue
so far.

Johnny




On Thu, Feb 21, 2013 at 11:39 AM, Houssam H. <[email protected]> wrote:

> Hi,
>
> I have a file with a few hundreds of columns with doubles and I am
> interested in creating a correlation matrix for the columns:
>
> A = load 'myData' using PigStorage(':');
> B = group A all;
> D = foreach B generate group,COR(A.$0,A.$1,A.$2);
>
> For N parameters, the COR function will generate N(N-1)/2 correlations.
> This is fine as long as N is less than 100: COR(A.$0,A.$1, .... A.$100);
> However once N is more than 100 or 200 I have an out of memory error (of
> course this would depend on the amount of RAM you have):
>
> 883 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR
> 6016: Out of memory.
> 893 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map
> reduce job(s) failed!
>
> My file is less than 50Mb, so pig is running all the time with only one
> mapper.
>
> This behavior was the same whether I run the script locally (pig -x
> local) or on Amazon ElasticMapReduce with multiple instances assigned
> to the job.
>
> Is there a solution to be able to run the correlation function for a
> big number of parameters?
>
> Thank you in advance!
>
> -Houssam
>

Reply via email to