That would be surprised. Which version of Pig are you using?

Daniel

On 06/15/2011 03:10 PM, Shubham Chopra wrote:
Hi Daniel,

Thanks for the reply. I did try that and ran into this issue again when I increased the number of operators. I found out, with hprof, that most sites with high memory usage are schema related. Is that a bug in schema implementation? Are schema related data-structures expected to consume so much memory?

~Shubham.

On Wed, Jun 15, 2011 at 2:32 PM, Daniel Dai <[email protected] <mailto:[email protected]>> wrote:

    Try to increase heap size. If you are running through bin/pig, set
    PIG_HEAPSIZE (in MB, default is 1000). You can use "pig
    -secretDebugCmd" option to see what the command line looks like.

    Daniel


    On 06/15/2011 10:09 AM, Shubham Chopra wrote:

        Hi,

        I am using Pig for number crunching on data that has a large
        number of
        columns (~300 or so). The script has around 25 operators and
        all I am doing
        in the script is group bys and SUMs. The script fails with the
        following
        exception:
        <code>
        Exception in thread "main" java.lang.OutOfMemoryError: GC
        overhead limit
        exceeded
                at java.util.HashMap.<init>(HashMap.java:209)
                at
        
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:190)
                at
        
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.clone(Schema.java:450)
                at
        org.apache.pig.impl.logicalLayer.schema.Schema.clone(Schema.java:1005)
                at
        
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.clone(Schema.java:450)
                at
        
org.apache.pig.impl.logicalLayer.ExpressionOperator.clone(ExpressionOperator.java:144)
                at
        org.apache.pig.impl.logicalLayer.LOProject.clone(LOProject.java:447)
                at
        org.apache.pig.impl.logicalLayer.LogicalPlan.clone(LogicalPlan.java:116)
                at
        
org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.<init>(LogicalPlanCloneHelper.java:63)
                at
        
org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(LogicalPlanCloner.java:45)
                at
        
org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:3504)
                at
        
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1464)
                at
        
org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:1013)
                at
        
org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:800)
                at
        
org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
                at
        org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1601)
                at
        org.apache.pig.PigServer$Graph.clone(PigServer.java:1645)
                at
        org.apache.pig.PigServer.getClonedGraph(PigServer.java:527)
                at org.apache.pig.PigServer.storeEx(PigServer.java:850)
                at org.apache.pig.PigServer.store(PigServer.java:816)
                at org.apache.pig.PigServer.store(PigServer.java:784)
        </code>
        The complete output I see is the following:
        <code>
        $run-script
        11/06/15 09:19:27 INFO executionengine.HExecutionEngine:
        Connecting to
        hadoop file system at: hdfs://abcd:9000
        11/06/15 09:19:28 INFO executionengine.HExecutionEngine:
        Connecting to
        map-reduce job tracker at: abcd:9001

        Exception in thread "main" java.lang.OutOfMemoryError: GC
        overhead limit
        exceeded
                at java.util.HashMap.<init>(HashMap.java:209)
                at
        
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:190)
                at
        
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.clone(Schema.java:450)
                at
        org.apache.pig.impl.logicalLayer.schema.Schema.clone(Schema.java:1005)
                at
        
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.clone(Schema.java:450)
                at
        
org.apache.pig.impl.logicalLayer.ExpressionOperator.clone(ExpressionOperator.java:144)
                at
        org.apache.pig.impl.logicalLayer.LOProject.clone(LOProject.java:447)
                at
        org.apache.pig.impl.logicalLayer.LogicalPlan.clone(LogicalPlan.java:116)
                at
        
org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.<init>(LogicalPlanCloneHelper.java:63)
                at
        
org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(LogicalPlanCloner.java:45)
                at
        
org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:3504)
                at
        
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1464)
                at
        
org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:1013)
                at
        
org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:800)
                at
        
org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
                at
        org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1601)
                at
        org.apache.pig.PigServer$Graph.clone(PigServer.java:1645)
                at
        org.apache.pig.PigServer.getClonedGraph(PigServer.java:527)
                at org.apache.pig.PigServer.storeEx(PigServer.java:850)
                at org.apache.pig.PigServer.store(PigServer.java:816)
                at org.apache.pig.PigServer.store(PigServer.java:784)
        </code>
        The process uses around 1.2 gigs of ram before crapping out
        with the
        exception above. Has anyone else faced a similar situation?
        Any way out of
        this?

        Thanks,
        Shubham.




Reply via email to