Hi Daniel,

Thanks for the reply. I did try that and ran into this issue again when I
increased the number of operators. I found out, with hprof, that most sites
with high memory usage are schema related. Is that a bug in schema
implementation? Are schema related data-structures expected to consume so
much memory?

~Shubham.

On Wed, Jun 15, 2011 at 2:32 PM, Daniel Dai <[email protected]> wrote:

> Try to increase heap size. If you are running through bin/pig, set
> PIG_HEAPSIZE (in MB, default is 1000). You can use "pig -secretDebugCmd"
> option to see what the command line looks like.
>
> Daniel
>
>
> On 06/15/2011 10:09 AM, Shubham Chopra wrote:
>
>> Hi,
>>
>> I am using Pig for number crunching on data that has a large number of
>> columns (~300 or so). The script has around 25 operators and all I am
>> doing
>> in the script is group bys and SUMs. The script fails with the following
>> exception:
>> <code>
>> Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit
>> exceeded
>>         at java.util.HashMap.<init>(HashMap.java:209)
>>         at
>>
>> org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:190)
>>         at
>>
>> org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.clone(Schema.java:450)
>>         at
>> org.apache.pig.impl.logicalLayer.schema.Schema.clone(Schema.java:1005)
>>         at
>>
>> org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.clone(Schema.java:450)
>>         at
>>
>> org.apache.pig.impl.logicalLayer.ExpressionOperator.clone(ExpressionOperator.java:144)
>>         at
>> org.apache.pig.impl.logicalLayer.LOProject.clone(LOProject.java:447)
>>         at
>> org.apache.pig.impl.logicalLayer.LogicalPlan.clone(LogicalPlan.java:116)
>>         at
>>
>> org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.<init>(LogicalPlanCloneHelper.java:63)
>>         at
>>
>> org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(LogicalPlanCloner.java:45)
>>         at
>>
>> org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:3504)
>>         at
>>
>> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1464)
>>         at
>>
>> org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:1013)
>>         at
>>
>> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:800)
>>         at
>>
>> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
>>         at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1601)
>>         at org.apache.pig.PigServer$Graph.clone(PigServer.java:1645)
>>         at org.apache.pig.PigServer.getClonedGraph(PigServer.java:527)
>>         at org.apache.pig.PigServer.storeEx(PigServer.java:850)
>>         at org.apache.pig.PigServer.store(PigServer.java:816)
>>         at org.apache.pig.PigServer.store(PigServer.java:784)
>> </code>
>> The complete output I see is the following:
>> <code>
>> $run-script
>> 11/06/15 09:19:27 INFO executionengine.HExecutionEngine: Connecting to
>> hadoop file system at: hdfs://abcd:9000
>> 11/06/15 09:19:28 INFO executionengine.HExecutionEngine: Connecting to
>> map-reduce job tracker at: abcd:9001
>>
>> Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit
>> exceeded
>>         at java.util.HashMap.<init>(HashMap.java:209)
>>         at
>>
>> org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:190)
>>         at
>>
>> org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.clone(Schema.java:450)
>>         at
>> org.apache.pig.impl.logicalLayer.schema.Schema.clone(Schema.java:1005)
>>         at
>>
>> org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.clone(Schema.java:450)
>>         at
>>
>> org.apache.pig.impl.logicalLayer.ExpressionOperator.clone(ExpressionOperator.java:144)
>>         at
>> org.apache.pig.impl.logicalLayer.LOProject.clone(LOProject.java:447)
>>         at
>> org.apache.pig.impl.logicalLayer.LogicalPlan.clone(LogicalPlan.java:116)
>>         at
>>
>> org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.<init>(LogicalPlanCloneHelper.java:63)
>>         at
>>
>> org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(LogicalPlanCloner.java:45)
>>         at
>>
>> org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:3504)
>>         at
>>
>> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1464)
>>         at
>>
>> org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:1013)
>>         at
>>
>> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:800)
>>         at
>>
>> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
>>         at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1601)
>>         at org.apache.pig.PigServer$Graph.clone(PigServer.java:1645)
>>         at org.apache.pig.PigServer.getClonedGraph(PigServer.java:527)
>>         at org.apache.pig.PigServer.storeEx(PigServer.java:850)
>>         at org.apache.pig.PigServer.store(PigServer.java:816)
>>         at org.apache.pig.PigServer.store(PigServer.java:784)
>> </code>
>> The process uses around 1.2 gigs of ram before crapping out with the
>> exception above. Has anyone else faced a similar situation? Any way out of
>> this?
>>
>> Thanks,
>> Shubham.
>>
>
>

Reply via email to