Hi Daniel, Thanks for the reply. I did try that and ran into this issue again when I increased the number of operators. I found out, with hprof, that most sites with high memory usage are schema related. Is that a bug in schema implementation? Are schema related data-structures expected to consume so much memory?
~Shubham. On Wed, Jun 15, 2011 at 2:32 PM, Daniel Dai <[email protected]> wrote: > Try to increase heap size. If you are running through bin/pig, set > PIG_HEAPSIZE (in MB, default is 1000). You can use "pig -secretDebugCmd" > option to see what the command line looks like. > > Daniel > > > On 06/15/2011 10:09 AM, Shubham Chopra wrote: > >> Hi, >> >> I am using Pig for number crunching on data that has a large number of >> columns (~300 or so). The script has around 25 operators and all I am >> doing >> in the script is group bys and SUMs. The script fails with the following >> exception: >> <code> >> Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit >> exceeded >> at java.util.HashMap.<init>(HashMap.java:209) >> at >> >> org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:190) >> at >> >> org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.clone(Schema.java:450) >> at >> org.apache.pig.impl.logicalLayer.schema.Schema.clone(Schema.java:1005) >> at >> >> org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.clone(Schema.java:450) >> at >> >> org.apache.pig.impl.logicalLayer.ExpressionOperator.clone(ExpressionOperator.java:144) >> at >> org.apache.pig.impl.logicalLayer.LOProject.clone(LOProject.java:447) >> at >> org.apache.pig.impl.logicalLayer.LogicalPlan.clone(LogicalPlan.java:116) >> at >> >> org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.<init>(LogicalPlanCloneHelper.java:63) >> at >> >> org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(LogicalPlanCloner.java:45) >> at >> >> org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:3504) >> at >> >> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1464) >> at >> >> org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:1013) >> at >> >> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:800) >> at >> >> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) >> at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1601) >> at org.apache.pig.PigServer$Graph.clone(PigServer.java:1645) >> at org.apache.pig.PigServer.getClonedGraph(PigServer.java:527) >> at org.apache.pig.PigServer.storeEx(PigServer.java:850) >> at org.apache.pig.PigServer.store(PigServer.java:816) >> at org.apache.pig.PigServer.store(PigServer.java:784) >> </code> >> The complete output I see is the following: >> <code> >> $run-script >> 11/06/15 09:19:27 INFO executionengine.HExecutionEngine: Connecting to >> hadoop file system at: hdfs://abcd:9000 >> 11/06/15 09:19:28 INFO executionengine.HExecutionEngine: Connecting to >> map-reduce job tracker at: abcd:9001 >> >> Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit >> exceeded >> at java.util.HashMap.<init>(HashMap.java:209) >> at >> >> org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:190) >> at >> >> org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.clone(Schema.java:450) >> at >> org.apache.pig.impl.logicalLayer.schema.Schema.clone(Schema.java:1005) >> at >> >> org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.clone(Schema.java:450) >> at >> >> org.apache.pig.impl.logicalLayer.ExpressionOperator.clone(ExpressionOperator.java:144) >> at >> org.apache.pig.impl.logicalLayer.LOProject.clone(LOProject.java:447) >> at >> org.apache.pig.impl.logicalLayer.LogicalPlan.clone(LogicalPlan.java:116) >> at >> >> org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.<init>(LogicalPlanCloneHelper.java:63) >> at >> >> org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(LogicalPlanCloner.java:45) >> at >> >> org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:3504) >> at >> >> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1464) >> at >> >> org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:1013) >> at >> >> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:800) >> at >> >> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) >> at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1601) >> at org.apache.pig.PigServer$Graph.clone(PigServer.java:1645) >> at org.apache.pig.PigServer.getClonedGraph(PigServer.java:527) >> at org.apache.pig.PigServer.storeEx(PigServer.java:850) >> at org.apache.pig.PigServer.store(PigServer.java:816) >> at org.apache.pig.PigServer.store(PigServer.java:784) >> </code> >> The process uses around 1.2 gigs of ram before crapping out with the >> exception above. Has anyone else faced a similar situation? Any way out of >> this? >> >> Thanks, >> Shubham. >> > >
