Correction, Schema and FieldSchema is removed in 0.9, LogicalSchema and LogicalFieldSchema is still there. But this cut half of the memory usage for schema.

Daniel

On 06/15/2011 03:52 PM, Daniel Dai wrote:
Fortunately this is part of old logical plan and totally abandoned in
Pig 0.9.

Daniel

On 06/15/2011 03:40 PM, Shubham Chopra wrote:
This was with 0.8.1. The following is a part of the output of hprof:

           percent          live          alloc'ed  stack class
  rank   self  accum     bytes objs     bytes  objs trace name
     1  4.99%  4.99%  33572480 419656 100460640 1255758 307176
java.util.HashMap$Entry[]
     2  4.92%  9.91%  33120000 414000  66240000 828000 308200
java.util.HashMap$Entry[]
     3  4.92% 14.83%  33114400 413930  99186000 1239825 308147
java.util.HashMap$Entry[]
     4  4.91% 19.74%  33009600 412620  66019200 825240 308202
java.util.HashMap$Entry[]
     5  4.69% 24.43%  31554048 986064 2720274496 85008578 309807
org.apache.pig.newplan.logical.relational.LogicalSchema$LogicalFieldSchema
     6  3.95% 28.39%  26604720 1108530 2104577160 87690715 309427
java.util.HashMap$Entry
     7  3.74% 32.13%  25187136 1049464  52832592 2201358 307177
java.util.HashMap$Entry
     8  3.65% 35.78%  24577328 852060 127045504 4220197 306673 char[]
     9  3.56% 39.35%  23979536 428206  71194592 1271332 306680
java.lang.Object[]
    10  3.45% 42.80%  23214264 967261 240166320 10006930 300253
java.util.ArrayList
    11  3.45% 46.24%  23179632 413922  69429528 1239813 308150
java.lang.Object[]
    12  3.21% 49.45%  21583080 899295  78141408 3255892 306516
java.util.HashMap$Entry
    13  3.04% 52.49%  20452128 852172 116126520 4838605 306674
java.lang.String
    14  2.58% 55.07%  17385728 1086608 1359995520 84999720 309812
org.apache.pig.impl.util.Pair
    15  2.50% 57.57%  16786640 419666  50231120 1255778 307175
java.util.HashMap
    16  2.50% 60.06%  16786240 419656  50230320 1255758 307172
java.util.HashMap
    17  2.49% 62.55%  16732960 418324  58320080 1458002 307121
java.util.HashMap
    18  2.46% 65.01%  16560000 414000  33120000 828000 308201
java.util.HashMap
    19  1.96% 66.98%  13209056 412783  52836224 1651132 309652
org.apache.pig.newplan.logical.relational.LogicalSchema$LogicalFieldSchema
    20  1.96% 68.94%  13207872 412746  39617568 1238049 308146
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema
    21  1.96% 70.90%  13203840 412620  26407680 825240 308193
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema
    22  1.49% 72.39%  10010592 625662 785665632 49104102 309774
java.lang.Integer
    23  1.48% 73.87%   9936648 414027  19872648 828027 308203
java.util.HashMap$Entry
    24  1.48% 75.34%   9934128 413922  29755512 1239813 308149
java.util.HashMap$Entry
    25  1.47% 76.82%   9907872 412828  19811160 825465 307947
java.util.ArrayList
    26  1.47% 78.29%   9907872 412828  19811160 825465 307948
java.util.HashMap$Entry
    27  1.47% 79.76%   9902880 412620  19805760 825240 308206
java.util.HashMap$Entry
    28  1.17% 80.93%   7885080   24   7885080    24 311867 char[]
    29  1.16% 82.09%   7776216    2   9749800     6 313313 char[]
    30  1.13% 83.22%   7595616 3686 1054669088 926722 309644
java.util.HashMap$Entry[]
    31  0.98% 84.20%   6622880 413930  19837200 1239825 308148
org.apache.pig.impl.util.MultiMap
    32  0.98% 85.18%   6601920 412620  13203840 825240 308199
org.apache.pig.impl.util.MultiMap
    33  0.68% 85.86%   4584576  568  13732448  5942 308075
java.util.HashMap$Entry[]
    34  0.44% 86.30%   2956656 123194 234198048 9758252 309678
java.lang.String
    35  0.44% 86.74%   2956656 123194 234198048 9758252 309680
java.util.HashMap$Entry
    36  0.44% 87.18%   2929776 1535  17382848 22404 306766
java.util.HashMap$Entry[]
    37  0.43% 87.61%   2895536 22872  13513088 107528 311646 char[]
    38  0.42% 88.03%   2856864 1386  16783440 20733 308151
java.util.HashMap$Entry[]
    39  0.42% 88.46%   2848496 1386  11172656 13806 308208
java.util.HashMap$Entry[]
    40  0.42% 88.88%   2848320 1380  11172480 13800 308207
java.util.HashMap$Entry[]

All the entries related to java.util.HashMap$Entry and
java.util.HashMap are traced to schema related function calls like
some of the following:
TRACE 307171:

org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:156)

org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.copyAndLink(Schema.java:242)

org.apache.pig.impl.logicalLayer.LOForEach.getSchema(LOForEach.java:245)

org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:851)
TRACE 307172:
         java.util.AbstractMap.<init>(AbstractMap.java:56)
         java.util.HashMap.<init>(HashMap.java:206)

org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:161)

org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.copyAndLink(Schema.java:242)
TRACE 307173:
         java.util.HashMap.<init>(HashMap.java:209)

org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:161)

org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.copyAndLink(Schema.java:242)

org.apache.pig.impl.logicalLayer.LOForEach.getSchema(LOForEach.java:245)
TRACE 307174:
         org.apache.pig.impl.util.MultiMap.<init>(MultiMap.java:46)

org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:162)

org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.copyAndLink(Schema.java:242)

org.apache.pig.impl.logicalLayer.LOForEach.getSchema(LOForEach.java:245)
TRACE 307175:
         java.util.AbstractMap.<init>(AbstractMap.java:56)
         java.util.HashMap.<init>(HashMap.java:206)
         org.apache.pig.impl.util.MultiMap.<init>(MultiMap.java:47)

org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:162)
TRACE 307176:
         java.util.HashMap.<init>(HashMap.java:209)
         org.apache.pig.impl.util.MultiMap.<init>(MultiMap.java:47)

org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:162)

org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.copyAndLink(Schema.java:242)
TRACE 307177:
         java.util.HashMap$Entry.<init>(HashMap.java:683)
         java.util.HashMap.addEntry(HashMap.java:753)
         java.util.HashMap.put(HashMap.java:385)

org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.setParent(Schema.java:251)


~Shubham.

On Wed, Jun 15, 2011 at 6:17 PM, Daniel Dai<[email protected]
<mailto:[email protected]>>  wrote:

     That would be surprised. Which version of Pig are you using?

     Daniel


     On 06/15/2011 03:10 PM, Shubham Chopra wrote:
     Hi Daniel,

     Thanks for the reply. I did try that and ran into this issue
     again when I increased the number of operators. I found out, with
     hprof, that most sites with high memory usage are schema related.
     Is that a bug in schema implementation? Are schema related
     data-structures expected to consume so much memory?

     ~Shubham.

     On Wed, Jun 15, 2011 at 2:32 PM, Daniel Dai
     <[email protected]<mailto:[email protected]>>  wrote:

         Try to increase heap size. If you are running through
         bin/pig, set PIG_HEAPSIZE (in MB, default is 1000). You can
         use "pig -secretDebugCmd" option to see what the command line
         looks like.

         Daniel


         On 06/15/2011 10:09 AM, Shubham Chopra wrote:

             Hi,

             I am using Pig for number crunching on data that has a
             large number of
             columns (~300 or so). The script has around 25 operators
             and all I am doing
             in the script is group bys and SUMs. The script fails
             with the following
             exception:
             <code>
             Exception in thread "main" java.lang.OutOfMemoryError: GC
             overhead limit
             exceeded
                     at java.util.HashMap.<init>(HashMap.java:209)
                     at
             
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:190)
                     at
             
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.clone(Schema.java:450)
                     at
             
org.apache.pig.impl.logicalLayer.schema.Schema.clone(Schema.java:1005)
                     at
             
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.clone(Schema.java:450)
                     at
             
org.apache.pig.impl.logicalLayer.ExpressionOperator.clone(ExpressionOperator.java:144)
                     at
             
org.apache.pig.impl.logicalLayer.LOProject.clone(LOProject.java:447)
                     at
             
org.apache.pig.impl.logicalLayer.LogicalPlan.clone(LogicalPlan.java:116)
                     at
             
org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.<init>(LogicalPlanCloneHelper.java:63)
                     at
             
org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(LogicalPlanCloner.java:45)
                     at
             
org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:3504)
                     at
             
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1464)
                     at
             
org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:1013)
                     at
             
org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:800)
                     at
             
org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
                     at
             org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1601)
                     at
             org.apache.pig.PigServer$Graph.clone(PigServer.java:1645)
                     at
             org.apache.pig.PigServer.getClonedGraph(PigServer.java:527)
                     at
             org.apache.pig.PigServer.storeEx(PigServer.java:850)
                     at org.apache.pig.PigServer.store(PigServer.java:816)
                     at org.apache.pig.PigServer.store(PigServer.java:784)
             </code>
             The complete output I see is the following:
             <code>
             $run-script
             11/06/15 09:19:27 INFO executionengine.HExecutionEngine:
             Connecting to
             hadoop file system at: hdfs://abcd:9000
             11/06/15 09:19:28 INFO executionengine.HExecutionEngine:
             Connecting to
             map-reduce job tracker at: abcd:9001

             Exception in thread "main" java.lang.OutOfMemoryError: GC
             overhead limit
             exceeded
                     at java.util.HashMap.<init>(HashMap.java:209)
                     at
             
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:190)
                     at
             
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.clone(Schema.java:450)
                     at
             
org.apache.pig.impl.logicalLayer.schema.Schema.clone(Schema.java:1005)
                     at
             
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.clone(Schema.java:450)
                     at
             
org.apache.pig.impl.logicalLayer.ExpressionOperator.clone(ExpressionOperator.java:144)
                     at
             
org.apache.pig.impl.logicalLayer.LOProject.clone(LOProject.java:447)
                     at
             
org.apache.pig.impl.logicalLayer.LogicalPlan.clone(LogicalPlan.java:116)
                     at
             
org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.<init>(LogicalPlanCloneHelper.java:63)
                     at
             
org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(LogicalPlanCloner.java:45)
                     at
             
org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:3504)
                     at
             
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1464)
                     at
             
org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:1013)
                     at
             
org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:800)
                     at
             
org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
                     at
             org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1601)
                     at
             org.apache.pig.PigServer$Graph.clone(PigServer.java:1645)
                     at
             org.apache.pig.PigServer.getClonedGraph(PigServer.java:527)
                     at
             org.apache.pig.PigServer.storeEx(PigServer.java:850)
                     at org.apache.pig.PigServer.store(PigServer.java:816)
                     at org.apache.pig.PigServer.store(PigServer.java:784)
             </code>
             The process uses around 1.2 gigs of ram before crapping
             out with the
             exception above. Has anyone else faced a similar
             situation? Any way out of
             this?

             Thanks,
             Shubham.





Reply via email to