Fortunately this is part of old logical plan and totally abandoned in Pig 0.9.

Daniel

On 06/15/2011 03:40 PM, Shubham Chopra wrote:
This was with 0.8.1. The following is a part of the output of hprof:

          percent          live          alloc'ed  stack class
 rank   self  accum     bytes objs     bytes  objs trace name
1 4.99% 4.99% 33572480 419656 100460640 1255758 307176 java.util.HashMap$Entry[] 2 4.92% 9.91% 33120000 414000 66240000 828000 308200 java.util.HashMap$Entry[] 3 4.92% 14.83% 33114400 413930 99186000 1239825 308147 java.util.HashMap$Entry[] 4 4.91% 19.74% 33009600 412620 66019200 825240 308202 java.util.HashMap$Entry[] 5 4.69% 24.43% 31554048 986064 2720274496 85008578 309807 org.apache.pig.newplan.logical.relational.LogicalSchema$LogicalFieldSchema 6 3.95% 28.39% 26604720 1108530 2104577160 87690715 309427 java.util.HashMap$Entry 7 3.74% 32.13% 25187136 1049464 52832592 2201358 307177 java.util.HashMap$Entry
    8  3.65% 35.78%  24577328 852060 127045504 4220197 306673 char[]
9 3.56% 39.35% 23979536 428206 71194592 1271332 306680 java.lang.Object[] 10 3.45% 42.80% 23214264 967261 240166320 10006930 300253 java.util.ArrayList 11 3.45% 46.24% 23179632 413922 69429528 1239813 308150 java.lang.Object[] 12 3.21% 49.45% 21583080 899295 78141408 3255892 306516 java.util.HashMap$Entry 13 3.04% 52.49% 20452128 852172 116126520 4838605 306674 java.lang.String 14 2.58% 55.07% 17385728 1086608 1359995520 84999720 309812 org.apache.pig.impl.util.Pair 15 2.50% 57.57% 16786640 419666 50231120 1255778 307175 java.util.HashMap 16 2.50% 60.06% 16786240 419656 50230320 1255758 307172 java.util.HashMap 17 2.49% 62.55% 16732960 418324 58320080 1458002 307121 java.util.HashMap 18 2.46% 65.01% 16560000 414000 33120000 828000 308201 java.util.HashMap 19 1.96% 66.98% 13209056 412783 52836224 1651132 309652 org.apache.pig.newplan.logical.relational.LogicalSchema$LogicalFieldSchema 20 1.96% 68.94% 13207872 412746 39617568 1238049 308146 org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema 21 1.96% 70.90% 13203840 412620 26407680 825240 308193 org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema 22 1.49% 72.39% 10010592 625662 785665632 49104102 309774 java.lang.Integer 23 1.48% 73.87% 9936648 414027 19872648 828027 308203 java.util.HashMap$Entry 24 1.48% 75.34% 9934128 413922 29755512 1239813 308149 java.util.HashMap$Entry 25 1.47% 76.82% 9907872 412828 19811160 825465 307947 java.util.ArrayList 26 1.47% 78.29% 9907872 412828 19811160 825465 307948 java.util.HashMap$Entry 27 1.47% 79.76% 9902880 412620 19805760 825240 308206 java.util.HashMap$Entry
   28  1.17% 80.93%   7885080   24   7885080    24 311867 char[]
   29  1.16% 82.09%   7776216    2   9749800     6 313313 char[]
30 1.13% 83.22% 7595616 3686 1054669088 926722 309644 java.util.HashMap$Entry[] 31 0.98% 84.20% 6622880 413930 19837200 1239825 308148 org.apache.pig.impl.util.MultiMap 32 0.98% 85.18% 6601920 412620 13203840 825240 308199 org.apache.pig.impl.util.MultiMap 33 0.68% 85.86% 4584576 568 13732448 5942 308075 java.util.HashMap$Entry[] 34 0.44% 86.30% 2956656 123194 234198048 9758252 309678 java.lang.String 35 0.44% 86.74% 2956656 123194 234198048 9758252 309680 java.util.HashMap$Entry 36 0.44% 87.18% 2929776 1535 17382848 22404 306766 java.util.HashMap$Entry[]
   37  0.43% 87.61%   2895536 22872  13513088 107528 311646 char[]
38 0.42% 88.03% 2856864 1386 16783440 20733 308151 java.util.HashMap$Entry[] 39 0.42% 88.46% 2848496 1386 11172656 13806 308208 java.util.HashMap$Entry[] 40 0.42% 88.88% 2848320 1380 11172480 13800 308207 java.util.HashMap$Entry[]

All the entries related to java.util.HashMap$Entry and java.util.HashMap are traced to schema related function calls like some of the following:
TRACE 307171:
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:156) org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.copyAndLink(Schema.java:242) org.apache.pig.impl.logicalLayer.LOForEach.getSchema(LOForEach.java:245) org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:851)
TRACE 307172:
        java.util.AbstractMap.<init>(AbstractMap.java:56)
        java.util.HashMap.<init>(HashMap.java:206)
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:161) org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.copyAndLink(Schema.java:242)
TRACE 307173:
        java.util.HashMap.<init>(HashMap.java:209)
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:161) org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.copyAndLink(Schema.java:242) org.apache.pig.impl.logicalLayer.LOForEach.getSchema(LOForEach.java:245)
TRACE 307174:
        org.apache.pig.impl.util.MultiMap.<init>(MultiMap.java:46)
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:162) org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.copyAndLink(Schema.java:242) org.apache.pig.impl.logicalLayer.LOForEach.getSchema(LOForEach.java:245)
TRACE 307175:
        java.util.AbstractMap.<init>(AbstractMap.java:56)
        java.util.HashMap.<init>(HashMap.java:206)
        org.apache.pig.impl.util.MultiMap.<init>(MultiMap.java:47)
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:162)
TRACE 307176:
        java.util.HashMap.<init>(HashMap.java:209)
        org.apache.pig.impl.util.MultiMap.<init>(MultiMap.java:47)
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:162) org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.copyAndLink(Schema.java:242)
TRACE 307177:
        java.util.HashMap$Entry.<init>(HashMap.java:683)
        java.util.HashMap.addEntry(HashMap.java:753)
        java.util.HashMap.put(HashMap.java:385)
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.setParent(Schema.java:251)


~Shubham.

On Wed, Jun 15, 2011 at 6:17 PM, Daniel Dai <[email protected] <mailto:[email protected]>> wrote:

    That would be surprised. Which version of Pig are you using?

    Daniel


    On 06/15/2011 03:10 PM, Shubham Chopra wrote:
    Hi Daniel,

    Thanks for the reply. I did try that and ran into this issue
    again when I increased the number of operators. I found out, with
    hprof, that most sites with high memory usage are schema related.
    Is that a bug in schema implementation? Are schema related
    data-structures expected to consume so much memory?

    ~Shubham.

    On Wed, Jun 15, 2011 at 2:32 PM, Daniel Dai
    <[email protected] <mailto:[email protected]>> wrote:

        Try to increase heap size. If you are running through
        bin/pig, set PIG_HEAPSIZE (in MB, default is 1000). You can
        use "pig -secretDebugCmd" option to see what the command line
        looks like.

        Daniel


        On 06/15/2011 10:09 AM, Shubham Chopra wrote:

            Hi,

            I am using Pig for number crunching on data that has a
            large number of
            columns (~300 or so). The script has around 25 operators
            and all I am doing
            in the script is group bys and SUMs. The script fails
            with the following
            exception:
            <code>
            Exception in thread "main" java.lang.OutOfMemoryError: GC
            overhead limit
            exceeded
                    at java.util.HashMap.<init>(HashMap.java:209)
                    at
            
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:190)
                    at
            
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.clone(Schema.java:450)
                    at
            
org.apache.pig.impl.logicalLayer.schema.Schema.clone(Schema.java:1005)
                    at
            
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.clone(Schema.java:450)
                    at
            
org.apache.pig.impl.logicalLayer.ExpressionOperator.clone(ExpressionOperator.java:144)
                    at
            org.apache.pig.impl.logicalLayer.LOProject.clone(LOProject.java:447)
                    at
            
org.apache.pig.impl.logicalLayer.LogicalPlan.clone(LogicalPlan.java:116)
                    at
            
org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.<init>(LogicalPlanCloneHelper.java:63)
                    at
            
org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(LogicalPlanCloner.java:45)
                    at
            
org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:3504)
                    at
            
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1464)
                    at
            
org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:1013)
                    at
            
org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:800)
                    at
            
org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
                    at
            org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1601)
                    at
            org.apache.pig.PigServer$Graph.clone(PigServer.java:1645)
                    at
            org.apache.pig.PigServer.getClonedGraph(PigServer.java:527)
                    at
            org.apache.pig.PigServer.storeEx(PigServer.java:850)
                    at org.apache.pig.PigServer.store(PigServer.java:816)
                    at org.apache.pig.PigServer.store(PigServer.java:784)
            </code>
            The complete output I see is the following:
            <code>
            $run-script
            11/06/15 09:19:27 INFO executionengine.HExecutionEngine:
            Connecting to
            hadoop file system at: hdfs://abcd:9000
            11/06/15 09:19:28 INFO executionengine.HExecutionEngine:
            Connecting to
            map-reduce job tracker at: abcd:9001

            Exception in thread "main" java.lang.OutOfMemoryError: GC
            overhead limit
            exceeded
                    at java.util.HashMap.<init>(HashMap.java:209)
                    at
            
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:190)
                    at
            
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.clone(Schema.java:450)
                    at
            
org.apache.pig.impl.logicalLayer.schema.Schema.clone(Schema.java:1005)
                    at
            
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.clone(Schema.java:450)
                    at
            
org.apache.pig.impl.logicalLayer.ExpressionOperator.clone(ExpressionOperator.java:144)
                    at
            org.apache.pig.impl.logicalLayer.LOProject.clone(LOProject.java:447)
                    at
            
org.apache.pig.impl.logicalLayer.LogicalPlan.clone(LogicalPlan.java:116)
                    at
            
org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.<init>(LogicalPlanCloneHelper.java:63)
                    at
            
org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(LogicalPlanCloner.java:45)
                    at
            
org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:3504)
                    at
            
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1464)
                    at
            
org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:1013)
                    at
            
org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:800)
                    at
            
org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
                    at
            org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1601)
                    at
            org.apache.pig.PigServer$Graph.clone(PigServer.java:1645)
                    at
            org.apache.pig.PigServer.getClonedGraph(PigServer.java:527)
                    at
            org.apache.pig.PigServer.storeEx(PigServer.java:850)
                    at org.apache.pig.PigServer.store(PigServer.java:816)
                    at org.apache.pig.PigServer.store(PigServer.java:784)
            </code>
            The process uses around 1.2 gigs of ram before crapping
            out with the
            exception above. Has anyone else faced a similar
            situation? Any way out of
            this?

            Thanks,
            Shubham.






Reply via email to