Correction, Schema and FieldSchema is removed in 0.9, LogicalSchema and
LogicalFieldSchema is still there. But this cut half of the memory usage
for schema.
Daniel
On 06/15/2011 03:52 PM, Daniel Dai wrote:
Fortunately this is part of old logical plan and totally abandoned in
Pig 0.9.
Daniel
On 06/15/2011 03:40 PM, Shubham Chopra wrote:
This was with 0.8.1. The following is a part of the output of hprof:
percent live alloc'ed stack class
rank self accum bytes objs bytes objs trace name
1 4.99% 4.99% 33572480 419656 100460640 1255758 307176
java.util.HashMap$Entry[]
2 4.92% 9.91% 33120000 414000 66240000 828000 308200
java.util.HashMap$Entry[]
3 4.92% 14.83% 33114400 413930 99186000 1239825 308147
java.util.HashMap$Entry[]
4 4.91% 19.74% 33009600 412620 66019200 825240 308202
java.util.HashMap$Entry[]
5 4.69% 24.43% 31554048 986064 2720274496 85008578 309807
org.apache.pig.newplan.logical.relational.LogicalSchema$LogicalFieldSchema
6 3.95% 28.39% 26604720 1108530 2104577160 87690715 309427
java.util.HashMap$Entry
7 3.74% 32.13% 25187136 1049464 52832592 2201358 307177
java.util.HashMap$Entry
8 3.65% 35.78% 24577328 852060 127045504 4220197 306673 char[]
9 3.56% 39.35% 23979536 428206 71194592 1271332 306680
java.lang.Object[]
10 3.45% 42.80% 23214264 967261 240166320 10006930 300253
java.util.ArrayList
11 3.45% 46.24% 23179632 413922 69429528 1239813 308150
java.lang.Object[]
12 3.21% 49.45% 21583080 899295 78141408 3255892 306516
java.util.HashMap$Entry
13 3.04% 52.49% 20452128 852172 116126520 4838605 306674
java.lang.String
14 2.58% 55.07% 17385728 1086608 1359995520 84999720 309812
org.apache.pig.impl.util.Pair
15 2.50% 57.57% 16786640 419666 50231120 1255778 307175
java.util.HashMap
16 2.50% 60.06% 16786240 419656 50230320 1255758 307172
java.util.HashMap
17 2.49% 62.55% 16732960 418324 58320080 1458002 307121
java.util.HashMap
18 2.46% 65.01% 16560000 414000 33120000 828000 308201
java.util.HashMap
19 1.96% 66.98% 13209056 412783 52836224 1651132 309652
org.apache.pig.newplan.logical.relational.LogicalSchema$LogicalFieldSchema
20 1.96% 68.94% 13207872 412746 39617568 1238049 308146
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema
21 1.96% 70.90% 13203840 412620 26407680 825240 308193
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema
22 1.49% 72.39% 10010592 625662 785665632 49104102 309774
java.lang.Integer
23 1.48% 73.87% 9936648 414027 19872648 828027 308203
java.util.HashMap$Entry
24 1.48% 75.34% 9934128 413922 29755512 1239813 308149
java.util.HashMap$Entry
25 1.47% 76.82% 9907872 412828 19811160 825465 307947
java.util.ArrayList
26 1.47% 78.29% 9907872 412828 19811160 825465 307948
java.util.HashMap$Entry
27 1.47% 79.76% 9902880 412620 19805760 825240 308206
java.util.HashMap$Entry
28 1.17% 80.93% 7885080 24 7885080 24 311867 char[]
29 1.16% 82.09% 7776216 2 9749800 6 313313 char[]
30 1.13% 83.22% 7595616 3686 1054669088 926722 309644
java.util.HashMap$Entry[]
31 0.98% 84.20% 6622880 413930 19837200 1239825 308148
org.apache.pig.impl.util.MultiMap
32 0.98% 85.18% 6601920 412620 13203840 825240 308199
org.apache.pig.impl.util.MultiMap
33 0.68% 85.86% 4584576 568 13732448 5942 308075
java.util.HashMap$Entry[]
34 0.44% 86.30% 2956656 123194 234198048 9758252 309678
java.lang.String
35 0.44% 86.74% 2956656 123194 234198048 9758252 309680
java.util.HashMap$Entry
36 0.44% 87.18% 2929776 1535 17382848 22404 306766
java.util.HashMap$Entry[]
37 0.43% 87.61% 2895536 22872 13513088 107528 311646 char[]
38 0.42% 88.03% 2856864 1386 16783440 20733 308151
java.util.HashMap$Entry[]
39 0.42% 88.46% 2848496 1386 11172656 13806 308208
java.util.HashMap$Entry[]
40 0.42% 88.88% 2848320 1380 11172480 13800 308207
java.util.HashMap$Entry[]
All the entries related to java.util.HashMap$Entry and
java.util.HashMap are traced to schema related function calls like
some of the following:
TRACE 307171:
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:156)
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.copyAndLink(Schema.java:242)
org.apache.pig.impl.logicalLayer.LOForEach.getSchema(LOForEach.java:245)
org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:851)
TRACE 307172:
java.util.AbstractMap.<init>(AbstractMap.java:56)
java.util.HashMap.<init>(HashMap.java:206)
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:161)
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.copyAndLink(Schema.java:242)
TRACE 307173:
java.util.HashMap.<init>(HashMap.java:209)
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:161)
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.copyAndLink(Schema.java:242)
org.apache.pig.impl.logicalLayer.LOForEach.getSchema(LOForEach.java:245)
TRACE 307174:
org.apache.pig.impl.util.MultiMap.<init>(MultiMap.java:46)
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:162)
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.copyAndLink(Schema.java:242)
org.apache.pig.impl.logicalLayer.LOForEach.getSchema(LOForEach.java:245)
TRACE 307175:
java.util.AbstractMap.<init>(AbstractMap.java:56)
java.util.HashMap.<init>(HashMap.java:206)
org.apache.pig.impl.util.MultiMap.<init>(MultiMap.java:47)
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:162)
TRACE 307176:
java.util.HashMap.<init>(HashMap.java:209)
org.apache.pig.impl.util.MultiMap.<init>(MultiMap.java:47)
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:162)
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.copyAndLink(Schema.java:242)
TRACE 307177:
java.util.HashMap$Entry.<init>(HashMap.java:683)
java.util.HashMap.addEntry(HashMap.java:753)
java.util.HashMap.put(HashMap.java:385)
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.setParent(Schema.java:251)
~Shubham.
On Wed, Jun 15, 2011 at 6:17 PM, Daniel Dai<[email protected]
<mailto:[email protected]>> wrote:
That would be surprised. Which version of Pig are you using?
Daniel
On 06/15/2011 03:10 PM, Shubham Chopra wrote:
Hi Daniel,
Thanks for the reply. I did try that and ran into this issue
again when I increased the number of operators. I found out, with
hprof, that most sites with high memory usage are schema related.
Is that a bug in schema implementation? Are schema related
data-structures expected to consume so much memory?
~Shubham.
On Wed, Jun 15, 2011 at 2:32 PM, Daniel Dai
<[email protected]<mailto:[email protected]>> wrote:
Try to increase heap size. If you are running through
bin/pig, set PIG_HEAPSIZE (in MB, default is 1000). You can
use "pig -secretDebugCmd" option to see what the command line
looks like.
Daniel
On 06/15/2011 10:09 AM, Shubham Chopra wrote:
Hi,
I am using Pig for number crunching on data that has a
large number of
columns (~300 or so). The script has around 25 operators
and all I am doing
in the script is group bys and SUMs. The script fails
with the following
exception:
<code>
Exception in thread "main" java.lang.OutOfMemoryError: GC
overhead limit
exceeded
at java.util.HashMap.<init>(HashMap.java:209)
at
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:190)
at
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.clone(Schema.java:450)
at
org.apache.pig.impl.logicalLayer.schema.Schema.clone(Schema.java:1005)
at
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.clone(Schema.java:450)
at
org.apache.pig.impl.logicalLayer.ExpressionOperator.clone(ExpressionOperator.java:144)
at
org.apache.pig.impl.logicalLayer.LOProject.clone(LOProject.java:447)
at
org.apache.pig.impl.logicalLayer.LogicalPlan.clone(LogicalPlan.java:116)
at
org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.<init>(LogicalPlanCloneHelper.java:63)
at
org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(LogicalPlanCloner.java:45)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:3504)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1464)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:1013)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:800)
at
org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
at
org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1601)
at
org.apache.pig.PigServer$Graph.clone(PigServer.java:1645)
at
org.apache.pig.PigServer.getClonedGraph(PigServer.java:527)
at
org.apache.pig.PigServer.storeEx(PigServer.java:850)
at org.apache.pig.PigServer.store(PigServer.java:816)
at org.apache.pig.PigServer.store(PigServer.java:784)
</code>
The complete output I see is the following:
<code>
$run-script
11/06/15 09:19:27 INFO executionengine.HExecutionEngine:
Connecting to
hadoop file system at: hdfs://abcd:9000
11/06/15 09:19:28 INFO executionengine.HExecutionEngine:
Connecting to
map-reduce job tracker at: abcd:9001
Exception in thread "main" java.lang.OutOfMemoryError: GC
overhead limit
exceeded
at java.util.HashMap.<init>(HashMap.java:209)
at
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:190)
at
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.clone(Schema.java:450)
at
org.apache.pig.impl.logicalLayer.schema.Schema.clone(Schema.java:1005)
at
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.clone(Schema.java:450)
at
org.apache.pig.impl.logicalLayer.ExpressionOperator.clone(ExpressionOperator.java:144)
at
org.apache.pig.impl.logicalLayer.LOProject.clone(LOProject.java:447)
at
org.apache.pig.impl.logicalLayer.LogicalPlan.clone(LogicalPlan.java:116)
at
org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.<init>(LogicalPlanCloneHelper.java:63)
at
org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(LogicalPlanCloner.java:45)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:3504)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1464)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:1013)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:800)
at
org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
at
org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1601)
at
org.apache.pig.PigServer$Graph.clone(PigServer.java:1645)
at
org.apache.pig.PigServer.getClonedGraph(PigServer.java:527)
at
org.apache.pig.PigServer.storeEx(PigServer.java:850)
at org.apache.pig.PigServer.store(PigServer.java:816)
at org.apache.pig.PigServer.store(PigServer.java:784)
</code>
The process uses around 1.2 gigs of ram before crapping
out with the
exception above. Has anyone else faced a similar
situation? Any way out of
this?
Thanks,
Shubham.