[ 
https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698986#action_12698986
 ] 

Alan Gates commented on PIG-766:
--------------------------------

Can you attach your script, and give an idea of the data sizes you are trying 
to join?  Of particular interest is the cardinality of the keys.

Also, if your inputs of very different sizes, you can try reducing the join 
order.  Pig always materializes the first input's keys in memory and streams 
the second.  If one table has many more instances of a given key than the other 
this can resolve the out of memory issue.

> ava.lang.OutOfMemoryError: Java heap space
> ------------------------------------------
>
>                 Key: PIG-766
>                 URL: https://issues.apache.org/jira/browse/PIG-766
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: Hadoop-0.18.3 (cloudera RPMs).
> mapred.child.java.opts=-Xmx1024m
>            Reporter: Vadim Zaliva
>
> My pig script always fails with the following error:
> Java.lang.OutOfMemoryError: Java heap space
>        at java.util.Arrays.copyOf(Arrays.java:2786)
>        at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
>        at java.io.DataOutputStream.write(DataOutputStream.java:90)
>        at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
>        at 
> org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213)
>        at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
>        at 
> org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233)
>        at 
> org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162)
>        at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
>        at 
> org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83)
>        at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
>        at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
>        at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156)
>        at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857)
>        at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467)
>        at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101)
>        at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219)
>        at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208)
>        at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86)
>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>        at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to