[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space
[ https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12871719#action_12871719 ] Dirk Schmid commented on PIG-766: - {quote}1. Are you getting the exact same stack trace as mentioned in the jira?{quote} Yes the same and some similar traces: {noformat} java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:279) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:264) at org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:249) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:214) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:264) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:209) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:264) at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77) at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:179) at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:880) at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1201) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:199) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:161) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176) at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1222) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2563) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2501) java.lang.OutOfMemoryError: Java heap space at org.apache.pig.data.DefaultTuple.(DefaultTuple.java:58) at org.apache.pig.data.DefaultTupleFactory.newTuple(DefaultTupleFactory.java:35) at org.apache.pig.data.DataReaderWriter.bytesToTuple(DataReaderWriter.java:61) at org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:142) at org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:136) at org.apache.pig.data.DefaultAbstractBag.readFields(DefaultAbstractBag.java:263) at org.apache.pig.data.DataReaderWriter.bytesToBag(DataReaderWriter.java:71) at org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:145) at org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:136) at org.apache.pig.data.DataReaderWriter.bytesToTuple(DataReaderWriter.java:63) at org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:142) at org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:136) at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:284) at org.apache.pig.impl.io.PigNullableWritable.readFields(PigNullableWritable.java:114) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40) at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116) at org.apache.hadoop.mapreduce.ReduceContext$ValueIterator.next(ReduceContext.java:163) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POCombinerPackage.getNext(POCombinerPackage.java:155) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMultiQueryPackage.getNext(POMultiQueryPackage.java:242) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:170) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:161) at
[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space
[ https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12871046#action_12871046 ] Dirk Schmid commented on PIG-766: - Many memory changes went in. Please, reopen if this is still a problem. I found the problem described by Vadim still existing for the following configuration: - Apache-Hadoop 0.20.2 - Pig 0.7.0 and also for 0.8.0-dev (18/may) ava.lang.OutOfMemoryError: Java heap space -- Key: PIG-766 URL: https://issues.apache.org/jira/browse/PIG-766 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0, 0.7.0 Environment: Hadoop-0.18.3 (cloudera RPMs). mapred.child.java.opts=-Xmx1024m Reporter: Vadim Zaliva My pig script always fails with the following error: Java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77) at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space
[ https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12871253#action_12871253 ] Ashutosh Chauhan commented on PIG-766: -- Dirk, 1. Are you getting the exact same stack trace as mentioned in the jira? 2. Which operations are you doing in your query - join, group-by, any other ? 3. What load/store func are you using to read and write data? PigStorage or your own ? 4. What is your data size and memory available to your tasks? 5. Do you have very large records in your dataset, like hundreds of MB for one record ? It would be great if you can paste here the script from which you get this exception. ava.lang.OutOfMemoryError: Java heap space -- Key: PIG-766 URL: https://issues.apache.org/jira/browse/PIG-766 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0, 0.7.0 Environment: Hadoop-0.18.3 (cloudera RPMs). mapred.child.java.opts=-Xmx1024m Reporter: Vadim Zaliva My pig script always fails with the following error: Java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77) at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space
[ https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699915#action_12699915 ] Vadim Zaliva commented on PIG-766: -- I have 1Gb now, could not go any higher. ava.lang.OutOfMemoryError: Java heap space -- Key: PIG-766 URL: https://issues.apache.org/jira/browse/PIG-766 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Environment: Hadoop-0.18.3 (cloudera RPMs). mapred.child.java.opts=-Xmx1024m Reporter: Vadim Zaliva My pig script always fails with the following error: Java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77) at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space
[ https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699277#action_12699277 ] Vadim Zaliva commented on PIG-766: -- increasing sort buffer to 500Mb did not work for me. since implementation of many basic algorithms (like counting number of records in relationship) in PIG requires using GROUP BY which could produce very long records (up to number of tuples in relationship), this is a very serious problem. Potentially record could exceed available Java heap memory. What are the strategies for overcoming this limitation? Does pig plan to address this? ava.lang.OutOfMemoryError: Java heap space -- Key: PIG-766 URL: https://issues.apache.org/jira/browse/PIG-766 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Environment: Hadoop-0.18.3 (cloudera RPMs). mapred.child.java.opts=-Xmx1024m Reporter: Vadim Zaliva My pig script always fails with the following error: Java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77) at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space
[ https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699376#action_12699376 ] Vadim Zaliva commented on PIG-766: -- I have increased it to 500Mb and it is still not enough. I see this as a more general problem, as at some point the memory I need to allocate for processing big dataset will exceed all possible VM limits. ava.lang.OutOfMemoryError: Java heap space -- Key: PIG-766 URL: https://issues.apache.org/jira/browse/PIG-766 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Environment: Hadoop-0.18.3 (cloudera RPMs). mapred.child.java.opts=-Xmx1024m Reporter: Vadim Zaliva My pig script always fails with the following error: Java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77) at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space
[ https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699427#action_12699427 ] Olga Natkovich commented on PIG-766: I did not mean increasing combiner buffer size - I meant overall memory that is given to the process. ava.lang.OutOfMemoryError: Java heap space -- Key: PIG-766 URL: https://issues.apache.org/jira/browse/PIG-766 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Environment: Hadoop-0.18.3 (cloudera RPMs). mapred.child.java.opts=-Xmx1024m Reporter: Vadim Zaliva My pig script always fails with the following error: Java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77) at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space
[ https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698996#action_12698996 ] Alan Gates commented on PIG-766: It isn't overall data size that matters. It is the size of a given key. So if you have a 2G data set up it has only one key (that is, every row has that key), then you'll hit this problem (assuming you can't fit 2G in memory on your data nodes). Pig does try to spill to avoid this, but has a hard time knowing when and how much to spill, and thus often runs out of memory. But I think you're right that this isn't in the join. From the stack it looks like it's trying to write data out of the map task. Do you have very large rows in this data? ava.lang.OutOfMemoryError: Java heap space -- Key: PIG-766 URL: https://issues.apache.org/jira/browse/PIG-766 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Environment: Hadoop-0.18.3 (cloudera RPMs). mapred.child.java.opts=-Xmx1024m Reporter: Vadim Zaliva My pig script always fails with the following error: Java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77) at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space
[ https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699000#action_12699000 ] Vadim Zaliva commented on PIG-766: -- I have at most 17m rows in my dataset. At some point I am doing GROUP BY and longest row about 500,000 tuples. ava.lang.OutOfMemoryError: Java heap space -- Key: PIG-766 URL: https://issues.apache.org/jira/browse/PIG-766 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Environment: Hadoop-0.18.3 (cloudera RPMs). mapred.child.java.opts=-Xmx1024m Reporter: Vadim Zaliva My pig script always fails with the following error: Java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77) at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space
[ https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699017#action_12699017 ] Olga Natkovich commented on PIG-766: I asked a member of hadoop team to take a look. A possible problem is that there is a single record that does not fit into combiner buffer. Hopefully we will get some help with this. ava.lang.OutOfMemoryError: Java heap space -- Key: PIG-766 URL: https://issues.apache.org/jira/browse/PIG-766 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Environment: Hadoop-0.18.3 (cloudera RPMs). mapred.child.java.opts=-Xmx1024m Reporter: Vadim Zaliva My pig script always fails with the following error: Java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77) at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space
[ https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699019#action_12699019 ] Olga Natkovich commented on PIG-766: I got confirmation from Hadoop dev that this is a case of one huge record that is larger than combiner buffer which means that it is over 90 MB. Does this sound right for your data? Is it possible you have data corruption? Do you have another data set to try this query with? ava.lang.OutOfMemoryError: Java heap space -- Key: PIG-766 URL: https://issues.apache.org/jira/browse/PIG-766 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Environment: Hadoop-0.18.3 (cloudera RPMs). mapred.child.java.opts=-Xmx1024m Reporter: Vadim Zaliva My pig script always fails with the following error: Java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77) at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space
[ https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699026#action_12699026 ] Santhosh Srinivasan commented on PIG-766: - You can specify the I/O sort buffer size on the command line as: java -Dio.sort.mb=200 -cp pig.jar:/path_to_hadoop_site.xml Reference: http://hadoop.apache.org/core/docs/current/hadoop-default.html ava.lang.OutOfMemoryError: Java heap space -- Key: PIG-766 URL: https://issues.apache.org/jira/browse/PIG-766 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Environment: Hadoop-0.18.3 (cloudera RPMs). mapred.child.java.opts=-Xmx1024m Reporter: Vadim Zaliva My pig script always fails with the following error: Java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77) at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.