[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space

2010-05-26 Thread Dirk Schmid (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12871719#action_12871719
 ] 

Dirk Schmid commented on PIG-766:
-

{quote}1. Are you getting the exact same stack trace as mentioned in the 
jira?{quote}
Yes the same and some similar traces:
{noformat}
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
at 
org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:279)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:264)
at 
org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:249)
at 
org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:214)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:264)
at 
org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:209)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:264)
at 
org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:179)
at 
org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:880)
at 
org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1201)
at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:199)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:161)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at 
org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1222)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2563)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2501)



java.lang.OutOfMemoryError: Java heap space
at org.apache.pig.data.DefaultTuple.(DefaultTuple.java:58)
at 
org.apache.pig.data.DefaultTupleFactory.newTuple(DefaultTupleFactory.java:35)
at 
org.apache.pig.data.DataReaderWriter.bytesToTuple(DataReaderWriter.java:61)
at 
org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:142)
at 
org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:136)
at 
org.apache.pig.data.DefaultAbstractBag.readFields(DefaultAbstractBag.java:263)
at 
org.apache.pig.data.DataReaderWriter.bytesToBag(DataReaderWriter.java:71)
at 
org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:145)
at 
org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:136)
at 
org.apache.pig.data.DataReaderWriter.bytesToTuple(DataReaderWriter.java:63)
at 
org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:142)
at 
org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:136)
at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:284)
at 
org.apache.pig.impl.io.PigNullableWritable.readFields(PigNullableWritable.java:114)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
at 
org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
at 
org.apache.hadoop.mapreduce.ReduceContext$ValueIterator.next(ReduceContext.java:163)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POCombinerPackage.getNext(POCombinerPackage.java:155)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMultiQueryPackage.getNext(POMultiQueryPackage.java:242)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:170)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:161)
at 

[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space

2010-05-25 Thread Dirk Schmid (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12871046#action_12871046
 ] 

Dirk Schmid commented on PIG-766:
-

Many memory changes went in. Please, reopen if this is still a problem. 

I found the problem described by Vadim still existing for the following 
configuration:

- Apache-Hadoop 0.20.2
- Pig 0.7.0 and also for  0.8.0-dev (18/may)

 ava.lang.OutOfMemoryError: Java heap space
 --

 Key: PIG-766
 URL: https://issues.apache.org/jira/browse/PIG-766
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0, 0.7.0
 Environment: Hadoop-0.18.3 (cloudera RPMs).
 mapred.child.java.opts=-Xmx1024m
Reporter: Vadim Zaliva

 My pig script always fails with the following error:
 Java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space

2010-05-25 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12871253#action_12871253
 ] 

Ashutosh Chauhan commented on PIG-766:
--

Dirk,

1. Are you getting the exact same stack trace as mentioned in the jira?
2. Which operations are you doing in your query - join, group-by, any other ?
3. What load/store func are you using to read and write data? PigStorage or 
your own ?
4. What is your data size and memory available to your tasks?
5. Do you have very large records in your dataset, like hundreds of MB for one 
record ?

It would be great if you can paste here the script from which you get this 
exception.

 ava.lang.OutOfMemoryError: Java heap space
 --

 Key: PIG-766
 URL: https://issues.apache.org/jira/browse/PIG-766
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0, 0.7.0
 Environment: Hadoop-0.18.3 (cloudera RPMs).
 mapred.child.java.opts=-Xmx1024m
Reporter: Vadim Zaliva

 My pig script always fails with the following error:
 Java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space

2009-04-16 Thread Vadim Zaliva (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699915#action_12699915
 ] 

Vadim Zaliva commented on PIG-766:
--

I have 1Gb now, could not go any higher.


 ava.lang.OutOfMemoryError: Java heap space
 --

 Key: PIG-766
 URL: https://issues.apache.org/jira/browse/PIG-766
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
 Environment: Hadoop-0.18.3 (cloudera RPMs).
 mapred.child.java.opts=-Xmx1024m
Reporter: Vadim Zaliva

 My pig script always fails with the following error:
 Java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space

2009-04-15 Thread Vadim Zaliva (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699277#action_12699277
 ] 

Vadim Zaliva commented on PIG-766:
--

increasing sort buffer to 500Mb did not work for me.

since implementation of many basic algorithms (like counting number of records 
in relationship) in PIG requires using GROUP BY which could produce very long 
records (up to number of tuples in relationship), this is a very serious 
problem. Potentially record could exceed available Java heap memory. What are 
the strategies for overcoming this limitation? Does pig plan to address this?


 ava.lang.OutOfMemoryError: Java heap space
 --

 Key: PIG-766
 URL: https://issues.apache.org/jira/browse/PIG-766
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
 Environment: Hadoop-0.18.3 (cloudera RPMs).
 mapred.child.java.opts=-Xmx1024m
Reporter: Vadim Zaliva

 My pig script always fails with the following error:
 Java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space

2009-04-15 Thread Vadim Zaliva (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699376#action_12699376
 ] 

Vadim Zaliva commented on PIG-766:
--

I have increased it to 500Mb and it is still not enough. I see this as a more 
general problem, as at some point the memory
I need to allocate for processing  big dataset will exceed all possible VM 
limits.


 ava.lang.OutOfMemoryError: Java heap space
 --

 Key: PIG-766
 URL: https://issues.apache.org/jira/browse/PIG-766
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
 Environment: Hadoop-0.18.3 (cloudera RPMs).
 mapred.child.java.opts=-Xmx1024m
Reporter: Vadim Zaliva

 My pig script always fails with the following error:
 Java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space

2009-04-15 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699427#action_12699427
 ] 

Olga Natkovich commented on PIG-766:


I did not mean increasing combiner buffer size - I meant overall memory that is 
given to the process.

 ava.lang.OutOfMemoryError: Java heap space
 --

 Key: PIG-766
 URL: https://issues.apache.org/jira/browse/PIG-766
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
 Environment: Hadoop-0.18.3 (cloudera RPMs).
 mapred.child.java.opts=-Xmx1024m
Reporter: Vadim Zaliva

 My pig script always fails with the following error:
 Java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space

2009-04-14 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698996#action_12698996
 ] 

Alan Gates commented on PIG-766:


It isn't overall data size that matters.  It is the size of a given key.  So if 
you have a 2G data set up it has only one key (that is, every row has that 
key), then you'll hit this problem (assuming you can't fit 2G in memory on your 
data nodes).  Pig does try to spill to avoid this, but has a hard time knowing 
when and how much to spill, and thus often runs out of memory.

But I think you're right that this isn't in the join.  From the stack it looks 
like it's trying to write data out of the map task.  Do you have very large 
rows in this data?

 ava.lang.OutOfMemoryError: Java heap space
 --

 Key: PIG-766
 URL: https://issues.apache.org/jira/browse/PIG-766
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
 Environment: Hadoop-0.18.3 (cloudera RPMs).
 mapred.child.java.opts=-Xmx1024m
Reporter: Vadim Zaliva

 My pig script always fails with the following error:
 Java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space

2009-04-14 Thread Vadim Zaliva (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699000#action_12699000
 ] 

Vadim Zaliva commented on PIG-766:
--

I have at most 17m rows in my dataset.
At some point I am doing GROUP BY and longest row about 500,000 tuples.



 ava.lang.OutOfMemoryError: Java heap space
 --

 Key: PIG-766
 URL: https://issues.apache.org/jira/browse/PIG-766
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
 Environment: Hadoop-0.18.3 (cloudera RPMs).
 mapred.child.java.opts=-Xmx1024m
Reporter: Vadim Zaliva

 My pig script always fails with the following error:
 Java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space

2009-04-14 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699017#action_12699017
 ] 

Olga Natkovich commented on PIG-766:


I asked a member of hadoop team to take a look. A possible problem is that 
there is a single record that does not fit into combiner buffer. Hopefully we 
will get some help with this.

 ava.lang.OutOfMemoryError: Java heap space
 --

 Key: PIG-766
 URL: https://issues.apache.org/jira/browse/PIG-766
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
 Environment: Hadoop-0.18.3 (cloudera RPMs).
 mapred.child.java.opts=-Xmx1024m
Reporter: Vadim Zaliva

 My pig script always fails with the following error:
 Java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space

2009-04-14 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699019#action_12699019
 ] 

Olga Natkovich commented on PIG-766:


I got confirmation from Hadoop dev that this is a case of one huge record that 
is larger than combiner buffer which means that it is over 90 MB. Does this 
sound right for your data? Is it possible you have data corruption? Do you have 
another data set to try this query with?

 ava.lang.OutOfMemoryError: Java heap space
 --

 Key: PIG-766
 URL: https://issues.apache.org/jira/browse/PIG-766
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
 Environment: Hadoop-0.18.3 (cloudera RPMs).
 mapred.child.java.opts=-Xmx1024m
Reporter: Vadim Zaliva

 My pig script always fails with the following error:
 Java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space

2009-04-14 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699026#action_12699026
 ] 

Santhosh Srinivasan commented on PIG-766:
-

You can specify the I/O sort buffer size on the command line as:

java -Dio.sort.mb=200 -cp pig.jar:/path_to_hadoop_site.xml

Reference: http://hadoop.apache.org/core/docs/current/hadoop-default.html

 ava.lang.OutOfMemoryError: Java heap space
 --

 Key: PIG-766
 URL: https://issues.apache.org/jira/browse/PIG-766
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
 Environment: Hadoop-0.18.3 (cloudera RPMs).
 mapred.child.java.opts=-Xmx1024m
Reporter: Vadim Zaliva

 My pig script always fails with the following error:
 Java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.