cool thanks, will try 2015-09-24 9:32 GMT+01:00 Prasanth Jayachandran < pjayachand...@hortonworks.com>:
> With 650 columns you might need to reduce the compression buffer size to > 8KB (may be try decreasing it fails or increasing it if it succeeds to find > the right size) down from default 256KB. You can do that by setting > orc.compress.size tblproperties. > > On Sep 24, 2015, at 3:27 AM, Patrick Duin <patd...@gmail.com> wrote: > > Thanks for the reply, > My first thought was out of memory as well but the illegal argument > exception happens before it is a separate entry in the log, The OOM > exception is not the cause. So I am not sure where that OOM exception fits > in. I've tried running it with more memory and got the same problem it was > also consistently failing on the same split. > We have about 650 columns. I don't know how many record writers are open > (how can I see that?). > I'll try running it with a reduced stripe size see if that helps. > The weird thing is we have a production cluster that is running same > hadoop/hive versions, same code and same data and processing just fine I > get this error only in our QA cluster. > It's hard to locate the difference :). > Anyway thanks for the pointers I'll do some more digging. > > Cheers, > Patrick > > 2015-09-24 0:51 GMT+01:00 Prasanth Jayachandran < > pjayachand...@hortonworks.com>: > >> Looks like you are running out of memory. Trying increasing the heap >> memory or reducing the stripe size. How many columns are you writing? Any >> idea how many record writers are open per map task? >> >> - Prasanth >> >> On Sep 22, 2015, at 4:32 AM, Patrick Duin <patd...@gmail.com> wrote: >> >> Hi all, >> >> I am struggling trying to understand a stack trace I am getting trying to >> write an ORC file: >> I am using hive-0.13.0/hadoop-2.4.0. >> >> 2015-09-21 09:15:44,603 INFO [main] org.apache.hadoop.mapred.MapTask: >> Ignoring exception during close for >> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector@2ce49e21 >> java.lang.IllegalArgumentException: Column has wrong number of index entries >> found: >> org.apache.hadoop.hive.ql.io.orc.OrcProto$RowIndexEntry$Builder@6eeb967b >> expected: 1 >> at >> org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:578) >> at >> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1398) >> at >> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1780) >> at >> org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040) >> at >> org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat$OrcRecordWriter.close(OrcNewOutputFormat.java:67) >> at >> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:647) >> at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1990) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:774) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) >> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:415) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594) >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) >> 2015-09-21 09:15:45,988 FATAL [main] org.apache.hadoop.mapred.YarnChild: >> Error running child : java.lang.OutOfMemoryError: Java heap space >> at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) >> at java.nio.ByteBuffer.allocate(ByteBuffer.java:331) >> at >> org.apache.hadoop.hive.ql.io.orc.OutStream.getNewOutputBuffer(OutStream.java:117) >> at org.apache.hadoop.hive.ql.io.orc.OutStream.spill(OutStream.java:168) >> at org.apache.hadoop.hive.ql.io.orc.OutStream.flush(OutStream.java:239) >> at >> org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:583) >> at >> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StringTreeWriter.writeStripe(WriterImpl.java:1012) >> at >> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1400) >> at >> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1780) >> at >> org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040) >> at >> org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat$OrcRecordWriter.close(OrcNewOutputFormat.java:67) >> at >> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:647) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) >> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:415) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594) >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) >> >> I've seen https://issues.apache.org/jira/browse/HIVE-9080 and I think that >> might be related. >> >> I am not using hive though I am using a Map only job that writes to an >> OrcNewOutputFormat.class. >> >> Any pointers would be appreciated, anyone seen this before? >> >> >> Thanks, >> >> Patrick >> >> >> > >