Hi Prasanth, Thanks for your quick reply. As I mentioned in the previous mail, this was the same stack trace in about 60 failed reducers. I am using Hive 1.2.1, not sure which newer version you are referring to.
But exactly as you pointed out, When I tried to reproduce this issue on my local setup by simply writing a large number of column, the stacktrace did vary. Also, from the WriterImpl code, it appears that the stripes have already been flushed before metadata is written. I may be mistaken, please correct me if I'm wrong. This is one the reasons I believe that this is more than just a simple memory issue related to columns. On Wed, Aug 31, 2016 at 3:42 AM, Prasanth Jayachandran < pjayachand...@hortonworks.com> wrote: > Under memory pressure, the stack trace of OOM can be different depending > on who is requesting more memory when the memory is already full. That is > the reason you are seeing OOM in writeMetadata (it may happen in other > places as well). When dealing with thousands of columns its better to set > hiv.exec.orc.default.buffer.size to lower value until you can avoid OOM. > Depending on the version of hive you are using, this may be set > automatically for you. In older hive versions, if number of columns is > >1000 buffer size will be automatically chosen. In newer version, this > limit is removed and orc writer will figure out the optimal buffer size > based on stripe size, available memory and number of columns. > > Thanks > Prasanth > > > On Aug 30, 2016, at 3:04 PM, Hank baker <hankbake...@gmail.com> wrote: > > Hi all, > > I'm trying to run a map reduce job to convert csv data into orc using the > OrcNewOutputFormat (reduce is required to satisfy some partitioning logic) > but getting an OOM error at reduce phase (during merge to be exact) with > the below attached stacktrace for one particular table which has about 800 > columns and this error seems common across all reducers(minimum reducer > input records is about 20, max. is about 100 mil). I am trying to figure > out the exact cause of the error since I have use the same job to convert > tables with 100-10000 columns without any memory or config changes. > > What concerns me in the stack trace is this line: > > at > org.apache.hadoop.hive.ql.io.orc.WriterImpl.writeMetadata(WriterImpl.java:2327) > > Why is it going OOM while trying to write MetaData ? > > I originally believed this was simply due to the number of open buffers > (as mentioned in http://mail-archives.apache.org/mod_mbox/hive-dev/201410. > mbox/%3c543d5eb6.2000...@apache.org%3E).So I wrote a bit of code to > reproduce the error on my local setup by creating an instance of > OrcRecordWriter and writing large number of columns, I did get a similar > heap space error, however it was going OOM while trying to flush the > stripes, with this in the stacktrace: > > at > org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:2133) > > This issue on the dev environment got resolved by setting > > hive.exec.orc.default.buffer.size=32k > > Will the same setting work for the original error? > > For different reasons I cannot change the reducer memory or lower the > buffer size even at a job level. For now, I am just trying to understand > the source of this error. Can anyone please help? > > Original OOM stacktrace: > > FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.hadoop.hive.ql.io.orc.OutStream.getNewInputBuffer(OutStream.java:107) > at org.apache.hadoop.hive.ql.io.orc.OutStream.write(OutStream.java:140) > at > com.google.protobuf.CodedOutputStream.refreshBuffer(CodedOutputStream.java:833) > at > com.google.protobuf.CodedOutputStream.flush(CodedOutputStream.java:843) > at > org.apache.hadoop.hive.ql.io.orc.WriterImpl.writeMetadata(WriterImpl.java:2327) > at > org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2426) > at > org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat$OrcRecordWriter.close(OrcNewOutputFormat.java:67) > at > org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:550) > at > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:629) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > > >