Hi Prasanth,

Thanks for your quick reply. As I mentioned in the previous mail, this was
the same stack trace in about 60 failed reducers. I am using Hive 1.2.1,
not sure which newer version you are referring to.

But exactly as you pointed out, When I tried to reproduce this issue on my
local setup by simply writing a large number of column, the stacktrace did
vary.

Also, from the WriterImpl code, it appears that the stripes have already
been flushed before metadata is written. I may be mistaken, please correct
me if I'm wrong. This is one the reasons I believe that this is more than
just a simple memory issue related to columns.


On Wed, Aug 31, 2016 at 3:42 AM, Prasanth Jayachandran <
pjayachand...@hortonworks.com> wrote:

> Under memory pressure, the stack trace of OOM can be different depending
> on who is requesting more memory when the memory is already full. That is
> the reason you are seeing OOM in writeMetadata (it may happen in other
> places as well). When dealing with thousands of columns its better to set
> hiv.exec.orc.default.buffer.size to lower value until you can avoid OOM.
> Depending on the version of hive you are using, this may be set
> automatically for you. In older hive versions, if number of columns is
> >1000 buffer size will be automatically chosen. In newer version, this
> limit is removed and orc writer will figure out the optimal buffer size
> based on stripe size, available memory and number of columns.
>
> Thanks
> Prasanth
>
>
> On Aug 30, 2016, at 3:04 PM, Hank baker <hankbake...@gmail.com> wrote:
>
> Hi all,
>
> I'm trying to run a map reduce job to convert csv data into orc using the
> OrcNewOutputFormat (reduce is required to satisfy some partitioning logic)
> but getting an OOM error at reduce phase (during merge to be exact) with
> the below attached stacktrace for one particular table which has about 800
> columns and this error seems common across all reducers(minimum reducer
> input records is about 20, max. is about 100 mil). I am trying to figure
> out the exact cause of the error since I have use the same job to convert
> tables with 100-10000 columns without any memory or config changes.
>
> What concerns me in the stack trace is this line:
>
>       at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.writeMetadata(WriterImpl.java:2327)
>
> Why is it going OOM while trying to write MetaData ?
>
> I originally believed this was simply due to the number of open buffers
> (as mentioned in http://mail-archives.apache.org/mod_mbox/hive-dev/201410.
> mbox/%3c543d5eb6.2000...@apache.org%3E).So I wrote a bit of code to
> reproduce the error on my local setup by creating an instance of
> OrcRecordWriter and writing large number of columns, I did get a similar
> heap space error, however it was going OOM while trying to flush the
> stripes, with this in the stacktrace:
>
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:2133)
>
> This issue on the dev environment got resolved by setting
>
> hive.exec.orc.default.buffer.size=32k
>
> Will the same setting work for the original error?
>
> For different reasons I cannot change the reducer memory or lower the
> buffer size even at a job level. For now, I am just trying to understand
> the source of this error. Can anyone please help?
>
> Original OOM stacktrace:
>
> FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : 
> java.lang.OutOfMemoryError: Java heap space
>       at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
>       at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
>       at 
> org.apache.hadoop.hive.ql.io.orc.OutStream.getNewInputBuffer(OutStream.java:107)
>       at org.apache.hadoop.hive.ql.io.orc.OutStream.write(OutStream.java:140)
>       at 
> com.google.protobuf.CodedOutputStream.refreshBuffer(CodedOutputStream.java:833)
>       at 
> com.google.protobuf.CodedOutputStream.flush(CodedOutputStream.java:843)
>       at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.writeMetadata(WriterImpl.java:2327)
>       at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2426)
>       at 
> org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat$OrcRecordWriter.close(OrcNewOutputFormat.java:67)
>       at 
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:550)
>       at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:629)
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
>       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>
>
>

Reply via email to