Re: ORC file in Hive 0.13 throws Java heap space error

Premal Shah Mon, 19 May 2014 13:51:02 -0700

Thanx for the response guys. I tried a few different compression sizes and
all of them did not work.
I guess our use-case is not a good candidate for orc or parquet (which I
tried too and it failed)
We will use some other file type.


Thanx again.


On Fri, May 16, 2014 at 2:26 PM, Prasanth Jayachandran <
pjayachand...@hortonworks.com> wrote:

> With Hive 0.13 the ORC memory issue is mitigated because of this
> optimization https://issues.apache.org/jira/browse/HIVE-6455. This
> optimization is enabled by default.
> But having 3283 columns is still huge. So I would still recommend reducing
> the default compression (256KB) buffer size to a lower value as suggested
> by John.
>
> Thanks
> Prasanth Jayachandran
>
> On May 16, 2014, at 12:31 PM, John Omernik <j...@omernik.com> wrote:
>
> When I created the table, I had to reduce the orc.compress.size quite a
> bit to make my table with many columns work. This was on Hive 0.12 (I
> thought it was supposed to be fixed on Hive 0.13, but 3k+ columns is huge)
>  The default of orc.compress size is quite a bit larger ( think in the 268k
> range) Try moving that smaller and smaller if that level doesn't work.
>  Good luck.
>
> STORED AS orc tblproperties ("orc.compress.size"="8192");
>
>
> On Thu, May 15, 2014 at 8:11 PM, Premal Shah <premal.j.s...@gmail.com>wrote:
>
>> I have a table in hive stored as text file with 3283 columns. All columns
>> are of string data type.
>>
>> I'm trying to convert that table into an orc file table using this command
>> *create table orc_table stored as orc as select * from text_table;*
>>
>> This is the setting under mapred-site.xml
>>
>> <property>
>>     <name>mapred.child.java.opts</name>
>>     <value>-Xmx4G -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
>> -verbose:gc -Xloggc:/mnt/hadoop/@taskid@.gc</value>
>>     <final>true</final>
>>   </property>
>>
>> The tasks die with this error
>>
>> 2014-05-16 00:53:42,424 FATAL org.apache.hadoop.mapred.Child: Error running 
>> child : java.lang.OutOfMemoryError: Java heap space
>>      at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:39)
>>      at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)
>>      at 
>> org.apache.hadoop.hive.ql.io.orc.OutStream.getNewOutputBuffer(OutStream.java:117)
>>      at org.apache.hadoop.hive.ql.io.orc.OutStream.spill(OutStream.java:168)
>>      at org.apache.hadoop.hive.ql.io.orc.OutStream.flush(OutStream.java:239)
>>      at 
>> org.apache.hadoop.hive.ql.io.orc.RunLengthByteWriter.flush(RunLengthByteWriter.java:58)
>>      at 
>> org.apache.hadoop.hive.ql.io.orc.BitFieldWriter.flush(BitFieldWriter.java:44)
>>      at 
>> org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:553)
>>      at 
>> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StringTreeWriter.writeStripe(WriterImpl.java:1012)
>>      at 
>> org.apache.hadoop.hive.ql.io.orc.WriterImpl$ListTreeWriter.writeStripe(WriterImpl.java:1455)
>>      at 
>> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1400)
>>      at 
>> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1780)
>>      at 
>> org.apache.hadoop.hive.ql.io.orc.WriterImpl.checkMemory(WriterImpl.java:221)
>>      at 
>> org.apache.hadoop.hive.ql.io.orc.MemoryManager.notifyWriters(MemoryManager.java:168)
>>      at 
>> org.apache.hadoop.hive.ql.io.orc.MemoryManager.addedRow(MemoryManager.java:157)
>>      at 
>> org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:2028)
>>      at 
>> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:86)
>>      at 
>> org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:622)
>>      at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
>>      at 
>> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
>>      at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
>>      at 
>> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
>>      at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
>>      at 
>> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
>>      at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
>>      at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>      at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
>>      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
>>      at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>>      at java.security.AccessController.doPrivileged(Native Method)
>>      at javax.security.auth.Subject.doAs(Subject.java:396)
>>      at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
>>
>>
>> This is the GC output for a task that ran out of memory
>>
>> 0.690: [GC 17024K->768K(83008K), 0.0019170 secs]
>> 0.842: [GC 8488K(83008K), 0.0066800 secs]
>> 1.031: [GC 17792K->1481K(83008K), 0.0015400 secs]
>> 1.352: [GC 17142K(83008K), 0.0041840 secs]
>> 1.371: [GC 18505K->2249K(83008K), 0.0097240 secs]
>> 34.779: [GC 28384K(4177280K), 0.0014050 secs]
>>
>>
>> Anything I can tweak to make it work?
>>
>> --
>> Regards,
>> Premal Shah.
>>
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.




-- 
Regards,
Premal Shah.

Re: ORC file in Hive 0.13 throws Java heap space error

Reply via email to