Thanx for the response guys. I tried a few different compression sizes and all of them did not work. I guess our use-case is not a good candidate for orc or parquet (which I tried too and it failed) We will use some other file type.
Thanx again. On Fri, May 16, 2014 at 2:26 PM, Prasanth Jayachandran < pjayachand...@hortonworks.com> wrote: > With Hive 0.13 the ORC memory issue is mitigated because of this > optimization https://issues.apache.org/jira/browse/HIVE-6455. This > optimization is enabled by default. > But having 3283 columns is still huge. So I would still recommend reducing > the default compression (256KB) buffer size to a lower value as suggested > by John. > > Thanks > Prasanth Jayachandran > > On May 16, 2014, at 12:31 PM, John Omernik <j...@omernik.com> wrote: > > When I created the table, I had to reduce the orc.compress.size quite a > bit to make my table with many columns work. This was on Hive 0.12 (I > thought it was supposed to be fixed on Hive 0.13, but 3k+ columns is huge) > The default of orc.compress size is quite a bit larger ( think in the 268k > range) Try moving that smaller and smaller if that level doesn't work. > Good luck. > > STORED AS orc tblproperties ("orc.compress.size"="8192"); > > > On Thu, May 15, 2014 at 8:11 PM, Premal Shah <premal.j.s...@gmail.com>wrote: > >> I have a table in hive stored as text file with 3283 columns. All columns >> are of string data type. >> >> I'm trying to convert that table into an orc file table using this command >> *create table orc_table stored as orc as select * from text_table;* >> >> This is the setting under mapred-site.xml >> >> <property> >> <name>mapred.child.java.opts</name> >> <value>-Xmx4G -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode >> -verbose:gc -Xloggc:/mnt/hadoop/@taskid@.gc</value> >> <final>true</final> >> </property> >> >> The tasks die with this error >> >> 2014-05-16 00:53:42,424 FATAL org.apache.hadoop.mapred.Child: Error running >> child : java.lang.OutOfMemoryError: Java heap space >> at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:39) >> at java.nio.ByteBuffer.allocate(ByteBuffer.java:312) >> at >> org.apache.hadoop.hive.ql.io.orc.OutStream.getNewOutputBuffer(OutStream.java:117) >> at org.apache.hadoop.hive.ql.io.orc.OutStream.spill(OutStream.java:168) >> at org.apache.hadoop.hive.ql.io.orc.OutStream.flush(OutStream.java:239) >> at >> org.apache.hadoop.hive.ql.io.orc.RunLengthByteWriter.flush(RunLengthByteWriter.java:58) >> at >> org.apache.hadoop.hive.ql.io.orc.BitFieldWriter.flush(BitFieldWriter.java:44) >> at >> org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:553) >> at >> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StringTreeWriter.writeStripe(WriterImpl.java:1012) >> at >> org.apache.hadoop.hive.ql.io.orc.WriterImpl$ListTreeWriter.writeStripe(WriterImpl.java:1455) >> at >> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1400) >> at >> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1780) >> at >> org.apache.hadoop.hive.ql.io.orc.WriterImpl.checkMemory(WriterImpl.java:221) >> at >> org.apache.hadoop.hive.ql.io.orc.MemoryManager.notifyWriters(MemoryManager.java:168) >> at >> org.apache.hadoop.hive.ql.io.orc.MemoryManager.addedRow(MemoryManager.java:157) >> at >> org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:2028) >> at >> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:86) >> at >> org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:622) >> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) >> at >> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) >> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) >> at >> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) >> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) >> at >> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) >> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) >> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) >> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) >> at org.apache.hadoop.mapred.Child$4.run(Child.java:255) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:396) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) >> >> >> This is the GC output for a task that ran out of memory >> >> 0.690: [GC 17024K->768K(83008K), 0.0019170 secs] >> 0.842: [GC 8488K(83008K), 0.0066800 secs] >> 1.031: [GC 17792K->1481K(83008K), 0.0015400 secs] >> 1.352: [GC 17142K(83008K), 0.0041840 secs] >> 1.371: [GC 18505K->2249K(83008K), 0.0097240 secs] >> 34.779: [GC 28384K(4177280K), 0.0014050 secs] >> >> >> Anything I can tweak to make it work? >> >> -- >> Regards, >> Premal Shah. >> > > > > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity > to which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. -- Regards, Premal Shah.