ORC file block size

2014-07-25 Thread John Zeng
Hi,  owner of org.apache.hadoop.hive.ql.io.orc.WriterImpl.java:

When writing a ORC file using following code piece:

   Writer writer = OrcFile.createWriter(new Path(/my_file_path),
   
OrcFile.writerOptions(conf).inspector(inspector).stripeSize(my_stripe_size).bufferSize(my_buffer_size)
   .version(OrcFile.Version.V_0_12));
   /** code to prepare tslist **/

   for (Timestamp ts : tslist) {
 writer.addRow(ts);
   }

   writer.close();

I got following error:

org.apache.hadoop.ipc.RemoteException(java.io.IOException): Specified block 
size is less than configured minimum value 
(dfs.namenode.fs-limits.min-block-size): 20  1048576
   at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2215)
   at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2180)
   at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:505)
   at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:354)
   at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)

After debugging into the code, I found this line in WriterImpl.java(line 169):

this.blockSize = Math.min(MAX_BLOCK_SIZE, 2 * stripeSize);

Basically, the block size is set as stripeSize times 2.  When there is no 
guarantee stripeSize is at least half of the default minimal block size (i.e. 
1048576 as defined by dfs.namenode.fs-limits.min-block-size), such exception 
will be inevitable.

Do we need to change code to at least set blockSize to what is set by 
dfs.namenode.fs-limits.min-block-size?  It will be nice to have a separate 
option for blockSize (instead of always the twice of stripeSize) although.

Thanks

John


Re: ORC file block size

2014-07-25 Thread Prasanth Jayachandran
In hive-0.14, block size is configurable. It is no more twice the value of 
stripe size. 

Thanks
Prasanth

 On Jul 25, 2014, at 3:53 PM, John Zeng john.z...@dataguise.com wrote:
 
 Hi,  owner of org.apache.hadoop.hive.ql.io.orc.WriterImpl.java:
 
 When writing a ORC file using following code piece:
 
   Writer writer = OrcFile.createWriter(new Path(/my_file_path),
   
 OrcFile.writerOptions(conf).inspector(inspector).stripeSize(my_stripe_size).bufferSize(my_buffer_size)
   .version(OrcFile.Version.V_0_12));
   /** code to prepare tslist **/
 
   for (Timestamp ts : tslist) {
 writer.addRow(ts);
   }
 
   writer.close();
 
 I got following error:
 
 org.apache.hadoop.ipc.RemoteException(java.io.IOException): Specified block 
 size is less than configured minimum value 
 (dfs.namenode.fs-limits.min-block-size): 20  1048576
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2215)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2180)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:505)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:354)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
 
 After debugging into the code, I found this line in WriterImpl.java(line 169):
 
this.blockSize = Math.min(MAX_BLOCK_SIZE, 2 * stripeSize);
 
 Basically, the block size is set as stripeSize times 2.  When there is no 
 guarantee stripeSize is at least half of the default minimal block size (i.e. 
 1048576 as defined by dfs.namenode.fs-limits.min-block-size), such exception 
 will be inevitable.
 
 Do we need to change code to at least set blockSize to what is set by 
 dfs.namenode.fs-limits.min-block-size?  It will be nice to have a separate 
 option for blockSize (instead of always the twice of stripeSize) although.
 
 Thanks
 
 John

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.