I was only du'ing the table dir. The tmp dirs only had a couple of hundred bytes in my case. The HFile tool only gives the avgKeyLen=46. This does not include 4 bytes KeyLength + 4 bytes ValueLength. Now indeed I get a total of 54 bytes/KV *1.5 billion ~= 81GB. Probably there are also leftovers from HDFS blocks not being fully occupied.
Thanks, Sever On Tue, Jul 3, 2012 at 2:29 PM, Stack <[email protected]> wrote: > On Tue, Jul 3, 2012 at 2:17 PM, Sever Fundatureanu > <[email protected]> wrote: > > Right, forgot about the timestamps. These should be a long value each, > so 8 > > bytes. The versioning is set to 1 so it shouldn't count. > > Note the column qualifier is also void on each entry. > > > > So now we get (33+1+8)x1.5*10^9 = 63GB, still a 19GB difference... > > > > What about regionserver WAL logs? You including these in your math or > are you just du'ing the table dir? The table dir can have tmp dirs > for compaction and split work. And after Michael Segel, the KV has a > type byte as well as some lengths for finding offsets in KV; take a > looksee w/ the hfile tool: > http://hbase.apache.org/book.html#hfile_tool2 > > St.Ack > -- Sever Fundatureanu Vrije Universiteit Amsterdam E-mail: [email protected]
