You can run ``hbase org.apache.hadoop.hbase.io.hfile.HFile -f "$region" -m'' where $region is every HFile (located under /hbase/$table/*/$family). This is rather slow [1] for some reason I don't quite understand, but it's many orders of magnitude faster than MapReducing the entire table. The output will have information like "entryCount" (number of cells in this file), "totalBytes" (size of the uncompressed data), "length" (actual size on disk), "avgKeyLen" (average number of bytes in a key), "avgValueLen" (average number of bytes stored in a cell).
This way you can get detailed information about your table. The results won't be up-to-date to the second, but they'll be pretty close. [1] I recently ran this at SU on a table with about 1200 regions and it took 1h 15m to read the meta data of every HFile. I don't understand how this can take so much time. -- Benoit "tsuna" Sigoure Software Engineer @ www.StumbleUpon.com
