Hi all I want to according to the keyvalue format, calculate data in HBase table size. HBase table structure is as follows:
The hbase table scan 'test' like this 010012010114200 column=s:STATION, timestamp=1378892292800, value=00001 010012010114200 column=s:YEAR, timestamp=1378892292800, value=2010 010012010114210 column=s:DAY, timestamp=1378892292800, value=14 010012010114210 column=s:HOUR, timestamp=1378892292800, value=21 010012010114210 column=s:MINUTE, timestamp=1378892292800, value=0 010012010114210 column=s:MONTH, timestamp=1378892292800, value=1 I want to calculate the record size: Fixed part needed by KeyValue format = Key Length + Value Length + Row Length + CF Length + Timestamp + Key Value = ( 4 + 4 + 2 + 1 + 8 + 1) = 20 Bytes Variable part needed by KeyValue format = Row + Column Family + Column Qualifier + Value Total bytes required = Fixed part + Variable part 1 Column = 20 + (15 + 1 + 7 + 5) = 48 Bytes 1 Column = 20 + (15 + 1 + 4 + 4) = 44 Bytes 1 Column = 20 + (15 + 1 + 3 + 2) = 41 Bytes 1 Column = 20 + (15 + 1 + 4 + 2) = 42 Bytes 1 Column = 20 + (15 + 1 + 6 + 1) = 43 Bytes 1 Column = 20 + (15 + 1 + 6 + 1) = 43 Bytes one record need 271 Bytes And I hava 2 million record Total Size is about 542MB My Question is: This calculation method is right ? -- In the Hadoop world, I am just a novice, explore the entire Hadoop ecosystem, I hope one day I can contribute their own code YanBit [email protected]
