I used the copy table command to copy a database between the original cluster A and a new cluster B. I have noticed that the rootdir is larger than 2X the size of the original. I am trying to account for such a large difference. The following are some details about the table.
I'm trying to figure out why my copied table is more than 2X the size of the original table. Could the bloomfilter itself account for this? The guide I used as a reference: http://blog.pivotal.io/pivotal/products/migrating-an-apache-hbase-table-between-different-clusters Supposedly the original command used to create the table on cluster A: create 'ADMd5', {NAME => 'a', BLOOMFILTER => 'ROW', VERSIONS => '1', COMPRESSION => 'SNAPPY', MIN_VERSIONS => '0'} How I created the target table on cluster B: create 'ADMd5','a',{ BLOOMFILTER => 'ROW', VERSIONS => '1', COMPRESSION => 'SNAPPY', MIN_VERSIONS => '0', SPLITS =>['/++ASUZm4u7YsTcF/VtK6Q==', '/zyuFR1VmhJyF4rbWsFnEg==', '0sZYnBd83ul58d1O8I2JnA==', '2+03N7IicZH3ltrqZUX6kQ==', '4+/slRQtkBDU7Px6C9MAbg==', '6+1dGCQ/IBrCsrNQXe/9xQ==', '7+2pvtpHUQHWkZJoouR9wQ==', '8+4n2deXhzmrpe//2Fo6Fg==', '9+4SKW/BmNzpL68cXwKV1Q==', 'A+4ajStFkjEMf36cX5D9xg==', 'B+6Zm6Kccb3l6iM2L0epxQ==', 'C+6lKKDiOWl5qrRn72fNCw==', 'D+6dZMyn7m+NhJ7G07gqaw==', 'E+6BrimmrpAd92gZJ5hyMw==', 'G+5tisu4xWZMOJnDHeYBJg==', 'I+7fRy4dvqcM/L6dFRQk9g==', 'J+8ECMw1zeOyjfOg/ypXJA==', 'K+7tenLYn6a1aNLniL6tbg==']} How the tables now appear in hbase shell: table A: describe 'ADMd5' DESCRIPTION ENABLED {NAME => 'ADMd5', FAMILIES => [{NAME => 'a', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VER true SIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]} 1 row(s) in 0.0370 seconds table B: hbase(main):003:0> describe 'ADMd5' DESCRIPTION ENABLED {NAME => 'ADMd5', FAMILIES => [{NAME => 'a', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'SNAPPY', MIN_VE true RSIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]} 1 row(s) in 0.0280 seconds The containing foldersize in hdfs: table A: sudo -u hdfs hadoop fs -dus -h /a_d dus: DEPRECATED: Please use 'du -s' instead. 227.4g /a_d table B: sudo -u hdfs hadoop fs -dus -h /a_d dus: DEPRECATED: Please use 'du -s' instead. 501.0g /a_d https://gist.github.com/drocsid/80bba7b6b19d64fde6c2