Hi Colin, Have you verified if the content of /a_d includes WALs and/or the content of the snapshots or the HBase archive? have you tried to major compact table B? does it makes any difference?
regards, esteban. -- Cloudera, Inc. On Thu, Aug 7, 2014 at 2:00 PM, Colin Kincaid Williams <disc...@uw.edu> wrote: > I used the copy table command to copy a database between the original > cluster A and a new cluster B. I have noticed that the rootdir is larger > than 2X the size of the original. I am trying to account for such a large > difference. The following are some details about the table. > > > I'm trying to figure out why my copied table is more than 2X the size of > the original table. Could the bloomfilter itself account for this? > > The guide I used as a reference: > > http://blog.pivotal.io/pivotal/products/migrating-an-apache-hbase-table-between-different-clusters > > > > Supposedly the original command used to create the table on cluster A: > > create 'ADMd5', {NAME => 'a', BLOOMFILTER => 'ROW', VERSIONS => '1', > COMPRESSION => 'SNAPPY', MIN_VERSIONS => '0'} > > > How I created the target table on cluster B: > > create 'ADMd5','a',{ > > > > BLOOMFILTER => 'ROW', > VERSIONS => '1', > COMPRESSION => 'SNAPPY', > MIN_VERSIONS => '0', > SPLITS =>['/++ASUZm4u7YsTcF/VtK6Q==', > '/zyuFR1VmhJyF4rbWsFnEg==', > '0sZYnBd83ul58d1O8I2JnA==', > '2+03N7IicZH3ltrqZUX6kQ==', > '4+/slRQtkBDU7Px6C9MAbg==', > '6+1dGCQ/IBrCsrNQXe/9xQ==', > '7+2pvtpHUQHWkZJoouR9wQ==', > '8+4n2deXhzmrpe//2Fo6Fg==', > '9+4SKW/BmNzpL68cXwKV1Q==', > 'A+4ajStFkjEMf36cX5D9xg==', > 'B+6Zm6Kccb3l6iM2L0epxQ==', > 'C+6lKKDiOWl5qrRn72fNCw==', > 'D+6dZMyn7m+NhJ7G07gqaw==', > 'E+6BrimmrpAd92gZJ5hyMw==', > 'G+5tisu4xWZMOJnDHeYBJg==', > 'I+7fRy4dvqcM/L6dFRQk9g==', > 'J+8ECMw1zeOyjfOg/ypXJA==', > 'K+7tenLYn6a1aNLniL6tbg==']} > > > How the tables now appear in hbase shell: > > table A: > > describe 'ADMd5' > DESCRIPTION > > ENABLED > > {NAME => 'ADMd5', FAMILIES => [{NAME => 'a', BLOOMFILTER => 'NONE', > REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VER > true > > SIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => > 'false', BLOCKCACHE => 'true'}]} > > > 1 row(s) in 0.0370 seconds > > > table B: > > hbase(main):003:0> describe 'ADMd5' > DESCRIPTION > > ENABLED > > {NAME => 'ADMd5', FAMILIES => [{NAME => 'a', BLOOMFILTER => 'ROW', > REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'SNAPPY', MIN_VE > true > > RSIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => > 'false', BLOCKCACHE => 'true'}]} > > > 1 row(s) in 0.0280 seconds > > > > The containing foldersize in hdfs: > table A: > sudo -u hdfs hadoop fs -dus -h /a_d > dus: DEPRECATED: Please use 'du -s' instead. > 227.4g /a_d > > table B: > sudo -u hdfs hadoop fs -dus -h /a_d > dus: DEPRECATED: Please use 'du -s' instead. > 501.0g /a_d > > > https://gist.github.com/drocsid/80bba7b6b19d64fde6c2 >