Hi Colin,

Have you verified if the content of /a_d includes WALs and/or the content
of the snapshots or the HBase archive? have you tried to major compact
table B?  does it makes any difference?

regards,
esteban.



--
Cloudera, Inc.



On Thu, Aug 7, 2014 at 2:00 PM, Colin Kincaid Williams <disc...@uw.edu>
wrote:

> I used the copy table command to copy a database between the original
> cluster A and a new cluster B. I have noticed that the rootdir is larger
> than 2X the size of the original. I am trying to account for such a large
> difference. The following are some details about the table.
>
>
> I'm trying to figure out why my copied table is more than 2X the size of
> the original table. Could the bloomfilter itself account for this?
>
> The guide I used as a reference:
>
> http://blog.pivotal.io/pivotal/products/migrating-an-apache-hbase-table-between-different-clusters
>
>
>
> Supposedly the original command used to create the table on cluster A:
>
> create 'ADMd5', {NAME => 'a', BLOOMFILTER => 'ROW', VERSIONS => '1',
> COMPRESSION => 'SNAPPY', MIN_VERSIONS => '0'}
>
>
> How I created the target table on cluster B:
>
> create 'ADMd5','a',{
>
>
>
> BLOOMFILTER => 'ROW',
> VERSIONS => '1',
> COMPRESSION => 'SNAPPY',
> MIN_VERSIONS => '0',
> SPLITS =>['/++ASUZm4u7YsTcF/VtK6Q==',
> '/zyuFR1VmhJyF4rbWsFnEg==',
> '0sZYnBd83ul58d1O8I2JnA==',
> '2+03N7IicZH3ltrqZUX6kQ==',
> '4+/slRQtkBDU7Px6C9MAbg==',
> '6+1dGCQ/IBrCsrNQXe/9xQ==',
> '7+2pvtpHUQHWkZJoouR9wQ==',
> '8+4n2deXhzmrpe//2Fo6Fg==',
> '9+4SKW/BmNzpL68cXwKV1Q==',
> 'A+4ajStFkjEMf36cX5D9xg==',
> 'B+6Zm6Kccb3l6iM2L0epxQ==',
> 'C+6lKKDiOWl5qrRn72fNCw==',
> 'D+6dZMyn7m+NhJ7G07gqaw==',
> 'E+6BrimmrpAd92gZJ5hyMw==',
> 'G+5tisu4xWZMOJnDHeYBJg==',
> 'I+7fRy4dvqcM/L6dFRQk9g==',
> 'J+8ECMw1zeOyjfOg/ypXJA==',
> 'K+7tenLYn6a1aNLniL6tbg==']}
>
>
> How the tables now appear in hbase shell:
>
> table A:
>
> describe 'ADMd5'
> DESCRIPTION
>
>   ENABLED
>
>  {NAME => 'ADMd5', FAMILIES => [{NAME => 'a', BLOOMFILTER => 'NONE',
> REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VER
> true
>
>  SIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY =>
> 'false', BLOCKCACHE => 'true'}]}
>
>
> 1 row(s) in 0.0370 seconds
>
>
> table B:
>
> hbase(main):003:0> describe 'ADMd5'
> DESCRIPTION
>
>   ENABLED
>
>  {NAME => 'ADMd5', FAMILIES => [{NAME => 'a', BLOOMFILTER => 'ROW',
> REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'SNAPPY', MIN_VE
> true
>
>  RSIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY =>
> 'false', BLOCKCACHE => 'true'}]}
>
>
> 1 row(s) in 0.0280 seconds
>
>
>
> The containing foldersize in hdfs:
> table A:
> sudo -u hdfs hadoop fs -dus -h /a_d
> dus: DEPRECATED: Please use 'du -s' instead.
> 227.4g  /a_d
>
> table B:
> sudo -u hdfs hadoop fs -dus -h /a_d
> dus: DEPRECATED: Please use 'du -s' instead.
> 501.0g  /a_d
>
>
> https://gist.github.com/drocsid/80bba7b6b19d64fde6c2
>

Reply via email to