Hi Colin,

Just to make sure.

Is table A from the source cluster and not compressed, and table B in the
destination cluster and SNAPPY compressed? Is that correct? Then ratio
should be the opposite. Are you able to du -h from hadoop to see if all
regions are evenly bigger or if anything else is wrong?


2014-08-07 20:44 GMT-04:00 Colin Kincaid Williams <disc...@uw.edu>:

> I haven't yet tried to major compact table B. I will look up some
> documentation on WALs and snapshots to find this information in the hdfs
> filesystem tomorrow. Could it be caused by the bloomfilter existing on
> table B, but not table A? The funny thing is the source table is smaller
> than the destination.
>
>
> On Thu, Aug 7, 2014 at 4:50 PM, Esteban Gutierrez <este...@cloudera.com>
> wrote:
>
> > Hi Colin,
> >
> > Have you verified if the content of /a_d includes WALs and/or the content
> > of the snapshots or the HBase archive? have you tried to major compact
> > table B?  does it makes any difference?
> >
> > regards,
> > esteban.
> >
> >
> >
> > --
> > Cloudera, Inc.
> >
> >
> >
> > On Thu, Aug 7, 2014 at 2:00 PM, Colin Kincaid Williams <disc...@uw.edu>
> > wrote:
> >
> > > I used the copy table command to copy a database between the original
> > > cluster A and a new cluster B. I have noticed that the rootdir is
> larger
> > > than 2X the size of the original. I am trying to account for such a
> large
> > > difference. The following are some details about the table.
> > >
> > >
> > > I'm trying to figure out why my copied table is more than 2X the size
> of
> > > the original table. Could the bloomfilter itself account for this?
> > >
> > > The guide I used as a reference:
> > >
> > >
> >
> http://blog.pivotal.io/pivotal/products/migrating-an-apache-hbase-table-between-different-clusters
> > >
> > >
> > >
> > > Supposedly the original command used to create the table on cluster A:
> > >
> > > create 'ADMd5', {NAME => 'a', BLOOMFILTER => 'ROW', VERSIONS => '1',
> > > COMPRESSION => 'SNAPPY', MIN_VERSIONS => '0'}
> > >
> > >
> > > How I created the target table on cluster B:
> > >
> > > create 'ADMd5','a',{
> > >
> > >
> > >
> > > BLOOMFILTER => 'ROW',
> > > VERSIONS => '1',
> > > COMPRESSION => 'SNAPPY',
> > > MIN_VERSIONS => '0',
> > > SPLITS =>['/++ASUZm4u7YsTcF/VtK6Q==',
> > > '/zyuFR1VmhJyF4rbWsFnEg==',
> > > '0sZYnBd83ul58d1O8I2JnA==',
> > > '2+03N7IicZH3ltrqZUX6kQ==',
> > > '4+/slRQtkBDU7Px6C9MAbg==',
> > > '6+1dGCQ/IBrCsrNQXe/9xQ==',
> > > '7+2pvtpHUQHWkZJoouR9wQ==',
> > > '8+4n2deXhzmrpe//2Fo6Fg==',
> > > '9+4SKW/BmNzpL68cXwKV1Q==',
> > > 'A+4ajStFkjEMf36cX5D9xg==',
> > > 'B+6Zm6Kccb3l6iM2L0epxQ==',
> > > 'C+6lKKDiOWl5qrRn72fNCw==',
> > > 'D+6dZMyn7m+NhJ7G07gqaw==',
> > > 'E+6BrimmrpAd92gZJ5hyMw==',
> > > 'G+5tisu4xWZMOJnDHeYBJg==',
> > > 'I+7fRy4dvqcM/L6dFRQk9g==',
> > > 'J+8ECMw1zeOyjfOg/ypXJA==',
> > > 'K+7tenLYn6a1aNLniL6tbg==']}
> > >
> > >
> > > How the tables now appear in hbase shell:
> > >
> > > table A:
> > >
> > > describe 'ADMd5'
> > > DESCRIPTION
> > >
> > >   ENABLED
> > >
> > >  {NAME => 'ADMd5', FAMILIES => [{NAME => 'a', BLOOMFILTER => 'NONE',
> > > REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE',
> MIN_VER
> > > true
> > >
> > >  SIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY =>
> > > 'false', BLOCKCACHE => 'true'}]}
> > >
> > >
> > > 1 row(s) in 0.0370 seconds
> > >
> > >
> > > table B:
> > >
> > > hbase(main):003:0> describe 'ADMd5'
> > > DESCRIPTION
> > >
> > >   ENABLED
> > >
> > >  {NAME => 'ADMd5', FAMILIES => [{NAME => 'a', BLOOMFILTER => 'ROW',
> > > REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'SNAPPY',
> > MIN_VE
> > > true
> > >
> > >  RSIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY =>
> > > 'false', BLOCKCACHE => 'true'}]}
> > >
> > >
> > > 1 row(s) in 0.0280 seconds
> > >
> > >
> > >
> > > The containing foldersize in hdfs:
> > > table A:
> > > sudo -u hdfs hadoop fs -dus -h /a_d
> > > dus: DEPRECATED: Please use 'du -s' instead.
> > > 227.4g  /a_d
> > >
> > > table B:
> > > sudo -u hdfs hadoop fs -dus -h /a_d
> > > dus: DEPRECATED: Please use 'du -s' instead.
> > > 501.0g  /a_d
> > >
> > >
> > > https://gist.github.com/drocsid/80bba7b6b19d64fde6c2
> > >
> >
>

Reply via email to