Re: Large discrepancy in hdfs hbase rootdir size after copytable operation.

Colin Kincaid Williams Fri, 08 Aug 2014 12:41:31 -0700

The correct syntax is :

create 'ADMd5',{
NAME => 'a',
VERSIONS => '1',
COMPRESSION => 'SNAPPY',
BLOOMFILTER => 'ROW',
},
{
SPLITS => ['/++ASUZm4u7YsTcF/VtK6Q==',
'/zyuFR1VmhJyF4rbWsFnEg==',
'0sZYnBd83ul58d1O8I2JnA==',
'2+03N7IicZH3ltrqZUX6kQ==',
'4+/slRQtkBDU7Px6C9MAbg==',
'6+1dGCQ/IBrCsrNQXe/9xQ==',
'7+2pvtpHUQHWkZJoouR9wQ==',
'8+4n2deXhzmrpe//2Fo6Fg==',
'9+4SKW/BmNzpL68cXwKV1Q==',
'A+4ajStFkjEMf36cX5D9xg==',
'B+6Zm6Kccb3l6iM2L0epxQ==',
'C+6lKKDiOWl5qrRn72fNCw==',
'D+6dZMyn7m+NhJ7G07gqaw==',
'E+6BrimmrpAd92gZJ5hyMw==',
'G+5tisu4xWZMOJnDHeYBJg==',
'I+7fRy4dvqcM/L6dFRQk9g==',
'J+8ECMw1zeOyjfOg/ypXJA==',
'K+7tenLYn6a1aNLniL6tbg==',]
}




On Fri, Aug 8, 2014 at 12:23 PM, Colin Kincaid Williams <disc...@uw.edu>
wrote:

> I have discovered the error. I made the mistake regarding the compression
> and the bloom filter. The new table doesn't have them enabled, and the old
> does. However I'm wondering how I can create tables with splits and bf and
> compression enabled. Shouldn't the following command return an error?
>
> hbase(main):001:0> create 'ADMd5','a',{
>
> hbase(main):002:1* BLOOMFILTER => 'ROW',
> hbase(main):003:1* VERSIONS => '1',
> hbase(main):004:1* COMPRESSION => 'SNAPPY',
> hbase(main):005:1* MIN_VERSIONS => '0',
> hbase(main):006:1* SPLITS =>['/++ASUZm4u7YsTcF/VtK6Q==',
> hbase(main):007:2* '/zyuFR1VmhJyF4rbWsFnEg==',
> hbase(main):008:2* '0sZYnBd83ul58d1O8I2JnA==',
> hbase(main):009:2* '2+03N7IicZH3ltrqZUX6kQ==',
> hbase(main):010:2* '4+/slRQtkBDU7Px6C9MAbg==',
> hbase(main):011:2* '6+1dGCQ/IBrCsrNQXe/9xQ==',
> hbase(main):012:2* '7+2pvtpHUQHWkZJoouR9wQ==',
> hbase(main):013:2* '8+4n2deXhzmrpe//2Fo6Fg==',
> hbase(main):014:2* '9+4SKW/BmNzpL68cXwKV1Q==',
> hbase(main):015:2* 'A+4ajStFkjEMf36cX5D9xg==',
> hbase(main):016:2* 'B+6Zm6Kccb3l6iM2L0epxQ==',
> hbase(main):017:2* 'C+6lKKDiOWl5qrRn72fNCw==',
> hbase(main):018:2* 'D+6dZMyn7m+NhJ7G07gqaw==',
> hbase(main):019:2* 'E+6BrimmrpAd92gZJ5hyMw==',
> hbase(main):020:2* 'G+5tisu4xWZMOJnDHeYBJg==',
> hbase(main):021:2* 'I+7fRy4dvqcM/L6dFRQk9g==',
> hbase(main):022:2* 'J+8ECMw1zeOyjfOg/ypXJA==',
> hbase(main):023:2* 'K+7tenLYn6a1aNLniL6tbg==',]}
> 0 row(s) in 1.8010 seconds
>
> hbase(main):024:0> describe 'ADMd5'
> DESCRIPTION                                        ENABLED
>
>  {NAME => 'ADMd5', FAMILIES => [{NAME => 'a', BLOO true
>
>  MFILTER => 'NONE', REPLICATION_SCOPE => '0', VERS
>
>  IONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS
>
>  => '0', TTL => '2147483647', BLOCKSIZE => '65536'
>
>  , IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
>
> 1 row(s) in 0.0420 seconds
>
>
>
> On Thu, Aug 7, 2014 at 5:50 PM, Jean-Marc Spaggiari <
> jean-m...@spaggiari.org> wrote:
>
>> Hi Colin,
>>
>> Just to make sure.
>>
>> Is table A from the source cluster and not compressed, and table B in the
>> destination cluster and SNAPPY compressed? Is that correct? Then ratio
>> should be the opposite. Are you able to du -h from hadoop to see if all
>> regions are evenly bigger or if anything else is wrong?
>>
>>
>> 2014-08-07 20:44 GMT-04:00 Colin Kincaid Williams <disc...@uw.edu>:
>>
>> > I haven't yet tried to major compact table B. I will look up some
>> > documentation on WALs and snapshots to find this information in the hdfs
>> > filesystem tomorrow. Could it be caused by the bloomfilter existing on
>> > table B, but not table A? The funny thing is the source table is smaller
>> > than the destination.
>> >
>> >
>> > On Thu, Aug 7, 2014 at 4:50 PM, Esteban Gutierrez <este...@cloudera.com
>> >
>> > wrote:
>> >
>> > > Hi Colin,
>> > >
>> > > Have you verified if the content of /a_d includes WALs and/or the
>> content
>> > > of the snapshots or the HBase archive? have you tried to major compact
>> > > table B?  does it makes any difference?
>> > >
>> > > regards,
>> > > esteban.
>> > >
>> > >
>> > >
>> > > --
>> > > Cloudera, Inc.
>> > >
>> > >
>> > >
>> > > On Thu, Aug 7, 2014 at 2:00 PM, Colin Kincaid Williams <
>> disc...@uw.edu>
>> > > wrote:
>> > >
>> > > > I used the copy table command to copy a database between the
>> original
>> > > > cluster A and a new cluster B. I have noticed that the rootdir is
>> > larger
>> > > > than 2X the size of the original. I am trying to account for such a
>> > large
>> > > > difference. The following are some details about the table.
>> > > >
>> > > >
>> > > > I'm trying to figure out why my copied table is more than 2X the
>> size
>> > of
>> > > > the original table. Could the bloomfilter itself account for this?
>> > > >
>> > > > The guide I used as a reference:
>> > > >
>> > > >
>> > >
>> >
>> http://blog.pivotal.io/pivotal/products/migrating-an-apache-hbase-table-between-different-clusters
>> > > >
>> > > >
>> > > >
>> > > > Supposedly the original command used to create the table on cluster
>> A:
>> > > >
>> > > > create 'ADMd5', {NAME => 'a', BLOOMFILTER => 'ROW', VERSIONS => '1',
>> > > > COMPRESSION => 'SNAPPY', MIN_VERSIONS => '0'}
>> > > >
>> > > >
>> > > > How I created the target table on cluster B:
>> > > >
>> > > > create 'ADMd5','a',{
>> > > >
>> > > >
>> > > >
>> > > > BLOOMFILTER => 'ROW',
>> > > > VERSIONS => '1',
>> > > > COMPRESSION => 'SNAPPY',
>> > > > MIN_VERSIONS => '0',
>> > > > SPLITS =>['/++ASUZm4u7YsTcF/VtK6Q==',
>> > > > '/zyuFR1VmhJyF4rbWsFnEg==',
>> > > > '0sZYnBd83ul58d1O8I2JnA==',
>> > > > '2+03N7IicZH3ltrqZUX6kQ==',
>> > > > '4+/slRQtkBDU7Px6C9MAbg==',
>> > > > '6+1dGCQ/IBrCsrNQXe/9xQ==',
>> > > > '7+2pvtpHUQHWkZJoouR9wQ==',
>> > > > '8+4n2deXhzmrpe//2Fo6Fg==',
>> > > > '9+4SKW/BmNzpL68cXwKV1Q==',
>> > > > 'A+4ajStFkjEMf36cX5D9xg==',
>> > > > 'B+6Zm6Kccb3l6iM2L0epxQ==',
>> > > > 'C+6lKKDiOWl5qrRn72fNCw==',
>> > > > 'D+6dZMyn7m+NhJ7G07gqaw==',
>> > > > 'E+6BrimmrpAd92gZJ5hyMw==',
>> > > > 'G+5tisu4xWZMOJnDHeYBJg==',
>> > > > 'I+7fRy4dvqcM/L6dFRQk9g==',
>> > > > 'J+8ECMw1zeOyjfOg/ypXJA==',
>> > > > 'K+7tenLYn6a1aNLniL6tbg==']}
>> > > >
>> > > >
>> > > > How the tables now appear in hbase shell:
>> > > >
>> > > > table A:
>> > > >
>> > > > describe 'ADMd5'
>> > > > DESCRIPTION
>> > > >
>> > > >   ENABLED
>> > > >
>> > > >  {NAME => 'ADMd5', FAMILIES => [{NAME => 'a', BLOOMFILTER => 'NONE',
>> > > > REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE',
>> > MIN_VER
>> > > > true
>> > > >
>> > > >  SIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536',
>> IN_MEMORY =>
>> > > > 'false', BLOCKCACHE => 'true'}]}
>> > > >
>> > > >
>> > > > 1 row(s) in 0.0370 seconds
>> > > >
>> > > >
>> > > > table B:
>> > > >
>> > > > hbase(main):003:0> describe 'ADMd5'
>> > > > DESCRIPTION
>> > > >
>> > > >   ENABLED
>> > > >
>> > > >  {NAME => 'ADMd5', FAMILIES => [{NAME => 'a', BLOOMFILTER => 'ROW',
>> > > > REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'SNAPPY',
>> > > MIN_VE
>> > > > true
>> > > >
>> > > >  RSIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536',
>> IN_MEMORY =>
>> > > > 'false', BLOCKCACHE => 'true'}]}
>> > > >
>> > > >
>> > > > 1 row(s) in 0.0280 seconds
>> > > >
>> > > >
>> > > >
>> > > > The containing foldersize in hdfs:
>> > > > table A:
>> > > > sudo -u hdfs hadoop fs -dus -h /a_d
>> > > > dus: DEPRECATED: Please use 'du -s' instead.
>> > > > 227.4g  /a_d
>> > > >
>> > > > table B:
>> > > > sudo -u hdfs hadoop fs -dus -h /a_d
>> > > > dus: DEPRECATED: Please use 'du -s' instead.
>> > > > 501.0g  /a_d
>> > > >
>> > > >
>> > > > https://gist.github.com/drocsid/80bba7b6b19d64fde6c2
>> > > >
>> > >
>> >
>>
>
>

Re: Large discrepancy in hdfs hbase rootdir size after copytable operation.

Reply via email to