The correct syntax is : create 'ADMd5',{ NAME => 'a', VERSIONS => '1', COMPRESSION => 'SNAPPY', BLOOMFILTER => 'ROW', }, { SPLITS => ['/++ASUZm4u7YsTcF/VtK6Q==', '/zyuFR1VmhJyF4rbWsFnEg==', '0sZYnBd83ul58d1O8I2JnA==', '2+03N7IicZH3ltrqZUX6kQ==', '4+/slRQtkBDU7Px6C9MAbg==', '6+1dGCQ/IBrCsrNQXe/9xQ==', '7+2pvtpHUQHWkZJoouR9wQ==', '8+4n2deXhzmrpe//2Fo6Fg==', '9+4SKW/BmNzpL68cXwKV1Q==', 'A+4ajStFkjEMf36cX5D9xg==', 'B+6Zm6Kccb3l6iM2L0epxQ==', 'C+6lKKDiOWl5qrRn72fNCw==', 'D+6dZMyn7m+NhJ7G07gqaw==', 'E+6BrimmrpAd92gZJ5hyMw==', 'G+5tisu4xWZMOJnDHeYBJg==', 'I+7fRy4dvqcM/L6dFRQk9g==', 'J+8ECMw1zeOyjfOg/ypXJA==', 'K+7tenLYn6a1aNLniL6tbg==',] }
On Fri, Aug 8, 2014 at 12:23 PM, Colin Kincaid Williams <disc...@uw.edu> wrote: > I have discovered the error. I made the mistake regarding the compression > and the bloom filter. The new table doesn't have them enabled, and the old > does. However I'm wondering how I can create tables with splits and bf and > compression enabled. Shouldn't the following command return an error? > > hbase(main):001:0> create 'ADMd5','a',{ > > hbase(main):002:1* BLOOMFILTER => 'ROW', > hbase(main):003:1* VERSIONS => '1', > hbase(main):004:1* COMPRESSION => 'SNAPPY', > hbase(main):005:1* MIN_VERSIONS => '0', > hbase(main):006:1* SPLITS =>['/++ASUZm4u7YsTcF/VtK6Q==', > hbase(main):007:2* '/zyuFR1VmhJyF4rbWsFnEg==', > hbase(main):008:2* '0sZYnBd83ul58d1O8I2JnA==', > hbase(main):009:2* '2+03N7IicZH3ltrqZUX6kQ==', > hbase(main):010:2* '4+/slRQtkBDU7Px6C9MAbg==', > hbase(main):011:2* '6+1dGCQ/IBrCsrNQXe/9xQ==', > hbase(main):012:2* '7+2pvtpHUQHWkZJoouR9wQ==', > hbase(main):013:2* '8+4n2deXhzmrpe//2Fo6Fg==', > hbase(main):014:2* '9+4SKW/BmNzpL68cXwKV1Q==', > hbase(main):015:2* 'A+4ajStFkjEMf36cX5D9xg==', > hbase(main):016:2* 'B+6Zm6Kccb3l6iM2L0epxQ==', > hbase(main):017:2* 'C+6lKKDiOWl5qrRn72fNCw==', > hbase(main):018:2* 'D+6dZMyn7m+NhJ7G07gqaw==', > hbase(main):019:2* 'E+6BrimmrpAd92gZJ5hyMw==', > hbase(main):020:2* 'G+5tisu4xWZMOJnDHeYBJg==', > hbase(main):021:2* 'I+7fRy4dvqcM/L6dFRQk9g==', > hbase(main):022:2* 'J+8ECMw1zeOyjfOg/ypXJA==', > hbase(main):023:2* 'K+7tenLYn6a1aNLniL6tbg==',]} > 0 row(s) in 1.8010 seconds > > hbase(main):024:0> describe 'ADMd5' > DESCRIPTION ENABLED > > {NAME => 'ADMd5', FAMILIES => [{NAME => 'a', BLOO true > > MFILTER => 'NONE', REPLICATION_SCOPE => '0', VERS > > IONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS > > => '0', TTL => '2147483647', BLOCKSIZE => '65536' > > , IN_MEMORY => 'false', BLOCKCACHE => 'true'}]} > > 1 row(s) in 0.0420 seconds > > > > On Thu, Aug 7, 2014 at 5:50 PM, Jean-Marc Spaggiari < > jean-m...@spaggiari.org> wrote: > >> Hi Colin, >> >> Just to make sure. >> >> Is table A from the source cluster and not compressed, and table B in the >> destination cluster and SNAPPY compressed? Is that correct? Then ratio >> should be the opposite. Are you able to du -h from hadoop to see if all >> regions are evenly bigger or if anything else is wrong? >> >> >> 2014-08-07 20:44 GMT-04:00 Colin Kincaid Williams <disc...@uw.edu>: >> >> > I haven't yet tried to major compact table B. I will look up some >> > documentation on WALs and snapshots to find this information in the hdfs >> > filesystem tomorrow. Could it be caused by the bloomfilter existing on >> > table B, but not table A? The funny thing is the source table is smaller >> > than the destination. >> > >> > >> > On Thu, Aug 7, 2014 at 4:50 PM, Esteban Gutierrez <este...@cloudera.com >> > >> > wrote: >> > >> > > Hi Colin, >> > > >> > > Have you verified if the content of /a_d includes WALs and/or the >> content >> > > of the snapshots or the HBase archive? have you tried to major compact >> > > table B? does it makes any difference? >> > > >> > > regards, >> > > esteban. >> > > >> > > >> > > >> > > -- >> > > Cloudera, Inc. >> > > >> > > >> > > >> > > On Thu, Aug 7, 2014 at 2:00 PM, Colin Kincaid Williams < >> disc...@uw.edu> >> > > wrote: >> > > >> > > > I used the copy table command to copy a database between the >> original >> > > > cluster A and a new cluster B. I have noticed that the rootdir is >> > larger >> > > > than 2X the size of the original. I am trying to account for such a >> > large >> > > > difference. The following are some details about the table. >> > > > >> > > > >> > > > I'm trying to figure out why my copied table is more than 2X the >> size >> > of >> > > > the original table. Could the bloomfilter itself account for this? >> > > > >> > > > The guide I used as a reference: >> > > > >> > > > >> > > >> > >> http://blog.pivotal.io/pivotal/products/migrating-an-apache-hbase-table-between-different-clusters >> > > > >> > > > >> > > > >> > > > Supposedly the original command used to create the table on cluster >> A: >> > > > >> > > > create 'ADMd5', {NAME => 'a', BLOOMFILTER => 'ROW', VERSIONS => '1', >> > > > COMPRESSION => 'SNAPPY', MIN_VERSIONS => '0'} >> > > > >> > > > >> > > > How I created the target table on cluster B: >> > > > >> > > > create 'ADMd5','a',{ >> > > > >> > > > >> > > > >> > > > BLOOMFILTER => 'ROW', >> > > > VERSIONS => '1', >> > > > COMPRESSION => 'SNAPPY', >> > > > MIN_VERSIONS => '0', >> > > > SPLITS =>['/++ASUZm4u7YsTcF/VtK6Q==', >> > > > '/zyuFR1VmhJyF4rbWsFnEg==', >> > > > '0sZYnBd83ul58d1O8I2JnA==', >> > > > '2+03N7IicZH3ltrqZUX6kQ==', >> > > > '4+/slRQtkBDU7Px6C9MAbg==', >> > > > '6+1dGCQ/IBrCsrNQXe/9xQ==', >> > > > '7+2pvtpHUQHWkZJoouR9wQ==', >> > > > '8+4n2deXhzmrpe//2Fo6Fg==', >> > > > '9+4SKW/BmNzpL68cXwKV1Q==', >> > > > 'A+4ajStFkjEMf36cX5D9xg==', >> > > > 'B+6Zm6Kccb3l6iM2L0epxQ==', >> > > > 'C+6lKKDiOWl5qrRn72fNCw==', >> > > > 'D+6dZMyn7m+NhJ7G07gqaw==', >> > > > 'E+6BrimmrpAd92gZJ5hyMw==', >> > > > 'G+5tisu4xWZMOJnDHeYBJg==', >> > > > 'I+7fRy4dvqcM/L6dFRQk9g==', >> > > > 'J+8ECMw1zeOyjfOg/ypXJA==', >> > > > 'K+7tenLYn6a1aNLniL6tbg==']} >> > > > >> > > > >> > > > How the tables now appear in hbase shell: >> > > > >> > > > table A: >> > > > >> > > > describe 'ADMd5' >> > > > DESCRIPTION >> > > > >> > > > ENABLED >> > > > >> > > > {NAME => 'ADMd5', FAMILIES => [{NAME => 'a', BLOOMFILTER => 'NONE', >> > > > REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', >> > MIN_VER >> > > > true >> > > > >> > > > SIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536', >> IN_MEMORY => >> > > > 'false', BLOCKCACHE => 'true'}]} >> > > > >> > > > >> > > > 1 row(s) in 0.0370 seconds >> > > > >> > > > >> > > > table B: >> > > > >> > > > hbase(main):003:0> describe 'ADMd5' >> > > > DESCRIPTION >> > > > >> > > > ENABLED >> > > > >> > > > {NAME => 'ADMd5', FAMILIES => [{NAME => 'a', BLOOMFILTER => 'ROW', >> > > > REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'SNAPPY', >> > > MIN_VE >> > > > true >> > > > >> > > > RSIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536', >> IN_MEMORY => >> > > > 'false', BLOCKCACHE => 'true'}]} >> > > > >> > > > >> > > > 1 row(s) in 0.0280 seconds >> > > > >> > > > >> > > > >> > > > The containing foldersize in hdfs: >> > > > table A: >> > > > sudo -u hdfs hadoop fs -dus -h /a_d >> > > > dus: DEPRECATED: Please use 'du -s' instead. >> > > > 227.4g /a_d >> > > > >> > > > table B: >> > > > sudo -u hdfs hadoop fs -dus -h /a_d >> > > > dus: DEPRECATED: Please use 'du -s' instead. >> > > > 501.0g /a_d >> > > > >> > > > >> > > > https://gist.github.com/drocsid/80bba7b6b19d64fde6c2 >> > > > >> > > >> > >> > >