[jira] [Created] (HBASE-10389) Add namespace help info in table related shell commands
Jerry He created HBASE-10389: Summary: Add namespace help info in table related shell commands Key: HBASE-10389 URL: https://issues.apache.org/jira/browse/HBASE-10389 Project: HBase Issue Type: Improvement Components: shell Affects Versions: 0.96.1, 0.96.0 Reporter: Jerry He Fix For: 0.98.0, 0.96.2 Currently in the help info of table related shell command, we don't mention or give namespace as part of the table name. For example, to create table: {code} hbase(main):001:0 help 'create' Creates a table. Pass a table name, and a set of column family specifications (at least one), and, optionally, table configuration. Column specification can be a simple string (name), or a dictionary (dictionaries are described below in main help output), necessarily including NAME attribute. Examples: hbase create 't1', {NAME = 'f1', VERSIONS = 5} hbase create 't1', {NAME = 'f1'}, {NAME = 'f2'}, {NAME = 'f3'} hbase # The above in shorthand would be the following: hbase create 't1', 'f1', 'f2', 'f3' hbase create 't1', {NAME = 'f1', VERSIONS = 1, TTL = 2592000, BLOCKCACHE = true} hbase create 't1', {NAME = 'f1', CONFIGURATION = {'hbase.hstore.blockingStoreFiles' = '10'}} Table configuration options can be put at the end. Examples: hbase create 't1', 'f1', SPLITS = ['10', '20', '30', '40'] hbase create 't1', 'f1', SPLITS_FILE = 'splits.txt', OWNER = 'johndoe' hbase create 't1', {NAME = 'f1', VERSIONS = 5}, METADATA = { 'mykey' = 'myvalue' } hbase # Optionally pre-split the table into NUMREGIONS, using hbase # SPLITALGO (HexStringSplit, UniformSplit or classname) hbase create 't1', 'f1', {NUMREGIONS = 15, SPLITALGO = 'HexStringSplit'} hbase create 't1', 'f1', {NUMREGIONS = 15, SPLITALGO = 'HexStringSplit', CONFIGURATION = {'hbase.hregion.scan.loadColumnFamiliesOnDemand' = 'true'}} You can also keep around a reference to the created table: hbase t1 = create 't1', 'f1' Which gives you a reference to the table named 't1', on which you can then call methods. {code} We should document the usage of namespace in these commands. For example: #namespace=foo and table qualifier=bar create 'foo:bar', 'fam' #namespace=default and table qualifier=bar create 'bar', 'fam' -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HBASE-10389) Add namespace help info in table related shell commands
[ https://issues.apache.org/jira/browse/HBASE-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He reassigned HBASE-10389: Assignee: Jerry He Add namespace help info in table related shell commands --- Key: HBASE-10389 URL: https://issues.apache.org/jira/browse/HBASE-10389 Project: HBase Issue Type: Improvement Components: shell Affects Versions: 0.96.0, 0.96.1 Reporter: Jerry He Assignee: Jerry He Fix For: 0.98.0, 0.96.2 Currently in the help info of table related shell command, we don't mention or give namespace as part of the table name. For example, to create table: {code} hbase(main):001:0 help 'create' Creates a table. Pass a table name, and a set of column family specifications (at least one), and, optionally, table configuration. Column specification can be a simple string (name), or a dictionary (dictionaries are described below in main help output), necessarily including NAME attribute. Examples: hbase create 't1', {NAME = 'f1', VERSIONS = 5} hbase create 't1', {NAME = 'f1'}, {NAME = 'f2'}, {NAME = 'f3'} hbase # The above in shorthand would be the following: hbase create 't1', 'f1', 'f2', 'f3' hbase create 't1', {NAME = 'f1', VERSIONS = 1, TTL = 2592000, BLOCKCACHE = true} hbase create 't1', {NAME = 'f1', CONFIGURATION = {'hbase.hstore.blockingStoreFiles' = '10'}} Table configuration options can be put at the end. Examples: hbase create 't1', 'f1', SPLITS = ['10', '20', '30', '40'] hbase create 't1', 'f1', SPLITS_FILE = 'splits.txt', OWNER = 'johndoe' hbase create 't1', {NAME = 'f1', VERSIONS = 5}, METADATA = { 'mykey' = 'myvalue' } hbase # Optionally pre-split the table into NUMREGIONS, using hbase # SPLITALGO (HexStringSplit, UniformSplit or classname) hbase create 't1', 'f1', {NUMREGIONS = 15, SPLITALGO = 'HexStringSplit'} hbase create 't1', 'f1', {NUMREGIONS = 15, SPLITALGO = 'HexStringSplit', CONFIGURATION = {'hbase.hregion.scan.loadColumnFamiliesOnDemand' = 'true'}} You can also keep around a reference to the created table: hbase t1 = create 't1', 'f1' Which gives you a reference to the table named 't1', on which you can then call methods. {code} We should document the usage of namespace in these commands. For example: #namespace=foo and table qualifier=bar create 'foo:bar', 'fam' #namespace=default and table qualifier=bar create 'bar', 'fam' -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10389) Add namespace help info in table related shell commands
[ https://issues.apache.org/jira/browse/HBASE-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878319#comment-13878319 ] Jerry He commented on HBASE-10389: -- I will do an initial survey of those commands, and get comments from there. Add namespace help info in table related shell commands --- Key: HBASE-10389 URL: https://issues.apache.org/jira/browse/HBASE-10389 Project: HBase Issue Type: Improvement Components: shell Affects Versions: 0.96.0, 0.96.1 Reporter: Jerry He Assignee: Jerry He Fix For: 0.98.0, 0.96.2 Currently in the help info of table related shell command, we don't mention or give namespace as part of the table name. For example, to create table: {code} hbase(main):001:0 help 'create' Creates a table. Pass a table name, and a set of column family specifications (at least one), and, optionally, table configuration. Column specification can be a simple string (name), or a dictionary (dictionaries are described below in main help output), necessarily including NAME attribute. Examples: hbase create 't1', {NAME = 'f1', VERSIONS = 5} hbase create 't1', {NAME = 'f1'}, {NAME = 'f2'}, {NAME = 'f3'} hbase # The above in shorthand would be the following: hbase create 't1', 'f1', 'f2', 'f3' hbase create 't1', {NAME = 'f1', VERSIONS = 1, TTL = 2592000, BLOCKCACHE = true} hbase create 't1', {NAME = 'f1', CONFIGURATION = {'hbase.hstore.blockingStoreFiles' = '10'}} Table configuration options can be put at the end. Examples: hbase create 't1', 'f1', SPLITS = ['10', '20', '30', '40'] hbase create 't1', 'f1', SPLITS_FILE = 'splits.txt', OWNER = 'johndoe' hbase create 't1', {NAME = 'f1', VERSIONS = 5}, METADATA = { 'mykey' = 'myvalue' } hbase # Optionally pre-split the table into NUMREGIONS, using hbase # SPLITALGO (HexStringSplit, UniformSplit or classname) hbase create 't1', 'f1', {NUMREGIONS = 15, SPLITALGO = 'HexStringSplit'} hbase create 't1', 'f1', {NUMREGIONS = 15, SPLITALGO = 'HexStringSplit', CONFIGURATION = {'hbase.hregion.scan.loadColumnFamiliesOnDemand' = 'true'}} You can also keep around a reference to the created table: hbase t1 = create 't1', 'f1' Which gives you a reference to the table named 't1', on which you can then call methods. {code} We should document the usage of namespace in these commands. For example: #namespace=foo and table qualifier=bar create 'foo:bar', 'fam' #namespace=default and table qualifier=bar create 'bar', 'fam' -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10448) ZKUtil create and watch methods don't set watch in some cases
Jerry He created HBASE-10448: Summary: ZKUtil create and watch methods don't set watch in some cases Key: HBASE-10448 URL: https://issues.apache.org/jira/browse/HBASE-10448 Project: HBase Issue Type: Bug Components: Zookeeper Affects Versions: 0.96.1.1, 0.96.0 Reporter: Jerry He Fix For: 0.98.1 While using the ZKUtil methods during testing, I found that watch was not set when it should be set based on the methods and method comments: createNodeIfNotExistsAndWatch createEphemeralNodeAndWatch For example, in createNodeIfNotExistsAndWatch(): {code} public static boolean createNodeIfNotExistsAndWatch( ZooKeeperWatcher zkw, String znode, byte [] data) throws KeeperException { try { zkw.getRecoverableZooKeeper().create(znode, data, createACL(zkw, znode), CreateMode.PERSISTENT); } catch (KeeperException.NodeExistsException nee) { try { zkw.getRecoverableZooKeeper().exists(znode, zkw); } catch (InterruptedException e) { zkw.interruptedException(e); return false; } return false; } catch (InterruptedException e) { zkw.interruptedException(e); return false; } return true; } {code} The watch is only set via exists() call when the node already exists. Similarly in createEphemeralNodeAndWatch(): {code} public static boolean createEphemeralNodeAndWatch(ZooKeeperWatcher zkw, String znode, byte [] data) throws KeeperException { try { zkw.getRecoverableZooKeeper().create(znode, data, createACL(zkw, znode), CreateMode.EPHEMERAL); } catch (KeeperException.NodeExistsException nee) { if(!watchAndCheckExists(zkw, znode)) { // It did exist but now it doesn't, try again return createEphemeralNodeAndWatch(zkw, znode, data); } return false; } catch (InterruptedException e) { LOG.info(Interrupted, e); Thread.currentThread().interrupt(); } return true; } {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10448) ZKUtil create and watch methods don't set watch in some cases
[ https://issues.apache.org/jira/browse/HBASE-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13888016#comment-13888016 ] Jerry He commented on HBASE-10448: -- I wonder how the callers/users of these 2 methods actually worked. A guess is that they don't really depend on the watches being set in the cases that the watches are not set. ZKUtil create and watch methods don't set watch in some cases - Key: HBASE-10448 URL: https://issues.apache.org/jira/browse/HBASE-10448 Project: HBase Issue Type: Bug Components: Zookeeper Affects Versions: 0.96.0, 0.96.1.1 Reporter: Jerry He Fix For: 0.98.1 While using the ZKUtil methods during testing, I found that watch was not set when it should be set based on the methods and method comments: createNodeIfNotExistsAndWatch createEphemeralNodeAndWatch For example, in createNodeIfNotExistsAndWatch(): {code} public static boolean createNodeIfNotExistsAndWatch( ZooKeeperWatcher zkw, String znode, byte [] data) throws KeeperException { try { zkw.getRecoverableZooKeeper().create(znode, data, createACL(zkw, znode), CreateMode.PERSISTENT); } catch (KeeperException.NodeExistsException nee) { try { zkw.getRecoverableZooKeeper().exists(znode, zkw); } catch (InterruptedException e) { zkw.interruptedException(e); return false; } return false; } catch (InterruptedException e) { zkw.interruptedException(e); return false; } return true; } {code} The watch is only set via exists() call when the node already exists. Similarly in createEphemeralNodeAndWatch(): {code} public static boolean createEphemeralNodeAndWatch(ZooKeeperWatcher zkw, String znode, byte [] data) throws KeeperException { try { zkw.getRecoverableZooKeeper().create(znode, data, createACL(zkw, znode), CreateMode.EPHEMERAL); } catch (KeeperException.NodeExistsException nee) { if(!watchAndCheckExists(zkw, znode)) { // It did exist but now it doesn't, try again return createEphemeralNodeAndWatch(zkw, znode, data); } return false; } catch (InterruptedException e) { LOG.info(Interrupted, e); Thread.currentThread().interrupt(); } return true; } {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10448) ZKUtil create and watch methods don't set watch in some cases
[ https://issues.apache.org/jira/browse/HBASE-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13888441#comment-13888441 ] Jerry He commented on HBASE-10448: -- Found this HBASE-8937 that reported similar problem. ZKUtil create and watch methods don't set watch in some cases - Key: HBASE-10448 URL: https://issues.apache.org/jira/browse/HBASE-10448 Project: HBase Issue Type: Bug Components: Zookeeper Affects Versions: 0.96.0, 0.96.1.1 Reporter: Jerry He Fix For: 0.98.1 While using the ZKUtil methods during testing, I found that watch was not set when it should be set based on the methods and method comments: createNodeIfNotExistsAndWatch createEphemeralNodeAndWatch For example, in createNodeIfNotExistsAndWatch(): {code} public static boolean createNodeIfNotExistsAndWatch( ZooKeeperWatcher zkw, String znode, byte [] data) throws KeeperException { try { zkw.getRecoverableZooKeeper().create(znode, data, createACL(zkw, znode), CreateMode.PERSISTENT); } catch (KeeperException.NodeExistsException nee) { try { zkw.getRecoverableZooKeeper().exists(znode, zkw); } catch (InterruptedException e) { zkw.interruptedException(e); return false; } return false; } catch (InterruptedException e) { zkw.interruptedException(e); return false; } return true; } {code} The watch is only set via exists() call when the node already exists. Similarly in createEphemeralNodeAndWatch(): {code} public static boolean createEphemeralNodeAndWatch(ZooKeeperWatcher zkw, String znode, byte [] data) throws KeeperException { try { zkw.getRecoverableZooKeeper().create(znode, data, createACL(zkw, znode), CreateMode.EPHEMERAL); } catch (KeeperException.NodeExistsException nee) { if(!watchAndCheckExists(zkw, znode)) { // It did exist but now it doesn't, try again return createEphemeralNodeAndWatch(zkw, znode, data); } return false; } catch (InterruptedException e) { LOG.info(Interrupted, e); Thread.currentThread().interrupt(); } return true; } {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10448) ZKUtil create and watch methods don't set watch in some cases
[ https://issues.apache.org/jira/browse/HBASE-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-10448: - Attachment: HBASE-10448-trunk.patch ZKUtil create and watch methods don't set watch in some cases - Key: HBASE-10448 URL: https://issues.apache.org/jira/browse/HBASE-10448 Project: HBase Issue Type: Bug Components: Zookeeper Affects Versions: 0.96.0, 0.96.1.1 Reporter: Jerry He Fix For: 0.98.1 Attachments: HBASE-10448-trunk.patch While using the ZKUtil methods during testing, I found that watch was not set when it should be set based on the methods and method comments: createNodeIfNotExistsAndWatch createEphemeralNodeAndWatch For example, in createNodeIfNotExistsAndWatch(): {code} public static boolean createNodeIfNotExistsAndWatch( ZooKeeperWatcher zkw, String znode, byte [] data) throws KeeperException { try { zkw.getRecoverableZooKeeper().create(znode, data, createACL(zkw, znode), CreateMode.PERSISTENT); } catch (KeeperException.NodeExistsException nee) { try { zkw.getRecoverableZooKeeper().exists(znode, zkw); } catch (InterruptedException e) { zkw.interruptedException(e); return false; } return false; } catch (InterruptedException e) { zkw.interruptedException(e); return false; } return true; } {code} The watch is only set via exists() call when the node already exists. Similarly in createEphemeralNodeAndWatch(): {code} public static boolean createEphemeralNodeAndWatch(ZooKeeperWatcher zkw, String znode, byte [] data) throws KeeperException { try { zkw.getRecoverableZooKeeper().create(znode, data, createACL(zkw, znode), CreateMode.EPHEMERAL); } catch (KeeperException.NodeExistsException nee) { if(!watchAndCheckExists(zkw, znode)) { // It did exist but now it doesn't, try again return createEphemeralNodeAndWatch(zkw, znode, data); } return false; } catch (InterruptedException e) { LOG.info(Interrupted, e); Thread.currentThread().interrupt(); } return true; } {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10448) ZKUtil create and watch methods don't set watch in some cases
[ https://issues.apache.org/jira/browse/HBASE-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-10448: - Status: Patch Available (was: Open) ZKUtil create and watch methods don't set watch in some cases - Key: HBASE-10448 URL: https://issues.apache.org/jira/browse/HBASE-10448 Project: HBase Issue Type: Bug Components: Zookeeper Affects Versions: 0.96.1.1, 0.96.0 Reporter: Jerry He Fix For: 0.98.1 Attachments: HBASE-10448-trunk.patch While using the ZKUtil methods during testing, I found that watch was not set when it should be set based on the methods and method comments: createNodeIfNotExistsAndWatch createEphemeralNodeAndWatch For example, in createNodeIfNotExistsAndWatch(): {code} public static boolean createNodeIfNotExistsAndWatch( ZooKeeperWatcher zkw, String znode, byte [] data) throws KeeperException { try { zkw.getRecoverableZooKeeper().create(znode, data, createACL(zkw, znode), CreateMode.PERSISTENT); } catch (KeeperException.NodeExistsException nee) { try { zkw.getRecoverableZooKeeper().exists(znode, zkw); } catch (InterruptedException e) { zkw.interruptedException(e); return false; } return false; } catch (InterruptedException e) { zkw.interruptedException(e); return false; } return true; } {code} The watch is only set via exists() call when the node already exists. Similarly in createEphemeralNodeAndWatch(): {code} public static boolean createEphemeralNodeAndWatch(ZooKeeperWatcher zkw, String znode, byte [] data) throws KeeperException { try { zkw.getRecoverableZooKeeper().create(znode, data, createACL(zkw, znode), CreateMode.EPHEMERAL); } catch (KeeperException.NodeExistsException nee) { if(!watchAndCheckExists(zkw, znode)) { // It did exist but now it doesn't, try again return createEphemeralNodeAndWatch(zkw, znode, data); } return false; } catch (InterruptedException e) { LOG.info(Interrupted, e); Thread.currentThread().interrupt(); } return true; } {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10448) ZKUtil create and watch methods don't set watch in some cases
[ https://issues.apache.org/jira/browse/HBASE-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13888455#comment-13888455 ] Jerry He commented on HBASE-10448: -- Attached a patch that set the watch whether or not we get NodeExistsException. I tried to avoid any other change in behavior of these two methods. ZKUtil create and watch methods don't set watch in some cases - Key: HBASE-10448 URL: https://issues.apache.org/jira/browse/HBASE-10448 Project: HBase Issue Type: Bug Components: Zookeeper Affects Versions: 0.96.0, 0.96.1.1 Reporter: Jerry He Fix For: 0.98.1 Attachments: HBASE-10448-trunk.patch While using the ZKUtil methods during testing, I found that watch was not set when it should be set based on the methods and method comments: createNodeIfNotExistsAndWatch createEphemeralNodeAndWatch For example, in createNodeIfNotExistsAndWatch(): {code} public static boolean createNodeIfNotExistsAndWatch( ZooKeeperWatcher zkw, String znode, byte [] data) throws KeeperException { try { zkw.getRecoverableZooKeeper().create(znode, data, createACL(zkw, znode), CreateMode.PERSISTENT); } catch (KeeperException.NodeExistsException nee) { try { zkw.getRecoverableZooKeeper().exists(znode, zkw); } catch (InterruptedException e) { zkw.interruptedException(e); return false; } return false; } catch (InterruptedException e) { zkw.interruptedException(e); return false; } return true; } {code} The watch is only set via exists() call when the node already exists. Similarly in createEphemeralNodeAndWatch(): {code} public static boolean createEphemeralNodeAndWatch(ZooKeeperWatcher zkw, String znode, byte [] data) throws KeeperException { try { zkw.getRecoverableZooKeeper().create(znode, data, createACL(zkw, znode), CreateMode.EPHEMERAL); } catch (KeeperException.NodeExistsException nee) { if(!watchAndCheckExists(zkw, znode)) { // It did exist but now it doesn't, try again return createEphemeralNodeAndWatch(zkw, znode, data); } return false; } catch (InterruptedException e) { LOG.info(Interrupted, e); Thread.currentThread().interrupt(); } return true; } {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10448) ZKUtil create and watch methods don't set watch in some cases
[ https://issues.apache.org/jira/browse/HBASE-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13888792#comment-13888792 ] Jerry He commented on HBASE-10448: -- Thanks, Ted, Andrew. Could you also mark HBASE-8937 as resolved as dup so that we have closure on that one too? ZKUtil create and watch methods don't set watch in some cases - Key: HBASE-10448 URL: https://issues.apache.org/jira/browse/HBASE-10448 Project: HBase Issue Type: Bug Components: Zookeeper Affects Versions: 0.96.0, 0.96.1.1 Reporter: Jerry He Assignee: Jerry He Fix For: 0.98.0, 0.99.0 Attachments: HBASE-10448-trunk.patch While using the ZKUtil methods during testing, I found that watch was not set when it should be set based on the methods and method comments: createNodeIfNotExistsAndWatch createEphemeralNodeAndWatch For example, in createNodeIfNotExistsAndWatch(): {code} public static boolean createNodeIfNotExistsAndWatch( ZooKeeperWatcher zkw, String znode, byte [] data) throws KeeperException { try { zkw.getRecoverableZooKeeper().create(znode, data, createACL(zkw, znode), CreateMode.PERSISTENT); } catch (KeeperException.NodeExistsException nee) { try { zkw.getRecoverableZooKeeper().exists(znode, zkw); } catch (InterruptedException e) { zkw.interruptedException(e); return false; } return false; } catch (InterruptedException e) { zkw.interruptedException(e); return false; } return true; } {code} The watch is only set via exists() call when the node already exists. Similarly in createEphemeralNodeAndWatch(): {code} public static boolean createEphemeralNodeAndWatch(ZooKeeperWatcher zkw, String znode, byte [] data) throws KeeperException { try { zkw.getRecoverableZooKeeper().create(znode, data, createACL(zkw, znode), CreateMode.EPHEMERAL); } catch (KeeperException.NodeExistsException nee) { if(!watchAndCheckExists(zkw, znode)) { // It did exist but now it doesn't, try again return createEphemeralNodeAndWatch(zkw, znode, data); } return false; } catch (InterruptedException e) { LOG.info(Interrupted, e); Thread.currentThread().interrupt(); } return true; } {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10389) Add namespace help info in table related shell commands
[ https://issues.apache.org/jira/browse/HBASE-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-10389: - Attachment: HBASE-10389-trunk.patch Add namespace help info in table related shell commands --- Key: HBASE-10389 URL: https://issues.apache.org/jira/browse/HBASE-10389 Project: HBase Issue Type: Improvement Components: shell Affects Versions: 0.96.0, 0.96.1 Reporter: Jerry He Assignee: Jerry He Attachments: HBASE-10389-trunk.patch Currently in the help info of table related shell command, we don't mention or give namespace as part of the table name. For example, to create table: {code} hbase(main):001:0 help 'create' Creates a table. Pass a table name, and a set of column family specifications (at least one), and, optionally, table configuration. Column specification can be a simple string (name), or a dictionary (dictionaries are described below in main help output), necessarily including NAME attribute. Examples: hbase create 't1', {NAME = 'f1', VERSIONS = 5} hbase create 't1', {NAME = 'f1'}, {NAME = 'f2'}, {NAME = 'f3'} hbase # The above in shorthand would be the following: hbase create 't1', 'f1', 'f2', 'f3' hbase create 't1', {NAME = 'f1', VERSIONS = 1, TTL = 2592000, BLOCKCACHE = true} hbase create 't1', {NAME = 'f1', CONFIGURATION = {'hbase.hstore.blockingStoreFiles' = '10'}} Table configuration options can be put at the end. Examples: hbase create 't1', 'f1', SPLITS = ['10', '20', '30', '40'] hbase create 't1', 'f1', SPLITS_FILE = 'splits.txt', OWNER = 'johndoe' hbase create 't1', {NAME = 'f1', VERSIONS = 5}, METADATA = { 'mykey' = 'myvalue' } hbase # Optionally pre-split the table into NUMREGIONS, using hbase # SPLITALGO (HexStringSplit, UniformSplit or classname) hbase create 't1', 'f1', {NUMREGIONS = 15, SPLITALGO = 'HexStringSplit'} hbase create 't1', 'f1', {NUMREGIONS = 15, SPLITALGO = 'HexStringSplit', CONFIGURATION = {'hbase.hregion.scan.loadColumnFamiliesOnDemand' = 'true'}} You can also keep around a reference to the created table: hbase t1 = create 't1', 'f1' Which gives you a reference to the table named 't1', on which you can then call methods. {code} We should document the usage of namespace in these commands. For example: #namespace=foo and table qualifier=bar create 'foo:bar', 'fam' #namespace=default and table qualifier=bar create 'bar', 'fam' -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10389) Add namespace help info in table related shell commands
[ https://issues.apache.org/jira/browse/HBASE-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-10389: - Status: Patch Available (was: Open) Add namespace help info in table related shell commands --- Key: HBASE-10389 URL: https://issues.apache.org/jira/browse/HBASE-10389 Project: HBase Issue Type: Improvement Components: shell Affects Versions: 0.96.1, 0.96.0 Reporter: Jerry He Assignee: Jerry He Attachments: HBASE-10389-trunk.patch Currently in the help info of table related shell command, we don't mention or give namespace as part of the table name. For example, to create table: {code} hbase(main):001:0 help 'create' Creates a table. Pass a table name, and a set of column family specifications (at least one), and, optionally, table configuration. Column specification can be a simple string (name), or a dictionary (dictionaries are described below in main help output), necessarily including NAME attribute. Examples: hbase create 't1', {NAME = 'f1', VERSIONS = 5} hbase create 't1', {NAME = 'f1'}, {NAME = 'f2'}, {NAME = 'f3'} hbase # The above in shorthand would be the following: hbase create 't1', 'f1', 'f2', 'f3' hbase create 't1', {NAME = 'f1', VERSIONS = 1, TTL = 2592000, BLOCKCACHE = true} hbase create 't1', {NAME = 'f1', CONFIGURATION = {'hbase.hstore.blockingStoreFiles' = '10'}} Table configuration options can be put at the end. Examples: hbase create 't1', 'f1', SPLITS = ['10', '20', '30', '40'] hbase create 't1', 'f1', SPLITS_FILE = 'splits.txt', OWNER = 'johndoe' hbase create 't1', {NAME = 'f1', VERSIONS = 5}, METADATA = { 'mykey' = 'myvalue' } hbase # Optionally pre-split the table into NUMREGIONS, using hbase # SPLITALGO (HexStringSplit, UniformSplit or classname) hbase create 't1', 'f1', {NUMREGIONS = 15, SPLITALGO = 'HexStringSplit'} hbase create 't1', 'f1', {NUMREGIONS = 15, SPLITALGO = 'HexStringSplit', CONFIGURATION = {'hbase.hregion.scan.loadColumnFamiliesOnDemand' = 'true'}} You can also keep around a reference to the created table: hbase t1 = create 't1', 'f1' Which gives you a reference to the table named 't1', on which you can then call methods. {code} We should document the usage of namespace in these commands. For example: #namespace=foo and table qualifier=bar create 'foo:bar', 'fam' #namespace=default and table qualifier=bar create 'bar', 'fam' -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10389) Add namespace help info in table related shell commands
[ https://issues.apache.org/jira/browse/HBASE-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890267#comment-13890267 ] Jerry He commented on HBASE-10389: -- I attached a patch which basically adds a table with namespace example in each of the relevant command without much explanation. Comment please, to let me know if we need to do more, or less. Add namespace help info in table related shell commands --- Key: HBASE-10389 URL: https://issues.apache.org/jira/browse/HBASE-10389 Project: HBase Issue Type: Improvement Components: shell Affects Versions: 0.96.0, 0.96.1 Reporter: Jerry He Assignee: Jerry He Attachments: HBASE-10389-trunk.patch Currently in the help info of table related shell command, we don't mention or give namespace as part of the table name. For example, to create table: {code} hbase(main):001:0 help 'create' Creates a table. Pass a table name, and a set of column family specifications (at least one), and, optionally, table configuration. Column specification can be a simple string (name), or a dictionary (dictionaries are described below in main help output), necessarily including NAME attribute. Examples: hbase create 't1', {NAME = 'f1', VERSIONS = 5} hbase create 't1', {NAME = 'f1'}, {NAME = 'f2'}, {NAME = 'f3'} hbase # The above in shorthand would be the following: hbase create 't1', 'f1', 'f2', 'f3' hbase create 't1', {NAME = 'f1', VERSIONS = 1, TTL = 2592000, BLOCKCACHE = true} hbase create 't1', {NAME = 'f1', CONFIGURATION = {'hbase.hstore.blockingStoreFiles' = '10'}} Table configuration options can be put at the end. Examples: hbase create 't1', 'f1', SPLITS = ['10', '20', '30', '40'] hbase create 't1', 'f1', SPLITS_FILE = 'splits.txt', OWNER = 'johndoe' hbase create 't1', {NAME = 'f1', VERSIONS = 5}, METADATA = { 'mykey' = 'myvalue' } hbase # Optionally pre-split the table into NUMREGIONS, using hbase # SPLITALGO (HexStringSplit, UniformSplit or classname) hbase create 't1', 'f1', {NUMREGIONS = 15, SPLITALGO = 'HexStringSplit'} hbase create 't1', 'f1', {NUMREGIONS = 15, SPLITALGO = 'HexStringSplit', CONFIGURATION = {'hbase.hregion.scan.loadColumnFamiliesOnDemand' = 'true'}} You can also keep around a reference to the created table: hbase t1 = create 't1', 'f1' Which gives you a reference to the table named 't1', on which you can then call methods. {code} We should document the usage of namespace in these commands. For example: #namespace=foo and table qualifier=bar create 'foo:bar', 'fam' #namespace=default and table qualifier=bar create 'bar', 'fam' -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10389) Add namespace help info in table related shell commands
[ https://issues.apache.org/jira/browse/HBASE-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-10389: - Status: Open (was: Patch Available) Add namespace help info in table related shell commands --- Key: HBASE-10389 URL: https://issues.apache.org/jira/browse/HBASE-10389 Project: HBase Issue Type: Improvement Components: shell Affects Versions: 0.96.1, 0.96.0 Reporter: Jerry He Assignee: Jerry He Attachments: HBASE-10389-trunk.patch Currently in the help info of table related shell command, we don't mention or give namespace as part of the table name. For example, to create table: {code} hbase(main):001:0 help 'create' Creates a table. Pass a table name, and a set of column family specifications (at least one), and, optionally, table configuration. Column specification can be a simple string (name), or a dictionary (dictionaries are described below in main help output), necessarily including NAME attribute. Examples: hbase create 't1', {NAME = 'f1', VERSIONS = 5} hbase create 't1', {NAME = 'f1'}, {NAME = 'f2'}, {NAME = 'f3'} hbase # The above in shorthand would be the following: hbase create 't1', 'f1', 'f2', 'f3' hbase create 't1', {NAME = 'f1', VERSIONS = 1, TTL = 2592000, BLOCKCACHE = true} hbase create 't1', {NAME = 'f1', CONFIGURATION = {'hbase.hstore.blockingStoreFiles' = '10'}} Table configuration options can be put at the end. Examples: hbase create 't1', 'f1', SPLITS = ['10', '20', '30', '40'] hbase create 't1', 'f1', SPLITS_FILE = 'splits.txt', OWNER = 'johndoe' hbase create 't1', {NAME = 'f1', VERSIONS = 5}, METADATA = { 'mykey' = 'myvalue' } hbase # Optionally pre-split the table into NUMREGIONS, using hbase # SPLITALGO (HexStringSplit, UniformSplit or classname) hbase create 't1', 'f1', {NUMREGIONS = 15, SPLITALGO = 'HexStringSplit'} hbase create 't1', 'f1', {NUMREGIONS = 15, SPLITALGO = 'HexStringSplit', CONFIGURATION = {'hbase.hregion.scan.loadColumnFamiliesOnDemand' = 'true'}} You can also keep around a reference to the created table: hbase t1 = create 't1', 'f1' Which gives you a reference to the table named 't1', on which you can then call methods. {code} We should document the usage of namespace in these commands. For example: #namespace=foo and table qualifier=bar create 'foo:bar', 'fam' #namespace=default and table qualifier=bar create 'bar', 'fam' -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10389) Add namespace help info in table related shell commands
[ https://issues.apache.org/jira/browse/HBASE-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-10389: - Attachment: (was: HBASE-10389-trunk.patch) Add namespace help info in table related shell commands --- Key: HBASE-10389 URL: https://issues.apache.org/jira/browse/HBASE-10389 Project: HBase Issue Type: Improvement Components: shell Affects Versions: 0.96.0, 0.96.1 Reporter: Jerry He Assignee: Jerry He Currently in the help info of table related shell command, we don't mention or give namespace as part of the table name. For example, to create table: {code} hbase(main):001:0 help 'create' Creates a table. Pass a table name, and a set of column family specifications (at least one), and, optionally, table configuration. Column specification can be a simple string (name), or a dictionary (dictionaries are described below in main help output), necessarily including NAME attribute. Examples: hbase create 't1', {NAME = 'f1', VERSIONS = 5} hbase create 't1', {NAME = 'f1'}, {NAME = 'f2'}, {NAME = 'f3'} hbase # The above in shorthand would be the following: hbase create 't1', 'f1', 'f2', 'f3' hbase create 't1', {NAME = 'f1', VERSIONS = 1, TTL = 2592000, BLOCKCACHE = true} hbase create 't1', {NAME = 'f1', CONFIGURATION = {'hbase.hstore.blockingStoreFiles' = '10'}} Table configuration options can be put at the end. Examples: hbase create 't1', 'f1', SPLITS = ['10', '20', '30', '40'] hbase create 't1', 'f1', SPLITS_FILE = 'splits.txt', OWNER = 'johndoe' hbase create 't1', {NAME = 'f1', VERSIONS = 5}, METADATA = { 'mykey' = 'myvalue' } hbase # Optionally pre-split the table into NUMREGIONS, using hbase # SPLITALGO (HexStringSplit, UniformSplit or classname) hbase create 't1', 'f1', {NUMREGIONS = 15, SPLITALGO = 'HexStringSplit'} hbase create 't1', 'f1', {NUMREGIONS = 15, SPLITALGO = 'HexStringSplit', CONFIGURATION = {'hbase.hregion.scan.loadColumnFamiliesOnDemand' = 'true'}} You can also keep around a reference to the created table: hbase t1 = create 't1', 'f1' Which gives you a reference to the table named 't1', on which you can then call methods. {code} We should document the usage of namespace in these commands. For example: #namespace=foo and table qualifier=bar create 'foo:bar', 'fam' #namespace=default and table qualifier=bar create 'bar', 'fam' -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10389) Add namespace help info in table related shell commands
[ https://issues.apache.org/jira/browse/HBASE-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-10389: - Attachment: HBASE-10389-trunk.patch Add namespace help info in table related shell commands --- Key: HBASE-10389 URL: https://issues.apache.org/jira/browse/HBASE-10389 Project: HBase Issue Type: Improvement Components: shell Affects Versions: 0.96.0, 0.96.1 Reporter: Jerry He Assignee: Jerry He Attachments: HBASE-10389-trunk.patch Currently in the help info of table related shell command, we don't mention or give namespace as part of the table name. For example, to create table: {code} hbase(main):001:0 help 'create' Creates a table. Pass a table name, and a set of column family specifications (at least one), and, optionally, table configuration. Column specification can be a simple string (name), or a dictionary (dictionaries are described below in main help output), necessarily including NAME attribute. Examples: hbase create 't1', {NAME = 'f1', VERSIONS = 5} hbase create 't1', {NAME = 'f1'}, {NAME = 'f2'}, {NAME = 'f3'} hbase # The above in shorthand would be the following: hbase create 't1', 'f1', 'f2', 'f3' hbase create 't1', {NAME = 'f1', VERSIONS = 1, TTL = 2592000, BLOCKCACHE = true} hbase create 't1', {NAME = 'f1', CONFIGURATION = {'hbase.hstore.blockingStoreFiles' = '10'}} Table configuration options can be put at the end. Examples: hbase create 't1', 'f1', SPLITS = ['10', '20', '30', '40'] hbase create 't1', 'f1', SPLITS_FILE = 'splits.txt', OWNER = 'johndoe' hbase create 't1', {NAME = 'f1', VERSIONS = 5}, METADATA = { 'mykey' = 'myvalue' } hbase # Optionally pre-split the table into NUMREGIONS, using hbase # SPLITALGO (HexStringSplit, UniformSplit or classname) hbase create 't1', 'f1', {NUMREGIONS = 15, SPLITALGO = 'HexStringSplit'} hbase create 't1', 'f1', {NUMREGIONS = 15, SPLITALGO = 'HexStringSplit', CONFIGURATION = {'hbase.hregion.scan.loadColumnFamiliesOnDemand' = 'true'}} You can also keep around a reference to the created table: hbase t1 = create 't1', 'f1' Which gives you a reference to the table named 't1', on which you can then call methods. {code} We should document the usage of namespace in these commands. For example: #namespace=foo and table qualifier=bar create 'foo:bar', 'fam' #namespace=default and table qualifier=bar create 'bar', 'fam' -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10389) Add namespace help info in table related shell commands
[ https://issues.apache.org/jira/browse/HBASE-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-10389: - Status: Patch Available (was: Open) Add namespace help info in table related shell commands --- Key: HBASE-10389 URL: https://issues.apache.org/jira/browse/HBASE-10389 Project: HBase Issue Type: Improvement Components: shell Affects Versions: 0.96.1, 0.96.0 Reporter: Jerry He Assignee: Jerry He Attachments: HBASE-10389-trunk.patch Currently in the help info of table related shell command, we don't mention or give namespace as part of the table name. For example, to create table: {code} hbase(main):001:0 help 'create' Creates a table. Pass a table name, and a set of column family specifications (at least one), and, optionally, table configuration. Column specification can be a simple string (name), or a dictionary (dictionaries are described below in main help output), necessarily including NAME attribute. Examples: hbase create 't1', {NAME = 'f1', VERSIONS = 5} hbase create 't1', {NAME = 'f1'}, {NAME = 'f2'}, {NAME = 'f3'} hbase # The above in shorthand would be the following: hbase create 't1', 'f1', 'f2', 'f3' hbase create 't1', {NAME = 'f1', VERSIONS = 1, TTL = 2592000, BLOCKCACHE = true} hbase create 't1', {NAME = 'f1', CONFIGURATION = {'hbase.hstore.blockingStoreFiles' = '10'}} Table configuration options can be put at the end. Examples: hbase create 't1', 'f1', SPLITS = ['10', '20', '30', '40'] hbase create 't1', 'f1', SPLITS_FILE = 'splits.txt', OWNER = 'johndoe' hbase create 't1', {NAME = 'f1', VERSIONS = 5}, METADATA = { 'mykey' = 'myvalue' } hbase # Optionally pre-split the table into NUMREGIONS, using hbase # SPLITALGO (HexStringSplit, UniformSplit or classname) hbase create 't1', 'f1', {NUMREGIONS = 15, SPLITALGO = 'HexStringSplit'} hbase create 't1', 'f1', {NUMREGIONS = 15, SPLITALGO = 'HexStringSplit', CONFIGURATION = {'hbase.hregion.scan.loadColumnFamiliesOnDemand' = 'true'}} You can also keep around a reference to the created table: hbase t1 = create 't1', 'f1' Which gives you a reference to the table named 't1', on which you can then call methods. {code} We should document the usage of namespace in these commands. For example: #namespace=foo and table qualifier=bar create 'foo:bar', 'fam' #namespace=default and table qualifier=bar create 'bar', 'fam' -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10389) Add namespace help info in table related shell commands
[ https://issues.apache.org/jira/browse/HBASE-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13891017#comment-13891017 ] Jerry He commented on HBASE-10389: -- Re-formatted the patch, and re-attached it. Add namespace help info in table related shell commands --- Key: HBASE-10389 URL: https://issues.apache.org/jira/browse/HBASE-10389 Project: HBase Issue Type: Improvement Components: shell Affects Versions: 0.96.0, 0.96.1 Reporter: Jerry He Assignee: Jerry He Attachments: HBASE-10389-trunk.patch Currently in the help info of table related shell command, we don't mention or give namespace as part of the table name. For example, to create table: {code} hbase(main):001:0 help 'create' Creates a table. Pass a table name, and a set of column family specifications (at least one), and, optionally, table configuration. Column specification can be a simple string (name), or a dictionary (dictionaries are described below in main help output), necessarily including NAME attribute. Examples: hbase create 't1', {NAME = 'f1', VERSIONS = 5} hbase create 't1', {NAME = 'f1'}, {NAME = 'f2'}, {NAME = 'f3'} hbase # The above in shorthand would be the following: hbase create 't1', 'f1', 'f2', 'f3' hbase create 't1', {NAME = 'f1', VERSIONS = 1, TTL = 2592000, BLOCKCACHE = true} hbase create 't1', {NAME = 'f1', CONFIGURATION = {'hbase.hstore.blockingStoreFiles' = '10'}} Table configuration options can be put at the end. Examples: hbase create 't1', 'f1', SPLITS = ['10', '20', '30', '40'] hbase create 't1', 'f1', SPLITS_FILE = 'splits.txt', OWNER = 'johndoe' hbase create 't1', {NAME = 'f1', VERSIONS = 5}, METADATA = { 'mykey' = 'myvalue' } hbase # Optionally pre-split the table into NUMREGIONS, using hbase # SPLITALGO (HexStringSplit, UniformSplit or classname) hbase create 't1', 'f1', {NUMREGIONS = 15, SPLITALGO = 'HexStringSplit'} hbase create 't1', 'f1', {NUMREGIONS = 15, SPLITALGO = 'HexStringSplit', CONFIGURATION = {'hbase.hregion.scan.loadColumnFamiliesOnDemand' = 'true'}} You can also keep around a reference to the created table: hbase t1 = create 't1', 'f1' Which gives you a reference to the table named 't1', on which you can then call methods. {code} We should document the usage of namespace in these commands. For example: #namespace=foo and table qualifier=bar create 'foo:bar', 'fam' #namespace=default and table qualifier=bar create 'bar', 'fam' -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10389) Add namespace help info in table related shell commands
[ https://issues.apache.org/jira/browse/HBASE-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13891488#comment-13891488 ] Jerry He commented on HBASE-10389: -- All the following work as desired: hbase disable_all 't.*' hbase disable_all 'ns:t.*' hbase disable_all 'ns:.*' and this one will include my_namespace1:table1, my_namespace2:table2, my_table hbase disable_all 'my.*' Add namespace help info in table related shell commands --- Key: HBASE-10389 URL: https://issues.apache.org/jira/browse/HBASE-10389 Project: HBase Issue Type: Improvement Components: shell Affects Versions: 0.96.0, 0.96.1 Reporter: Jerry He Assignee: Jerry He Attachments: HBASE-10389-trunk.patch Currently in the help info of table related shell command, we don't mention or give namespace as part of the table name. For example, to create table: {code} hbase(main):001:0 help 'create' Creates a table. Pass a table name, and a set of column family specifications (at least one), and, optionally, table configuration. Column specification can be a simple string (name), or a dictionary (dictionaries are described below in main help output), necessarily including NAME attribute. Examples: hbase create 't1', {NAME = 'f1', VERSIONS = 5} hbase create 't1', {NAME = 'f1'}, {NAME = 'f2'}, {NAME = 'f3'} hbase # The above in shorthand would be the following: hbase create 't1', 'f1', 'f2', 'f3' hbase create 't1', {NAME = 'f1', VERSIONS = 1, TTL = 2592000, BLOCKCACHE = true} hbase create 't1', {NAME = 'f1', CONFIGURATION = {'hbase.hstore.blockingStoreFiles' = '10'}} Table configuration options can be put at the end. Examples: hbase create 't1', 'f1', SPLITS = ['10', '20', '30', '40'] hbase create 't1', 'f1', SPLITS_FILE = 'splits.txt', OWNER = 'johndoe' hbase create 't1', {NAME = 'f1', VERSIONS = 5}, METADATA = { 'mykey' = 'myvalue' } hbase # Optionally pre-split the table into NUMREGIONS, using hbase # SPLITALGO (HexStringSplit, UniformSplit or classname) hbase create 't1', 'f1', {NUMREGIONS = 15, SPLITALGO = 'HexStringSplit'} hbase create 't1', 'f1', {NUMREGIONS = 15, SPLITALGO = 'HexStringSplit', CONFIGURATION = {'hbase.hregion.scan.loadColumnFamiliesOnDemand' = 'true'}} You can also keep around a reference to the created table: hbase t1 = create 't1', 'f1' Which gives you a reference to the table named 't1', on which you can then call methods. {code} We should document the usage of namespace in these commands. For example: #namespace=foo and table qualifier=bar create 'foo:bar', 'fam' #namespace=default and table qualifier=bar create 'bar', 'fam' -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10492) open daughter regions can unpredictably take long time
Jerry He created HBASE-10492: Summary: open daughter regions can unpredictably take long time Key: HBASE-10492 URL: https://issues.apache.org/jira/browse/HBASE-10492 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.0 Reporter: Jerry He I have seen during a stress testing client was getting RetriesExhaustedWithDetailsException: Failed 748 actions: NotServingRegionException On the master log, 2014-02-08 20:43 is the timestamp from OFFLINE to SPLITTING_NEW, 2014-02-08 21:41 is the timestamp from SPLITTING_NEW to OPEN. The corresponding time period on the region sever log is: {code} 2014-02-08 20:44:12,662 WARN org.apache.hadoop.hbase.regionserver.HRegionFileSystem: .regioninfo file not found for region: 010c1981882d1a59201af5e2dc589d44 2014-02-08 20:44:12,666 WARN org.apache.hadoop.hbase.regionserver.HRegionFileSystem: .regioninfo file not found for region: c2eb9b7971ca7f3fed3da86df5b788e7 {code} There were no INFO related to these two regions until: (at the end see this: Split took 57mins, 16sec) {code} 2014-02-08 21:41:14,029 INFO org.apache.hadoop.hbase.regionserver.HRegion: Onlined c2eb9b7971ca7f3fed3da86df5b788e7; next sequenceid=213355 2014-02-08 21:41:14,031 INFO org.apache.hadoop.hbase.regionserver.HRegion: Onlined 010c1981882d1a59201af5e2dc589d44; next sequenceid=213354 2014-02-08 21:41:14,032 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks for region=tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7. 2014-02-08 21:41:14,054 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Updated row tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7. with server=hdtest208.svl.ibm.com,60020,1391887547473 2014-02-08 21:41:14,054 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Finished post open deploy task for tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7. 2014-02-08 21:41:14,054 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks for region=tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44. 2014-02-08 21:41:14,059 INFO org.apache.hadoop.hbase.regionserver.HStore: Completed compaction of 10 file(s) in cf of tpch_hb_1000_2.lineitem,^\x01\x8B\xE7(\x80\x01\x80\x93\xFD\x01\x01\x80\x00\x00\x00\xB5\x0E\xCC'\x01\x80\x00\x00\x03,1391918508561.1fbcfc0a792435dfd73ec5b0ef5c953c. into 451be6df8c604993ae540b808d9cfa08(size=72.8 M), total size for store is 2.4 G. This selection was in queue for 0sec, and took 1mins, 40sec to execute. 2014-02-08 21:41:14,059 INFO org.apache.hadoop.hbase.regionserver.CompactSplitThread: Completed compaction: Request = regionName=tpch_hb_1000_2.lineitem,^\x01\x8B\xE7(\x80\x01\x80\x93\xFD\x01\x01\x80\x00\x00\x00\xB5\x0E\xCC'\x01\x80\x00\x00\x03,1391918508561.1fbcfc0a792435dfd73ec5b0ef5c953c., storeName=cf, fileCount=10, fileSize=94.1 M, priority=9883, time=1391924373278861000; duration=1mins, 40sec 2014-02-08 21:41:14,059 INFO org.apache.hadoop.hbase.regionserver.HRegion: Starting compaction on cf in region tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7. 2014-02-08 21:41:14,059 INFO org.apache.hadoop.hbase.regionserver.HStore: Starting compaction of 10 file(s) in cf of tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7. into tmpdir=gpfs:/hbase/data/default/tpch_hb_1000_2.lineitem/c2eb9b7971ca7f3fed3da86df5b788e7/.tmp, totalSize=709.7 M 2014-02-08 21:41:14,066 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Updated row tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44. with server=hdtest208.svl.ibm.com,60020,1391887547473 2014-02-08 21:41:14,066 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Finished post open deploy task for tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44. 2014-02-08 21:41:14,190 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Region split, hbase:meta updated, and report to master. Parent=tpch_hb_1000_2.lineitem,,1391918508561.b576e8db65d56ec08db5ca900587c28d., new regions: tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44., tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7.. Split took 57mins, 16sec {code} -- This message
[jira] [Commented] (HBASE-10492) open daughter regions can unpredictably take long time
[ https://issues.apache.org/jira/browse/HBASE-10492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896937#comment-13896937 ] Jerry He commented on HBASE-10492: -- The problem is probably caused by this part of the code in SplitTransaction.openDaughters() {code} // Open daughters in parallel. DaughterOpener aOpener = new DaughterOpener(server, a); DaughterOpener bOpener = new DaughterOpener(server, b); aOpener.start(); bOpener.start(); try { aOpener.join(); bOpener.join(); } {code} We are opening the daughter regions in separate new threads. It is possible, although rare, due to issues like thread scheduling, the daughter regions may not be opened until after a long time. open daughter regions can unpredictably take long time -- Key: HBASE-10492 URL: https://issues.apache.org/jira/browse/HBASE-10492 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.0 Reporter: Jerry He I have seen during a stress testing client was getting RetriesExhaustedWithDetailsException: Failed 748 actions: NotServingRegionException On the master log, 2014-02-08 20:43 is the timestamp from OFFLINE to SPLITTING_NEW, 2014-02-08 21:41 is the timestamp from SPLITTING_NEW to OPEN. The corresponding time period on the region sever log is: {code} 2014-02-08 20:44:12,662 WARN org.apache.hadoop.hbase.regionserver.HRegionFileSystem: .regioninfo file not found for region: 010c1981882d1a59201af5e2dc589d44 2014-02-08 20:44:12,666 WARN org.apache.hadoop.hbase.regionserver.HRegionFileSystem: .regioninfo file not found for region: c2eb9b7971ca7f3fed3da86df5b788e7 {code} There were no INFO related to these two regions until: (at the end see this: Split took 57mins, 16sec) {code} 2014-02-08 21:41:14,029 INFO org.apache.hadoop.hbase.regionserver.HRegion: Onlined c2eb9b7971ca7f3fed3da86df5b788e7; next sequenceid=213355 2014-02-08 21:41:14,031 INFO org.apache.hadoop.hbase.regionserver.HRegion: Onlined 010c1981882d1a59201af5e2dc589d44; next sequenceid=213354 2014-02-08 21:41:14,032 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks for region=tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7. 2014-02-08 21:41:14,054 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Updated row tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7. with server=hdtest208.svl.ibm.com,60020,1391887547473 2014-02-08 21:41:14,054 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Finished post open deploy task for tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7. 2014-02-08 21:41:14,054 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks for region=tpch_hb_1000_2.lineitem,,1391921037353.010c1981882d1a59201af5e2dc589d44. 2014-02-08 21:41:14,059 INFO org.apache.hadoop.hbase.regionserver.HStore: Completed compaction of 10 file(s) in cf of tpch_hb_1000_2.lineitem,^\x01\x8B\xE7(\x80\x01\x80\x93\xFD\x01\x01\x80\x00\x00\x00\xB5\x0E\xCC'\x01\x80\x00\x00\x03,1391918508561.1fbcfc0a792435dfd73ec5b0ef5c953c. into 451be6df8c604993ae540b808d9cfa08(size=72.8 M), total size for store is 2.4 G. This selection was in queue for 0sec, and took 1mins, 40sec to execute. 2014-02-08 21:41:14,059 INFO org.apache.hadoop.hbase.regionserver.CompactSplitThread: Completed compaction: Request = regionName=tpch_hb_1000_2.lineitem,^\x01\x8B\xE7(\x80\x01\x80\x93\xFD\x01\x01\x80\x00\x00\x00\xB5\x0E\xCC'\x01\x80\x00\x00\x03,1391918508561.1fbcfc0a792435dfd73ec5b0ef5c953c., storeName=cf, fileCount=10, fileSize=94.1 M, priority=9883, time=1391924373278861000; duration=1mins, 40sec 2014-02-08 21:41:14,059 INFO org.apache.hadoop.hbase.regionserver.HRegion: Starting compaction on cf in region tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7. 2014-02-08 21:41:14,059 INFO org.apache.hadoop.hbase.regionserver.HStore: Starting compaction of 10 file(s) in cf of tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7. into tmpdir=gpfs:/hbase/data/default/tpch_hb_1000_2.lineitem/c2eb9b7971ca7f3fed3da86df5b788e7/.tmp, totalSize=709.7 M 2014-02-08 21:41:14,066 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
[jira] [Commented] (HBASE-10492) open daughter regions can unpredictably take long time
[ https://issues.apache.org/jira/browse/HBASE-10492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898262#comment-13898262 ] Jerry He commented on HBASE-10492: -- The machines are 24 CPU 48G memory with Red Hat Enterprise Linux Server release 6.4 (Santiago) 2.6.32-358.el6.x86_64 IBM JDK 6 5 region servers (each with datanode and task tracker). The load MR job with loading of data. I have been trying to reproduce the long delay in opening the daughter regions. With 'org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 200' I have seen delays up to 6 mins. See the log below (from 2014-02-11 02:35:52 to 2014-02-11 02:41:14 at the end) {code} 2014-02-11 02:35:52,473 WARN org.apache.hadoop.hbase.regionserver.HRegionFileSystem: .regioninfo file not found for region: 10a421ac8075a42cbcb53bdc393c8e8c 2014-02-11 02:35:52,479 WARN org.apache.hadoop.hbase.regionserver.HRegionFileSystem: .regioninfo file not found for region: 5ff07e59d13c99ca14408807a6e61722 2014-02-11 02:35:52,589 INFO org.apache.hadoop.hbase.regionserver.compactions.CompactionConfiguration: size [4194304, 9223372036854775807); files [3, 10); ratio 1.20; off-peak ratio 5.00; throttle point 2684354560; delete expired; major period 0, major jitter 0.50 2014-02-11 02:35:52,596 INFO org.apache.hadoop.hbase.regionserver.compactions.CompactionConfiguration: size [4194304, 9223372036854775807); files [3, 10); ratio 1.20; off-peak ratio 5.00; throttle point 2684354560; delete expired; major period 0, major jitter 0.50 2014-02-11 02:35:55,458 INFO org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher: Flushed, sequenceid=4289924, memsize=256.6 M, hasBloomFilter=true, into tmp file gpfs:/hbase/data/default/TestTable/ed4d9fb392ae52c1a406a221defc6b00/.tmp/9e2cb318b0114248b9c62948cf47ac5b 2014-02-11 02:36:37,894 INFO org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher: Flushed, sequenceid=4289926, memsize=153.1 M, hasBloomFilter=true, into tmp file gpfs:/hbase/data/default/TestTable/110cc21c77569d595f7717b8c75fbf66/.tmp/4e55d6ba4b5644838163101f2ba20fdb 2014-02-11 02:36:53,067 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: Rolled WAL /hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392114789609 with entries=416, filesize=578.7 M; new WAL /hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392114958416 2014-02-11 02:36:53,067 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: moving old hlog file /hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112795409 whose highest sequenceid is 4285071 to /hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112795409 2014-02-11 02:36:53,162 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: moving old hlog file /hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112818204 whose highest sequenceid is 4285169 to /hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112818204 2014-02-11 02:36:53,210 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: moving old hlog file /hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112839023 whose highest sequenceid is 4285266 to /hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112839023 2014-02-11 02:37:13,297 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: moving old hlog file /hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112862511 whose highest sequenceid is 4285362 to /hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112862511 2014-02-11 02:37:13,326 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: moving old hlog file /hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112871587 whose highest sequenceid is 4285453 to /hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112871587 2014-02-11 02:37:13,383 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: moving old hlog file /hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112877894 whose highest sequenceid is 4285546 to /hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112877894 2014-02-11 02:37:33,474 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: moving old hlog file /hbase/WALs/hdtest202.svl.ibm.com,60020,1392097223732/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112891408 whose highest sequenceid is 4285641 to /hbase/oldWALs/hdtest202.svl.ibm.com%2C60020%2C1392097223732.1392112891408 2014-02-11 02:37:33,481 INFO org.apache.hadoop.hbase.regionserver.HStore: Added
[jira] [Commented] (HBASE-10492) open daughter regions can unpredictably take long time
[ https://issues.apache.org/jira/browse/HBASE-10492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899501#comment-13899501 ] Jerry He commented on HBASE-10492: -- I am able to reproduce the problem and get more realtime metrics on the region server. Now it does not seem to be a thread scheduling problem, or doesn't it? We can see from the live metrics: Initializing region since 5mins, 18sec ago Instantiating store for column family since 5mins, 18sec ago {code} Wed Feb 12 00:00:59 PST 2014Initializing region tpch_hb_1000_2.lineitem,`\x01\x85\xF5\xEC\x8D\x01\x80\x00\x0B\x8E\x01\x80\x00\x00\x00\xB3\x9EN\xE3\x01\x80\x00\x00\x03,1392192032936.1d4381bb583f957a9996c1ef0fa3ce68. RUNNING (since 5mins, 18sec ago)Instantiating store for column family {NAME = 'cf', REPLICATION_SCOPE = '0', KEEP_DELETED_CELLS = 'false', COMPRESSION = 'GZ', ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true', MIN_VERSIONS = '0', DATA_BLOCK_ENCODING = 'NONE', IN_MEMORY = 'false', BLOOMFILTER = 'ROW', TTL = '2147483647', VERSIONS = '2147483647', BLOCKSIZE = '65536'} (since 5mins, 18sec ago) Wed Feb 12 00:00:59 PST 2014Initializing region tpch_hb_1000_2.lineitem,`\x01\x80\x01:\x94\x01\x80\x01:\x95\x01\x80\x00\x00\x00\xB5\xA8\x94\x04\x01\x80\x00\x00\x02,1392192032936.2980739184621d45397a972ea89c9411. RUNNING (since 5mins, 18sec ago)Instantiating store for column family {NAME = 'cf', REPLICATION_SCOPE = '0', KEEP_DELETED_CELLS = 'false', COMPRESSION = 'GZ', ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true', MIN_VERSIONS = '0', DATA_BLOCK_ENCODING = 'NONE', IN_MEMORY = 'false', BLOOMFILTER = 'ROW', TTL = '2147483647', VERSIONS = '2147483647', BLOCKSIZE = '65536'} (since 5mins, 18sec ago) Wed Feb 12 00:00:59 PST 2014Initializing region tpch_hb_1000_2.lineitem,`\x01\x85\xF5\xEC\x8D\x01\x80\x00\x0B\x8E\x01\x80\x00\x00\x00\xB3\x9EN\xE3\x01\x80\x00\x00\x03,1392192032936.1d4381bb583f957a9996c1ef0fa3ce68. RUNNING (since 8mins, 18sec ago) Instantiating store for column family {NAME = 'cf', REPLICATION_SCOPE = '0', KEEP_DELETED_CELLS = 'false', COMPRESSION = 'GZ', ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true', MIN_VERSIONS = '0', DATA_BLOCK_ENCODING = 'NONE', IN_MEMORY = 'false', BLOOMFILTER = 'ROW', TTL = '2147483647', VERSIONS = '2147483647', BLOCKSIZE = '65536'} (since 8mins, 18sec ago) Wed Feb 12 00:00:59 PST 2014Initializing region tpch_hb_1000_2.lineitem,`\x01\x80\x01:\x94\x01\x80\x01:\x95\x01\x80\x00\x00\x00\xB5\xA8\x94\x04\x01\x80\x00\x00\x02,1392192032936.2980739184621d45397a972ea89c9411. RUNNING (since 8mins, 18sec ago)Instantiating store for column family {NAME = 'cf', REPLICATION_SCOPE = '0', KEEP_DELETED_CELLS = 'false', COMPRESSION = 'GZ', ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true', MIN_VERSIONS = '0', DATA_BLOCK_ENCODING = 'NONE', IN_MEMORY = 'false', BLOOMFILTER = 'ROW', TTL = '2147483647', VERSIONS = '2147483647', BLOCKSIZE = '65536'} (since 8mins, 18sec ago) {code} It is more like file system issue? BWT. HBase's new metrics rocks! open daughter regions can unpredictably take long time -- Key: HBASE-10492 URL: https://issues.apache.org/jira/browse/HBASE-10492 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.0 Reporter: Jerry He I have seen during a stress testing client was getting RetriesExhaustedWithDetailsException: Failed 748 actions: NotServingRegionException On the master log, 2014-02-08 20:43 is the timestamp from OFFLINE to SPLITTING_NEW, 2014-02-08 21:41 is the timestamp from SPLITTING_NEW to OPEN. The corresponding time period on the region sever log is: {code} 2014-02-08 20:44:12,662 WARN org.apache.hadoop.hbase.regionserver.HRegionFileSystem: .regioninfo file not found for region: 010c1981882d1a59201af5e2dc589d44 2014-02-08 20:44:12,666 WARN org.apache.hadoop.hbase.regionserver.HRegionFileSystem: .regioninfo file not found for region: c2eb9b7971ca7f3fed3da86df5b788e7 {code} There were no INFO related to these two regions until: (at the end see this: Split took 57mins, 16sec) {code} 2014-02-08 21:41:14,029 INFO org.apache.hadoop.hbase.regionserver.HRegion: Onlined c2eb9b7971ca7f3fed3da86df5b788e7; next sequenceid=213355 2014-02-08 21:41:14,031 INFO org.apache.hadoop.hbase.regionserver.HRegion: Onlined 010c1981882d1a59201af5e2dc589d44; next sequenceid=213354 2014-02-08 21:41:14,032 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks for region=tpch_hb_1000_2.lineitem,]\x01\x8B\xE9\xF4\x8A\x01\x80p\xA3\xA4\x01\x80\x00\x00\x00\xB6\xB7+\x02\x01\x80\x00\x00\x02,1391921037353.c2eb9b7971ca7f3fed3da86df5b788e7. 2014-02-08 21:41:14,054 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Updated row
[jira] [Commented] (HBASE-10549) when there is a hole,LoadIncrementalHFiles will hung up in an infinite loop.
[ https://issues.apache.org/jira/browse/HBASE-10549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902517#comment-13902517 ] Jerry He commented on HBASE-10549: -- Can you give more details, e.g. what is a hole? when there is a hole,LoadIncrementalHFiles will hung up in an infinite loop. Key: HBASE-10549 URL: https://issues.apache.org/jira/browse/HBASE-10549 Project: HBase Issue Type: Bug Components: HFile Affects Versions: 0.94.11 Reporter: yuanxinen -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10533) commands.rb is giving wrong error messages on exceptions
[ https://issues.apache.org/jira/browse/HBASE-10533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903761#comment-13903761 ] Jerry He commented on HBASE-10533: -- HBASE-8798 should have fixed the first part? commands.rb is giving wrong error messages on exceptions Key: HBASE-10533 URL: https://issues.apache.org/jira/browse/HBASE-10533 Project: HBase Issue Type: Bug Components: shell Reporter: rajeshbabu Assignee: rajeshbabu Fix For: 0.96.2, 0.98.1, 0.99.0, 0.94.18 Attachments: HBASE-10533_trunk.patch 1) Clone into existing table name is printing snapshot name instead of table name. {code} hbase(main):004:0 clone_snapshot 'myTableSnapshot-122112','table' ERROR: Table already exists: myTableSnapshot-122112! {code} The reason for this is we are printing first argument instead of exception message. {code} if cause.kind_of?(org.apache.hadoop.hbase.TableExistsException) then raise Table already exists: #{args.first}! end {code} 2) If we give wrong column family in put or delete. Expectation is to print actual column families in the table but instead throwing the exception. {code} hbase(main):002:0 put 't1','r','unkwown_cf','value' 2014-02-14 15:51:10,037 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-02-14 15:51:10,640 INFO [main] hdfs.PeerCache: SocketCache disabled. ERROR: Failed 1 action: org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family unkwown_cf does not exist in region t1,,1392118273512.c7230b923c58f1af406a6d84930e40c1. in table 't1', {NAME = 'f1', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER = 'ROW', REPLICATION_SCOPE = '0', COMPRESSION = 'NONE', VERSIONS = '6', TTL = '2147483647', MIN_VERSIONS = '0', KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '65536', IN_MEMORY = 'false', BLOCKCACHE = 'true'} at org.apache.hadoop.hbase.regionserver.HRegionServer.doBatchOp(HRegionServer.java:4206) at org.apache.hadoop.hbase.regionserver.HRegionServer.doNonAtomicRegionMutation(HRegionServer.java:3441) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3345) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:28460) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:662) : 1 time, {code} The reason for this is server will not throw NoSuchColumnFamilyException directly, instead RetriesExhaustedWithDetailsException will be thrown. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10615) Make LoadIncrementalHFiles skip reference files
Jerry He created HBASE-10615: Summary: Make LoadIncrementalHFiles skip reference files Key: HBASE-10615 URL: https://issues.apache.org/jira/browse/HBASE-10615 Project: HBase Issue Type: Improvement Affects Versions: 0.96.0 Reporter: Jerry He Assignee: Jerry He Priority: Minor There is use base that the source of hfiles for LoadIncrementalHFiles can be a FileSystem copy-out/backup of HBase table or archive hfiles. For example, 1. Copy-out of hbase.rootdir, table dir, region dir (after disable) or archive dir. 2. ExportSnapshot It is possible that there are reference files in the family dir in these cases. We have such use cases, where trying to load back into HBase, we'll get {code} Caused by: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile Trailer from file hdfs://HDFS-AMR/tmp/restoreTemp/117182adfe861c5d2b607da91d60aa8a/info/aed3d01648384b31b29e5bad4cd80bec.d179ab341fc68e7612fcd74eaf7cafbd at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:570) at org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:594) at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:636) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:472) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:393) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:391) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314) at java.util.concurrent.FutureTask.run(FutureTask.java:149) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919) at java.lang.Thread.run(Thread.java:738) Caused by: java.lang.IllegalArgumentException: Invalid HFile version: 16715777 (expected to be between 2 and 2) at org.apache.hadoop.hbase.io.hfile.HFile.checkFormatVersion(HFile.java:927) at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:426) at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:568) {code} It is desirable and safe to skip these reference files since they don't contain any real data for bulk load purpose. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10615) Make LoadIncrementalHFiles skip reference files
[ https://issues.apache.org/jira/browse/HBASE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-10615: - Attachment: HBASE-10615-trunk.patch Make LoadIncrementalHFiles skip reference files --- Key: HBASE-10615 URL: https://issues.apache.org/jira/browse/HBASE-10615 Project: HBase Issue Type: Improvement Affects Versions: 0.96.0 Reporter: Jerry He Assignee: Jerry He Priority: Minor Attachments: HBASE-10615-trunk.patch There is use base that the source of hfiles for LoadIncrementalHFiles can be a FileSystem copy-out/backup of HBase table or archive hfiles. For example, 1. Copy-out of hbase.rootdir, table dir, region dir (after disable) or archive dir. 2. ExportSnapshot It is possible that there are reference files in the family dir in these cases. We have such use cases, where trying to load back into HBase, we'll get {code} Caused by: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile Trailer from file hdfs://HDFS-AMR/tmp/restoreTemp/117182adfe861c5d2b607da91d60aa8a/info/aed3d01648384b31b29e5bad4cd80bec.d179ab341fc68e7612fcd74eaf7cafbd at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:570) at org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:594) at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:636) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:472) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:393) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:391) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314) at java.util.concurrent.FutureTask.run(FutureTask.java:149) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919) at java.lang.Thread.run(Thread.java:738) Caused by: java.lang.IllegalArgumentException: Invalid HFile version: 16715777 (expected to be between 2 and 2) at org.apache.hadoop.hbase.io.hfile.HFile.checkFormatVersion(HFile.java:927) at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:426) at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:568) {code} It is desirable and safe to skip these reference files since they don't contain any real data for bulk load purpose. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10615) Make LoadIncrementalHFiles skip reference files
[ https://issues.apache.org/jira/browse/HBASE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912533#comment-13912533 ] Jerry He commented on HBASE-10615: -- Attached a patch to skip reference files in discoverLoadQueue(). Make LoadIncrementalHFiles skip reference files --- Key: HBASE-10615 URL: https://issues.apache.org/jira/browse/HBASE-10615 Project: HBase Issue Type: Improvement Affects Versions: 0.96.0 Reporter: Jerry He Assignee: Jerry He Priority: Minor Attachments: HBASE-10615-trunk.patch There is use base that the source of hfiles for LoadIncrementalHFiles can be a FileSystem copy-out/backup of HBase table or archive hfiles. For example, 1. Copy-out of hbase.rootdir, table dir, region dir (after disable) or archive dir. 2. ExportSnapshot It is possible that there are reference files in the family dir in these cases. We have such use cases, where trying to load back into HBase, we'll get {code} Caused by: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile Trailer from file hdfs://HDFS-AMR/tmp/restoreTemp/117182adfe861c5d2b607da91d60aa8a/info/aed3d01648384b31b29e5bad4cd80bec.d179ab341fc68e7612fcd74eaf7cafbd at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:570) at org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:594) at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:636) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:472) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:393) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:391) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314) at java.util.concurrent.FutureTask.run(FutureTask.java:149) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919) at java.lang.Thread.run(Thread.java:738) Caused by: java.lang.IllegalArgumentException: Invalid HFile version: 16715777 (expected to be between 2 and 2) at org.apache.hadoop.hbase.io.hfile.HFile.checkFormatVersion(HFile.java:927) at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:426) at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:568) {code} It is desirable and safe to skip these reference files since they don't contain any real data for bulk load purpose. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10615) Make LoadIncrementalHFiles skip reference files
[ https://issues.apache.org/jira/browse/HBASE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-10615: - Status: Patch Available (was: Open) Make LoadIncrementalHFiles skip reference files --- Key: HBASE-10615 URL: https://issues.apache.org/jira/browse/HBASE-10615 Project: HBase Issue Type: Improvement Affects Versions: 0.96.0 Reporter: Jerry He Assignee: Jerry He Priority: Minor Attachments: HBASE-10615-trunk.patch There is use base that the source of hfiles for LoadIncrementalHFiles can be a FileSystem copy-out/backup of HBase table or archive hfiles. For example, 1. Copy-out of hbase.rootdir, table dir, region dir (after disable) or archive dir. 2. ExportSnapshot It is possible that there are reference files in the family dir in these cases. We have such use cases, where trying to load back into HBase, we'll get {code} Caused by: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile Trailer from file hdfs://HDFS-AMR/tmp/restoreTemp/117182adfe861c5d2b607da91d60aa8a/info/aed3d01648384b31b29e5bad4cd80bec.d179ab341fc68e7612fcd74eaf7cafbd at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:570) at org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:594) at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:636) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:472) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:393) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:391) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314) at java.util.concurrent.FutureTask.run(FutureTask.java:149) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919) at java.lang.Thread.run(Thread.java:738) Caused by: java.lang.IllegalArgumentException: Invalid HFile version: 16715777 (expected to be between 2 and 2) at org.apache.hadoop.hbase.io.hfile.HFile.checkFormatVersion(HFile.java:927) at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:426) at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:568) {code} It is desirable and safe to skip these reference files since they don't contain any real data for bulk load purpose. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8073) HFileOutputFormat support for offline operation
[ https://issues.apache.org/jira/browse/HBASE-8073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13913397#comment-13913397 ] Jerry He commented on HBASE-8073: - A good feature. An example is the WALPlayer. When replaying WALs to HFiles, it requires a live table with the same table name in the WAL, which makes it not useful as a offline tool. HFileOutputFormat support for offline operation --- Key: HBASE-8073 URL: https://issues.apache.org/jira/browse/HBASE-8073 Project: HBase Issue Type: Sub-task Components: mapreduce Reporter: Nick Dimiduk When using HFileOutputFormat to generate HFiles, it inspects the region topology of the target table. The split points from that table are used to guide the TotalOrderPartitioner. If the target table does not exist, it is first created. This imposes an unnecessary dependence on an online HBase and existing table. If the table exists, it can be used. However, the job can be smarter. For example, if there's far more data going into the HFiles than the table currently contains, the table regions aren't very useful for data split points. Instead, the input data can be sampled to produce split points more meaningful to the dataset. LoadIncrementalHFiles is already capable of handling divergence between HFile boundaries and table regions, so this should not pose any additional burdon at load time. The proper method of sampling the data likely requires a custom input format and an additional map-reduce job perform the sampling. See a relevant implementation: https://github.com/alexholmes/hadoop-book/blob/master/src/main/java/com/manning/hip/ch4/sampler/ReservoirSamplerInputFormat.java -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10615) Make LoadIncrementalHFiles skip reference files
[ https://issues.apache.org/jira/browse/HBASE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13913904#comment-13913904 ] Jerry He commented on HBASE-10615: -- Hi, Matteo Thanks for the comments! There are two questions here. 1. Should the bulk load throws an error or skip if it sees a reference file? My argument is that we should not throw an error. The existence of reference file is not an error condition. 2. Is it safe to skip the reference file for the purpose of bulk loading from the user's perspective? Matteo raised the issue of possible loss of data. My argument is that we are fine for these reasons: 1) The purpose of LoadIncrementalHFiles is to load the data contained in the hfiles of a given region dir into HBase safely. As long as this is satisfied, we are fine for the data for this scope 2) If we want to consider from a broader view, to confider the integrity of the entire table data. The user of the bulk load tool controls the bulk loading. For example, the user will not copy out the links in a cloned table from a snapshot and then expect to bulk load these links to have the data. In the reference example, the user will bulk load the parent region too. {quote} you upload the parent region data but not the daughter reference files the CatalogJanitor kicks in and the parent is removed, since there are no references to the parent and your data is lost... {quote} Why would the data is lost? I thought the hfiles in the parent region would be added or sliced into an existing live region. The bulk load tool does not care if the input hfile region is a split parent or not, right? Maybe I miss and misunderstand something? Make LoadIncrementalHFiles skip reference files --- Key: HBASE-10615 URL: https://issues.apache.org/jira/browse/HBASE-10615 Project: HBase Issue Type: Improvement Affects Versions: 0.96.0 Reporter: Jerry He Assignee: Jerry He Priority: Minor Attachments: HBASE-10615-trunk.patch There is use base that the source of hfiles for LoadIncrementalHFiles can be a FileSystem copy-out/backup of HBase table or archive hfiles. For example, 1. Copy-out of hbase.rootdir, table dir, region dir (after disable) or archive dir. 2. ExportSnapshot It is possible that there are reference files in the family dir in these cases. We have such use cases, where trying to load back into HBase, we'll get {code} Caused by: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile Trailer from file hdfs://HDFS-AMR/tmp/restoreTemp/117182adfe861c5d2b607da91d60aa8a/info/aed3d01648384b31b29e5bad4cd80bec.d179ab341fc68e7612fcd74eaf7cafbd at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:570) at org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:594) at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:636) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:472) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:393) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:391) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314) at java.util.concurrent.FutureTask.run(FutureTask.java:149) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919) at java.lang.Thread.run(Thread.java:738) Caused by: java.lang.IllegalArgumentException: Invalid HFile version: 16715777 (expected to be between 2 and 2) at org.apache.hadoop.hbase.io.hfile.HFile.checkFormatVersion(HFile.java:927) at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:426) at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:568) {code} It is desirable and safe to skip these reference files since they don't contain any real data for bulk load purpose. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10622) Improve log and Exceptions in Export Snapshot
[ https://issues.apache.org/jira/browse/HBASE-10622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13913963#comment-13913963 ] Jerry He commented on HBASE-10622: -- It will help to use job.getStatus().getFailureInfo() if the copy job failed? Another area is to somehow intelligently to estimate the number of copy mappers needed based on the size and number of files? Similar to DistCp? Improve log and Exceptions in Export Snapshot -- Key: HBASE-10622 URL: https://issues.apache.org/jira/browse/HBASE-10622 Project: HBase Issue Type: Bug Components: snapshots Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: 0.99.0 Attachments: HBASE-10622-v0.patch from the logs of export snapshot is not really clear what's going on, adding some extra information useful to debug, and in some places the real exception can be thrown -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10615) Make LoadIncrementalHFiles skip reference files
[ https://issues.apache.org/jira/browse/HBASE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13914928#comment-13914928 ] Jerry He commented on HBASE-10615: -- Let me give a practical use case, related to ExportSnapshot. You can help to see if there is any loophole. I take snapshots on cluster A, and export them to cluster B, which services as backup storage. When I want to clone the table on cluster C, I can do the following on cluster C (alternative to restore/clone snapshot) 1. Construct the table based on the tableInfo and possibly pre-split based on the region info stored with the snapshot. 2. Have a program basically loops thru the archive regions to bulk load the region data. The parent region is in the archive so are the daughters, if the snapshot happened to capture the moment. I remember you had a JIRA to include both parent and daughters in the snapshot. I don't see any loss of data here. I have been testing it for a while. I had to change LoadIncrementalHFiles to skip the reference files if they exists, to avoid the exception that posted in this JIRA. Make LoadIncrementalHFiles skip reference files --- Key: HBASE-10615 URL: https://issues.apache.org/jira/browse/HBASE-10615 Project: HBase Issue Type: Improvement Affects Versions: 0.96.0 Reporter: Jerry He Assignee: Jerry He Priority: Minor Attachments: HBASE-10615-trunk.patch There is use base that the source of hfiles for LoadIncrementalHFiles can be a FileSystem copy-out/backup of HBase table or archive hfiles. For example, 1. Copy-out of hbase.rootdir, table dir, region dir (after disable) or archive dir. 2. ExportSnapshot It is possible that there are reference files in the family dir in these cases. We have such use cases, where trying to load back into HBase, we'll get {code} Caused by: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile Trailer from file hdfs://HDFS-AMR/tmp/restoreTemp/117182adfe861c5d2b607da91d60aa8a/info/aed3d01648384b31b29e5bad4cd80bec.d179ab341fc68e7612fcd74eaf7cafbd at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:570) at org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:594) at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:636) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:472) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:393) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:391) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314) at java.util.concurrent.FutureTask.run(FutureTask.java:149) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919) at java.lang.Thread.run(Thread.java:738) Caused by: java.lang.IllegalArgumentException: Invalid HFile version: 16715777 (expected to be between 2 and 2) at org.apache.hadoop.hbase.io.hfile.HFile.checkFormatVersion(HFile.java:927) at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:426) at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:568) {code} It is desirable and safe to skip these reference files since they don't contain any real data for bulk load purpose. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10615) Make LoadIncrementalHFiles skip reference files
[ https://issues.apache.org/jira/browse/HBASE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-10615: - Attachment: HBASE-10615-trunk-v2.patch Make LoadIncrementalHFiles skip reference files --- Key: HBASE-10615 URL: https://issues.apache.org/jira/browse/HBASE-10615 Project: HBase Issue Type: Improvement Affects Versions: 0.96.0 Reporter: Jerry He Assignee: Jerry He Priority: Minor Attachments: HBASE-10615-trunk-v2.patch, HBASE-10615-trunk.patch There is use base that the source of hfiles for LoadIncrementalHFiles can be a FileSystem copy-out/backup of HBase table or archive hfiles. For example, 1. Copy-out of hbase.rootdir, table dir, region dir (after disable) or archive dir. 2. ExportSnapshot It is possible that there are reference files in the family dir in these cases. We have such use cases, where trying to load back into HBase, we'll get {code} Caused by: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile Trailer from file hdfs://HDFS-AMR/tmp/restoreTemp/117182adfe861c5d2b607da91d60aa8a/info/aed3d01648384b31b29e5bad4cd80bec.d179ab341fc68e7612fcd74eaf7cafbd at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:570) at org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:594) at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:636) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:472) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:393) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:391) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314) at java.util.concurrent.FutureTask.run(FutureTask.java:149) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919) at java.lang.Thread.run(Thread.java:738) Caused by: java.lang.IllegalArgumentException: Invalid HFile version: 16715777 (expected to be between 2 and 2) at org.apache.hadoop.hbase.io.hfile.HFile.checkFormatVersion(HFile.java:927) at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:426) at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:568) {code} It is desirable and safe to skip these reference files since they don't contain any real data for bulk load purpose. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10615) Make LoadIncrementalHFiles skip reference files
[ https://issues.apache.org/jira/browse/HBASE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915064#comment-13915064 ] Jerry He commented on HBASE-10615: -- Attached v2 with added LOG warns. There is another place where we walk thru the hfiles in createTable() when the table does not exist. We read the files twice in this case. Only warn once. Make LoadIncrementalHFiles skip reference files --- Key: HBASE-10615 URL: https://issues.apache.org/jira/browse/HBASE-10615 Project: HBase Issue Type: Improvement Affects Versions: 0.96.0 Reporter: Jerry He Assignee: Jerry He Priority: Minor Attachments: HBASE-10615-trunk-v2.patch, HBASE-10615-trunk.patch There is use base that the source of hfiles for LoadIncrementalHFiles can be a FileSystem copy-out/backup of HBase table or archive hfiles. For example, 1. Copy-out of hbase.rootdir, table dir, region dir (after disable) or archive dir. 2. ExportSnapshot It is possible that there are reference files in the family dir in these cases. We have such use cases, where trying to load back into HBase, we'll get {code} Caused by: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile Trailer from file hdfs://HDFS-AMR/tmp/restoreTemp/117182adfe861c5d2b607da91d60aa8a/info/aed3d01648384b31b29e5bad4cd80bec.d179ab341fc68e7612fcd74eaf7cafbd at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:570) at org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:594) at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:636) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:472) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:393) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:391) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314) at java.util.concurrent.FutureTask.run(FutureTask.java:149) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919) at java.lang.Thread.run(Thread.java:738) Caused by: java.lang.IllegalArgumentException: Invalid HFile version: 16715777 (expected to be between 2 and 2) at org.apache.hadoop.hbase.io.hfile.HFile.checkFormatVersion(HFile.java:927) at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:426) at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:568) {code} It is desirable and safe to skip these reference files since they don't contain any real data for bulk load purpose. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10615) Make LoadIncrementalHFiles skip reference files
[ https://issues.apache.org/jira/browse/HBASE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-10615: - Attachment: HBASE-10615-trunk-v3.patch Make LoadIncrementalHFiles skip reference files --- Key: HBASE-10615 URL: https://issues.apache.org/jira/browse/HBASE-10615 Project: HBase Issue Type: Improvement Affects Versions: 0.96.0 Reporter: Jerry He Assignee: Jerry He Priority: Minor Attachments: HBASE-10615-trunk-v2.patch, HBASE-10615-trunk-v3.patch, HBASE-10615-trunk.patch There is use base that the source of hfiles for LoadIncrementalHFiles can be a FileSystem copy-out/backup of HBase table or archive hfiles. For example, 1. Copy-out of hbase.rootdir, table dir, region dir (after disable) or archive dir. 2. ExportSnapshot It is possible that there are reference files in the family dir in these cases. We have such use cases, where trying to load back into HBase, we'll get {code} Caused by: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile Trailer from file hdfs://HDFS-AMR/tmp/restoreTemp/117182adfe861c5d2b607da91d60aa8a/info/aed3d01648384b31b29e5bad4cd80bec.d179ab341fc68e7612fcd74eaf7cafbd at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:570) at org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:594) at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:636) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:472) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:393) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:391) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314) at java.util.concurrent.FutureTask.run(FutureTask.java:149) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919) at java.lang.Thread.run(Thread.java:738) Caused by: java.lang.IllegalArgumentException: Invalid HFile version: 16715777 (expected to be between 2 and 2) at org.apache.hadoop.hbase.io.hfile.HFile.checkFormatVersion(HFile.java:927) at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:426) at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:568) {code} It is desirable and safe to skip these reference files since they don't contain any real data for bulk load purpose. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10615) Make LoadIncrementalHFiles skip reference files
[ https://issues.apache.org/jira/browse/HBASE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915320#comment-13915320 ] Jerry He commented on HBASE-10615: -- Rebased with latest from trunk and attached v3. Make LoadIncrementalHFiles skip reference files --- Key: HBASE-10615 URL: https://issues.apache.org/jira/browse/HBASE-10615 Project: HBase Issue Type: Improvement Affects Versions: 0.96.0 Reporter: Jerry He Assignee: Jerry He Priority: Minor Attachments: HBASE-10615-trunk-v2.patch, HBASE-10615-trunk-v3.patch, HBASE-10615-trunk.patch There is use base that the source of hfiles for LoadIncrementalHFiles can be a FileSystem copy-out/backup of HBase table or archive hfiles. For example, 1. Copy-out of hbase.rootdir, table dir, region dir (after disable) or archive dir. 2. ExportSnapshot It is possible that there are reference files in the family dir in these cases. We have such use cases, where trying to load back into HBase, we'll get {code} Caused by: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile Trailer from file hdfs://HDFS-AMR/tmp/restoreTemp/117182adfe861c5d2b607da91d60aa8a/info/aed3d01648384b31b29e5bad4cd80bec.d179ab341fc68e7612fcd74eaf7cafbd at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:570) at org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:594) at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:636) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:472) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:393) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:391) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314) at java.util.concurrent.FutureTask.run(FutureTask.java:149) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919) at java.lang.Thread.run(Thread.java:738) Caused by: java.lang.IllegalArgumentException: Invalid HFile version: 16715777 (expected to be between 2 and 2) at org.apache.hadoop.hbase.io.hfile.HFile.checkFormatVersion(HFile.java:927) at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:426) at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:568) {code} It is desirable and safe to skip these reference files since they don't contain any real data for bulk load purpose. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10615) Make LoadIncrementalHFiles skip reference files
[ https://issues.apache.org/jira/browse/HBASE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915324#comment-13915324 ] Jerry He commented on HBASE-10615: -- A config parameter will probably be helpful if the bulk load tool itself can or wants to 'resolve' the reference/link. But bulk load is operating on one region dir only. Make LoadIncrementalHFiles skip reference files --- Key: HBASE-10615 URL: https://issues.apache.org/jira/browse/HBASE-10615 Project: HBase Issue Type: Improvement Affects Versions: 0.96.0 Reporter: Jerry He Assignee: Jerry He Priority: Minor Attachments: HBASE-10615-trunk-v2.patch, HBASE-10615-trunk-v3.patch, HBASE-10615-trunk.patch There is use base that the source of hfiles for LoadIncrementalHFiles can be a FileSystem copy-out/backup of HBase table or archive hfiles. For example, 1. Copy-out of hbase.rootdir, table dir, region dir (after disable) or archive dir. 2. ExportSnapshot It is possible that there are reference files in the family dir in these cases. We have such use cases, where trying to load back into HBase, we'll get {code} Caused by: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile Trailer from file hdfs://HDFS-AMR/tmp/restoreTemp/117182adfe861c5d2b607da91d60aa8a/info/aed3d01648384b31b29e5bad4cd80bec.d179ab341fc68e7612fcd74eaf7cafbd at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:570) at org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:594) at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:636) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:472) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:393) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:391) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314) at java.util.concurrent.FutureTask.run(FutureTask.java:149) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919) at java.lang.Thread.run(Thread.java:738) Caused by: java.lang.IllegalArgumentException: Invalid HFile version: 16715777 (expected to be between 2 and 2) at org.apache.hadoop.hbase.io.hfile.HFile.checkFormatVersion(HFile.java:927) at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:426) at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:568) {code} It is desirable and safe to skip these reference files since they don't contain any real data for bulk load purpose. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10615) Make LoadIncrementalHFiles skip reference files
[ https://issues.apache.org/jira/browse/HBASE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-10615: - Status: Open (was: Patch Available) Make LoadIncrementalHFiles skip reference files --- Key: HBASE-10615 URL: https://issues.apache.org/jira/browse/HBASE-10615 Project: HBase Issue Type: Improvement Affects Versions: 0.96.0 Reporter: Jerry He Assignee: Jerry He Priority: Minor Attachments: HBASE-10615-trunk-v2.patch, HBASE-10615-trunk-v3.patch, HBASE-10615-trunk.patch There is use base that the source of hfiles for LoadIncrementalHFiles can be a FileSystem copy-out/backup of HBase table or archive hfiles. For example, 1. Copy-out of hbase.rootdir, table dir, region dir (after disable) or archive dir. 2. ExportSnapshot It is possible that there are reference files in the family dir in these cases. We have such use cases, where trying to load back into HBase, we'll get {code} Caused by: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile Trailer from file hdfs://HDFS-AMR/tmp/restoreTemp/117182adfe861c5d2b607da91d60aa8a/info/aed3d01648384b31b29e5bad4cd80bec.d179ab341fc68e7612fcd74eaf7cafbd at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:570) at org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:594) at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:636) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:472) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:393) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:391) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314) at java.util.concurrent.FutureTask.run(FutureTask.java:149) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919) at java.lang.Thread.run(Thread.java:738) Caused by: java.lang.IllegalArgumentException: Invalid HFile version: 16715777 (expected to be between 2 and 2) at org.apache.hadoop.hbase.io.hfile.HFile.checkFormatVersion(HFile.java:927) at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:426) at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:568) {code} It is desirable and safe to skip these reference files since they don't contain any real data for bulk load purpose. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10615) Make LoadIncrementalHFiles skip reference files
[ https://issues.apache.org/jira/browse/HBASE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-10615: - Status: Patch Available (was: Open) Make LoadIncrementalHFiles skip reference files --- Key: HBASE-10615 URL: https://issues.apache.org/jira/browse/HBASE-10615 Project: HBase Issue Type: Improvement Affects Versions: 0.96.0 Reporter: Jerry He Assignee: Jerry He Priority: Minor Attachments: HBASE-10615-trunk-v2.patch, HBASE-10615-trunk-v3.patch, HBASE-10615-trunk.patch There is use base that the source of hfiles for LoadIncrementalHFiles can be a FileSystem copy-out/backup of HBase table or archive hfiles. For example, 1. Copy-out of hbase.rootdir, table dir, region dir (after disable) or archive dir. 2. ExportSnapshot It is possible that there are reference files in the family dir in these cases. We have such use cases, where trying to load back into HBase, we'll get {code} Caused by: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile Trailer from file hdfs://HDFS-AMR/tmp/restoreTemp/117182adfe861c5d2b607da91d60aa8a/info/aed3d01648384b31b29e5bad4cd80bec.d179ab341fc68e7612fcd74eaf7cafbd at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:570) at org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:594) at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:636) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:472) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:393) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:391) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314) at java.util.concurrent.FutureTask.run(FutureTask.java:149) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919) at java.lang.Thread.run(Thread.java:738) Caused by: java.lang.IllegalArgumentException: Invalid HFile version: 16715777 (expected to be between 2 and 2) at org.apache.hadoop.hbase.io.hfile.HFile.checkFormatVersion(HFile.java:927) at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:426) at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:568) {code} It is desirable and safe to skip these reference files since they don't contain any real data for bulk load purpose. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10615) Make LoadIncrementalHFiles skip reference files
[ https://issues.apache.org/jira/browse/HBASE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915435#comment-13915435 ] Jerry He commented on HBASE-10615: -- bq. A config parameter will probably be helpful if the bulk load tool itself can or wants to 'resolve' the reference/link. Then it will be good to give user an option to 'skip' or 'resolve'. Now it is more like the only option is to 'skip'. Or 'error', which doesn't make much meaningful sense. Make LoadIncrementalHFiles skip reference files --- Key: HBASE-10615 URL: https://issues.apache.org/jira/browse/HBASE-10615 Project: HBase Issue Type: Improvement Affects Versions: 0.96.0 Reporter: Jerry He Assignee: Jerry He Priority: Minor Attachments: HBASE-10615-trunk-v2.patch, HBASE-10615-trunk-v3.patch, HBASE-10615-trunk.patch There is use base that the source of hfiles for LoadIncrementalHFiles can be a FileSystem copy-out/backup of HBase table or archive hfiles. For example, 1. Copy-out of hbase.rootdir, table dir, region dir (after disable) or archive dir. 2. ExportSnapshot It is possible that there are reference files in the family dir in these cases. We have such use cases, where trying to load back into HBase, we'll get {code} Caused by: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile Trailer from file hdfs://HDFS-AMR/tmp/restoreTemp/117182adfe861c5d2b607da91d60aa8a/info/aed3d01648384b31b29e5bad4cd80bec.d179ab341fc68e7612fcd74eaf7cafbd at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:570) at org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:594) at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:636) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:472) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:393) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:391) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314) at java.util.concurrent.FutureTask.run(FutureTask.java:149) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919) at java.lang.Thread.run(Thread.java:738) Caused by: java.lang.IllegalArgumentException: Invalid HFile version: 16715777 (expected to be between 2 and 2) at org.apache.hadoop.hbase.io.hfile.HFile.checkFormatVersion(HFile.java:927) at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:426) at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:568) {code} It is desirable and safe to skip these reference files since they don't contain any real data for bulk load purpose. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10622) Improve log and Exceptions in Export Snapshot
[ https://issues.apache.org/jira/browse/HBASE-10622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13916308#comment-13916308 ] Jerry He commented on HBASE-10622: -- Hi, Matteo Are you reversing HBASE-9060 in the code by putting the path inthe Also, do you want to put a '%' charactor after: (totalBytesWritten/(float)inputFileSize) * 100.0f Improve log and Exceptions in Export Snapshot -- Key: HBASE-10622 URL: https://issues.apache.org/jira/browse/HBASE-10622 Project: HBase Issue Type: Bug Components: snapshots Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: 0.99.0 Attachments: HBASE-10622-v0.patch, HBASE-10622-v1.patch, HBASE-10622-v2.patch from the logs of export snapshot is not really clear what's going on, adding some extra information useful to debug, and in some places the real exception can be thrown -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10622) Improve log and Exceptions in Export Snapshot
[ https://issues.apache.org/jira/browse/HBASE-10622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13916318#comment-13916318 ] Jerry He commented on HBASE-10622: -- bq. Are you reversing HBASE-9060 in the code by putting the path inthe Are you reversing HBASE-9060 in the code by putting the path in the format? Improve log and Exceptions in Export Snapshot -- Key: HBASE-10622 URL: https://issues.apache.org/jira/browse/HBASE-10622 Project: HBase Issue Type: Bug Components: snapshots Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: 0.99.0 Attachments: HBASE-10622-v0.patch, HBASE-10622-v1.patch, HBASE-10622-v2.patch from the logs of export snapshot is not really clear what's going on, adding some extra information useful to debug, and in some places the real exception can be thrown -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10622) Improve log and Exceptions in Export Snapshot
[ https://issues.apache.org/jira/browse/HBASE-10622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13917556#comment-13917556 ] Jerry He commented on HBASE-10622: -- A few more comments (since you are doing the improvement ...) {code} -// Verify that the written size match -if (totalBytesWritten != inputFileSize) { - String msg = number of bytes copied not matching copied= + totalBytesWritten + -expected= + inputFileSize + for file= + inputPath; - throw new IOException(msg); {code} You think this is unnecessary? In the run(), can we cleanup/delete snapshotTmpDir if Step 2 failed so that we don't ask the user to manually clean it since it comes from our Step 1 copy? Can we add a job counter say 'COPIES_FILES' to be along side with 'BYTES_COPIED'? Another issue is probably more involved, and does not need to be covered in this JIRA. It is the overall progress reporting of the ExportSnapshot job. For example, hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot snapshot1 -copy-to /user/biadmin/mysnapshots -mappers 30 {code} 14/03/02 12:19:54 INFO mapred.JobClient: map 0% reduce 0% 14/03/02 12:20:12 INFO mapred.JobClient: map 6% reduce 0% 14/03/02 12:20:13 INFO mapred.JobClient: map 44% reduce 0% 14/03/02 12:20:19 INFO mapred.JobClient: map 83% reduce 0% {code} There is about 130G to export. But it takes just a few secs to get to 83%, after the first around of mappers are launched, and will stay there for a long time. Similarly at the end it will show 100% for a long time while there are mappers still running. he map progress percentage is quite inaccurate with regard to the over progress. Improve log and Exceptions in Export Snapshot -- Key: HBASE-10622 URL: https://issues.apache.org/jira/browse/HBASE-10622 Project: HBase Issue Type: Bug Components: snapshots Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: 0.99.0 Attachments: HBASE-10622-v0.patch, HBASE-10622-v1.patch, HBASE-10622-v2.patch, HBASE-10622-v3.patch from the logs of export snapshot is not really clear what's going on, adding some extra information useful to debug, and in some places the real exception can be thrown -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10622) Improve log and Exceptions in Export Snapshot
[ https://issues.apache.org/jira/browse/HBASE-10622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918364#comment-13918364 ] Jerry He commented on HBASE-10622: -- bq. other jira, it requires a new InputFormat/RecordReader with the progress based on the file size and not on the number of lines in the input file. The only progress that we track is the current file copy Agree. Not an easy thing to do. It seems that DistCp has the same issue. Looks good. Thanks. Improve log and Exceptions in Export Snapshot -- Key: HBASE-10622 URL: https://issues.apache.org/jira/browse/HBASE-10622 Project: HBase Issue Type: Bug Components: snapshots Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: 0.99.0 Attachments: HBASE-10622-v0.patch, HBASE-10622-v1.patch, HBASE-10622-v2.patch, HBASE-10622-v3.patch, HBASE-10622-v4.patch from the logs of export snapshot is not really clear what's going on, adding some extra information useful to debug, and in some places the real exception can be thrown -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10615) Make LoadIncrementalHFiles skip reference files
[ https://issues.apache.org/jira/browse/HBASE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920074#comment-13920074 ] Jerry He commented on HBASE-10615: -- The findbugs and TestLogRolling does not seem to be caused by the patch. Hi, [~stack], [~mbertozzi], [~yuzhih...@gmail.com] Are you ok with the patch? Make LoadIncrementalHFiles skip reference files --- Key: HBASE-10615 URL: https://issues.apache.org/jira/browse/HBASE-10615 Project: HBase Issue Type: Improvement Affects Versions: 0.96.0 Reporter: Jerry He Assignee: Jerry He Priority: Minor Attachments: HBASE-10615-trunk-v2.patch, HBASE-10615-trunk-v3.patch, HBASE-10615-trunk.patch There is use base that the source of hfiles for LoadIncrementalHFiles can be a FileSystem copy-out/backup of HBase table or archive hfiles. For example, 1. Copy-out of hbase.rootdir, table dir, region dir (after disable) or archive dir. 2. ExportSnapshot It is possible that there are reference files in the family dir in these cases. We have such use cases, where trying to load back into HBase, we'll get {code} Caused by: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile Trailer from file hdfs://HDFS-AMR/tmp/restoreTemp/117182adfe861c5d2b607da91d60aa8a/info/aed3d01648384b31b29e5bad4cd80bec.d179ab341fc68e7612fcd74eaf7cafbd at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:570) at org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:594) at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:636) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:472) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:393) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:391) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314) at java.util.concurrent.FutureTask.run(FutureTask.java:149) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919) at java.lang.Thread.run(Thread.java:738) Caused by: java.lang.IllegalArgumentException: Invalid HFile version: 16715777 (expected to be between 2 and 2) at org.apache.hadoop.hbase.io.hfile.HFile.checkFormatVersion(HFile.java:927) at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:426) at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:568) {code} It is desirable and safe to skip these reference files since they don't contain any real data for bulk load purpose. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-8304) Bulkload fail to remove files if fs.default.name / fs.defaultFS is configured without default port.
[ https://issues.apache.org/jira/browse/HBASE-8304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923197#comment-13923197 ] Jerry He commented on HBASE-8304: - Maybe I am asking for too much. Does this JIRA/patch covers the following case: source uri: webhdfs://myhost1:14000/ target uri: hdfs://myhost1:9000/ We should not copy in this case since we are really on the same cluster as well. Bulkload fail to remove files if fs.default.name / fs.defaultFS is configured without default port. --- Key: HBASE-8304 URL: https://issues.apache.org/jira/browse/HBASE-8304 Project: HBase Issue Type: Bug Components: HFile, regionserver Affects Versions: 0.94.5 Reporter: Raymond Liu Assignee: haosdent Labels: bulkloader Attachments: HBASE-8304-v2.patch, HBASE-8304-v3.patch, HBASE-8304.patch When fs.default.name or fs.defaultFS in hadoop core-site.xml is configured as hdfs://ip, and hbase.rootdir is configured as hdfs://ip:port/hbaserootdir where port is the hdfs namenode's default port. the bulkload operation will not remove the file in bulk output dir. Store::bulkLoadHfile will think hdfs:://ip and hdfs:://ip:port as different filesystem and go with copy approaching instead of rename. The root cause is that hbase master will rewrite fs.default.name/fs.defaultFS according to hbase.rootdir when regionserver started, thus, dest fs uri from the hregion will not matching src fs uri passed from client. any suggestion what is the best approaching to fix this issue? I kind of think that we could check for default port if src uri come without port info. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-8304) Bulkload fail to remove files if fs.default.name / fs.defaultFS is configured without default port.
[ https://issues.apache.org/jira/browse/HBASE-8304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923220#comment-13923220 ] Jerry He commented on HBASE-8304: - {code} +String srcServiceName = srcFs.getCanonicalServiceName(); +String desServiceName = desFs.getCanonicalServiceName(); ... + //If one serviceName is a HA format while the other is a no-HA format, + //maybe they are refer to the same FileSystem. + //For example, srcFs is ha-hdfs://nameservices and desFs is hdfs://activeNamenode:port + SetInetSocketAddress srcAddrs = getNNAddresses((DistributedFileSystem) srcFs, conf); + SetInetSocketAddress desAddrs = getNNAddresses((DistributedFileSystem) desFs, conf); + if (Sets.intersection(srcAddrs, desAddrs).size() 0) { +return true; {code} A little unclear about this. Given your example: //For example, srcFs is ha-hdfs://nameservices and desFs is hdfs://activeNamenode:port If the desFs is HA enabled, then you will get the ''ha-hdfs:// format, right? If it returns hdfs://, does it already tell you they are different FS? It is good JIRA. I didn't know fs.getCanonicalServiceName() will return ha-hdfs://nameservices in HA case. Bulkload fail to remove files if fs.default.name / fs.defaultFS is configured without default port. --- Key: HBASE-8304 URL: https://issues.apache.org/jira/browse/HBASE-8304 Project: HBase Issue Type: Bug Components: HFile, regionserver Affects Versions: 0.94.5 Reporter: Raymond Liu Assignee: haosdent Labels: bulkloader Attachments: HBASE-8304-v2.patch, HBASE-8304-v3.patch, HBASE-8304.patch When fs.default.name or fs.defaultFS in hadoop core-site.xml is configured as hdfs://ip, and hbase.rootdir is configured as hdfs://ip:port/hbaserootdir where port is the hdfs namenode's default port. the bulkload operation will not remove the file in bulk output dir. Store::bulkLoadHfile will think hdfs:://ip and hdfs:://ip:port as different filesystem and go with copy approaching instead of rename. The root cause is that hbase master will rewrite fs.default.name/fs.defaultFS according to hbase.rootdir when regionserver started, thus, dest fs uri from the hregion will not matching src fs uri passed from client. any suggestion what is the best approaching to fix this issue? I kind of think that we could check for default port if src uri come without port info. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-8798) Fix a minor bug in shell command with clone_snapshot table error
[ https://issues.apache.org/jira/browse/HBASE-8798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697010#comment-13697010 ] Jerry He commented on HBASE-8798: - Here is the output after the fix. hbase(main):001:0 list TABLE TestTable 1 row(s) in 1.1350 seconds = [TestTable] hbase(main):002:0 list_snapshots SNAPSHOTTABLE + CREATION TIME mysnapshot1TestTable (Mon Jun 24 13:29:00 -0700 2013) 1 row(s) in 0.2040 seconds = [mysnapshot1] hbase(main):003:0 clone_snapshot 'mysnapshot1', 'TestTable' ERROR: Table already exists: *TestTable!* Here is some help for this command: Create a new table by cloning the snapshot content. There're no copies of data involved. And writing on the newly created table will not influence the snapshot data. Examples: hbase clone_snapshot 'snapshotName', 'tableName' Fix a minor bug in shell command with clone_snapshot table error Key: HBASE-8798 URL: https://issues.apache.org/jira/browse/HBASE-8798 Project: HBase Issue Type: Bug Components: shell, snapshots Affects Versions: 0.94.8, 0.95.1 Reporter: Jerry He Assignee: Jerry He Priority: Minor Attachments: HBASE-8798-trunk.patch In HBase shell, the syntax for clone_snapshot is: hbase clone_snapshot 'snapshotName', 'tableName' If the target table already exists, we'll get an error. For example: -- hbase(main):011:0 clone_snapshot 'mysnapshot1', 'TestTable' ERROR: Table already exists: mysnapshot1! Here is some help for this command: Create a new table by cloning the snapshot content. There're no copies of data involved. And writing on the newly created table will not influence the snapshot data. Examples: hbase clone_snapshot 'snapshotName', 'tableName' -- The bug is in the ERROR message: *ERROR: Table already exists: mysnapshot1!* We should output the table name, not the snapshot name. Currently, in command.rb, we have the output fixed as args.first for TableExistsException: {code} def translate_hbase_exceptions(*args) yield rescue org.apache.hadoop.hbase.exceptions.TableNotFoundException raise Unknown table #{args.first}! rescue org.apache.hadoop.hbase.exceptions.NoSuchColumnFamilyException valid_cols = table(args.first).get_all_columns.map { |c| c + '*' } raise Unknown column family! Valid column names: #{valid_cols.join(, )} rescue org.apache.hadoop.hbase.exceptions.TableExistsException raise Table already exists: #{args.first}! end {code} This is fine with commands like 'create tableName ...' but not 'clone_snapshot snapshotName tableName'. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8798) Fix a minor bug in shell command with clone_snapshot table error
[ https://issues.apache.org/jira/browse/HBASE-8798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697290#comment-13697290 ] Jerry He commented on HBASE-8798: - Hi, Ted, Matteo Thanks for the review. The commands.rb has some existing logic that tries to 'rescue' some exceptions. It will get more and more difficult to keep it generic artificially as we evolve and add more ... The approach to expand (instead of shrink) this logic is not a bad or ugly one ... But your suggested solution is fine. Fix a minor bug in shell command with clone_snapshot table error Key: HBASE-8798 URL: https://issues.apache.org/jira/browse/HBASE-8798 Project: HBase Issue Type: Bug Components: shell, snapshots Affects Versions: 0.94.8, 0.95.1 Reporter: Jerry He Assignee: Jerry He Priority: Minor Attachments: HBASE-8798-trunk.patch In HBase shell, the syntax for clone_snapshot is: hbase clone_snapshot 'snapshotName', 'tableName' If the target table already exists, we'll get an error. For example: -- hbase(main):011:0 clone_snapshot 'mysnapshot1', 'TestTable' ERROR: Table already exists: mysnapshot1! Here is some help for this command: Create a new table by cloning the snapshot content. There're no copies of data involved. And writing on the newly created table will not influence the snapshot data. Examples: hbase clone_snapshot 'snapshotName', 'tableName' -- The bug is in the ERROR message: *ERROR: Table already exists: mysnapshot1!* We should output the table name, not the snapshot name. Currently, in command.rb, we have the output fixed as args.first for TableExistsException: {code} def translate_hbase_exceptions(*args) yield rescue org.apache.hadoop.hbase.exceptions.TableNotFoundException raise Unknown table #{args.first}! rescue org.apache.hadoop.hbase.exceptions.NoSuchColumnFamilyException valid_cols = table(args.first).get_all_columns.map { |c| c + '*' } raise Unknown column family! Valid column names: #{valid_cols.join(, )} rescue org.apache.hadoop.hbase.exceptions.TableExistsException raise Table already exists: #{args.first}! end {code} This is fine with commands like 'create tableName ...' but not 'clone_snapshot snapshotName tableName'. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8760) possible loss of data in snapshot taken after region split
[ https://issues.apache.org/jira/browse/HBASE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-8760: Assignee: (was: Jerry He) possible loss of data in snapshot taken after region split -- Key: HBASE-8760 URL: https://issues.apache.org/jira/browse/HBASE-8760 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.94.8 Reporter: Jerry He Fix For: 0.94.10 Attachments: HBase-8760-0.94.8.patch, HBase-8760-0.94.8-v1.patch, HBASE-8760-thz-v0.patch Right after a region split but before the daughter regions are compacted, we have two daughter regions containing Reference files to the parent hfiles. If we take snapshot right at the moment, the snapshot will succeed, but it will only contain the daughter Reference files. Since there is no hold on the parent hfiles, they will be deleted by the HFile Cleaner after they are no longer needed by the daughter regions soon after. A minimum we need to do is the keep these parent hfiles from being deleted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8760) possible loss of data in snapshot taken after region split
[ https://issues.apache.org/jira/browse/HBASE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-8760: Affects Version/s: 0.95.1 possible loss of data in snapshot taken after region split -- Key: HBASE-8760 URL: https://issues.apache.org/jira/browse/HBASE-8760 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.94.8, 0.95.1 Reporter: Jerry He Fix For: 0.94.10 Attachments: HBase-8760-0.94.8.patch, HBase-8760-0.94.8-v1.patch, HBASE-8760-thz-v0.patch Right after a region split but before the daughter regions are compacted, we have two daughter regions containing Reference files to the parent hfiles. If we take snapshot right at the moment, the snapshot will succeed, but it will only contain the daughter Reference files. Since there is no hold on the parent hfiles, they will be deleted by the HFile Cleaner after they are no longer needed by the daughter regions soon after. A minimum we need to do is the keep these parent hfiles from being deleted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8760) possible loss of data in snapshot taken after region split
[ https://issues.apache.org/jira/browse/HBASE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-8760: Fix Version/s: 0.95.2 possible loss of data in snapshot taken after region split -- Key: HBASE-8760 URL: https://issues.apache.org/jira/browse/HBASE-8760 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.94.8, 0.95.1 Reporter: Jerry He Fix For: 0.95.2, 0.94.10 Attachments: HBase-8760-0.94.8.patch, HBase-8760-0.94.8-v1.patch, HBASE-8760-thz-v0.patch Right after a region split but before the daughter regions are compacted, we have two daughter regions containing Reference files to the parent hfiles. If we take snapshot right at the moment, the snapshot will succeed, but it will only contain the daughter Reference files. Since there is no hold on the parent hfiles, they will be deleted by the HFile Cleaner after they are no longer needed by the daughter regions soon after. A minimum we need to do is the keep these parent hfiles from being deleted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8967) Duplicate call to snapshotManager.stop() in HRegionServer
Jerry He created HBASE-8967: --- Summary: Duplicate call to snapshotManager.stop() in HRegionServer Key: HBASE-8967 URL: https://issues.apache.org/jira/browse/HBASE-8967 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.94.9, 0.95.1 Reporter: Jerry He Assignee: Jerry He Priority: Minor Fix For: 0.95.2 snapshotManager.stop() is called twice in HRegionServer shutdown process {code} 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager: Stopping RegionServerSnapshotManager gracefully. 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: regionserver60020.cacheFlusher exiting 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting. 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker: regionserver60020.compactionChecker exiting 2013-07-12 12:06:56,909 DEBUG org.apache.hadoop.hbase.catalog.CatalogTracker: Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@1bfd1bfd ... 2013-07-12 12:06:56,911 INFO org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager: Stopping RegionServerSnapshotManager gracefully. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8967) Duplicate call to snapshotManager.stop() in HRegionServer
[ https://issues.apache.org/jira/browse/HBASE-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-8967: Attachment: HBASE-8967.patch.patch Duplicate call to snapshotManager.stop() in HRegionServer - Key: HBASE-8967 URL: https://issues.apache.org/jira/browse/HBASE-8967 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.95.1, 0.94.9 Reporter: Jerry He Assignee: Jerry He Priority: Minor Fix For: 0.95.2 snapshotManager.stop() is called twice in HRegionServer shutdown process {code} 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager: Stopping RegionServerSnapshotManager gracefully. 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: regionserver60020.cacheFlusher exiting 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting. 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker: regionserver60020.compactionChecker exiting 2013-07-12 12:06:56,909 DEBUG org.apache.hadoop.hbase.catalog.CatalogTracker: Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@1bfd1bfd ... 2013-07-12 12:06:56,911 INFO org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager: Stopping RegionServerSnapshotManager gracefully. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8967) Duplicate call to snapshotManager.stop() in HRegionServer
[ https://issues.apache.org/jira/browse/HBASE-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-8967: Attachment: (was: HBASE-8967.patch.patch) Duplicate call to snapshotManager.stop() in HRegionServer - Key: HBASE-8967 URL: https://issues.apache.org/jira/browse/HBASE-8967 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.95.1, 0.94.9 Reporter: Jerry He Assignee: Jerry He Priority: Minor Fix For: 0.95.2 snapshotManager.stop() is called twice in HRegionServer shutdown process {code} 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager: Stopping RegionServerSnapshotManager gracefully. 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: regionserver60020.cacheFlusher exiting 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting. 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker: regionserver60020.compactionChecker exiting 2013-07-12 12:06:56,909 DEBUG org.apache.hadoop.hbase.catalog.CatalogTracker: Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@1bfd1bfd ... 2013-07-12 12:06:56,911 INFO org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager: Stopping RegionServerSnapshotManager gracefully. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8967) Duplicate call to snapshotManager.stop() in HRegionServer
[ https://issues.apache.org/jira/browse/HBASE-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-8967: Status: Patch Available (was: Open) Duplicate call to snapshotManager.stop() in HRegionServer - Key: HBASE-8967 URL: https://issues.apache.org/jira/browse/HBASE-8967 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.94.9, 0.95.1 Reporter: Jerry He Assignee: Jerry He Priority: Minor Fix For: 0.95.2 Attachments: HBASE-8967.patch snapshotManager.stop() is called twice in HRegionServer shutdown process {code} 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager: Stopping RegionServerSnapshotManager gracefully. 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: regionserver60020.cacheFlusher exiting 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting. 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker: regionserver60020.compactionChecker exiting 2013-07-12 12:06:56,909 DEBUG org.apache.hadoop.hbase.catalog.CatalogTracker: Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@1bfd1bfd ... 2013-07-12 12:06:56,911 INFO org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager: Stopping RegionServerSnapshotManager gracefully. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8967) Duplicate call to snapshotManager.stop() in HRegionServer
[ https://issues.apache.org/jira/browse/HBASE-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-8967: Attachment: HBASE-8967.patch Duplicate call to snapshotManager.stop() in HRegionServer - Key: HBASE-8967 URL: https://issues.apache.org/jira/browse/HBASE-8967 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.95.1, 0.94.9 Reporter: Jerry He Assignee: Jerry He Priority: Minor Fix For: 0.95.2 Attachments: HBASE-8967.patch snapshotManager.stop() is called twice in HRegionServer shutdown process {code} 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager: Stopping RegionServerSnapshotManager gracefully. 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: regionserver60020.cacheFlusher exiting 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting. 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker: regionserver60020.compactionChecker exiting 2013-07-12 12:06:56,909 DEBUG org.apache.hadoop.hbase.catalog.CatalogTracker: Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@1bfd1bfd ... 2013-07-12 12:06:56,911 INFO org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager: Stopping RegionServerSnapshotManager gracefully. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8967) Duplicate call to snapshotManager.stop() in HRegionServer
[ https://issues.apache.org/jira/browse/HBASE-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13710603#comment-13710603 ] Jerry He commented on HBASE-8967: - Also included in the patch is a minor cleanup related to HBASE-8783 Duplicate call to snapshotManager.stop() in HRegionServer - Key: HBASE-8967 URL: https://issues.apache.org/jira/browse/HBASE-8967 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.95.1, 0.94.9 Reporter: Jerry He Assignee: Jerry He Priority: Minor Fix For: 0.95.2 Attachments: HBASE-8967.patch snapshotManager.stop() is called twice in HRegionServer shutdown process {code} 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager: Stopping RegionServerSnapshotManager gracefully. 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: regionserver60020.cacheFlusher exiting 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting. 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker: regionserver60020.compactionChecker exiting 2013-07-12 12:06:56,909 DEBUG org.apache.hadoop.hbase.catalog.CatalogTracker: Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@1bfd1bfd ... 2013-07-12 12:06:56,911 INFO org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager: Stopping RegionServerSnapshotManager gracefully. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8967) Duplicate call to snapshotManager.stop() in HRegionServer
[ https://issues.apache.org/jira/browse/HBASE-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-8967: Attachment: HBASE-8967-v2.patch Duplicate call to snapshotManager.stop() in HRegionServer - Key: HBASE-8967 URL: https://issues.apache.org/jira/browse/HBASE-8967 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.95.1, 0.94.9 Reporter: Jerry He Assignee: Jerry He Priority: Minor Fix For: 0.95.2 Attachments: HBASE-8967.patch, HBASE-8967-v2.patch snapshotManager.stop() is called twice in HRegionServer shutdown process {code} 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager: Stopping RegionServerSnapshotManager gracefully. 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: regionserver60020.cacheFlusher exiting 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting. 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker: regionserver60020.compactionChecker exiting 2013-07-12 12:06:56,909 DEBUG org.apache.hadoop.hbase.catalog.CatalogTracker: Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@1bfd1bfd ... 2013-07-12 12:06:56,911 INFO org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager: Stopping RegionServerSnapshotManager gracefully. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8967) Duplicate call to snapshotManager.stop() in HRegionServer
[ https://issues.apache.org/jira/browse/HBASE-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13711480#comment-13711480 ] Jerry He commented on HBASE-8967: - Thanks, Ted, Matteo Attached v2 to add back the comment line. Duplicate call to snapshotManager.stop() in HRegionServer - Key: HBASE-8967 URL: https://issues.apache.org/jira/browse/HBASE-8967 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.95.1, 0.94.9 Reporter: Jerry He Assignee: Jerry He Priority: Minor Fix For: 0.95.2 Attachments: HBASE-8967.patch, HBASE-8967-v2.patch snapshotManager.stop() is called twice in HRegionServer shutdown process {code} 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager: Stopping RegionServerSnapshotManager gracefully. 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: regionserver60020.cacheFlusher exiting 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting. 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker: regionserver60020.compactionChecker exiting 2013-07-12 12:06:56,909 DEBUG org.apache.hadoop.hbase.catalog.CatalogTracker: Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@1bfd1bfd ... 2013-07-12 12:06:56,911 INFO org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager: Stopping RegionServerSnapshotManager gracefully. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8967) Duplicate call to snapshotManager.stop() in HRegionServer
[ https://issues.apache.org/jira/browse/HBASE-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-8967: Attachment: HBASE-8967-v2-0.94.patch Duplicate call to snapshotManager.stop() in HRegionServer - Key: HBASE-8967 URL: https://issues.apache.org/jira/browse/HBASE-8967 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.95.1, 0.94.9 Reporter: Jerry He Assignee: Jerry He Priority: Minor Fix For: 0.95.2 Attachments: HBASE-8967.patch, HBASE-8967-v2-0.94.patch, HBASE-8967-v2.patch snapshotManager.stop() is called twice in HRegionServer shutdown process {code} 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager: Stopping RegionServerSnapshotManager gracefully. 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: regionserver60020.cacheFlusher exiting 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting. 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker: regionserver60020.compactionChecker exiting 2013-07-12 12:06:56,909 DEBUG org.apache.hadoop.hbase.catalog.CatalogTracker: Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@1bfd1bfd ... 2013-07-12 12:06:56,911 INFO org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager: Stopping RegionServerSnapshotManager gracefully. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8967) Duplicate call to snapshotManager.stop() in HRegionServer
[ https://issues.apache.org/jira/browse/HBASE-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13711501#comment-13711501 ] Jerry He commented on HBASE-8967: - Attached v2-0.94 for 0.94 consideration. Duplicate call to snapshotManager.stop() in HRegionServer - Key: HBASE-8967 URL: https://issues.apache.org/jira/browse/HBASE-8967 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.95.1, 0.94.9 Reporter: Jerry He Assignee: Jerry He Priority: Minor Fix For: 0.95.2 Attachments: HBASE-8967.patch, HBASE-8967-v2-0.94.patch, HBASE-8967-v2.patch snapshotManager.stop() is called twice in HRegionServer shutdown process {code} 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager: Stopping RegionServerSnapshotManager gracefully. 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: regionserver60020.cacheFlusher exiting 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting. 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker: regionserver60020.compactionChecker exiting 2013-07-12 12:06:56,909 DEBUG org.apache.hadoop.hbase.catalog.CatalogTracker: Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@1bfd1bfd ... 2013-07-12 12:06:56,911 INFO org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager: Stopping RegionServerSnapshotManager gracefully. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8967) Duplicate call to snapshotManager.stop() in HRegionServer
[ https://issues.apache.org/jira/browse/HBASE-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13711894#comment-13711894 ] Jerry He commented on HBASE-8967: - Thanks for the comment and good reminder, Stack, Ted. Duplicate call to snapshotManager.stop() in HRegionServer - Key: HBASE-8967 URL: https://issues.apache.org/jira/browse/HBASE-8967 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.95.1, 0.94.9 Reporter: Jerry He Assignee: Jerry He Priority: Minor Fix For: 0.98.0, 0.95.2 Attachments: HBASE-8967.patch, HBASE-8967-v2-0.94.patch, HBASE-8967-v2.patch snapshotManager.stop() is called twice in HRegionServer shutdown process {code} 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager: Stopping RegionServerSnapshotManager gracefully. 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: regionserver60020.cacheFlusher exiting 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting. 2013-07-12 12:06:56,909 INFO org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker: regionserver60020.compactionChecker exiting 2013-07-12 12:06:56,909 DEBUG org.apache.hadoop.hbase.catalog.CatalogTracker: Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@1bfd1bfd ... 2013-07-12 12:06:56,911 INFO org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager: Stopping RegionServerSnapshotManager gracefully. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-9029) Backport HBASE-8706 Some improvement in snapshot to 0.94
Jerry He created HBASE-9029: --- Summary: Backport HBASE-8706 Some improvement in snapshot to 0.94 Key: HBASE-9029 URL: https://issues.apache.org/jira/browse/HBASE-9029 Project: HBase Issue Type: Improvement Components: snapshots Affects Versions: 0.94.9 Reporter: Jerry He Assignee: Jerry He Priority: Minor Fix For: 0.94.11 'HBASE-8706 Some improvement in snapshot' has some good parameter tuning and improvement for snapshot handling, making snapshot more robust. It will nice to put it in 0.94. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9029) Backport HBASE-8706 Some improvement in snapshot to 0.94
[ https://issues.apache.org/jira/browse/HBASE-9029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-9029: Attachment: HBase-9029-0.94.patch Backport HBASE-8706 Some improvement in snapshot to 0.94 Key: HBASE-9029 URL: https://issues.apache.org/jira/browse/HBASE-9029 Project: HBase Issue Type: Improvement Components: snapshots Affects Versions: 0.94.9 Reporter: Jerry He Assignee: Jerry He Priority: Minor Fix For: 0.94.11 Attachments: HBase-9029-0.94.patch 'HBASE-8706 Some improvement in snapshot' has some good parameter tuning and improvement for snapshot handling, making snapshot more robust. It will nice to put it in 0.94. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9029) Backport HBASE-8706 Some improvement in snapshot to 0.94
[ https://issues.apache.org/jira/browse/HBASE-9029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717551#comment-13717551 ] Jerry He commented on HBASE-9029: - It was backported to our 0.94.9 branch short time ago. Unit test clean on local run: {code} Running org.apache.hadoop.hbase.TestHServerInfo Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 215.893 sec Results : Tests run: 692, Failures: 0, Errors: 0, Skipped: 0 Running org.apache.hadoop.hbase.snapshot.TestSnapshotDescriptionUtils Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.532 sec Running org.apache.hadoop.hbase.snapshot.TestExportSnapshot Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 38.308 sec Results : Tests run: 888, Failures: 0, Errors: 0, Skipped: 2 [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 31:52.827s [INFO] Finished at: Fri Jul 19 20:52:08 PDT 2013 [INFO] Final Memory: 26M/226M [INFO] {code} Also ran some ad hoc snapshot testing successfully. Backport HBASE-8706 Some improvement in snapshot to 0.94 Key: HBASE-9029 URL: https://issues.apache.org/jira/browse/HBASE-9029 Project: HBase Issue Type: Improvement Components: snapshots Affects Versions: 0.94.9 Reporter: Jerry He Assignee: Jerry He Priority: Minor Fix For: 0.94.11 Attachments: HBase-9029-0.94.patch 'HBASE-8706 Some improvement in snapshot' has some good parameter tuning and improvement for snapshot handling, making snapshot more robust. It will nice to put it in 0.94. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-9060) EXportSnapshot job fails if target path contains percentage character
Jerry He created HBASE-9060: --- Summary: EXportSnapshot job fails if target path contains percentage character Key: HBASE-9060 URL: https://issues.apache.org/jira/browse/HBASE-9060 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.94.10, 0.95.1 Reporter: Jerry He Assignee: Jerry He Priority: Minor Here is the stack trace: hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot table1_snapshot -copy-to hdfs:///myhbasebackup/table1_snapshot {code} 13/07/26 18:09:50 INFO mapred.JobClient: map 0% reduce 0% 13/07/26 18:09:58 INFO mapred.JobClient: Task Id : attempt_201307261804_0002_m_01_0, Status : FAILED java.util.MissingFormatArgumentException: Format specifier ') from family1/table1=3567d8ac6cfee83dfe81c346f139fb9c-c5bc120475a54d188f30d4b621d505b1 to hdfs:/myhbase%2C' at java.util.Formatter.getArgument(Formatter.java:592) at java.util.Formatter.format(Formatter.java:561) at java.util.Formatter.format(Formatter.java:510) at java.lang.String.format(String.java:1977) at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.copyData(ExportSnapshot.java:274) at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.copyFile(ExportSnapshot.java:204) at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.map(ExportSnapshot.java:149) at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.map(ExportSnapshot.java:98) {code} The problem is this code in copyData(): {code} final String statusMessage = copied %s/ + StringUtils.humanReadableInt(inputFileSize) + (%.3f%%) from + inputPath + to + outputPath; {code} Since we don't know what the path may contain that may confuse the formatter, we need to pull that part out of the format string. Also the percentage completion math seems to be wrong in the same code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9060) EXportSnapshot job fails if target path contains percentage character
[ https://issues.apache.org/jira/browse/HBASE-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-9060: Fix Version/s: 0.95.2 Status: Patch Available (was: Open) EXportSnapshot job fails if target path contains percentage character - Key: HBASE-9060 URL: https://issues.apache.org/jira/browse/HBASE-9060 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.94.10, 0.95.1 Reporter: Jerry He Assignee: Jerry He Priority: Minor Fix For: 0.95.2 Attachments: HBase-9060.patch Here is the stack trace: hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot table1_snapshot -copy-to hdfs:///myhbasebackup/table1_snapshot {code} 13/07/26 18:09:50 INFO mapred.JobClient: map 0% reduce 0% 13/07/26 18:09:58 INFO mapred.JobClient: Task Id : attempt_201307261804_0002_m_01_0, Status : FAILED java.util.MissingFormatArgumentException: Format specifier ') from family1/table1=3567d8ac6cfee83dfe81c346f139fb9c-c5bc120475a54d188f30d4b621d505b1 to hdfs:/myhbase%2C' at java.util.Formatter.getArgument(Formatter.java:592) at java.util.Formatter.format(Formatter.java:561) at java.util.Formatter.format(Formatter.java:510) at java.lang.String.format(String.java:1977) at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.copyData(ExportSnapshot.java:274) at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.copyFile(ExportSnapshot.java:204) at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.map(ExportSnapshot.java:149) at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.map(ExportSnapshot.java:98) {code} The problem is this code in copyData(): {code} final String statusMessage = copied %s/ + StringUtils.humanReadableInt(inputFileSize) + (%.3f%%) from + inputPath + to + outputPath; {code} Since we don't know what the path may contain that may confuse the formatter, we need to pull that part out of the format string. Also the percentage completion math seems to be wrong in the same code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9060) EXportSnapshot job fails if target path contains percentage character
[ https://issues.apache.org/jira/browse/HBASE-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-9060: Attachment: HBase-9060.patch EXportSnapshot job fails if target path contains percentage character - Key: HBASE-9060 URL: https://issues.apache.org/jira/browse/HBASE-9060 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.95.1, 0.94.10 Reporter: Jerry He Assignee: Jerry He Priority: Minor Attachments: HBase-9060.patch Here is the stack trace: hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot table1_snapshot -copy-to hdfs:///myhbasebackup/table1_snapshot {code} 13/07/26 18:09:50 INFO mapred.JobClient: map 0% reduce 0% 13/07/26 18:09:58 INFO mapred.JobClient: Task Id : attempt_201307261804_0002_m_01_0, Status : FAILED java.util.MissingFormatArgumentException: Format specifier ') from family1/table1=3567d8ac6cfee83dfe81c346f139fb9c-c5bc120475a54d188f30d4b621d505b1 to hdfs:/myhbase%2C' at java.util.Formatter.getArgument(Formatter.java:592) at java.util.Formatter.format(Formatter.java:561) at java.util.Formatter.format(Formatter.java:510) at java.lang.String.format(String.java:1977) at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.copyData(ExportSnapshot.java:274) at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.copyFile(ExportSnapshot.java:204) at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.map(ExportSnapshot.java:149) at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.map(ExportSnapshot.java:98) {code} The problem is this code in copyData(): {code} final String statusMessage = copied %s/ + StringUtils.humanReadableInt(inputFileSize) + (%.3f%%) from + inputPath + to + outputPath; {code} Since we don't know what the path may contain that may confuse the formatter, we need to pull that part out of the format string. Also the percentage completion math seems to be wrong in the same code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9060) EXportSnapshot job fails if target path contains percentage character
[ https://issues.apache.org/jira/browse/HBASE-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-9060: Description: Here is the stack trace: hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot table1_snapshot -copy-to hdfs:///myhbase%2Cbackup/table1_snapshot {code} 13/07/26 18:09:50 INFO mapred.JobClient: map 0% reduce 0% 13/07/26 18:09:58 INFO mapred.JobClient: Task Id : attempt_201307261804_0002_m_01_0, Status : FAILED java.util.MissingFormatArgumentException: Format specifier ') from family1/table1=3567d8ac6cfee83dfe81c346f139fb9c-c5bc120475a54d188f30d4b621d505b1 to hdfs:/myhbase%2C' at java.util.Formatter.getArgument(Formatter.java:592) at java.util.Formatter.format(Formatter.java:561) at java.util.Formatter.format(Formatter.java:510) at java.lang.String.format(String.java:1977) at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.copyData(ExportSnapshot.java:274) at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.copyFile(ExportSnapshot.java:204) at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.map(ExportSnapshot.java:149) at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.map(ExportSnapshot.java:98) {code} The problem is this code in copyData(): {code} final String statusMessage = copied %s/ + StringUtils.humanReadableInt(inputFileSize) + (%.3f%%) from + inputPath + to + outputPath; {code} Since we don't know what the path may contain that may confuse the formatter, we need to pull that part out of the format string. Also the percentage completion math seems to be wrong in the same code. was: Here is the stack trace: hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot table1_snapshot -copy-to hdfs:///myhbasebackup/table1_snapshot {code} 13/07/26 18:09:50 INFO mapred.JobClient: map 0% reduce 0% 13/07/26 18:09:58 INFO mapred.JobClient: Task Id : attempt_201307261804_0002_m_01_0, Status : FAILED java.util.MissingFormatArgumentException: Format specifier ') from family1/table1=3567d8ac6cfee83dfe81c346f139fb9c-c5bc120475a54d188f30d4b621d505b1 to hdfs:/myhbase%2C' at java.util.Formatter.getArgument(Formatter.java:592) at java.util.Formatter.format(Formatter.java:561) at java.util.Formatter.format(Formatter.java:510) at java.lang.String.format(String.java:1977) at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.copyData(ExportSnapshot.java:274) at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.copyFile(ExportSnapshot.java:204) at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.map(ExportSnapshot.java:149) at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.map(ExportSnapshot.java:98) {code} The problem is this code in copyData(): {code} final String statusMessage = copied %s/ + StringUtils.humanReadableInt(inputFileSize) + (%.3f%%) from + inputPath + to + outputPath; {code} Since we don't know what the path may contain that may confuse the formatter, we need to pull that part out of the format string. Also the percentage completion math seems to be wrong in the same code. EXportSnapshot job fails if target path contains percentage character - Key: HBASE-9060 URL: https://issues.apache.org/jira/browse/HBASE-9060 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.95.1, 0.94.10 Reporter: Jerry He Assignee: Jerry He Priority: Minor Fix For: 0.95.2 Attachments: HBase-9060.patch Here is the stack trace: hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot table1_snapshot -copy-to hdfs:///myhbase%2Cbackup/table1_snapshot {code} 13/07/26 18:09:50 INFO mapred.JobClient: map 0% reduce 0% 13/07/26 18:09:58 INFO mapred.JobClient: Task Id : attempt_201307261804_0002_m_01_0, Status : FAILED java.util.MissingFormatArgumentException: Format specifier ') from family1/table1=3567d8ac6cfee83dfe81c346f139fb9c-c5bc120475a54d188f30d4b621d505b1 to hdfs:/myhbase%2C' at java.util.Formatter.getArgument(Formatter.java:592) at java.util.Formatter.format(Formatter.java:561) at java.util.Formatter.format(Formatter.java:510) at java.lang.String.format(String.java:1977) at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.copyData(ExportSnapshot.java:274) at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.copyFile(ExportSnapshot.java:204) at
[jira] [Commented] (HBASE-9060) EXportSnapshot job fails if target path contains percentage character
[ https://issues.apache.org/jira/browse/HBASE-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721892#comment-13721892 ] Jerry He commented on HBASE-9060: - Edited the description to correct the path in the command (had given the path without the problem). EXportSnapshot job fails if target path contains percentage character - Key: HBASE-9060 URL: https://issues.apache.org/jira/browse/HBASE-9060 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.95.1, 0.94.10 Reporter: Jerry He Assignee: Jerry He Priority: Minor Fix For: 0.95.2 Attachments: HBase-9060.patch Here is the stack trace: hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot table1_snapshot -copy-to hdfs:///myhbase%2Cbackup/table1_snapshot {code} 13/07/26 18:09:50 INFO mapred.JobClient: map 0% reduce 0% 13/07/26 18:09:58 INFO mapred.JobClient: Task Id : attempt_201307261804_0002_m_01_0, Status : FAILED java.util.MissingFormatArgumentException: Format specifier ') from family1/table1=3567d8ac6cfee83dfe81c346f139fb9c-c5bc120475a54d188f30d4b621d505b1 to hdfs:/myhbase%2C' at java.util.Formatter.getArgument(Formatter.java:592) at java.util.Formatter.format(Formatter.java:561) at java.util.Formatter.format(Formatter.java:510) at java.lang.String.format(String.java:1977) at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.copyData(ExportSnapshot.java:274) at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.copyFile(ExportSnapshot.java:204) at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.map(ExportSnapshot.java:149) at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.map(ExportSnapshot.java:98) {code} The problem is this code in copyData(): {code} final String statusMessage = copied %s/ + StringUtils.humanReadableInt(inputFileSize) + (%.3f%%) from + inputPath + to + outputPath; {code} Since we don't know what the path may contain that may confuse the formatter, we need to pull that part out of the format string. Also the percentage completion math seems to be wrong in the same code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9029) Backport HBASE-8706 Some improvement in snapshot to 0.94
[ https://issues.apache.org/jira/browse/HBASE-9029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725469#comment-13725469 ] Jerry He commented on HBASE-9029: - Hi, guys Should we put this in 0.94? Backport HBASE-8706 Some improvement in snapshot to 0.94 Key: HBASE-9029 URL: https://issues.apache.org/jira/browse/HBASE-9029 Project: HBase Issue Type: Improvement Components: snapshots Affects Versions: 0.94.9 Reporter: Jerry He Assignee: Jerry He Priority: Minor Fix For: 0.94.11 Attachments: HBase-9029-0.94.patch 'HBASE-8706 Some improvement in snapshot' has some good parameter tuning and improvement for snapshot handling, making snapshot more robust. It will nice to put it in 0.94. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8760) possible loss of data in snapshot taken after region split
[ https://issues.apache.org/jira/browse/HBASE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13727919#comment-13727919 ] Jerry He commented on HBASE-8760: - Hi, Matteo I agree your patch is probably the best we can do for now. We can probably do more in HBASE-7987 to have a new solution for this problem. For example, include the parent hfiles in the manifest file but add indicators/markers to tell they are meant to be parent hfiles. possible loss of data in snapshot taken after region split -- Key: HBASE-8760 URL: https://issues.apache.org/jira/browse/HBASE-8760 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.94.8, 0.95.1 Reporter: Jerry He Fix For: 0.98.0, 0.95.2, 0.94.12 Attachments: HBase-8760-0.94.8.patch, HBase-8760-0.94.8-v1.patch, HBASE-8760-thz-v0.patch Right after a region split but before the daughter regions are compacted, we have two daughter regions containing Reference files to the parent hfiles. If we take snapshot right at the moment, the snapshot will succeed, but it will only contain the daughter Reference files. Since there is no hold on the parent hfiles, they will be deleted by the HFile Cleaner after they are no longer needed by the daughter regions soon after. A minimum we need to do is the keep these parent hfiles from being deleted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8565) stop-hbase.sh clean up: backup master
[ https://issues.apache.org/jira/browse/HBASE-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-8565: Description: In stop-hbase.sh: {code} # TODO: store backup masters in ZooKeeper and have the primary send them a shutdown message # stop any backup masters $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} \ --hosts ${HBASE_BACKUP_MASTERS} stop master-backup {code} After HBASE-5213, stop-hbase.sh - hbase master stop will bring down the backup master too via the cluster status znode. We should not need the above code anymore. Another issue happens when the current master died and the backup master became the active master. {code} nohup nice -n ${HBASE_NICENESS:-0} $HBASE_HOME/bin/hbase \ --config ${HBASE_CONF_DIR} \ master stop $@ $logout 21 /dev/null waitForProcessEnd `cat $pid` 'stop-master-command' {code} We can still issue 'hbase-stop.sh' from the old master. stop-hbase.sh - hbase master stop - look for active master - request shutdown This process still works. But the waitForProcessEnd statement will not work since the local master pid is not relevant anymore. What is the best way in the this case? was: In stop-hbase.sh: {code} # TODO: store backup masters in ZooKeeper and have the primary send them a shutdown message # stop any backup masters $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} \ --hosts ${HBASE_BACKUP_MASTERS} stop master-backup {code} After HBASE-5213, stop-hbase.sh - hbase master stop will bring up the backup master too via the cluster status znode. We should not need the above code anymore. Another issue happens when the current master died and the backup master became the active master. {code} nohup nice -n ${HBASE_NICENESS:-0} $HBASE_HOME/bin/hbase \ --config ${HBASE_CONF_DIR} \ master stop $@ $logout 21 /dev/null waitForProcessEnd `cat $pid` 'stop-master-command' {code} We can still issue 'hbase-stop.sh' from the old master. stop-hbase.sh - hbase master stop - look for active master - request shutdown This process still works. But the waitForProcessEnd statement will not work since the local master pid is not relevant anymore. What is the best way in the this case? stop-hbase.sh clean up: backup master - Key: HBASE-8565 URL: https://issues.apache.org/jira/browse/HBASE-8565 Project: HBase Issue Type: Bug Components: master, scripts Affects Versions: 0.94.7, 0.95.0 Reporter: Jerry He Priority: Minor Fix For: 0.98.0, 0.95.2, 0.94.12 In stop-hbase.sh: {code} # TODO: store backup masters in ZooKeeper and have the primary send them a shutdown message # stop any backup masters $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} \ --hosts ${HBASE_BACKUP_MASTERS} stop master-backup {code} After HBASE-5213, stop-hbase.sh - hbase master stop will bring down the backup master too via the cluster status znode. We should not need the above code anymore. Another issue happens when the current master died and the backup master became the active master. {code} nohup nice -n ${HBASE_NICENESS:-0} $HBASE_HOME/bin/hbase \ --config ${HBASE_CONF_DIR} \ master stop $@ $logout 21 /dev/null waitForProcessEnd `cat $pid` 'stop-master-command' {code} We can still issue 'hbase-stop.sh' from the old master. stop-hbase.sh - hbase master stop - look for active master - request shutdown This process still works. But the waitForProcessEnd statement will not work since the local master pid is not relevant anymore. What is the best way in the this case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8565) stop-hbase.sh clean up: backup master
[ https://issues.apache.org/jira/browse/HBASE-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13727970#comment-13727970 ] Jerry He commented on HBASE-8565: - Correct a typo in the description: Old: After HBASE-5213, stop-hbase.sh - hbase master stop will bring up the backup master too via the cluster status znode. New: After HBASE-5213, stop-hbase.sh - hbase master stop will bring down the backup master too via the cluster status znode. stop-hbase.sh clean up: backup master - Key: HBASE-8565 URL: https://issues.apache.org/jira/browse/HBASE-8565 Project: HBase Issue Type: Bug Components: master, scripts Affects Versions: 0.94.7, 0.95.0 Reporter: Jerry He Priority: Minor Fix For: 0.98.0, 0.95.2, 0.94.12 In stop-hbase.sh: {code} # TODO: store backup masters in ZooKeeper and have the primary send them a shutdown message # stop any backup masters $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} \ --hosts ${HBASE_BACKUP_MASTERS} stop master-backup {code} After HBASE-5213, stop-hbase.sh - hbase master stop will bring down the backup master too via the cluster status znode. We should not need the above code anymore. Another issue happens when the current master died and the backup master became the active master. {code} nohup nice -n ${HBASE_NICENESS:-0} $HBASE_HOME/bin/hbase \ --config ${HBASE_CONF_DIR} \ master stop $@ $logout 21 /dev/null waitForProcessEnd `cat $pid` 'stop-master-command' {code} We can still issue 'hbase-stop.sh' from the old master. stop-hbase.sh - hbase master stop - look for active master - request shutdown This process still works. But the waitForProcessEnd statement will not work since the local master pid is not relevant anymore. What is the best way in the this case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8565) stop-hbase.sh clean up: backup master
[ https://issues.apache.org/jira/browse/HBASE-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-8565: Status: Patch Available (was: Open) stop-hbase.sh clean up: backup master - Key: HBASE-8565 URL: https://issues.apache.org/jira/browse/HBASE-8565 Project: HBase Issue Type: Bug Components: master, scripts Affects Versions: 0.95.0, 0.94.7 Reporter: Jerry He Assignee: Jerry He Priority: Minor Fix For: 0.98.0, 0.95.2, 0.94.12 Attachments: HBASE-8565-v1-0.94.patch, HBASE-8565-v1-trunk.patch In stop-hbase.sh: {code} # TODO: store backup masters in ZooKeeper and have the primary send them a shutdown message # stop any backup masters $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} \ --hosts ${HBASE_BACKUP_MASTERS} stop master-backup {code} After HBASE-5213, stop-hbase.sh - hbase master stop will bring down the backup master too via the cluster status znode. We should not need the above code anymore. Another issue happens when the current master died and the backup master became the active master. {code} nohup nice -n ${HBASE_NICENESS:-0} $HBASE_HOME/bin/hbase \ --config ${HBASE_CONF_DIR} \ master stop $@ $logout 21 /dev/null waitForProcessEnd `cat $pid` 'stop-master-command' {code} We can still issue 'hbase-stop.sh' from the old master. stop-hbase.sh - hbase master stop - look for active master - request shutdown This process still works. But the waitForProcessEnd statement will not work since the local master pid is not relevant anymore. What is the best way in the this case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-8565) stop-hbase.sh clean up: backup master
[ https://issues.apache.org/jira/browse/HBASE-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He reassigned HBASE-8565: --- Assignee: Jerry He stop-hbase.sh clean up: backup master - Key: HBASE-8565 URL: https://issues.apache.org/jira/browse/HBASE-8565 Project: HBase Issue Type: Bug Components: master, scripts Affects Versions: 0.94.7, 0.95.0 Reporter: Jerry He Assignee: Jerry He Priority: Minor Fix For: 0.98.0, 0.95.2, 0.94.12 Attachments: HBASE-8565-v1-0.94.patch, HBASE-8565-v1-trunk.patch In stop-hbase.sh: {code} # TODO: store backup masters in ZooKeeper and have the primary send them a shutdown message # stop any backup masters $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} \ --hosts ${HBASE_BACKUP_MASTERS} stop master-backup {code} After HBASE-5213, stop-hbase.sh - hbase master stop will bring down the backup master too via the cluster status znode. We should not need the above code anymore. Another issue happens when the current master died and the backup master became the active master. {code} nohup nice -n ${HBASE_NICENESS:-0} $HBASE_HOME/bin/hbase \ --config ${HBASE_CONF_DIR} \ master stop $@ $logout 21 /dev/null waitForProcessEnd `cat $pid` 'stop-master-command' {code} We can still issue 'hbase-stop.sh' from the old master. stop-hbase.sh - hbase master stop - look for active master - request shutdown This process still works. But the waitForProcessEnd statement will not work since the local master pid is not relevant anymore. What is the best way in the this case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8565) stop-hbase.sh clean up: backup master
[ https://issues.apache.org/jira/browse/HBASE-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13728365#comment-13728365 ] Jerry He commented on HBASE-8565: - Attached an initial patch. stop-hbase.sh clean up: backup master - Key: HBASE-8565 URL: https://issues.apache.org/jira/browse/HBASE-8565 Project: HBase Issue Type: Bug Components: master, scripts Affects Versions: 0.94.7, 0.95.0 Reporter: Jerry He Assignee: Jerry He Priority: Minor Fix For: 0.98.0, 0.95.2, 0.94.12 Attachments: HBASE-8565-v1-0.94.patch, HBASE-8565-v1-trunk.patch In stop-hbase.sh: {code} # TODO: store backup masters in ZooKeeper and have the primary send them a shutdown message # stop any backup masters $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} \ --hosts ${HBASE_BACKUP_MASTERS} stop master-backup {code} After HBASE-5213, stop-hbase.sh - hbase master stop will bring down the backup master too via the cluster status znode. We should not need the above code anymore. Another issue happens when the current master died and the backup master became the active master. {code} nohup nice -n ${HBASE_NICENESS:-0} $HBASE_HOME/bin/hbase \ --config ${HBASE_CONF_DIR} \ master stop $@ $logout 21 /dev/null waitForProcessEnd `cat $pid` 'stop-master-command' {code} We can still issue 'hbase-stop.sh' from the old master. stop-hbase.sh - hbase master stop - look for active master - request shutdown This process still works. But the waitForProcessEnd statement will not work since the local master pid is not relevant anymore. What is the best way in the this case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8565) stop-hbase.sh clean up: backup master
[ https://issues.apache.org/jira/browse/HBASE-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-8565: Attachment: HBASE-8565-v1-0.94.patch HBASE-8565-v1-trunk.patch stop-hbase.sh clean up: backup master - Key: HBASE-8565 URL: https://issues.apache.org/jira/browse/HBASE-8565 Project: HBase Issue Type: Bug Components: master, scripts Affects Versions: 0.94.7, 0.95.0 Reporter: Jerry He Assignee: Jerry He Priority: Minor Fix For: 0.98.0, 0.95.2, 0.94.12 Attachments: HBASE-8565-v1-0.94.patch, HBASE-8565-v1-trunk.patch In stop-hbase.sh: {code} # TODO: store backup masters in ZooKeeper and have the primary send them a shutdown message # stop any backup masters $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} \ --hosts ${HBASE_BACKUP_MASTERS} stop master-backup {code} After HBASE-5213, stop-hbase.sh - hbase master stop will bring down the backup master too via the cluster status znode. We should not need the above code anymore. Another issue happens when the current master died and the backup master became the active master. {code} nohup nice -n ${HBASE_NICENESS:-0} $HBASE_HOME/bin/hbase \ --config ${HBASE_CONF_DIR} \ master stop $@ $logout 21 /dev/null waitForProcessEnd `cat $pid` 'stop-master-command' {code} We can still issue 'hbase-stop.sh' from the old master. stop-hbase.sh - hbase master stop - look for active master - request shutdown This process still works. But the waitForProcessEnd statement will not work since the local master pid is not relevant anymore. What is the best way in the this case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8760) possible loss of data in snapshot taken after region split
[ https://issues.apache.org/jira/browse/HBASE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13728951#comment-13728951 ] Jerry He commented on HBASE-8760: - This is a meaningful change! Parent region and daughter regions are all included in the snapshot. After restore/clone, all will be included and brought online? Code comments: {code} snapshtoDisabledRegion(snapshotDir, regionInfo); {code} Typo in the method name. {code} public void verifyRegions(Path snapshotDir) throws IOException {code} == private void verifyRegions(final Path snapshotDir) throws IOException {code} private void verifyRegion(final FileSystem fs, final Path snapshotDir, final HRegionInfo region) throws IOException { // make sure we have region in the snapshot {code} That comment line is not indeeded anymore. possible loss of data in snapshot taken after region split -- Key: HBASE-8760 URL: https://issues.apache.org/jira/browse/HBASE-8760 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.94.8, 0.95.1 Reporter: Jerry He Fix For: 0.98.0, 0.95.2, 0.94.12 Attachments: HBase-8760-0.94.8.patch, HBase-8760-0.94.8-v1.patch, HBASE-8760-thz-v0.patch, HBASE-8760-thz-v1.patch Right after a region split but before the daughter regions are compacted, we have two daughter regions containing Reference files to the parent hfiles. If we take snapshot right at the moment, the snapshot will succeed, but it will only contain the daughter Reference files. Since there is no hold on the parent hfiles, they will be deleted by the HFile Cleaner after they are no longer needed by the daughter regions soon after. A minimum we need to do is the keep these parent hfiles from being deleted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8760) possible loss of data in snapshot taken after region split
[ https://issues.apache.org/jira/browse/HBASE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13728970#comment-13728970 ] Jerry He commented on HBASE-8760: - And the population of the .META is based on the .regioninfo files which were carried over from the original table? That makes sense. Thanks. possible loss of data in snapshot taken after region split -- Key: HBASE-8760 URL: https://issues.apache.org/jira/browse/HBASE-8760 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.94.8, 0.95.1 Reporter: Jerry He Fix For: 0.98.0, 0.95.2, 0.94.12 Attachments: HBase-8760-0.94.8.patch, HBase-8760-0.94.8-v1.patch, HBASE-8760-thz-v0.patch, HBASE-8760-thz-v1.patch Right after a region split but before the daughter regions are compacted, we have two daughter regions containing Reference files to the parent hfiles. If we take snapshot right at the moment, the snapshot will succeed, but it will only contain the daughter Reference files. Since there is no hold on the parent hfiles, they will be deleted by the HFile Cleaner after they are no longer needed by the daughter regions soon after. A minimum we need to do is the keep these parent hfiles from being deleted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8760) possible loss of data in snapshot taken after region split
[ https://issues.apache.org/jira/browse/HBASE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730002#comment-13730002 ] Jerry He commented on HBASE-8760: - I had seen the problem on a 0.94 live cluster. It happened when there was heavy write load on the cluster while the snapshot was taken. Later to re-create the problem, I had to suspend the compaction thread manually so that right after region split the new regions would not be compacted right away. I have not got a chance to do testing on this patch yet. possible loss of data in snapshot taken after region split -- Key: HBASE-8760 URL: https://issues.apache.org/jira/browse/HBASE-8760 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.94.8, 0.95.1 Reporter: Jerry He Fix For: 0.98.0, 0.95.2, 0.94.12 Attachments: HBase-8760-0.94.8.patch, HBase-8760-0.94.8-v1.patch, HBASE-8760-thz-v0.patch, HBASE-8760-thz-v1.patch, HBASE-8760-thz-v2.patch Right after a region split but before the daughter regions are compacted, we have two daughter regions containing Reference files to the parent hfiles. If we take snapshot right at the moment, the snapshot will succeed, but it will only contain the daughter Reference files. Since there is no hold on the parent hfiles, they will be deleted by the HFile Cleaner after they are no longer needed by the daughter regions soon after. A minimum we need to do is the keep these parent hfiles from being deleted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8760) possible loss of data in snapshot taken after region split
[ https://issues.apache.org/jira/browse/HBASE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736454#comment-13736454 ] Jerry He commented on HBASE-8760: - Hi, Matteo I've just tested the v4 patch again 0.94 and 0.95.2. These are the basic steps: 1. Change the code to disable compaction (similar to what you mentioned). 2. start hbase 3. Create and populate a TestTable with 'hbase org.apache.hadoop.hbase.PerformanceEvaluation randomWrite 5' 4. split TestTable 5. snapshot 'TestTable', 'my_snapshot1' (This snapshot includes parent and daughter references.) 6. stop hbase 7. Change the code to enable normal compaction. 8. start hbase 9. Wait for normal compactions (and/or additional splits) to go thru their courses, and hfile cleaners to go thru their courses as well. 10. clone_snapshot 'my_snapshot1', 'TestTable_clone' 11. count the rows of TestTable_clone to verify the number is the same as TestTable. 12. Verify there are no exceptions in region server logs like 'can not open link' or 'can not open file'. 13. snapshot 'TestTable_clone', 'my_snapshot2' 14. clone 'my_snapshot2', 'TestTable_clone_clone' possible loss of data in snapshot taken after region split -- Key: HBASE-8760 URL: https://issues.apache.org/jira/browse/HBASE-8760 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.94.8, 0.95.1 Reporter: Jerry He Fix For: 0.98.0, 0.95.2, 0.94.12 Attachments: HBase-8760-0.94.8.patch, HBase-8760-0.94.8-v1.patch, HBASE-8760-0.94-v4.patch, HBASE-8760-thz-v0.patch, HBASE-8760-thz-v1.patch, HBASE-8760-thz-v2.patch, HBASE-8760-thz-v3.patch, HBASE-8760-v4.patch Right after a region split but before the daughter regions are compacted, we have two daughter regions containing Reference files to the parent hfiles. If we take snapshot right at the moment, the snapshot will succeed, but it will only contain the daughter Reference files. Since there is no hold on the parent hfiles, they will be deleted by the HFile Cleaner after they are no longer needed by the daughter regions soon after. A minimum we need to do is the keep these parent hfiles from being deleted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8760) possible loss of data in snapshot taken after region split
[ https://issues.apache.org/jira/browse/HBASE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736461#comment-13736461 ] Jerry He commented on HBASE-8760: - The patch is working well up to step 12. I've not been able to re-create the problem. But I have seen problems and exceptions in both 0.94 and 0.95.2 during step 13 and 14 for a second level snapshot and clone. For example, in 0.94: {code} hbase(main):005:0 snapshot 'TestTable_clone', 'my_snapshot2' ERROR: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { ss=my_snapshot2 table=TestTable_clone type=FLUSH } had an error. my_snapshot2 not found in proclist [] at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:359) at org.apache.hadoop.hbase.master.HMaster.isSnapshotDone(HMaster.java:2185) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:611) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426) Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via Failed taking snapshot { ss=my_snapshot2 table=TestTable_clone type=FLUSH } due to exception:Missing parent hfile for: TestTable=83935cdbb327ac84f45a7248f4d58173-048d68de11a042e9aba294ab336ddbf3.630c188f55575e0cce497ba342b562bb:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Missing parent hfile for: TestTable=83935cdbb327ac84f45a7248f4d58173-048d68de11a042e9aba294ab336ddbf3.630c188f55575e0cce497ba342b562bb at org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:85) at org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:282) at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:349) ... 7 more Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Missing parent hfile for: TestTable=83935cdbb327ac84f45a7248f4d58173-048d68de11a042e9aba294ab336ddbf3.630c188f55575e0cce497ba342b562bb at org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifyStoreFile(MasterSnapshotVerifier.java:223) at org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.access$000(MasterSnapshotVerifier.java:85) at org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier$1.storeFile(MasterSnapshotVerifier.java:209) at org.apache.hadoop.hbase.util.FSVisitor.visitRegionStoreFiles(FSVisitor.java:115) {code} From the logs, in this failure, 630c188f55575e0cce497ba342b562bb is a region in TestTable_clone that went thru its own split. It was gone (not even in .archive) after its split. But somehow there are remaining links/references to it in TestTable_clone. TestTable_clone have 3m plus rows. It could go thru compactions and splits on its own. That seems to have confuses snapshot operations. If you need to relevant master/region server logs, I can send to you or attach them here. possible loss of data in snapshot taken after region split -- Key: HBASE-8760 URL: https://issues.apache.org/jira/browse/HBASE-8760 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.94.8, 0.95.1 Reporter: Jerry He Fix For: 0.98.0, 0.95.2, 0.94.12 Attachments: HBase-8760-0.94.8.patch, HBase-8760-0.94.8-v1.patch, HBASE-8760-0.94-v4.patch, HBASE-8760-thz-v0.patch, HBASE-8760-thz-v1.patch, HBASE-8760-thz-v2.patch, HBASE-8760-thz-v3.patch, HBASE-8760-v4.patch Right after a region split but before the daughter regions are compacted, we have two daughter regions containing Reference files to the parent hfiles. If we take snapshot right at the moment, the snapshot will succeed, but it will only contain the daughter Reference files. Since there is no hold on the parent hfiles, they will be deleted by the HFile Cleaner after they are no longer needed by the daughter regions soon after. A minimum we need to do is the keep these parent hfiles from being deleted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8760) possible loss of data in snapshot taken after region split
[ https://issues.apache.org/jira/browse/HBASE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736468#comment-13736468 ] Jerry He commented on HBASE-8760: - Some exceptions were seen against 0.95.2 during step 14. It is different from 0.94. But it could just be due to random timing. Step 13 was ok. Step 14 was ok as successful as well, but there were errors in the logs: {code} 2013-08-10 23:00:16,463 ERROR [RS_OPEN_REGION-hdtest009:60021-1] handler.OpenRegionHandler: Failed open of region=TestTable_clone_clone,,1376197826879.c3ea5fba0fe4a49a9e93102d133b99fd., starting to roll back the global memstore size. ... Caused by: java.io.IOException: java.io.FileNotFoundException: Unable to open link: org.apache.hadoop.hbase.io.HFileLink locations=[hdfs://hdtest009.svl.ibm.com:9000/hbase95/.data/default/TestTable_clone/9d76f97c231b0ffa4f9ecbe73bfc2acd/info/9af07c31650045d28aa13d8b37251690, hdfs://hdtest009.svl.ibm.com:9000/hbase95/.tmp/.data/default/TestTable_clone/9d76f97c231b0ffa4f9ecbe73bfc2acd/info/9af07c31650045d28aa13d8b37251690, hdfs://hdtest009.svl.ibm.com:9000/hbase95/.archive/.data/default/TestTable_clone/9d76f97c231b0ffa4f9ecbe73bfc2acd/info/9af07c31650045d28aa13d8b37251690] at org.apache.hadoop.hbase.regionserver.HStore.loadStoreFiles(HStore.java:448) at org.apache.hadoop.hbase.regionserver.HStore.init(HStore.java:241) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:3122) {code} Based on the logs, the failed region was a parent region. The daughters regions were ok. Therefore the end row count was good. Again if you need relevant logs, I can send to you or attach here. possible loss of data in snapshot taken after region split -- Key: HBASE-8760 URL: https://issues.apache.org/jira/browse/HBASE-8760 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.94.8, 0.95.1 Reporter: Jerry He Fix For: 0.98.0, 0.95.2, 0.94.12 Attachments: HBase-8760-0.94.8.patch, HBase-8760-0.94.8-v1.patch, HBASE-8760-0.94-v4.patch, HBASE-8760-thz-v0.patch, HBASE-8760-thz-v1.patch, HBASE-8760-thz-v2.patch, HBASE-8760-thz-v3.patch, HBASE-8760-v4.patch Right after a region split but before the daughter regions are compacted, we have two daughter regions containing Reference files to the parent hfiles. If we take snapshot right at the moment, the snapshot will succeed, but it will only contain the daughter Reference files. Since there is no hold on the parent hfiles, they will be deleted by the HFile Cleaner after they are no longer needed by the daughter regions soon after. A minimum we need to do is the keep these parent hfiles from being deleted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8760) possible loss of data in snapshot taken after region split
[ https://issues.apache.org/jira/browse/HBASE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-8760: Attachment: v4-patch-testing-0.95.2.zip v4-patch-testing-0.94.zip possible loss of data in snapshot taken after region split -- Key: HBASE-8760 URL: https://issues.apache.org/jira/browse/HBASE-8760 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.94.8, 0.95.1 Reporter: Jerry He Fix For: 0.98.0, 0.95.2, 0.94.12 Attachments: HBase-8760-0.94.8.patch, HBase-8760-0.94.8-v1.patch, HBASE-8760-0.94-v4.patch, HBASE-8760-thz-v0.patch, HBASE-8760-thz-v1.patch, HBASE-8760-thz-v2.patch, HBASE-8760-thz-v3.patch, HBASE-8760-v4.patch, v4-patch-testing-0.94.zip, v4-patch-testing-0.95.2.zip Right after a region split but before the daughter regions are compacted, we have two daughter regions containing Reference files to the parent hfiles. If we take snapshot right at the moment, the snapshot will succeed, but it will only contain the daughter Reference files. Since there is no hold on the parent hfiles, they will be deleted by the HFile Cleaner after they are no longer needed by the daughter regions soon after. A minimum we need to do is the keep these parent hfiles from being deleted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8760) possible loss of data in snapshot taken after region split
[ https://issues.apache.org/jira/browse/HBASE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736557#comment-13736557 ] Jerry He commented on HBASE-8760: - Attached zip files from the testing. Each contains: 1. master/region sever logs 2. file listing for the snapshots, tables during the testing. possible loss of data in snapshot taken after region split -- Key: HBASE-8760 URL: https://issues.apache.org/jira/browse/HBASE-8760 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.94.8, 0.95.1 Reporter: Jerry He Fix For: 0.98.0, 0.95.2, 0.94.12 Attachments: HBase-8760-0.94.8.patch, HBase-8760-0.94.8-v1.patch, HBASE-8760-0.94-v4.patch, HBASE-8760-thz-v0.patch, HBASE-8760-thz-v1.patch, HBASE-8760-thz-v2.patch, HBASE-8760-thz-v3.patch, HBASE-8760-v4.patch, v4-patch-testing-0.94.zip, v4-patch-testing-0.95.2.zip Right after a region split but before the daughter regions are compacted, we have two daughter regions containing Reference files to the parent hfiles. If we take snapshot right at the moment, the snapshot will succeed, but it will only contain the daughter Reference files. Since there is no hold on the parent hfiles, they will be deleted by the HFile Cleaner after they are no longer needed by the daughter regions soon after. A minimum we need to do is the keep these parent hfiles from being deleted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8760) possible loss of data in snapshot taken after region split
[ https://issues.apache.org/jira/browse/HBASE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739972#comment-13739972 ] Jerry He commented on HBASE-8760: - Hi, Matteo From the master and region server logs, the offline regions from snapshot were clearly tried being brought online. I didn't make the connection between the failures/exceptions and that. I am very hopeful this latest patch will solve it. I will do some quick testing. Thanks. possible loss of data in snapshot taken after region split -- Key: HBASE-8760 URL: https://issues.apache.org/jira/browse/HBASE-8760 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.94.8, 0.95.1 Reporter: Jerry He Fix For: 0.98.0, 0.95.2, 0.94.12 Attachments: HBase-8760-0.94.8.patch, HBase-8760-0.94.8-v1.patch, HBASE-8760-0.94-v4.patch, HBASE-8760-0.94-v5.patch, HBASE-8760-0.94-v6.patch, HBASE-8760-thz-v0.patch, HBASE-8760-thz-v1.patch, HBASE-8760-thz-v2.patch, HBASE-8760-thz-v3.patch, HBASE-8760-v4.patch, v4-patch-testing-0.94.zip, v4-patch-testing-0.95.2.zip Right after a region split but before the daughter regions are compacted, we have two daughter regions containing Reference files to the parent hfiles. If we take snapshot right at the moment, the snapshot will succeed, but it will only contain the daughter Reference files. Since there is no hold on the parent hfiles, they will be deleted by the HFile Cleaner after they are no longer needed by the daughter regions soon after. A minimum we need to do is the keep these parent hfiles from being deleted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8760) possible loss of data in snapshot taken after region split
[ https://issues.apache.org/jira/browse/HBASE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-8760: Attachment: HBASE-8760-0.94-v8-addendum.patch possible loss of data in snapshot taken after region split -- Key: HBASE-8760 URL: https://issues.apache.org/jira/browse/HBASE-8760 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.94.8, 0.95.1 Reporter: Jerry He Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: HBase-8760-0.94.8.patch, HBase-8760-0.94.8-v1.patch, HBASE-8760-0.94-v4.patch, HBASE-8760-0.94-v5.patch, HBASE-8760-0.94-v6.patch, HBASE-8760-0.94-v7.patch, HBASE-8760-0.94-v8-addendum.patch, HBASE-8760-0.94-v8.patch, HBASE-8760-thz-v0.patch, HBASE-8760-trunk-v8.patch, HBASE-8760-v4.patch, v4-patch-testing-0.94.zip, v4-patch-testing-0.95.2.zip Right after a region split but before the daughter regions are compacted, we have two daughter regions containing Reference files to the parent hfiles. If we take snapshot right at the moment, the snapshot will succeed, but it will only contain the daughter Reference files. Since there is no hold on the parent hfiles, they will be deleted by the HFile Cleaner after they are no longer needed by the daughter regions soon after. A minimum we need to do is the keep these parent hfiles from being deleted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8760) possible loss of data in snapshot taken after region split
[ https://issues.apache.org/jira/browse/HBASE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13743522#comment-13743522 ] Jerry He commented on HBASE-8760: - Hi, Matteo Thank you for the time and effort you spent on this JIRA! There had been more complexity and problems than anticipated. I applied HBASE-9207, HBASE-9233, and then the HBASE-8760-0.94-v8.patch on my 0.94 cluster. I went through a few times the test steps outlined in my previous comment. Sometimes with minor changes in the steps. There is one more issue. (Hopefully this is the last one!) We should not include the offline regions' ServerName in the online snapshot procedure. Otherwise the snapshot procedure will timeout while waiting for the obsolete ServerName if the ServerName has been changed, e.g. a re-start. Attached a 0.94-v8-addendum. It is on top of HBASE-8760-0.94-v8.patch. After this, I have not seen any failure or exceptions during the testing. The row counts always match. The logs are clean without errors or exceptions too. possible loss of data in snapshot taken after region split -- Key: HBASE-8760 URL: https://issues.apache.org/jira/browse/HBASE-8760 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.94.8, 0.95.1 Reporter: Jerry He Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: HBase-8760-0.94.8.patch, HBase-8760-0.94.8-v1.patch, HBASE-8760-0.94-v4.patch, HBASE-8760-0.94-v5.patch, HBASE-8760-0.94-v6.patch, HBASE-8760-0.94-v7.patch, HBASE-8760-0.94-v8-addendum.patch, HBASE-8760-0.94-v8.patch, HBASE-8760-thz-v0.patch, HBASE-8760-trunk-v8.patch, HBASE-8760-v4.patch, v4-patch-testing-0.94.zip, v4-patch-testing-0.95.2.zip Right after a region split but before the daughter regions are compacted, we have two daughter regions containing Reference files to the parent hfiles. If we take snapshot right at the moment, the snapshot will succeed, but it will only contain the daughter Reference files. Since there is no hold on the parent hfiles, they will be deleted by the HFile Cleaner after they are no longer needed by the daughter regions soon after. A minimum we need to do is the keep these parent hfiles from being deleted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8760) possible loss of data in snapshot taken after region split
[ https://issues.apache.org/jira/browse/HBASE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13745565#comment-13745565 ] Jerry He commented on HBASE-8760: - Thanks Matteo and everyone! possible loss of data in snapshot taken after region split -- Key: HBASE-8760 URL: https://issues.apache.org/jira/browse/HBASE-8760 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.94.8, 0.95.1 Reporter: Jerry He Assignee: Matteo Bertozzi Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: HBase-8760-0.94.8.patch, HBase-8760-0.94.8-v1.patch, HBASE-8760-0.94-v10.patch, HBASE-8760-0.94-v4.patch, HBASE-8760-0.94-v5.patch, HBASE-8760-0.94-v6.patch, HBASE-8760-0.94-v7.patch, HBASE-8760-0.94-v8-addendum.patch, HBASE-8760-0.94-v8.patch, HBASE-8760-0.94-v9.patch, HBASE-8760-thz-v0.patch, HBASE-8760-trunk-v10.patch, HBASE-8760-trunk-v8.patch, HBASE-8760-trunk-v9.patch, HBASE-8760-v4.patch, v4-patch-testing-0.94.zip, v4-patch-testing-0.95.2.zip Right after a region split but before the daughter regions are compacted, we have two daughter regions containing Reference files to the parent hfiles. If we take snapshot right at the moment, the snapshot will succeed, but it will only contain the daughter Reference files. Since there is no hold on the parent hfiles, they will be deleted by the HFile Cleaner after they are no longer needed by the daughter regions soon after. A minimum we need to do is the keep these parent hfiles from being deleted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8565) stop-hbase.sh clean up: backup master
[ https://issues.apache.org/jira/browse/HBASE-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747703#comment-13747703 ] Jerry He commented on HBASE-8565: - Hi, Stack Thank you for the comment. bq. On the second issue, test for presence of the process before waiting on it? We could do a check on the local master pid. But to make it work even if the master is not local anymore, instead of waiting for local master pid, can we borrow the idea of using master node on ZK to wait for: {code} zmaster=`$bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool zookeeper.znode.master` if [ $zmaster == null ]; then zmaster=master; fi zmaster=$zparent/$zmaster echo -n Waiting for Master ZNode ${zmaster} to expire while ! $bin/hbase zkcli stat $zmaster 21 | grep Node does not exist; do echo -n . sleep 1 done {code} stop-hbase.sh clean up: backup master - Key: HBASE-8565 URL: https://issues.apache.org/jira/browse/HBASE-8565 Project: HBase Issue Type: Bug Components: master, scripts Affects Versions: 0.94.7, 0.95.0 Reporter: Jerry He Assignee: Jerry He Priority: Minor Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: HBASE-8565-v1-0.94.patch, HBASE-8565-v1-trunk.patch In stop-hbase.sh: {code} # TODO: store backup masters in ZooKeeper and have the primary send them a shutdown message # stop any backup masters $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} \ --hosts ${HBASE_BACKUP_MASTERS} stop master-backup {code} After HBASE-5213, stop-hbase.sh - hbase master stop will bring down the backup master too via the cluster status znode. We should not need the above code anymore. Another issue happens when the current master died and the backup master became the active master. {code} nohup nice -n ${HBASE_NICENESS:-0} $HBASE_HOME/bin/hbase \ --config ${HBASE_CONF_DIR} \ master stop $@ $logout 21 /dev/null waitForProcessEnd `cat $pid` 'stop-master-command' {code} We can still issue 'hbase-stop.sh' from the old master. stop-hbase.sh - hbase master stop - look for active master - request shutdown This process still works. But the waitForProcessEnd statement will not work since the local master pid is not relevant anymore. What is the best way in the this case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8565) stop-hbase.sh clean up: backup master
[ https://issues.apache.org/jira/browse/HBASE-8565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13748317#comment-13748317 ] Jerry He commented on HBASE-8565: - Hi, Lars, Stack The extra code that was removed doesn't break anything. It is just maybe redundant. But keep the way it it currently for 0.94 to reduce the risk is prudent. Feel free to close this one. As you suggested, any additional polishment of stop-hbase.sh will be addressed in another jira. Thanks! stop-hbase.sh clean up: backup master - Key: HBASE-8565 URL: https://issues.apache.org/jira/browse/HBASE-8565 Project: HBase Issue Type: Bug Components: master, scripts Affects Versions: 0.94.7, 0.95.0 Reporter: Jerry He Assignee: Jerry He Priority: Minor Fix For: 0.98.0, 0.96.0 Attachments: HBASE-8565-v1-0.94.patch, HBASE-8565-v1-trunk.patch In stop-hbase.sh: {code} # TODO: store backup masters in ZooKeeper and have the primary send them a shutdown message # stop any backup masters $bin/hbase-daemons.sh --config ${HBASE_CONF_DIR} \ --hosts ${HBASE_BACKUP_MASTERS} stop master-backup {code} After HBASE-5213, stop-hbase.sh - hbase master stop will bring down the backup master too via the cluster status znode. We should not need the above code anymore. Another issue happens when the current master died and the backup master became the active master. {code} nohup nice -n ${HBASE_NICENESS:-0} $HBASE_HOME/bin/hbase \ --config ${HBASE_CONF_DIR} \ master stop $@ $logout 21 /dev/null waitForProcessEnd `cat $pid` 'stop-master-command' {code} We can still issue 'hbase-stop.sh' from the old master. stop-hbase.sh - hbase master stop - look for active master - request shutdown This process still works. But the waitForProcessEnd statement will not work since the local master pid is not relevant anymore. What is the best way in the this case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-9397) Snapshots with the same name are allowed to proceed concurrently
Jerry He created HBASE-9397: --- Summary: Snapshots with the same name are allowed to proceed concurrently Key: HBASE-9397 URL: https://issues.apache.org/jira/browse/HBASE-9397 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.94.11, 0.95.2 Reporter: Jerry He Assignee: Jerry He Fix For: 0.94.12, 0.96.0 Snapshots with the same name (but on different tables) are allowed to proceed concurrently. This seems to be loop hole created by allowing multiple snapshots (on different tables) to run concurrently. There are two checks in SnapshotManager, but fail to catch this particular case. In isSnapshotCompleted(), we only check the completed snapshot directory. In isTakingSnapshot(), we only check for the same table name. The end result is the concurrently running snapshots with the same name are overlapping and messing up each other. For example, cleaning up the other's snapshot working directory in .hbase-snapshot/.tmp/snapshot-name. {code} 2013-08-29 18:25:13,443 ERROR org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler: Failed taking snapshot { ss=mysnapshot table=TestTable type=FLUSH } due to exception:Couldn't read snapshot info from:hdfs://hdtest009:9000/hbase/.hbase-snapshot/.tmp/mysnapshot/.snapshotinfo org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read snapshot info from:hdfs://hdtest009:9000/hbase/.hbase-snapshot/.tmp/mysnapshot/.snapshotinfo at org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils.readSnapshotInfo(SnapshotDescriptionUtils.java:321) at org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifySnapshotDescription(MasterSnapshotVerifier.java:123) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9397) Snapshots with the same name are allowed to proceed concurrently
[ https://issues.apache.org/jira/browse/HBASE-9397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-9397: Status: Patch Available (was: Open) Snapshots with the same name are allowed to proceed concurrently Key: HBASE-9397 URL: https://issues.apache.org/jira/browse/HBASE-9397 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.94.11, 0.95.2 Reporter: Jerry He Assignee: Jerry He Fix For: 0.94.12, 0.96.0 Attachments: HBASE-9397-0.94.patch, HBASE-9397-trunk.patch Snapshots with the same name (but on different tables) are allowed to proceed concurrently. This seems to be loop hole created by allowing multiple snapshots (on different tables) to run concurrently. There are two checks in SnapshotManager, but fail to catch this particular case. In isSnapshotCompleted(), we only check the completed snapshot directory. In isTakingSnapshot(), we only check for the same table name. The end result is the concurrently running snapshots with the same name are overlapping and messing up each other. For example, cleaning up the other's snapshot working directory in .hbase-snapshot/.tmp/snapshot-name. {code} 2013-08-29 18:25:13,443 ERROR org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler: Failed taking snapshot { ss=mysnapshot table=TestTable type=FLUSH } due to exception:Couldn't read snapshot info from:hdfs://hdtest009:9000/hbase/.hbase-snapshot/.tmp/mysnapshot/.snapshotinfo org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read snapshot info from:hdfs://hdtest009:9000/hbase/.hbase-snapshot/.tmp/mysnapshot/.snapshotinfo at org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils.readSnapshotInfo(SnapshotDescriptionUtils.java:321) at org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifySnapshotDescription(MasterSnapshotVerifier.java:123) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9397) Snapshots with the same name are allowed to proceed concurrently
[ https://issues.apache.org/jira/browse/HBASE-9397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-9397: Attachment: HBASE-9397-trunk.patch HBASE-9397-0.94.patch Snapshots with the same name are allowed to proceed concurrently Key: HBASE-9397 URL: https://issues.apache.org/jira/browse/HBASE-9397 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.95.2, 0.94.11 Reporter: Jerry He Assignee: Jerry He Fix For: 0.94.12, 0.96.0 Attachments: HBASE-9397-0.94.patch, HBASE-9397-trunk.patch Snapshots with the same name (but on different tables) are allowed to proceed concurrently. This seems to be loop hole created by allowing multiple snapshots (on different tables) to run concurrently. There are two checks in SnapshotManager, but fail to catch this particular case. In isSnapshotCompleted(), we only check the completed snapshot directory. In isTakingSnapshot(), we only check for the same table name. The end result is the concurrently running snapshots with the same name are overlapping and messing up each other. For example, cleaning up the other's snapshot working directory in .hbase-snapshot/.tmp/snapshot-name. {code} 2013-08-29 18:25:13,443 ERROR org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler: Failed taking snapshot { ss=mysnapshot table=TestTable type=FLUSH } due to exception:Couldn't read snapshot info from:hdfs://hdtest009:9000/hbase/.hbase-snapshot/.tmp/mysnapshot/.snapshotinfo org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read snapshot info from:hdfs://hdtest009:9000/hbase/.hbase-snapshot/.tmp/mysnapshot/.snapshotinfo at org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils.readSnapshotInfo(SnapshotDescriptionUtils.java:321) at org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifySnapshotDescription(MasterSnapshotVerifier.java:123) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9397) Snapshots with the same name are allowed to proceed concurrently
[ https://issues.apache.org/jira/browse/HBASE-9397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13755113#comment-13755113 ] Jerry He commented on HBASE-9397: - Attached a straight fix. Any comment and other suggestion are welcome. Snapshots with the same name are allowed to proceed concurrently Key: HBASE-9397 URL: https://issues.apache.org/jira/browse/HBASE-9397 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.95.2, 0.94.11 Reporter: Jerry He Assignee: Jerry He Fix For: 0.94.12, 0.96.0 Attachments: HBASE-9397-0.94.patch, HBASE-9397-trunk.patch Snapshots with the same name (but on different tables) are allowed to proceed concurrently. This seems to be loop hole created by allowing multiple snapshots (on different tables) to run concurrently. There are two checks in SnapshotManager, but fail to catch this particular case. In isSnapshotCompleted(), we only check the completed snapshot directory. In isTakingSnapshot(), we only check for the same table name. The end result is the concurrently running snapshots with the same name are overlapping and messing up each other. For example, cleaning up the other's snapshot working directory in .hbase-snapshot/.tmp/snapshot-name. {code} 2013-08-29 18:25:13,443 ERROR org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler: Failed taking snapshot { ss=mysnapshot table=TestTable type=FLUSH } due to exception:Couldn't read snapshot info from:hdfs://hdtest009:9000/hbase/.hbase-snapshot/.tmp/mysnapshot/.snapshotinfo org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read snapshot info from:hdfs://hdtest009:9000/hbase/.hbase-snapshot/.tmp/mysnapshot/.snapshotinfo at org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils.readSnapshotInfo(SnapshotDescriptionUtils.java:321) at org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifySnapshotDescription(MasterSnapshotVerifier.java:123) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9397) Snapshots with the same name are allowed to proceed concurrently
[ https://issues.apache.org/jira/browse/HBASE-9397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13755615#comment-13755615 ] Jerry He commented on HBASE-9397: - Matteo, Thanks for the comment. Yes, the 'restoreHandlers' part was a last min copy-paste error. Corrected it and followed your second suggestion too. Easy test with one relatively big table (100G). The snapshot will take a few seconds for another snapshot with the same name (or table) to sneak in. Snapshots with the same name are allowed to proceed concurrently Key: HBASE-9397 URL: https://issues.apache.org/jira/browse/HBASE-9397 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.95.2, 0.94.11 Reporter: Jerry He Assignee: Jerry He Fix For: 0.94.12, 0.96.0 Attachments: HBASE-9397-0.94.patch, HBASE-9397-0.94-v2.patch, HBASE-9397-trunk.patch, HBASE-9397-trunk-v2.patch Snapshots with the same name (but on different tables) are allowed to proceed concurrently. This seems to be loop hole created by allowing multiple snapshots (on different tables) to run concurrently. There are two checks in SnapshotManager, but fail to catch this particular case. In isSnapshotCompleted(), we only check the completed snapshot directory. In isTakingSnapshot(), we only check for the same table name. The end result is the concurrently running snapshots with the same name are overlapping and messing up each other. For example, cleaning up the other's snapshot working directory in .hbase-snapshot/.tmp/snapshot-name. {code} 2013-08-29 18:25:13,443 ERROR org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler: Failed taking snapshot { ss=mysnapshot table=TestTable type=FLUSH } due to exception:Couldn't read snapshot info from:hdfs://hdtest009:9000/hbase/.hbase-snapshot/.tmp/mysnapshot/.snapshotinfo org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read snapshot info from:hdfs://hdtest009:9000/hbase/.hbase-snapshot/.tmp/mysnapshot/.snapshotinfo at org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils.readSnapshotInfo(SnapshotDescriptionUtils.java:321) at org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifySnapshotDescription(MasterSnapshotVerifier.java:123) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9397) Snapshots with the same name are allowed to proceed concurrently
[ https://issues.apache.org/jira/browse/HBASE-9397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry He updated HBASE-9397: Attachment: HBASE-9397-trunk-v2.patch HBASE-9397-0.94-v2.patch Snapshots with the same name are allowed to proceed concurrently Key: HBASE-9397 URL: https://issues.apache.org/jira/browse/HBASE-9397 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 0.95.2, 0.94.11 Reporter: Jerry He Assignee: Jerry He Fix For: 0.94.12, 0.96.0 Attachments: HBASE-9397-0.94.patch, HBASE-9397-0.94-v2.patch, HBASE-9397-trunk.patch, HBASE-9397-trunk-v2.patch Snapshots with the same name (but on different tables) are allowed to proceed concurrently. This seems to be loop hole created by allowing multiple snapshots (on different tables) to run concurrently. There are two checks in SnapshotManager, but fail to catch this particular case. In isSnapshotCompleted(), we only check the completed snapshot directory. In isTakingSnapshot(), we only check for the same table name. The end result is the concurrently running snapshots with the same name are overlapping and messing up each other. For example, cleaning up the other's snapshot working directory in .hbase-snapshot/.tmp/snapshot-name. {code} 2013-08-29 18:25:13,443 ERROR org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler: Failed taking snapshot { ss=mysnapshot table=TestTable type=FLUSH } due to exception:Couldn't read snapshot info from:hdfs://hdtest009:9000/hbase/.hbase-snapshot/.tmp/mysnapshot/.snapshotinfo org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read snapshot info from:hdfs://hdtest009:9000/hbase/.hbase-snapshot/.tmp/mysnapshot/.snapshotinfo at org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils.readSnapshotInfo(SnapshotDescriptionUtils.java:321) at org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifySnapshotDescription(MasterSnapshotVerifier.java:123) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira