[jira] [Updated] (HBASE-4815) Disable online altering by default, create a config for it
[ https://issues.apache.org/jira/browse/HBASE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4815: -- Attachment: HBASE-4796.patch Disable online altering by default, create a config for it -- Key: HBASE-4815 URL: https://issues.apache.org/jira/browse/HBASE-4815 Project: HBase Issue Type: Task Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.92.0 Attachments: 4815-v2.txt, 4815.addendum, 4815.patch There's a whole class of bugs that we've been revealing from trying out online altering in conjunction with other operations like splitting. HBASE-4729, HBASE-4794, and HBASE-4814 are examples. It's not so much that the online altering code is buggy, but that it wasn't tested in an environment that permits splitting. I think we should mark online altering as experimental in 0.92 and add a config to enable it (so it would be disabled by default, requiring people to enable for altering table schema). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4815) Disable online altering by default, create a config for it
[ https://issues.apache.org/jira/browse/HBASE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4815: -- Attachment: (was: HBASE-4796.patch) Disable online altering by default, create a config for it -- Key: HBASE-4815 URL: https://issues.apache.org/jira/browse/HBASE-4815 Project: HBase Issue Type: Task Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.92.0 Attachments: 4815-v2.txt, 4815.addendum, 4815.patch There's a whole class of bugs that we've been revealing from trying out online altering in conjunction with other operations like splitting. HBASE-4729, HBASE-4794, and HBASE-4814 are examples. It's not so much that the online altering code is buggy, but that it wasn't tested in an environment that permits splitting. I think we should mark online altering as experimental in 0.92 and add a config to enable it (so it would be disabled by default, requiring people to enable for altering table schema). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication
[ https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154088#comment-13154088 ] Mikhail Bautin commented on HBASE-2418: --- I just saw this regionserver crash in my five-node, three-RS cluster test. Since this is a ZK-related patch that went in recently, I am attaching the stack trace here just in case. 2011-11-21 01:30:15,188 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server machine_name,60020,1321867814890: Initialization of RS failed. Hence aborting RS. java.util.ConcurrentModificationException at java.util.Hashtable$Enumerator.next(Hashtable.java:1031) at org.apache.hadoop.conf.Configuration.iterator(Configuration.java:1042) at org.apache.hadoop.hbase.zookeeper.ZKConfig.makeZKProps(ZKConfig.java:75) at org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:245) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:144) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:124) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1262) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:568) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:559) at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:183) at org.apache.hadoop.hbase.catalog.CatalogTracker.init(CatalogTracker.java:177) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:575) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:534) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:642) at java.lang.Thread.run(Thread.java:619) add support for ZooKeeper authentication Key: HBASE-2418 URL: https://issues.apache.org/jira/browse/HBASE-2418 Project: HBase Issue Type: Improvement Components: master, regionserver Reporter: Patrick Hunt Assignee: Eugene Koontz Priority: Critical Labels: security, zookeeper Fix For: 0.92.0, 0.94.0 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch Some users may run a ZooKeeper cluster in multi tenant mode meaning that more than one client service would like to share a single ZooKeeper service instance (cluster). In this case the client services typically want to protect their data (ZK znodes) from access by other services (tenants) on the cluster. Say you are running HBase and Solr and Neo4j, or multiple HBase instances, etc... having authentication/authorization on the znodes is important for both security and helping to ensure that services don't interact negatively (touch each other's data). Today HBase does not have support for authentication or authorization. This should be added to the HBase clients that are accessing the ZK cluster. In general it means calling addAuthInfo once after a session is established: http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String, byte[]) with a user specific credential, often times this is a shared secret or certificate. You may be able to statically configure this in some cases (config string or file to read from), however in my case in particular you may need to access it programmatically, which adds complexity as the end user may need to load code into HBase for accessing the credential. Secondly you need to specify a non world ACL when interacting with znodes (create primarily): http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html Feel free to ping the ZooKeeper team if you have questions. It might also be good to discuss with some potential end users - in particular regarding how the end user can specify the credential. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication
[ https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154100#comment-13154100 ] ramkrishna.s.vasudevan commented on HBASE-2418: --- Not able to build from the maven repository for zookeeper 3.4.0 SNAPSHOT. Correct me if am wrong. add support for ZooKeeper authentication Key: HBASE-2418 URL: https://issues.apache.org/jira/browse/HBASE-2418 Project: HBase Issue Type: Improvement Components: master, regionserver Reporter: Patrick Hunt Assignee: Eugene Koontz Priority: Critical Labels: security, zookeeper Fix For: 0.92.0, 0.94.0 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch Some users may run a ZooKeeper cluster in multi tenant mode meaning that more than one client service would like to share a single ZooKeeper service instance (cluster). In this case the client services typically want to protect their data (ZK znodes) from access by other services (tenants) on the cluster. Say you are running HBase and Solr and Neo4j, or multiple HBase instances, etc... having authentication/authorization on the znodes is important for both security and helping to ensure that services don't interact negatively (touch each other's data). Today HBase does not have support for authentication or authorization. This should be added to the HBase clients that are accessing the ZK cluster. In general it means calling addAuthInfo once after a session is established: http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String, byte[]) with a user specific credential, often times this is a shared secret or certificate. You may be able to statically configure this in some cases (config string or file to read from), however in my case in particular you may need to access it programmatically, which adds complexity as the end user may need to load code into HBase for accessing the credential. Secondly you need to specify a non world ACL when interacting with znodes (create primarily): http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html Feel free to ping the ZooKeeper team if you have questions. It might also be good to discuss with some potential end users - in particular regarding how the end user can specify the credential. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication
[ https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154104#comment-13154104 ] ramkrishna.s.vasudevan commented on HBASE-2418: --- I resolved by adding this {code} repository idghelmling.testing/id nameGary Helmling test repo/name urlhttp://people.apache.org/~garyh/mvn//url snapshots enabledtrue/enabled /snapshots releases enabledtrue/enabled /releases /repository {code} this was present in HBASE-2418-3.patch add support for ZooKeeper authentication Key: HBASE-2418 URL: https://issues.apache.org/jira/browse/HBASE-2418 Project: HBase Issue Type: Improvement Components: master, regionserver Reporter: Patrick Hunt Assignee: Eugene Koontz Priority: Critical Labels: security, zookeeper Fix For: 0.92.0, 0.94.0 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch Some users may run a ZooKeeper cluster in multi tenant mode meaning that more than one client service would like to share a single ZooKeeper service instance (cluster). In this case the client services typically want to protect their data (ZK znodes) from access by other services (tenants) on the cluster. Say you are running HBase and Solr and Neo4j, or multiple HBase instances, etc... having authentication/authorization on the znodes is important for both security and helping to ensure that services don't interact negatively (touch each other's data). Today HBase does not have support for authentication or authorization. This should be added to the HBase clients that are accessing the ZK cluster. In general it means calling addAuthInfo once after a session is established: http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String, byte[]) with a user specific credential, often times this is a shared secret or certificate. You may be able to statically configure this in some cases (config string or file to read from), however in my case in particular you may need to access it programmatically, which adds complexity as the end user may need to load code into HBase for accessing the credential. Secondly you need to specify a non world ACL when interacting with znodes (create primarily): http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html Feel free to ping the ZooKeeper team if you have questions. It might also be good to discuss with some potential end users - in particular regarding how the end user can specify the credential. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4833) HRegionServer stops could be 0,5s faster
[ https://issues.apache.org/jira/browse/HBASE-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154129#comment-13154129 ] nkeywal commented on HBASE-4833: I created HBASE-4832 to follow the fix on TestRegionServerCoprocessorExceptionWithAbort and added a link. HRegionServer stops could be 0,5s faster Key: HBASE-4833 URL: https://issues.apache.org/jira/browse/HBASE-4833 Project: HBase Issue Type: Improvement Components: regionserver, test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4833_trunk_hregionserver.patch The current implementation of HRegionServer#stop is {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); synchronized (this) { // Wakes run() if it is sleeping notifyAll(); // FindBugs NN_NAKED_NOTIFY } } {noformat} The notification is sent on the wrong object and does nothing. As a consequence, the region server continues to sleep instead of waking up and stopping immediately. A correct implementation is: {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); // Wakes run() if it is sleeping sleeper.skipSleepCycle(); } {noformat} Then the region server stops immediately. This makes the region server stops 0,5s faster on average, which is quite useful for unit tests. However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does not work. It likely because the code does no expect the region server to stop that fast. See HBASE-4832 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-4308: - Assignee: ramkrishna.s.vasudevan Race between RegionOpenedHandler and AssignmentManager -- Key: HBASE-4308 URL: https://issues.apache.org/jira/browse/HBASE-4308 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0 When the master is processing a ZK event for REGION_OPENED, it calls delete() on the znode before it removes the node from RegionsInTransition. If the notification of that delete comes back into AssignmentManager before the region is removed from RIT, you see an error like: 2011-08-30 17:43:29,537 WARN [main-EventThread] master.AssignmentManager(861): Node deleted but still in RIT: .META.,,1.1028785192 state=OPEN, ts=1314751409532, server=todd-w510,55655,1314751396840 Not certain if it causes issues, but it's a concerning log message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication
[ https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154146#comment-13154146 ] Hudson commented on HBASE-2418: --- Integrated in HBase-TRUNK-security #2 (See [https://builds.apache.org/job/HBase-TRUNK-security/2/]) HBASE-2418 Support for ZooKeeper authentication apurtell : Files : * /hbase/trunk/pom.xml * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperACL.java add support for ZooKeeper authentication Key: HBASE-2418 URL: https://issues.apache.org/jira/browse/HBASE-2418 Project: HBase Issue Type: Improvement Components: master, regionserver Reporter: Patrick Hunt Assignee: Eugene Koontz Priority: Critical Labels: security, zookeeper Fix For: 0.92.0, 0.94.0 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch Some users may run a ZooKeeper cluster in multi tenant mode meaning that more than one client service would like to share a single ZooKeeper service instance (cluster). In this case the client services typically want to protect their data (ZK znodes) from access by other services (tenants) on the cluster. Say you are running HBase and Solr and Neo4j, or multiple HBase instances, etc... having authentication/authorization on the znodes is important for both security and helping to ensure that services don't interact negatively (touch each other's data). Today HBase does not have support for authentication or authorization. This should be added to the HBase clients that are accessing the ZK cluster. In general it means calling addAuthInfo once after a session is established: http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String, byte[]) with a user specific credential, often times this is a shared secret or certificate. You may be able to statically configure this in some cases (config string or file to read from), however in my case in particular you may need to access it programmatically, which adds complexity as the end user may need to load code into HBase for accessing the credential. Secondly you need to specify a non world ACL when interacting with znodes (create primarily): http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html Feel free to ping the ZooKeeper team if you have questions. It might also be good to discuss with some potential end users - in particular regarding how the end user can specify the credential. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4213: -- Attachment: (was: 4213-trunk-v7.txt) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK. Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.94.0 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 4213-101211-Support_instant_schema_changes_through_ZK.patch, 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 4213-V10-Support_instant_schema_changes_through_ZK.patch, 4213-V5-Support_instant_schema_changes_through_ZK.patch, 4213-V7-Support_instant_schema_changes_through_ZK.patch, 4213-V8-Support_instant_schema_changes_through_ZK.patch, 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 4213-trunk-v3.txt, 4213-trunk-v4.txt, 4213-trunk-v5.txt, 4213-trunk-v6.txt, 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.v6, HBASE-4213-Instant_schema_change.patch, HBASE-4213_Instant_schema_change_-Version_2_.patch, HBASE_Instant_schema_change-version_3_.patch, schema-update.png This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4213: -- Status: Open (was: Patch Available) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK. Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.94.0 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 4213-101211-Support_instant_schema_changes_through_ZK.patch, 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 4213-V10-Support_instant_schema_changes_through_ZK.patch, 4213-V5-Support_instant_schema_changes_through_ZK.patch, 4213-V7-Support_instant_schema_changes_through_ZK.patch, 4213-V8-Support_instant_schema_changes_through_ZK.patch, 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 4213-trunk-v3.txt, 4213-trunk-v4.txt, 4213-trunk-v5.txt, 4213-trunk-v6.txt, 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.v6, HBASE-4213-Instant_schema_change.patch, HBASE-4213_Instant_schema_change_-Version_2_.patch, HBASE_Instant_schema_change-version_3_.patch, schema-update.png This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4213: -- Attachment: 4213-trunk-v7.txt Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK. Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.94.0 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 4213-101211-Support_instant_schema_changes_through_ZK.patch, 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 4213-V10-Support_instant_schema_changes_through_ZK.patch, 4213-V5-Support_instant_schema_changes_through_ZK.patch, 4213-V7-Support_instant_schema_changes_through_ZK.patch, 4213-V8-Support_instant_schema_changes_through_ZK.patch, 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 4213-trunk-v3.txt, 4213-trunk-v4.txt, 4213-trunk-v5.txt, 4213-trunk-v6.txt, 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.v6, HBASE-4213-Instant_schema_change.patch, HBASE-4213_Instant_schema_change_-Version_2_.patch, HBASE_Instant_schema_change-version_3_.patch, schema-update.png This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4213: -- Status: Patch Available (was: Open) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK. Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.94.0 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 4213-101211-Support_instant_schema_changes_through_ZK.patch, 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 4213-V10-Support_instant_schema_changes_through_ZK.patch, 4213-V5-Support_instant_schema_changes_through_ZK.patch, 4213-V7-Support_instant_schema_changes_through_ZK.patch, 4213-V8-Support_instant_schema_changes_through_ZK.patch, 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 4213-trunk-v3.txt, 4213-trunk-v4.txt, 4213-trunk-v5.txt, 4213-trunk-v6.txt, 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.v6, HBASE-4213-Instant_schema_change.patch, HBASE-4213_Instant_schema_change_-Version_2_.patch, HBASE_Instant_schema_change-version_3_.patch, schema-update.png This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk
[ https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154249#comment-13154249 ] Jonathan Hsieh commented on HBASE-2856: --- @lars the 0.92 version or TestAcidGuarantees ran for about 12 hours without problems. TestAcidGuarantee broken on trunk -- Key: HBASE-2856 URL: https://issues.apache.org/jira/browse/HBASE-2856 Project: HBase Issue Type: Bug Affects Versions: 0.89.20100621 Reporter: ryan rawson Assignee: Amitanand Aiyer Priority: Blocker Fix For: 0.94.0 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 2856-v9-all-inclusive.txt, acid.txt TestAcidGuarantee has a test whereby it attempts to read a number of columns from a row, and every so often the first column of N is different, when it should be the same. This is a bug deep inside the scanner whereby the first peek() of a row is done at time T then the rest of the read is done at T+1 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' data becomes committed and flushed to disk. One possible solution is to introduce the memstoreTS (or similarly equivalent value) to the HFile thus allowing us to preserve read consistency past flushes. Another solution involves fixing the scanners so that peek() is not destructive (and thus might return different things at different times alas). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk
[ https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154262#comment-13154262 ] Nicolas Spiegelberg commented on HBASE-2856: Something to keep in mind: we have a version of this for our prod branch running on some smaller test clusters, but not yet on our actual prod clusters (since we committed it at the same time you did). Also, note that between HFileV2 this, there is no easy downgrade strategy after moving from 90 to 92. I think that putting this in a 92 RC definitely means a extra testing effort. However, it's been the last massive outstanding caveat for ACID semantics so it makes sense for 92 inclusion. I'm sure that other companies consider this a critical issue for their customers, so they would be up for accelerating this testing effort ahead of our schedule. :) TestAcidGuarantee broken on trunk -- Key: HBASE-2856 URL: https://issues.apache.org/jira/browse/HBASE-2856 Project: HBase Issue Type: Bug Affects Versions: 0.89.20100621 Reporter: ryan rawson Assignee: Amitanand Aiyer Priority: Blocker Fix For: 0.94.0 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 2856-v9-all-inclusive.txt, acid.txt TestAcidGuarantee has a test whereby it attempts to read a number of columns from a row, and every so often the first column of N is different, when it should be the same. This is a bug deep inside the scanner whereby the first peek() of a row is done at time T then the rest of the read is done at T+1 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' data becomes committed and flushed to disk. One possible solution is to introduce the memstoreTS (or similarly equivalent value) to the HFile thus allowing us to preserve read consistency past flushes. Another solution involves fixing the scanners so that peek() is not destructive (and thus might return different things at different times alas). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4798) Sleeps and synchronisation improvements for tests
[ https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4798: --- Attachment: 4798_trunk_all.v10.patch Sleeps and synchronisation improvements for tests - Key: HBASE-4798 URL: https://issues.apache.org/jira/browse/HBASE-4798 Project: HBase Issue Type: Improvement Components: master, regionserver, test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v2.patch, 4798_trunk_all.v5.patch, 4798_trunk_all.v6.patch, 4798_trunk_all.v7.patch Multiple small changes: @commiters: Removing some sleeps made visible a bug on JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. You may want to review this. JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never met (test on !c !!c). Added a new synchronization point. AssignementManager#waitForAssignment: add a timeout on the wait = not stuck if the notification is received before the wait. HMaster#loop: use a notification instead of a 1s sleep HRegionServer#waitForServerOnline: new method used by JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s sleep HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s ZooKeeperNodeTracker#start: replace a recursive call by a loop ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not stuck if the notification is received before the wait. HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, with the change on HBaseTestingUtility we are 60s faster TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead of 1s TestRestartCluster#testClusterRestart: send all the table creation together, then check creation, should be faster TestHLog: shutdown the whole cluster instead of DFS only (more standard) JVMClusterUtil#startup: lower the sleep from 1s to 0,1s HConnectionManager#close: Zookeeper name in debug message from HConnectionManager after connection close was always null because it was set to null in the delete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4798) Sleeps and synchronisation improvements for tests
[ https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4798: --- Status: Open (was: Patch Available) Sleeps and synchronisation improvements for tests - Key: HBASE-4798 URL: https://issues.apache.org/jira/browse/HBASE-4798 Project: HBase Issue Type: Improvement Components: master, regionserver, test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v2.patch, 4798_trunk_all.v5.patch, 4798_trunk_all.v6.patch, 4798_trunk_all.v7.patch Multiple small changes: @commiters: Removing some sleeps made visible a bug on JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. You may want to review this. JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never met (test on !c !!c). Added a new synchronization point. AssignementManager#waitForAssignment: add a timeout on the wait = not stuck if the notification is received before the wait. HMaster#loop: use a notification instead of a 1s sleep HRegionServer#waitForServerOnline: new method used by JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s sleep HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s ZooKeeperNodeTracker#start: replace a recursive call by a loop ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not stuck if the notification is received before the wait. HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, with the change on HBaseTestingUtility we are 60s faster TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead of 1s TestRestartCluster#testClusterRestart: send all the table creation together, then check creation, should be faster TestHLog: shutdown the whole cluster instead of DFS only (more standard) JVMClusterUtil#startup: lower the sleep from 1s to 0,1s HConnectionManager#close: Zookeeper name in debug message from HConnectionManager after connection close was always null because it was set to null in the delete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4798) Sleeps and synchronisation improvements for tests
[ https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4798: --- Release Note: last try on the same patch, but I think it's ok to commit it. Status: Patch Available (was: Open) Sleeps and synchronisation improvements for tests - Key: HBASE-4798 URL: https://issues.apache.org/jira/browse/HBASE-4798 Project: HBase Issue Type: Improvement Components: master, regionserver, test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v2.patch, 4798_trunk_all.v5.patch, 4798_trunk_all.v6.patch, 4798_trunk_all.v7.patch Multiple small changes: @commiters: Removing some sleeps made visible a bug on JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. You may want to review this. JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never met (test on !c !!c). Added a new synchronization point. AssignementManager#waitForAssignment: add a timeout on the wait = not stuck if the notification is received before the wait. HMaster#loop: use a notification instead of a 1s sleep HRegionServer#waitForServerOnline: new method used by JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s sleep HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s ZooKeeperNodeTracker#start: replace a recursive call by a loop ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not stuck if the notification is received before the wait. HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, with the change on HBaseTestingUtility we are 60s faster TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead of 1s TestRestartCluster#testClusterRestart: send all the table creation together, then check creation, should be faster TestHLog: shutdown the whole cluster instead of DFS only (more standard) JVMClusterUtil#startup: lower the sleep from 1s to 0,1s HConnectionManager#close: Zookeeper name in debug message from HConnectionManager after connection close was always null because it was set to null in the delete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication
[ https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154276#comment-13154276 ] Andrew Purtell commented on HBASE-2418: --- @Mikhail: Thanks, that doesn't have a clear direct relation. If it were a test failure, I'd say otherwise. This patch modified the MiniZKCluster to take a Configuration in constructor and use it. This patch did not touch ZKConfig, which is HBase side code. add support for ZooKeeper authentication Key: HBASE-2418 URL: https://issues.apache.org/jira/browse/HBASE-2418 Project: HBase Issue Type: Improvement Components: master, regionserver Reporter: Patrick Hunt Assignee: Eugene Koontz Priority: Critical Labels: security, zookeeper Fix For: 0.92.0, 0.94.0 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch Some users may run a ZooKeeper cluster in multi tenant mode meaning that more than one client service would like to share a single ZooKeeper service instance (cluster). In this case the client services typically want to protect their data (ZK znodes) from access by other services (tenants) on the cluster. Say you are running HBase and Solr and Neo4j, or multiple HBase instances, etc... having authentication/authorization on the znodes is important for both security and helping to ensure that services don't interact negatively (touch each other's data). Today HBase does not have support for authentication or authorization. This should be added to the HBase clients that are accessing the ZK cluster. In general it means calling addAuthInfo once after a session is established: http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String, byte[]) with a user specific credential, often times this is a shared secret or certificate. You may be able to statically configure this in some cases (config string or file to read from), however in my case in particular you may need to access it programmatically, which adds complexity as the end user may need to load code into HBase for accessing the credential. Secondly you need to specify a non world ACL when interacting with znodes (create primarily): http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html Feel free to ping the ZooKeeper team if you have questions. It might also be good to discuss with some potential end users - in particular regarding how the end user can specify the credential. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication
[ https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154280#comment-13154280 ] Andrew Purtell commented on HBASE-2418: --- @Ram I'm looking at the 0.92 pom right now and it includes the repository entry for ghelmling.testing. add support for ZooKeeper authentication Key: HBASE-2418 URL: https://issues.apache.org/jira/browse/HBASE-2418 Project: HBase Issue Type: Improvement Components: master, regionserver Reporter: Patrick Hunt Assignee: Eugene Koontz Priority: Critical Labels: security, zookeeper Fix For: 0.92.0, 0.94.0 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch Some users may run a ZooKeeper cluster in multi tenant mode meaning that more than one client service would like to share a single ZooKeeper service instance (cluster). In this case the client services typically want to protect their data (ZK znodes) from access by other services (tenants) on the cluster. Say you are running HBase and Solr and Neo4j, or multiple HBase instances, etc... having authentication/authorization on the znodes is important for both security and helping to ensure that services don't interact negatively (touch each other's data). Today HBase does not have support for authentication or authorization. This should be added to the HBase clients that are accessing the ZK cluster. In general it means calling addAuthInfo once after a session is established: http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String, byte[]) with a user specific credential, often times this is a shared secret or certificate. You may be able to statically configure this in some cases (config string or file to read from), however in my case in particular you may need to access it programmatically, which adds complexity as the end user may need to load code into HBase for accessing the credential. Secondly you need to specify a non world ACL when interacting with znodes (create primarily): http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html Feel free to ping the ZooKeeper team if you have questions. It might also be good to discuss with some potential end users - in particular regarding how the end user can specify the credential. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4835) CME out of ZKConfig.makeZKProps
CME out of ZKConfig.makeZKProps --- Key: HBASE-4835 URL: https://issues.apache.org/jira/browse/HBASE-4835 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Mikhail reported this from a five-node, three-RS cluster test: {code} 2011-11-21 01:30:15,188 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server machine_name,60020,1321867814890: Initialization of RS failed. Hence aborting RS. java.util.ConcurrentModificationException at java.util.Hashtable$Enumerator.next(Hashtable.java:1031) at org.apache.hadoop.conf.Configuration.iterator(Configuration.java:1042) at org.apache.hadoop.hbase.zookeeper.ZKConfig.makeZKProps(ZKConfig.java:75) at org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:245) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:144) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:124) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1262) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:568) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:559) at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:183) at org.apache.hadoop.hbase.catalog.CatalogTracker.init(CatalogTracker.java:177) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:575) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:534) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:642) at java.lang.Thread.run(Thread.java:619) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication
[ https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154281#comment-13154281 ] Andrew Purtell commented on HBASE-2418: --- I opened HBASE-4835 for the CME. add support for ZooKeeper authentication Key: HBASE-2418 URL: https://issues.apache.org/jira/browse/HBASE-2418 Project: HBase Issue Type: Improvement Components: master, regionserver Reporter: Patrick Hunt Assignee: Eugene Koontz Priority: Critical Labels: security, zookeeper Fix For: 0.92.0, 0.94.0 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch Some users may run a ZooKeeper cluster in multi tenant mode meaning that more than one client service would like to share a single ZooKeeper service instance (cluster). In this case the client services typically want to protect their data (ZK znodes) from access by other services (tenants) on the cluster. Say you are running HBase and Solr and Neo4j, or multiple HBase instances, etc... having authentication/authorization on the znodes is important for both security and helping to ensure that services don't interact negatively (touch each other's data). Today HBase does not have support for authentication or authorization. This should be added to the HBase clients that are accessing the ZK cluster. In general it means calling addAuthInfo once after a session is established: http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String, byte[]) with a user specific credential, often times this is a shared secret or certificate. You may be able to statically configure this in some cases (config string or file to read from), however in my case in particular you may need to access it programmatically, which adds complexity as the end user may need to load code into HBase for accessing the credential. Secondly you need to specify a non world ACL when interacting with znodes (create primarily): http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html Feel free to ping the ZooKeeper team if you have questions. It might also be good to discuss with some potential end users - in particular regarding how the end user can specify the credential. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4809) Per-CF set RPC metrics
[ https://issues.apache.org/jira/browse/HBASE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154285#comment-13154285 ] Phabricator commented on HBASE-4809: nspiegelberg has accepted the revision [jira] [HBASE-4809] Per-CF set RPC metrics. could you put the test under TestHeapSize I'll commit INLINE COMMENTS src/test/java/org/apache/hadoop/hbase/regionserver/metrics/TestSchemaMetrics.java:220-222 I think it's better to be consistent than optimal. Right now, heapsize is easy to refactor because it's done the same way for all classes. I'm +1 on a heapsize refactor, but I say we put that in another JIRA. It's easier to review 2 JIRAs, feature + refactor, than it is to combine the two. REVISION DETAIL https://reviews.facebook.net/D483 Per-CF set RPC metrics -- Key: HBASE-4809 URL: https://issues.apache.org/jira/browse/HBASE-4809 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Minor Attachments: D483.1.patch, D483.2.patch, D483.3.patch, HBASE-4809_Per_CF_set_RPC_metrics.patch Porting per-CF set metrics for RPC times and response sizes from 0.89-fb to trunk. For each mutation signature (a set of column families involved in an RPC request) we increment several metrics, allowing to monitor access patterns. We deal with guarding against an explosion of the number of metrics in HBASE-4638 (which might even be implemented as part of this JIRA). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4835) CME out of ZKConfig.makeZKProps
[ https://issues.apache.org/jira/browse/HBASE-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-4835: -- Attachment: HBASE-4835.patch I think the simplest course of action is to make a shallow copy of the Configuration in the ZKW constructor. CME out of ZKConfig.makeZKProps --- Key: HBASE-4835 URL: https://issues.apache.org/jira/browse/HBASE-4835 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Attachments: HBASE-4835.patch Mikhail reported this from a five-node, three-RS cluster test: {code} 2011-11-21 01:30:15,188 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server machine_name,60020,1321867814890: Initialization of RS failed. Hence aborting RS. java.util.ConcurrentModificationException at java.util.Hashtable$Enumerator.next(Hashtable.java:1031) at org.apache.hadoop.conf.Configuration.iterator(Configuration.java:1042) at org.apache.hadoop.hbase.zookeeper.ZKConfig.makeZKProps(ZKConfig.java:75) at org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:245) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:144) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:124) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1262) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:568) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:559) at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:183) at org.apache.hadoop.hbase.catalog.CatalogTracker.init(CatalogTracker.java:177) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:575) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:534) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:642) at java.lang.Thread.run(Thread.java:619) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154297#comment-13154297 ] Hadoop QA commented on HBASE-4213: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504508/4213-trunk-v7.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 66 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestInstantSchemaChange org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/318//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/318//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/318//console This message is automatically generated. Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK. Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.94.0 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 4213-101211-Support_instant_schema_changes_through_ZK.patch, 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 4213-V10-Support_instant_schema_changes_through_ZK.patch, 4213-V5-Support_instant_schema_changes_through_ZK.patch, 4213-V7-Support_instant_schema_changes_through_ZK.patch, 4213-V8-Support_instant_schema_changes_through_ZK.patch, 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 4213-trunk-v3.txt, 4213-trunk-v4.txt, 4213-trunk-v5.txt, 4213-trunk-v6.txt, 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.v6, HBASE-4213-Instant_schema_change.patch, HBASE-4213_Instant_schema_change_-Version_2_.patch, HBASE_Instant_schema_change-version_3_.patch, schema-update.png This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk
[ https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154298#comment-13154298 ] stack commented on HBASE-2856: -- Lets get it in. @Lars TestHCM failed recently for me in 0.92 building locally. Maybe its not related to this. @Jon Thanks for running 12 our proofing. TestAcidGuarantee broken on trunk -- Key: HBASE-2856 URL: https://issues.apache.org/jira/browse/HBASE-2856 Project: HBase Issue Type: Bug Affects Versions: 0.89.20100621 Reporter: ryan rawson Assignee: Amitanand Aiyer Priority: Blocker Fix For: 0.94.0 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 2856-v9-all-inclusive.txt, acid.txt TestAcidGuarantee has a test whereby it attempts to read a number of columns from a row, and every so often the first column of N is different, when it should be the same. This is a bug deep inside the scanner whereby the first peek() of a row is done at time T then the rest of the read is done at T+1 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' data becomes committed and flushed to disk. One possible solution is to introduce the memstoreTS (or similarly equivalent value) to the HFile thus allowing us to preserve read consistency past flushes. Another solution involves fixing the scanners so that peek() is not destructive (and thus might return different things at different times alas). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154300#comment-13154300 ] Ted Yu commented on HBASE-4213: --- 3 out of the 4 test failures were due to 'Too many open files'. Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK. Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.94.0 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 4213-101211-Support_instant_schema_changes_through_ZK.patch, 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 4213-V10-Support_instant_schema_changes_through_ZK.patch, 4213-V5-Support_instant_schema_changes_through_ZK.patch, 4213-V7-Support_instant_schema_changes_through_ZK.patch, 4213-V8-Support_instant_schema_changes_through_ZK.patch, 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 4213-trunk-v3.txt, 4213-trunk-v4.txt, 4213-trunk-v5.txt, 4213-trunk-v6.txt, 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.v6, HBASE-4213-Instant_schema_change.patch, HBASE-4213_Instant_schema_change_-Version_2_.patch, HBASE_Instant_schema_change-version_3_.patch, schema-update.png This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154306#comment-13154306 ] Ted Yu commented on HBASE-4213: --- Integrated to TRUNK. Thanks for the patch Subbu. Thanks for the review, Lars, Todd and Andy. Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK. Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.94.0 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 4213-101211-Support_instant_schema_changes_through_ZK.patch, 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 4213-V10-Support_instant_schema_changes_through_ZK.patch, 4213-V5-Support_instant_schema_changes_through_ZK.patch, 4213-V7-Support_instant_schema_changes_through_ZK.patch, 4213-V8-Support_instant_schema_changes_through_ZK.patch, 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 4213-trunk-v3.txt, 4213-trunk-v4.txt, 4213-trunk-v5.txt, 4213-trunk-v6.txt, 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.v6, HBASE-4213-Instant_schema_change.patch, HBASE-4213_Instant_schema_change_-Version_2_.patch, HBASE_Instant_schema_change-version_3_.patch, schema-update.png This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4835) CME out of ZKConfig.makeZKProps
[ https://issues.apache.org/jira/browse/HBASE-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154316#comment-13154316 ] Ted Yu commented on HBASE-4835: --- ZKConfig.makeZKProps() is used by ZKUtil.connect(), ZKConfig.getZKQuorumServersString(), etc Would ConcurrentModificationException come out the other callers ? CME out of ZKConfig.makeZKProps --- Key: HBASE-4835 URL: https://issues.apache.org/jira/browse/HBASE-4835 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Attachments: HBASE-4835.patch Mikhail reported this from a five-node, three-RS cluster test: {code} 2011-11-21 01:30:15,188 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server machine_name,60020,1321867814890: Initialization of RS failed. Hence aborting RS. java.util.ConcurrentModificationException at java.util.Hashtable$Enumerator.next(Hashtable.java:1031) at org.apache.hadoop.conf.Configuration.iterator(Configuration.java:1042) at org.apache.hadoop.hbase.zookeeper.ZKConfig.makeZKProps(ZKConfig.java:75) at org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:245) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:144) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:124) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1262) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:568) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:559) at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:183) at org.apache.hadoop.hbase.catalog.CatalogTracker.init(CatalogTracker.java:177) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:575) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:534) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:642) at java.lang.Thread.run(Thread.java:619) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4835) CME out of ZKConfig.makeZKProps
[ https://issues.apache.org/jira/browse/HBASE-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154323#comment-13154323 ] stack commented on HBASE-4835: -- Would it be cleaner making the clone down in makeZKProps Andrew rather than globally per ZKW instance? (Lets get this fix into 0.92). CME out of ZKConfig.makeZKProps --- Key: HBASE-4835 URL: https://issues.apache.org/jira/browse/HBASE-4835 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Attachments: HBASE-4835.patch Mikhail reported this from a five-node, three-RS cluster test: {code} 2011-11-21 01:30:15,188 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server machine_name,60020,1321867814890: Initialization of RS failed. Hence aborting RS. java.util.ConcurrentModificationException at java.util.Hashtable$Enumerator.next(Hashtable.java:1031) at org.apache.hadoop.conf.Configuration.iterator(Configuration.java:1042) at org.apache.hadoop.hbase.zookeeper.ZKConfig.makeZKProps(ZKConfig.java:75) at org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:245) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:144) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:124) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1262) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:568) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:559) at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:183) at org.apache.hadoop.hbase.catalog.CatalogTracker.init(CatalogTracker.java:177) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:575) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:534) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:642) at java.lang.Thread.run(Thread.java:619) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4836) Master stuck with ServerShutdownHandler doing a waitForMeta
Master stuck with ServerShutdownHandler doing a waitForMeta --- Key: HBASE-4836 URL: https://issues.apache.org/jira/browse/HBASE-4836 Project: HBase Issue Type: Bug Reporter: stack Messing around w/ 0.92 on cluster I got myself into a situation where the master would not go down because we were hung as follows in an infinite wait on meta to come up: {code} MASTER_SERVER_OPERATIONS-sv4r11s38,7001,1321897362552-2 prio=10 tid=0x4205d800 nid=0x19f6 waiting for monitor entry [0x7fe4eb3f1000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:457) - waiting to lock 0xca199190 (a java.util.concurrent.atomic.AtomicBoolean) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:426) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:253) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:168) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) MASTER_SERVER_OPERATIONS-sv4r11s38,7001,1321897362552-1 prio=10 tid=0x4237b000 nid=0x19f4 waiting for monitor entry [0x7fe4ebefc000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:457) - waiting to lock 0xca199190 (a java.util.concurrent.atomic.AtomicBoolean) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:426) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:253) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:168) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) MASTER_SERVER_OPERATIONS-sv4r11s38,7001,1321897362552-0 prio=10 tid=0x7fe4ec610800 nid=0x18e1 waiting on condition [0x7fe4eb4f2000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1295) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:655) at org.apache.hadoop.hbase.catalog.MetaReader.get(MetaReader.java:245) at org.apache.hadoop.hbase.catalog.MetaReader.getRegion(MetaReader.java:347) at org.apache.hadoop.hbase.catalog.MetaReader.readRegionLocation(MetaReader.java:287) at org.apache.hadoop.hbase.catalog.MetaReader.getMetaRegionLocation(MetaReader.java:274) at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:399) - locked 0xca199190 (a java.util.concurrent.atomic.AtomicBoolean) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:458) - locked 0xca199190 (a java.util.concurrent.atomic.AtomicBoolean) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:426) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:253) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:168) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} This bit of code needs a bit of refactor such that we can get in state of hosting server -- whether its stopped/stopping or not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4835) ConcurrentModificationException out of ZKConfig.makeZKProps
[ https://issues.apache.org/jira/browse/HBASE-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4835: -- Summary: ConcurrentModificationException out of ZKConfig.makeZKProps (was: CME out of ZKConfig.makeZKProps) ConcurrentModificationException out of ZKConfig.makeZKProps --- Key: HBASE-4835 URL: https://issues.apache.org/jira/browse/HBASE-4835 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Attachments: HBASE-4835.patch Mikhail reported this from a five-node, three-RS cluster test: {code} 2011-11-21 01:30:15,188 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server machine_name,60020,1321867814890: Initialization of RS failed. Hence aborting RS. java.util.ConcurrentModificationException at java.util.Hashtable$Enumerator.next(Hashtable.java:1031) at org.apache.hadoop.conf.Configuration.iterator(Configuration.java:1042) at org.apache.hadoop.hbase.zookeeper.ZKConfig.makeZKProps(ZKConfig.java:75) at org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:245) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:144) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:124) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1262) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:568) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:559) at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:183) at org.apache.hadoop.hbase.catalog.CatalogTracker.init(CatalogTracker.java:177) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:575) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:534) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:642) at java.lang.Thread.run(Thread.java:619) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4837) [book] book.xml - schema design, comment on new storefile creation
[book] book.xml - schema design, comment on new storefile creation -- Key: HBASE-4837 URL: https://issues.apache.org/jira/browse/HBASE-4837 Project: HBase Issue Type: Improvement Reporter: Doug Meil Assignee: Doug Meil Priority: Minor book.xml * schema design chapter. added sub-section commenting that table and CF changes won't take effect until new StoreFiles get written. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154328#comment-13154328 ] Lars Hofhansl commented on HBASE-4213: -- Awesome. Thanks for all the work on this Subbu and Ted. This will be incredibly useful! (Once online schema changes stabilize in general) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK. Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.94.0 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 4213-101211-Support_instant_schema_changes_through_ZK.patch, 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 4213-V10-Support_instant_schema_changes_through_ZK.patch, 4213-V5-Support_instant_schema_changes_through_ZK.patch, 4213-V7-Support_instant_schema_changes_through_ZK.patch, 4213-V8-Support_instant_schema_changes_through_ZK.patch, 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 4213-trunk-v3.txt, 4213-trunk-v4.txt, 4213-trunk-v5.txt, 4213-trunk-v6.txt, 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.v6, HBASE-4213-Instant_schema_change.patch, HBASE-4213_Instant_schema_change_-Version_2_.patch, HBASE_Instant_schema_change-version_3_.patch, schema-update.png This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4814) Starting an online alter when regions are splitting can leave their daughters unaltered
[ https://issues.apache.org/jira/browse/HBASE-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4814: -- Fix Version/s: (was: 0.92.0) Fix this in 0.94 Starting an online alter when regions are splitting can leave their daughters unaltered --- Key: HBASE-4814 URL: https://issues.apache.org/jira/browse/HBASE-4814 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Fix For: 0.94.0 I've seen a situation where regions were splitting almost exactly at the same time as an alter command was issued and those regions' daughters were left unaltered. It would even seem that the daughters' daughters also share this situation. Reopening all the regions fixes the problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4837) [book] book.xml - schema design, comment on new storefile creation
[ https://issues.apache.org/jira/browse/HBASE-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doug Meil updated HBASE-4837: - Attachment: book_HBASE_4837.xml.patch [book] book.xml - schema design, comment on new storefile creation -- Key: HBASE-4837 URL: https://issues.apache.org/jira/browse/HBASE-4837 Project: HBase Issue Type: Improvement Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Attachments: book_HBASE_4837.xml.patch book.xml * schema design chapter. added sub-section commenting that table and CF changes won't take effect until new StoreFiles get written. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4837) [book] book.xml - schema design, comment on new storefile creation
[ https://issues.apache.org/jira/browse/HBASE-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doug Meil updated HBASE-4837: - Status: Patch Available (was: Open) [book] book.xml - schema design, comment on new storefile creation -- Key: HBASE-4837 URL: https://issues.apache.org/jira/browse/HBASE-4837 Project: HBase Issue Type: Improvement Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Attachments: book_HBASE_4837.xml.patch book.xml * schema design chapter. added sub-section commenting that table and CF changes won't take effect until new StoreFiles get written. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4837) [book] book.xml - schema design, comment on new storefile creation
[ https://issues.apache.org/jira/browse/HBASE-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doug Meil updated HBASE-4837: - Resolution: Fixed Status: Resolved (was: Patch Available) [book] book.xml - schema design, comment on new storefile creation -- Key: HBASE-4837 URL: https://issues.apache.org/jira/browse/HBASE-4837 Project: HBase Issue Type: Improvement Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Attachments: book_HBASE_4837.xml.patch book.xml * schema design chapter. added sub-section commenting that table and CF changes won't take effect until new StoreFiles get written. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk
[ https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154332#comment-13154332 ] Lars Hofhansl commented on HBASE-2856: -- testClosing is something I added as part of: HBASE-4805, I'll take a look. Some of the other failing tests in there scare me. :) TestAcidGuarantee broken on trunk -- Key: HBASE-2856 URL: https://issues.apache.org/jira/browse/HBASE-2856 Project: HBase Issue Type: Bug Affects Versions: 0.89.20100621 Reporter: ryan rawson Assignee: Amitanand Aiyer Priority: Blocker Fix For: 0.94.0 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 2856-v9-all-inclusive.txt, acid.txt TestAcidGuarantee has a test whereby it attempts to read a number of columns from a row, and every so often the first column of N is different, when it should be the same. This is a bug deep inside the scanner whereby the first peek() of a row is done at time T then the rest of the read is done at T+1 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' data becomes committed and flushed to disk. One possible solution is to introduce the memstoreTS (or similarly equivalent value) to the HFile thus allowing us to preserve read consistency past flushes. Another solution involves fixing the scanners so that peek() is not destructive (and thus might return different things at different times alas). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154333#comment-13154333 ] Ted Yu commented on HBASE-4213: --- Compared to the implementation from HBASE-1730, Subbu's code would take less amount of effort to stabilize. Thanks for the support, Lars. Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK. Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.94.0 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 4213-101211-Support_instant_schema_changes_through_ZK.patch, 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 4213-V10-Support_instant_schema_changes_through_ZK.patch, 4213-V5-Support_instant_schema_changes_through_ZK.patch, 4213-V7-Support_instant_schema_changes_through_ZK.patch, 4213-V8-Support_instant_schema_changes_through_ZK.patch, 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 4213-trunk-v3.txt, 4213-trunk-v4.txt, 4213-trunk-v5.txt, 4213-trunk-v6.txt, 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.v6, HBASE-4213-Instant_schema_change.patch, HBASE-4213_Instant_schema_change_-Version_2_.patch, HBASE_Instant_schema_change-version_3_.patch, schema-update.png This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast
[ https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koontz reassigned HBASE-4832: Assignee: Eugene Koontz TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast --- Key: HBASE-4832 URL: https://issues.apache.org/jira/browse/HBASE-4832 Project: HBase Issue Type: Bug Components: coprocessors, test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: Eugene Koontz Priority: Minor Attachments: 4832_trunk_hregionserver.patch The current implementation of HRegionServer#stop is {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); synchronized (this) { // Wakes run() if it is sleeping notifyAll(); // FindBugs NN_NAKED_NOTIFY } } {noformat} The notification is sent on the wrong object and does nothing. As a consequence, the region server continues to sleep instead of waking up and stopping immediately. A correct implementation is: {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); // Wakes run() if it is sleeping sleeper.skipSleepCycle(); } {noformat} Then the region server stops immediately. This makes the region server stops 0,5s faster on average, which is quite useful for unit tests. However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does not work. It likely because the code does no expect the region server to stop that fast. The exception is: {noformat} testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort) Time elapsed: 30.06 sec ERROR! java.lang.Exception: test timed out after 3 milliseconds at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.init(Throwable.java:196) at java.lang.Exception.init(Exception.java:41) at java.lang.InterruptedException.init(InterruptedException.java:48) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697) at org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280) at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:892) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:750) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:725) at org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort.testExceptionFromCoprocessorDuringPut(TestRegionServerCoprocessorExceptionWithAbort.java:84) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
[jira] [Commented] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change
[ https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154341#comment-13154341 ] jirapos...@reviews.apache.org commented on HBASE-4820: -- bq. On 2011-11-21 04:54:00, ramkrishna vasudevan wrote: bq. src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java, line 1530 bq. https://reviews.apache.org/r/2895/diff/1/?file=59536#file59536line1530 bq. bq. Can we make this msg more clear. bq. Something like bq. Unexpected state : statename.. Cannot transit znode state from : currentState to OFFLINE. You got it. - Jimmy --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2895/#review3388 --- On 2011-11-21 02:06:29, Jimmy Xiang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2895/ bq. --- bq. bq. (Updated 2011-11-21 02:06:29) bq. bq. bq. Review request for hbase, Todd Lipcon and Jonathan Robie. bq. bq. bq. Summary bq. --- bq. bq. Distributed log splitting coding enhancement to make it easier to understand, no semantics change. bq. It is some issue raised during the code review in back porting this feature to CDH. bq. bq. bq. This addresses bug HBASE-4820. bq. https://issues.apache.org/jira/browse/HBASE-4820 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java f7ef653 bq.src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java b9a3a2c bq.src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 7dd67e9 bq.src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java 1d329b0 bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java 21747b1 bq. src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java 51daa1f bq.src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java c8684ec bq. src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitLogWorker.java 84d76e8 bq. bq. Diff: https://reviews.apache.org/r/2895/diff bq. bq. bq. Testing bq. --- bq. bq. Ran unit tests. Non-flaky tests are green. Two client tests didn't pass, which are not related to this change. bq. bq. bq. Thanks, bq. bq. Jimmy bq. bq. Distributed log splitting coding enhancement to make it easier to understand, no semantics change - Key: HBASE-4820 URL: https://issues.apache.org/jira/browse/HBASE-4820 Project: HBase Issue Type: Improvement Components: wal Affects Versions: 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Labels: newbie Fix For: 0.94.0 Attachments: 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch, 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch In reviewing distributed log splitting feature, we found some cosmetic issues. They make the code hard to understand. It will be great to fix them. For this issue, there should be no semantic change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change
[ https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154342#comment-13154342 ] jirapos...@reviews.apache.org commented on HBASE-4820: -- bq. On 2011-11-21 03:29:17, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java, line 271 bq. https://reviews.apache.org/r/2895/diff/1/?file=59537#file59537line271 bq. bq. handleDeadWorkers would be a better method name. Yes, that's right. bq. On 2011-11-21 03:29:17, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java, line 431 bq. https://reviews.apache.org/r/2895/diff/1/?file=59538#file59538line431 bq. bq. retry_count is the remaining count. This log message should be clearer. That's right bq. On 2011-11-21 03:29:17, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java, line 480 bq. https://reviews.apache.org/r/2895/diff/1/?file=59538#file59538line480 bq. bq. We should say 'remaining retries=' Fixed. bq. On 2011-11-21 03:29:17, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java, line 453 bq. https://reviews.apache.org/r/2895/diff/1/?file=59538#file59538line453 bq. bq. Can we implement this item now ? We can do it in another jira. bq. On 2011-11-21 03:29:17, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java, line 671 bq. https://reviews.apache.org/r/2895/diff/1/?file=59538#file59538line671 bq. bq. Please adjust indentation. That's right. bq. On 2011-11-21 03:29:17, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java, line 210 bq. https://reviews.apache.org/r/2895/diff/1/?file=59538#file59538line210 bq. bq. Please remove white space. I assume you suggest we should not use tab. Please correct me if I am wrong. bq. On 2011-11-21 03:29:17, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java, line 952 bq. https://reviews.apache.org/r/2895/diff/1/?file=59538#file59538line952 bq. bq. Please adjust indentation for these 4 lines. Fixed. Replaced tabs with spaces. bq. On 2011-11-21 03:29:17, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java, line 965 bq. https://reviews.apache.org/r/2895/diff/1/?file=59538#file59538line965 bq. bq. Should read 'splitlog workers' fixed. bq. On 2011-11-21 03:29:17, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java, line 648 bq. https://reviews.apache.org/r/2895/diff/1/?file=59540#file59540line648 bq. bq. Adjust indentation, please. fixed. - Jimmy --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2895/#review3385 --- On 2011-11-21 02:06:29, Jimmy Xiang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2895/ bq. --- bq. bq. (Updated 2011-11-21 02:06:29) bq. bq. bq. Review request for hbase, Todd Lipcon and Jonathan Robie. bq. bq. bq. Summary bq. --- bq. bq. Distributed log splitting coding enhancement to make it easier to understand, no semantics change. bq. It is some issue raised during the code review in back porting this feature to CDH. bq. bq. bq. This addresses bug HBASE-4820. bq. https://issues.apache.org/jira/browse/HBASE-4820 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java f7ef653 bq.src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java b9a3a2c bq.src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 7dd67e9 bq.src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java 1d329b0 bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java 21747b1 bq. src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java 51daa1f bq.src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java c8684ec bq. src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitLogWorker.java 84d76e8 bq. bq. Diff: https://reviews.apache.org/r/2895/diff bq. bq. bq. Testing bq. --- bq. bq. Ran unit tests. Non-flaky tests are green. Two client tests didn't pass, which are not related to this change. bq. bq. bq. Thanks, bq. bq. Jimmy bq. bq. Distributed log splitting coding enhancement to make it easier to understand, no semantics change
[jira] [Commented] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change
[ https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154340#comment-13154340 ] jirapos...@reviews.apache.org commented on HBASE-4820: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2895/ --- (Updated 2011-11-21 18:22:03.402105) Review request for hbase, Todd Lipcon and Jonathan Robie. Changes --- Updated patch diff after changes per review. Summary --- Distributed log splitting coding enhancement to make it easier to understand, no semantics change. It is some issue raised during the code review in back porting this feature to CDH. This addresses bug HBASE-4820. https://issues.apache.org/jira/browse/HBASE-4820 Diffs (updated) - src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java f7ef653 src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java b9a3a2c src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 7dd67e9 src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java 1d329b0 src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java 21747b1 src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java 51daa1f src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java c8684ec src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitLogWorker.java 84d76e8 Diff: https://reviews.apache.org/r/2895/diff Testing --- Ran unit tests. Non-flaky tests are green. Two client tests didn't pass, which are not related to this change. Thanks, Jimmy Distributed log splitting coding enhancement to make it easier to understand, no semantics change - Key: HBASE-4820 URL: https://issues.apache.org/jira/browse/HBASE-4820 Project: HBase Issue Type: Improvement Components: wal Affects Versions: 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Labels: newbie Fix For: 0.94.0 Attachments: 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch, 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch In reviewing distributed log splitting feature, we found some cosmetic issues. They make the code hard to understand. It will be great to fix them. For this issue, there should be no semantic change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4798) Sleeps and synchronisation improvements for tests
[ https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154343#comment-13154343 ] Hadoop QA commented on HBASE-4798: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504512/4798_trunk_all.v10.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 21 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -166 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 59 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestAdmin Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/319//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/319//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/319//console This message is automatically generated. Sleeps and synchronisation improvements for tests - Key: HBASE-4798 URL: https://issues.apache.org/jira/browse/HBASE-4798 Project: HBase Issue Type: Improvement Components: master, regionserver, test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v2.patch, 4798_trunk_all.v5.patch, 4798_trunk_all.v6.patch, 4798_trunk_all.v7.patch Multiple small changes: @commiters: Removing some sleeps made visible a bug on JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. You may want to review this. JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never met (test on !c !!c). Added a new synchronization point. AssignementManager#waitForAssignment: add a timeout on the wait = not stuck if the notification is received before the wait. HMaster#loop: use a notification instead of a 1s sleep HRegionServer#waitForServerOnline: new method used by JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s sleep HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s ZooKeeperNodeTracker#start: replace a recursive call by a loop ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not stuck if the notification is received before the wait. HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, with the change on HBaseTestingUtility we are 60s faster TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead of 1s TestRestartCluster#testClusterRestart: send all the table creation together, then check creation, should be faster TestHLog: shutdown the whole cluster instead of DFS only (more standard) JVMClusterUtil#startup: lower the sleep from 1s to 0,1s HConnectionManager#close: Zookeeper name in debug message from HConnectionManager after connection close was always null because it was set to null in the delete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-4797) [availability] Give recovered.edits files better names, ones that include first and last sequence id so we can skip files with edits we know older than current region ha
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang reassigned HBASE-4797: -- Assignee: Jimmy Xiang [availability] Give recovered.edits files better names, ones that include first and last sequence id so we can skip files with edits we know older than current region has -- Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4798) Sleeps and synchronisation improvements for tests
[ https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154356#comment-13154356 ] stack commented on HBASE-4798: -- The TestAdmin fails because of too many open files. Let me commit. Sleeps and synchronisation improvements for tests - Key: HBASE-4798 URL: https://issues.apache.org/jira/browse/HBASE-4798 Project: HBase Issue Type: Improvement Components: master, regionserver, test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v2.patch, 4798_trunk_all.v5.patch, 4798_trunk_all.v6.patch, 4798_trunk_all.v7.patch Multiple small changes: @commiters: Removing some sleeps made visible a bug on JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. You may want to review this. JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never met (test on !c !!c). Added a new synchronization point. AssignementManager#waitForAssignment: add a timeout on the wait = not stuck if the notification is received before the wait. HMaster#loop: use a notification instead of a 1s sleep HRegionServer#waitForServerOnline: new method used by JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s sleep HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s ZooKeeperNodeTracker#start: replace a recursive call by a loop ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not stuck if the notification is received before the wait. HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, with the change on HBaseTestingUtility we are 60s faster TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead of 1s TestRestartCluster#testClusterRestart: send all the table creation together, then check creation, should be faster TestHLog: shutdown the whole cluster instead of DFS only (more standard) JVMClusterUtil#startup: lower the sleep from 1s to 0,1s HConnectionManager#close: Zookeeper name in debug message from HConnectionManager after connection close was always null because it was set to null in the delete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4835) ConcurrentModificationException out of ZKConfig.makeZKProps
[ https://issues.apache.org/jira/browse/HBASE-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154360#comment-13154360 ] Andrew Purtell commented on HBASE-4835: --- Or synchronize access to the Configuration object in makeZKProps ConcurrentModificationException out of ZKConfig.makeZKProps --- Key: HBASE-4835 URL: https://issues.apache.org/jira/browse/HBASE-4835 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Attachments: HBASE-4835.patch Mikhail reported this from a five-node, three-RS cluster test: {code} 2011-11-21 01:30:15,188 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server machine_name,60020,1321867814890: Initialization of RS failed. Hence aborting RS. java.util.ConcurrentModificationException at java.util.Hashtable$Enumerator.next(Hashtable.java:1031) at org.apache.hadoop.conf.Configuration.iterator(Configuration.java:1042) at org.apache.hadoop.hbase.zookeeper.ZKConfig.makeZKProps(ZKConfig.java:75) at org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:245) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:144) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:124) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1262) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:568) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:559) at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:183) at org.apache.hadoop.hbase.catalog.CatalogTracker.init(CatalogTracker.java:177) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:575) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:534) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:642) at java.lang.Thread.run(Thread.java:619) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HBASE-4835) ConcurrentModificationException out of ZKConfig.makeZKProps
[ https://issues.apache.org/jira/browse/HBASE-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154360#comment-13154360 ] Andrew Purtell edited comment on HBASE-4835 at 11/21/11 6:48 PM: - Or synchronize access to the Configuration object in makeZKProps, in addition to cloning the Configuration in the constructor. Can also consider heavy handed synchronization of every read or mutation of it in o.a.h.h.zookeeper just to be sure. The concern I have about cloning in makeZKProps is the same hashmap iteration will happen for that. was (Author: apurtell): Or synchronize access to the Configuration object in makeZKProps ConcurrentModificationException out of ZKConfig.makeZKProps --- Key: HBASE-4835 URL: https://issues.apache.org/jira/browse/HBASE-4835 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Attachments: HBASE-4835.patch Mikhail reported this from a five-node, three-RS cluster test: {code} 2011-11-21 01:30:15,188 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server machine_name,60020,1321867814890: Initialization of RS failed. Hence aborting RS. java.util.ConcurrentModificationException at java.util.Hashtable$Enumerator.next(Hashtable.java:1031) at org.apache.hadoop.conf.Configuration.iterator(Configuration.java:1042) at org.apache.hadoop.hbase.zookeeper.ZKConfig.makeZKProps(ZKConfig.java:75) at org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:245) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:144) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:124) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1262) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:568) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:559) at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:183) at org.apache.hadoop.hbase.catalog.CatalogTracker.init(CatalogTracker.java:177) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:575) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:534) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:642) at java.lang.Thread.run(Thread.java:619) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Give recovered.edits files better names, ones that include first and last sequence id so we can skip files with edits we know older than current region h
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154362#comment-13154362 ] stack commented on HBASE-4797: -- Thanks Jimmy for taking this on. Looks like you don't have to rename the files; just sort them and figure which set to apply (and do what Todd suggests rewriting the znode less often -- or asynchronously). [availability] Give recovered.edits files better names, ones that include first and last sequence id so we can skip files with edits we know older than current region has -- Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4835) ConcurrentModificationException out of ZKConfig.makeZKProps
[ https://issues.apache.org/jira/browse/HBASE-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154366#comment-13154366 ] Mikhail Bautin commented on HBASE-4835: --- @Andrew: thanks for the fix! The simple approach with copying the configuration sounds good -- I presume we don't create too many unique ZooKeeperWatchers in a single JVM. Alternatively, we could somehow get an immutable snapshot of the configuration's key set and iterate that instead of the configuration itself. ConcurrentModificationException out of ZKConfig.makeZKProps --- Key: HBASE-4835 URL: https://issues.apache.org/jira/browse/HBASE-4835 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Attachments: HBASE-4835.patch Mikhail reported this from a five-node, three-RS cluster test: {code} 2011-11-21 01:30:15,188 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server machine_name,60020,1321867814890: Initialization of RS failed. Hence aborting RS. java.util.ConcurrentModificationException at java.util.Hashtable$Enumerator.next(Hashtable.java:1031) at org.apache.hadoop.conf.Configuration.iterator(Configuration.java:1042) at org.apache.hadoop.hbase.zookeeper.ZKConfig.makeZKProps(ZKConfig.java:75) at org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:245) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:144) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:124) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1262) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:568) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:559) at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:183) at org.apache.hadoop.hbase.catalog.CatalogTracker.init(CatalogTracker.java:177) at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:575) at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:534) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:642) at java.lang.Thread.run(Thread.java:619) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4823) long running scans lose benefit of bloomfilters and timerange hints
[ https://issues.apache.org/jira/browse/HBASE-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kannan Muthukkaruppan updated HBASE-4823: - Assignee: Amitanand Aiyer (was: Kannan Muthukkaruppan) Amitanand will be helping on this issue. long running scans lose benefit of bloomfilters and timerange hints --- Key: HBASE-4823 URL: https://issues.apache.org/jira/browse/HBASE-4823 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Amitanand Aiyer When you have a long running scan due to say an MR job, you can lose the benefit of timerange hints bloom filters midway if your scanner gets reset. [Note: The scanners can get reset say due to a flush or compaction]. In one of our workloads, we periodically want to do rollups on recent 15 minutes of data in a column family... but the timerange hint benefit is lost midway when this resetScannerStack (shown below) happens. And end result-- we end up reading all the old HFiles rather than just the recent HFiles. {code} private void resetScannerStack(KeyValue lastTopKey) throws IOException { if (heap != null) { throw new RuntimeException(StoreScanner.reseek run on an existing heap!); } /* When we have the scan object, should we not pass it to getScanners() * to get a limited set of scanners? We did so in the constructor and we * could have done it now by storing the scan object from the constructor */ ListKeyValueScanner scanners = getScanners(); {code} The comment in the code seems to be aware of this issue and even has the suggested fix! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Give recovered.edits files better names, ones that include first and last sequence id so we can skip files with edits we know older than current region h
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154380#comment-13154380 ] Jimmy Xiang commented on HBASE-4797: Yes, that's what I was thinking. The file name has the start seq id. If there are multiple files, there should be multiple start seq ids. That implies the max seq ids in some of these files, if sorted. I can use these information to filter out some files safely. On Mon, Nov 21, 2011 at 10:52 AM, stack (Commented) (JIRA) [availability] Give recovered.edits files better names, ones that include first and last sequence id so we can skip files with edits we know older than current region has -- Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change
[ https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154385#comment-13154385 ] Kannan Muthukkaruppan commented on HBASE-4820: -- This type of code factoring changes will make it harder for 89-fb/our internal branch changes to stay in sync with trunk; and to push/pull patches between the two revs. But agree that that can't be the reason to block all code factor/improvements. So those have to be evaluated on a case to case basis. Do we think these changes and code moves are worth it? [I have only looked at the specific changes superficially, but wanted to express the concern at least so that someone who has reviewed in detail can comment if this change is a must.] Distributed log splitting coding enhancement to make it easier to understand, no semantics change - Key: HBASE-4820 URL: https://issues.apache.org/jira/browse/HBASE-4820 Project: HBase Issue Type: Improvement Components: wal Affects Versions: 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Labels: newbie Fix For: 0.94.0 Attachments: 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch, 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch In reviewing distributed log splitting feature, we found some cosmetic issues. They make the code hard to understand. It will be great to fix them. For this issue, there should be no semantic change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154396#comment-13154396 ] Hudson commented on HBASE-4213: --- Integrated in HBase-TRUNK #2468 (See [https://builds.apache.org/job/HBase-TRUNK/2468/]) HBASE-4213 Support for fault tolerant, instant schema updates with out master's intervention through ZK tedyu : Files : * /hbase/trunk/pom.xml * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/MasterServices.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ModifyTableHandler.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/TableDeleteFamilyHandler.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/OnlineRegions.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/MasterSchemaChangeTracker.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/SchemaChangeTracker.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java * /hbase/trunk/src/main/resources/hbase-default.xml * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/InstantSchemaChangeTestBase.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestInstantSchemaChange.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestInstantSchemaChangeFailover.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestInstantSchemaChangeSplit.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK. Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.94.0 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 4213-101211-Support_instant_schema_changes_through_ZK.patch, 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 4213-V10-Support_instant_schema_changes_through_ZK.patch, 4213-V5-Support_instant_schema_changes_through_ZK.patch, 4213-V7-Support_instant_schema_changes_through_ZK.patch, 4213-V8-Support_instant_schema_changes_through_ZK.patch, 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 4213-trunk-v3.txt, 4213-trunk-v4.txt, 4213-trunk-v5.txt, 4213-trunk-v6.txt, 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.v6, HBASE-4213-Instant_schema_change.patch, HBASE-4213_Instant_schema_change_-Version_2_.patch, HBASE_Instant_schema_change-version_3_.patch, schema-update.png This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk
[ https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154399#comment-13154399 ] Lars Hofhansl commented on HBASE-2856: -- This one looks bad: {noformat} testFilterAcrossMultipleRegions(org.apache.hadoop.hbase.client.TestFromClientSid e) Time elapsed: 12.233 sec FAILURE! java.lang.AssertionError: expected:17576 but was:28064 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hbase.client.TestFromClientSide.assertRowCount(Test FromClientSide.java:528) at org.apache.hadoop.hbase.client.TestFromClientSide.testFilterAcrossMul tipleRegions(TestFromClientSide.java:436) {noformat} Happens only with the 0.92 patch applied. It seems the scanner now finds too many cells. TestAcidGuarantee broken on trunk -- Key: HBASE-2856 URL: https://issues.apache.org/jira/browse/HBASE-2856 Project: HBase Issue Type: Bug Affects Versions: 0.89.20100621 Reporter: ryan rawson Assignee: Amitanand Aiyer Priority: Blocker Fix For: 0.94.0 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 2856-v9-all-inclusive.txt, acid.txt TestAcidGuarantee has a test whereby it attempts to read a number of columns from a row, and every so often the first column of N is different, when it should be the same. This is a bug deep inside the scanner whereby the first peek() of a row is done at time T then the rest of the read is done at T+1 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' data becomes committed and flushed to disk. One possible solution is to introduce the memstoreTS (or similarly equivalent value) to the HFile thus allowing us to preserve read consistency past flushes. Another solution involves fixing the scanners so that peek() is not destructive (and thus might return different things at different times alas). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4213: -- Attachment: 4213.addendum Addendum to temporarily disable testInstantSchemaOperationsInZKForMasterFailover which relies on predetermined sleep interval for schema janitor to clean up schema change request Subbu will come up with better test. Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK. Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.94.0 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 4213-101211-Support_instant_schema_changes_through_ZK.patch, 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 4213-V10-Support_instant_schema_changes_through_ZK.patch, 4213-V5-Support_instant_schema_changes_through_ZK.patch, 4213-V7-Support_instant_schema_changes_through_ZK.patch, 4213-V8-Support_instant_schema_changes_through_ZK.patch, 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 4213-trunk-v3.txt, 4213-trunk-v4.txt, 4213-trunk-v5.txt, 4213-trunk-v6.txt, 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.addendum, 4213.v6, HBASE-4213-Instant_schema_change.patch, HBASE-4213_Instant_schema_change_-Version_2_.patch, HBASE_Instant_schema_change-version_3_.patch, schema-update.png This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk
[ https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154410#comment-13154410 ] Lars Hofhansl commented on HBASE-2856: -- I looked through the entire patch again manually, but I can't figure out what would cause this failure. TestAcidGuarantee broken on trunk -- Key: HBASE-2856 URL: https://issues.apache.org/jira/browse/HBASE-2856 Project: HBase Issue Type: Bug Affects Versions: 0.89.20100621 Reporter: ryan rawson Assignee: Amitanand Aiyer Priority: Blocker Fix For: 0.94.0 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 2856-v9-all-inclusive.txt, acid.txt TestAcidGuarantee has a test whereby it attempts to read a number of columns from a row, and every so often the first column of N is different, when it should be the same. This is a bug deep inside the scanner whereby the first peek() of a row is done at time T then the rest of the read is done at T+1 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' data becomes committed and flushed to disk. One possible solution is to introduce the memstoreTS (or similarly equivalent value) to the HFile thus allowing us to preserve read consistency past flushes. Another solution involves fixing the scanners so that peek() is not destructive (and thus might return different things at different times alas). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154417#comment-13154417 ] Hadoop QA commented on HBASE-4213: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504535/4213.addendum against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/320//console This message is automatically generated. Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK. Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.94.0 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 4213-101211-Support_instant_schema_changes_through_ZK.patch, 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 4213-V10-Support_instant_schema_changes_through_ZK.patch, 4213-V5-Support_instant_schema_changes_through_ZK.patch, 4213-V7-Support_instant_schema_changes_through_ZK.patch, 4213-V8-Support_instant_schema_changes_through_ZK.patch, 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 4213-trunk-v3.txt, 4213-trunk-v4.txt, 4213-trunk-v5.txt, 4213-trunk-v6.txt, 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.addendum, 4213.v6, HBASE-4213-Instant_schema_change.patch, HBASE-4213_Instant_schema_change_-Version_2_.patch, HBASE_Instant_schema_change-version_3_.patch, schema-update.png This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change
[ https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154438#comment-13154438 ] Ted Yu commented on HBASE-4820: --- I agree with Kannan that we should reduce the number of places where refactor is done to make porting easier. Currently SplitLogManager.splitLogDistributed(final ListPath logDirs) creates MonitoredTask to display status. This results in repetitive display of the following form on 60010/master-status: {code} Doing distributed log split in [hdfs://...-splitting] {code} We should make log splitting status display cleaner. Distributed log splitting coding enhancement to make it easier to understand, no semantics change - Key: HBASE-4820 URL: https://issues.apache.org/jira/browse/HBASE-4820 Project: HBase Issue Type: Improvement Components: wal Affects Versions: 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Labels: newbie Fix For: 0.94.0 Attachments: 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch, 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch In reviewing distributed log splitting feature, we found some cosmetic issues. They make the code hard to understand. It will be great to fix them. For this issue, there should be no semantic change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change
[ https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154443#comment-13154443 ] Jonathan Hsieh commented on HBASE-4820: --- @Kannan, I'm looking at this from the point of view of someone who recently spent a many hours reviewing the dist log splitting patches in aggregate and may be responsible for fixing issues if it has problems. I had a harder time than I'd prefer, and will likely have the same problem again if there are problems in the future. Doing a little bit of semantics preserving changes such as making var/method/class names more descriptive and encapsulating pieces would go a long way to make the code more easily and quickly understandable by more people. Are you suggesting splitting these changes into smaller pieces such as: * add better exception error messages. * consolidate calls only used once. Ex: async callbacks submethods; inline finishInitailize into SLM's constructor * rename vague methods. ex: installTask(String taskName) might be better as enqueueSplitLog(String logPath); handleDeadWorker might be better as blacklistDeadWorker; 'exec(String name, Progressable)' might be better as 'split(String logfilename, Progressable)' * rename vague classes. ex: Task to SplitTask, TaskBatch to SplitTaskState/SplitTaskContext * correct comments to be consistent with code (comments in SplitLogWorker talks about SUCCESS state which acutally is DONE state). Distributed log splitting coding enhancement to make it easier to understand, no semantics change - Key: HBASE-4820 URL: https://issues.apache.org/jira/browse/HBASE-4820 Project: HBase Issue Type: Improvement Components: wal Affects Versions: 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Labels: newbie Fix For: 0.94.0 Attachments: 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch, 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch In reviewing distributed log splitting feature, we found some cosmetic issues. They make the code hard to understand. It will be great to fix them. For this issue, there should be no semantic change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk
[ https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154453#comment-13154453 ] Jonathan Hsieh commented on HBASE-2856: --- On the bulkload operation, the error has something to do with the split point -- in the test I force a split and the resulting error has something to do with the point where the start of the second daughter. @Lars -- since the original issue is resolved, and since this seems non-trival, maybe this should get move into a new issue? TestAcidGuarantee broken on trunk -- Key: HBASE-2856 URL: https://issues.apache.org/jira/browse/HBASE-2856 Project: HBase Issue Type: Bug Affects Versions: 0.89.20100621 Reporter: ryan rawson Assignee: Amitanand Aiyer Priority: Blocker Fix For: 0.94.0 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 2856-v9-all-inclusive.txt, acid.txt TestAcidGuarantee has a test whereby it attempts to read a number of columns from a row, and every so often the first column of N is different, when it should be the same. This is a bug deep inside the scanner whereby the first peek() of a row is done at time T then the rest of the read is done at T+1 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' data becomes committed and flushed to disk. One possible solution is to introduce the memstoreTS (or similarly equivalent value) to the HFile thus allowing us to preserve read consistency past flushes. Another solution involves fixing the scanners so that peek() is not destructive (and thus might return different things at different times alas). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4213: -- Attachment: (was: 4213-trunk-v4.txt) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK. Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.94.0 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 4213-101211-Support_instant_schema_changes_through_ZK.patch, 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 4213-V10-Support_instant_schema_changes_through_ZK.patch, 4213-V5-Support_instant_schema_changes_through_ZK.patch, 4213-V7-Support_instant_schema_changes_through_ZK.patch, 4213-V8-Support_instant_schema_changes_through_ZK.patch, 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.addendum, 4213.v6, HBASE-4213-Instant_schema_change.patch, HBASE-4213_Instant_schema_change_-Version_2_.patch, HBASE_Instant_schema_change-version_3_.patch, schema-update.png This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4213: -- Attachment: (was: 4213-trunk-v3.txt) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK. Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.94.0 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 4213-101211-Support_instant_schema_changes_through_ZK.patch, 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 4213-V10-Support_instant_schema_changes_through_ZK.patch, 4213-V5-Support_instant_schema_changes_through_ZK.patch, 4213-V7-Support_instant_schema_changes_through_ZK.patch, 4213-V8-Support_instant_schema_changes_through_ZK.patch, 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.addendum, 4213.v6, HBASE-4213-Instant_schema_change.patch, HBASE-4213_Instant_schema_change_-Version_2_.patch, HBASE_Instant_schema_change-version_3_.patch, schema-update.png This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4213: -- Attachment: (was: 4213-trunk-v5.txt) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK. Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.94.0 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 4213-101211-Support_instant_schema_changes_through_ZK.patch, 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 4213-V10-Support_instant_schema_changes_through_ZK.patch, 4213-V5-Support_instant_schema_changes_through_ZK.patch, 4213-V7-Support_instant_schema_changes_through_ZK.patch, 4213-V8-Support_instant_schema_changes_through_ZK.patch, 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.addendum, 4213.v6, HBASE-4213-Instant_schema_change.patch, HBASE-4213_Instant_schema_change_-Version_2_.patch, HBASE_Instant_schema_change-version_3_.patch, schema-update.png This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4213: -- Attachment: (was: 4213-trunk-v6.txt) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK. Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.94.0 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 4213-101211-Support_instant_schema_changes_through_ZK.patch, 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 4213-V10-Support_instant_schema_changes_through_ZK.patch, 4213-V5-Support_instant_schema_changes_through_ZK.patch, 4213-V7-Support_instant_schema_changes_through_ZK.patch, 4213-V8-Support_instant_schema_changes_through_ZK.patch, 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.addendum, 4213.v6, HBASE-4213-Instant_schema_change.patch, HBASE-4213_Instant_schema_change_-Version_2_.patch, HBASE_Instant_schema_change-version_3_.patch, schema-update.png This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4830) Regionserver BLOCKED on WAITING DFSClient$DFSOutputStream.waitForAckedSeqno running 0.20.205.0+
[ https://issues.apache.org/jira/browse/HBASE-4830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4830: - Attachment: 4830.txt Todd's suggestion. Testing it actually works. Regionserver BLOCKED on WAITING DFSClient$DFSOutputStream.waitForAckedSeqno running 0.20.205.0+ --- Key: HBASE-4830 URL: https://issues.apache.org/jira/browse/HBASE-4830 Project: HBase Issue Type: Bug Reporter: stack Attachments: 4830.txt, hbase-stack-regionserver-sv4r9s38.out Running 0.20.205.1 (I was not at tip of the branch) I ran into the following hung regionserver: {code} regionserver7003.logRoller daemon prio=10 tid=0x7fd98028f800 nid=0x61af in Object.wait() [0x7fd987bfa000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.waitForAckedSeqno(DFSClient.java:3606) - locked 0xf8656788 (a java.util.LinkedList) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.flushInternal(DFSClient.java:3595) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3687) - locked 0xf8656458 (a org.apache.hadoop.hdfs.DFSClient$DFSOutputStream) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3626) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86) at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:966) - locked 0xf8655998 (a org.apache.hadoop.io.SequenceFile$Writer) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.close(SequenceFileLogWriter.java:214) at org.apache.hadoop.hbase.regionserver.wal.HLog.cleanupCurrentWriter(HLog.java:791) at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:578) - locked 0xc443deb0 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94) at java.lang.Thread.run(Thread.java:662) {code} Other threads are like this (here's a sample): {code} regionserver7003.logSyncer daemon prio=10 tid=0x7fd98025e000 nid=0x61ae waiting for monitor entry [0x7fd987cfb000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1074) - waiting to lock 0xc443deb0 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1195) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1057) at java.lang.Thread.run(Thread.java:662) IPC Server handler 0 on 7003 daemon prio=10 tid=0x7fd98049b800 nid=0x61b8 waiting for monitor entry [0x7fd9872f1000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:1007) - waiting to lock 0xc443deb0 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:1798) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1668) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2980) at sun.reflect.GeneratedMethodAccessor636.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1325) {code} Looks like HDFS-1529? (Todd?) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4838) Port 2856 (TestAcidGuarantees is failing) to 0.92
Port 2856 (TestAcidGuarantees is failing) to 0.92 - Key: HBASE-4838 URL: https://issues.apache.org/jira/browse/HBASE-4838 Project: HBase Issue Type: Sub-task Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0 Moving back port into a separate issue (as suggested by JonH), because this not trivial. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk
[ https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154491#comment-13154491 ] Lars Hofhansl commented on HBASE-2856: -- Created HBASE-4838 TestAcidGuarantee broken on trunk -- Key: HBASE-2856 URL: https://issues.apache.org/jira/browse/HBASE-2856 Project: HBase Issue Type: Bug Affects Versions: 0.89.20100621 Reporter: ryan rawson Assignee: Amitanand Aiyer Priority: Blocker Fix For: 0.94.0 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 2856-v9-all-inclusive.txt, acid.txt TestAcidGuarantee has a test whereby it attempts to read a number of columns from a row, and every so often the first column of N is different, when it should be the same. This is a bug deep inside the scanner whereby the first peek() of a row is done at time T then the rest of the read is done at T+1 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' data becomes committed and flushed to disk. One possible solution is to introduce the memstoreTS (or similarly equivalent value) to the HFile thus allowing us to preserve read consistency past flushes. Another solution involves fixing the scanners so that peek() is not destructive (and thus might return different things at different times alas). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4838) Port 2856 (TestAcidGuarantee is failing) to 0.92
[ https://issues.apache.org/jira/browse/HBASE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-4838: - Summary: Port 2856 (TestAcidGuarantee is failing) to 0.92 (was: Port 2856 (TestAcidGuarantees is failing) to 0.92) Port 2856 (TestAcidGuarantee is failing) to 0.92 Key: HBASE-4838 URL: https://issues.apache.org/jira/browse/HBASE-4838 Project: HBase Issue Type: Sub-task Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0 Moving back port into a separate issue (as suggested by JonH), because this not trivial. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4838) Port 2856 (TestAcidGuarantee is failing) to 0.92
[ https://issues.apache.org/jira/browse/HBASE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-4838: - Attachment: 4838-v1.txt Patch identical to 0.92-patch in HBASE-2856. This has issues with failing tests. Port 2856 (TestAcidGuarantee is failing) to 0.92 Key: HBASE-4838 URL: https://issues.apache.org/jira/browse/HBASE-4838 Project: HBase Issue Type: Sub-task Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0 Attachments: 4838-v1.txt Moving back port into a separate issue (as suggested by JonH), because this not trivial. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154503#comment-13154503 ] stack commented on HBASE-4213: -- Want to do the test fix in another issue Ted and Subbu? Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK. Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.94.0 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 4213-101211-Support_instant_schema_changes_through_ZK.patch, 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 4213-V10-Support_instant_schema_changes_through_ZK.patch, 4213-V5-Support_instant_schema_changes_through_ZK.patch, 4213-V7-Support_instant_schema_changes_through_ZK.patch, 4213-V8-Support_instant_schema_changes_through_ZK.patch, 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.addendum, 4213.v6, HBASE-4213-Instant_schema_change.patch, HBASE-4213_Instant_schema_change_-Version_2_.patch, HBASE_Instant_schema_change-version_3_.patch, schema-update.png This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4839) Re-enable TestInstantSchemaChangeFailover#testInstantSchemaOperationsInZKForMasterFailover
Re-enable TestInstantSchemaChangeFailover#testInstantSchemaOperationsInZKForMasterFailover -- Key: HBASE-4839 URL: https://issues.apache.org/jira/browse/HBASE-4839 Project: HBase Issue Type: Test Reporter: Ted Yu TestInstantSchemaChangeFailover#testInstantSchemaOperationsInZKForMasterFailover was disabled for instant schema change (HBASE-4213) after it failed on Jenkins. We should enable it and make it pass on Jenkins and dev enviroments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Give recovered.edits files better names, ones that include first and last sequence id so we can skip files with edits we know older than current region h
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154538#comment-13154538 ] Jimmy Xiang commented on HBASE-4797: The region opening is tried periodically. The waiting interval is about 1/3 of the assignment time out. I think that's fine. [availability] Give recovered.edits files better names, ones that include first and last sequence id so we can skip files with edits we know older than current region has -- Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4840) If I call split fast enough, while inserting, rows disappear.
If I call split fast enough, while inserting, rows disappear. -- Key: HBASE-4840 URL: https://issues.apache.org/jira/browse/HBASE-4840 Project: HBase Issue Type: Bug Reporter: Alex Newman I'll attach a unit test for this. Basically if you call split, while inserting data you can get to the point to where the cluster becomes unstable, or rows will disappear. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4841) If I call split fast enough, while inserting, rows disappear.
If I call split fast enough, while inserting, rows disappear. -- Key: HBASE-4841 URL: https://issues.apache.org/jira/browse/HBASE-4841 Project: HBase Issue Type: Bug Reporter: Alex Newman I'll attach a unit test for this. Basically if you call split, while inserting data you can get to the point to where the cluster becomes unstable, or rows will disappear. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4841) If I call split fast enough, while inserting, rows disappear.
[ https://issues.apache.org/jira/browse/HBASE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154581#comment-13154581 ] Alex Newman commented on HBASE-4841: Since this can cause dataloss it may make sense to increase the priority. If I call split fast enough, while inserting, rows disappear. -- Key: HBASE-4841 URL: https://issues.apache.org/jira/browse/HBASE-4841 Project: HBase Issue Type: Bug Reporter: Alex Newman Attachments: 1 I'll attach a unit test for this. Basically if you call split, while inserting data you can get to the point to where the cluster becomes unstable, or rows will disappear. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4841) If I call split fast enough, while inserting, rows disappear.
[ https://issues.apache.org/jira/browse/HBASE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Newman updated HBASE-4841: --- Attachment: 1 If I call split fast enough, while inserting, rows disappear. -- Key: HBASE-4841 URL: https://issues.apache.org/jira/browse/HBASE-4841 Project: HBase Issue Type: Bug Reporter: Alex Newman Attachments: 1 I'll attach a unit test for this. Basically if you call split, while inserting data you can get to the point to where the cluster becomes unstable, or rows will disappear. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast
[ https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koontz updated HBASE-4832: - Release Note: This incorporates nkeywal's earlier patch to this JIRA, and allows TestRegionServerCoprocessortWithAbort() to work with it. It changes the test to use a Zookeeper watcher in a separate thread to watch for the regionserver to abort. (This is also what is currently done with TestMasterCoprocessorWithAbort()). In my testing, repeated iterations (30+) of TestRegionServerCoprocessortWithAbort() succeed. Status: Patch Available (was: Open) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast --- Key: HBASE-4832 URL: https://issues.apache.org/jira/browse/HBASE-4832 Project: HBase Issue Type: Bug Components: coprocessors, test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: Eugene Koontz Priority: Minor Attachments: 4832_trunk_hregionserver.patch The current implementation of HRegionServer#stop is {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); synchronized (this) { // Wakes run() if it is sleeping notifyAll(); // FindBugs NN_NAKED_NOTIFY } } {noformat} The notification is sent on the wrong object and does nothing. As a consequence, the region server continues to sleep instead of waking up and stopping immediately. A correct implementation is: {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); // Wakes run() if it is sleeping sleeper.skipSleepCycle(); } {noformat} Then the region server stops immediately. This makes the region server stops 0,5s faster on average, which is quite useful for unit tests. However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does not work. It likely because the code does no expect the region server to stop that fast. The exception is: {noformat} testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort) Time elapsed: 30.06 sec ERROR! java.lang.Exception: test timed out after 3 milliseconds at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.init(Throwable.java:196) at java.lang.Exception.init(Exception.java:41) at java.lang.InterruptedException.init(InterruptedException.java:48) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697) at org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280) at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354) at
[jira] [Updated] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast
[ https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koontz updated HBASE-4832: - Attachment: HBASE-4832.patch TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast --- Key: HBASE-4832 URL: https://issues.apache.org/jira/browse/HBASE-4832 Project: HBase Issue Type: Bug Components: coprocessors, test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: Eugene Koontz Priority: Minor Attachments: 4832_trunk_hregionserver.patch, HBASE-4832.patch The current implementation of HRegionServer#stop is {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); synchronized (this) { // Wakes run() if it is sleeping notifyAll(); // FindBugs NN_NAKED_NOTIFY } } {noformat} The notification is sent on the wrong object and does nothing. As a consequence, the region server continues to sleep instead of waking up and stopping immediately. A correct implementation is: {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); // Wakes run() if it is sleeping sleeper.skipSleepCycle(); } {noformat} Then the region server stops immediately. This makes the region server stops 0,5s faster on average, which is quite useful for unit tests. However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does not work. It likely because the code does no expect the region server to stop that fast. The exception is: {noformat} testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort) Time elapsed: 30.06 sec ERROR! java.lang.Exception: test timed out after 3 milliseconds at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.init(Throwable.java:196) at java.lang.Exception.init(Exception.java:41) at java.lang.InterruptedException.init(InterruptedException.java:48) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697) at org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280) at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:892) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:750) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:725) at org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort.testExceptionFromCoprocessorDuringPut(TestRegionServerCoprocessorExceptionWithAbort.java:84) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
[jira] [Commented] (HBASE-4841) If I call split fast enough, while inserting, rows disappear.
[ https://issues.apache.org/jira/browse/HBASE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154682#comment-13154682 ] Alex Newman commented on HBASE-4841: I realized it may be easier If I post the log for the unit test, rather than requiring you to run it. It's on the way. If I call split fast enough, while inserting, rows disappear. -- Key: HBASE-4841 URL: https://issues.apache.org/jira/browse/HBASE-4841 Project: HBase Issue Type: Bug Reporter: Alex Newman Attachments: 1 I'll attach a unit test for this. Basically if you call split, while inserting data you can get to the point to where the cluster becomes unstable, or rows will disappear. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4842) [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck
[hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck --- Key: HBASE-4842 URL: https://issues.apache.org/jira/browse/HBASE-4842 Project: HBase Issue Type: Bug Reporter: Jonathan Hsieh Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck is intermittently failing. In the test, a region's assignment is changed in META but not in ZK. After the equivalent of 'hbck -fix', a subsequent check that should be clean comes up with a new ZK assignment but with META still being inconsistent with ZK. The RS in ZK sometimes this points to the same RS, but sometimes it moves to another ZK. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4842) [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck
[ https://issues.apache.org/jira/browse/HBASE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-4842: -- Description: Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck is intermittently failing. In the test, a region's assignment is purposely changed in META but not in ZK. After the equivalent of 'hbck -fix', a subsequent check that should be clean comes up with a new ZK assignment but with META still being inconsistent with ZK. The RS in ZK sometimes this points to the same RS, but sometimes it moves to another ZK. was: Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck is intermittently failing. In the test, a region's assignment is changed in META but not in ZK. After the equivalent of 'hbck -fix', a subsequent check that should be clean comes up with a new ZK assignment but with META still being inconsistent with ZK. The RS in ZK sometimes this points to the same RS, but sometimes it moves to another ZK. [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck --- Key: HBASE-4842 URL: https://issues.apache.org/jira/browse/HBASE-4842 Project: HBase Issue Type: Bug Reporter: Jonathan Hsieh Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck is intermittently failing. In the test, a region's assignment is purposely changed in META but not in ZK. After the equivalent of 'hbck -fix', a subsequent check that should be clean comes up with a new ZK assignment but with META still being inconsistent with ZK. The RS in ZK sometimes this points to the same RS, but sometimes it moves to another ZK. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4842) [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck
[ https://issues.apache.org/jira/browse/HBASE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154686#comment-13154686 ] Jonathan Hsieh commented on HBASE-4842: --- Output Examples: Note that the ZK assignment and the META assignment did not change. {code} // hbck -fix call ERROR: Region tableBadMetaAssign,,1321733234211.35120fc878802e3b6829e6d7b597b44c. listed in META on region server ubuntu64-build01.sf.cloudera.com,51134,1321733229687 but found on region server ubuntu64-build01.sf.cloudera.com,38112,1321733229583 Trying to fix assignment error... ... // hbck after fix ERROR: Region tableBadMetaAssign,,1321733234211.35120fc878802e3b6829e6d7b597b44c. listed in META on region server ubuntu64-build01.sf.cloudera.com,51134,1321733229687 but found on region server ubuntu64-build01.sf.cloudera.com,38112,1321733229583 {code} Note that the ZK assignment changed but meta had not yet changed. {code} // hbck -fix ERROR: Region tableBadMetaAssign,,1321719700727.af24fbbe3e1df676b8e31e3ff5765fb6. listed in META on region server p0123.sf.cloudera.com,36067,1321719696277 but found on region server p0123.sf.cloudera.com,54221,1321719696237 Trying to fix assignment error... ... // hbck after fix ERROR: Region tableBadMetaAssign,,1321719700727.af24fbbe3e1df676b8e31e3ff5765fb6. listed in META on region server p0123.sf.cloudera.com,36067,1321719696277 but found on region server p0123.sf.cloudera.com,59522,1321719696305 {code} [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck --- Key: HBASE-4842 URL: https://issues.apache.org/jira/browse/HBASE-4842 Project: HBase Issue Type: Bug Reporter: Jonathan Hsieh Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck is intermittently failing. In the test, a region's assignment is purposely changed in META but not in ZK. After the equivalent of 'hbck -fix', a subsequent check that should be clean comes up with a new ZK assignment but with META still being inconsistent with ZK. The RS in ZK sometimes this points to the same RS, but sometimes it moves to another ZK. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Give recovered.edits files better names, ones that include first and last sequence id so we can skip files with edits we know older than current region h
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154689#comment-13154689 ] jirapos...@reviews.apache.org commented on HBASE-4797: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2906/ --- Review request for hbase, Todd Lipcon and Michael Stack. Summary --- If there are multiple recovered edits files, I used the file name to find the initial sequence id. After these files are sorted, we can find a file's possible maximum sequence id based on the next file's initial sequence id. If the maximum sequence id is smaller than the current sequence id, the whole recovered edits file is old and ignored. This addresses bug HBASE-4797. https://issues.apache.org/jira/browse/HBASE-4797 Diffs - src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 8b89661 src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 5daa02b Diff: https://reviews.apache.org/r/2906/diff Testing --- Added test case to TestHRegion, and all the tests in this test are passed. Thanks, Jimmy [availability] Give recovered.edits files better names, ones that include first and last sequence id so we can skip files with edits we know older than current region has -- Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4841) If I call split fast enough, while inserting, rows disappear.
[ https://issues.apache.org/jira/browse/HBASE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Newman updated HBASE-4841: --- Description: I'll attach a unit test for this. Basically if you call split, while inserting data you can get to the point to where the cluster becomes unstable, or rows will disappear. The unit test gives you some flexibility of: - How many rows - How wide the rows are - The frequency of the split. The default settings crash unit tests or cause the unit tests to fail on my laptop. On my macbook air, i could actually turn down the number of total rows, and the frequency of the splits which is surprising. I think this is because the macbook air has much better IO than my backup acer. was:I'll attach a unit test for this. Basically if you call split, while inserting data you can get to the point to where the cluster becomes unstable, or rows will disappear. If I call split fast enough, while inserting, rows disappear. -- Key: HBASE-4841 URL: https://issues.apache.org/jira/browse/HBASE-4841 Project: HBase Issue Type: Bug Reporter: Alex Newman Attachments: 1 I'll attach a unit test for this. Basically if you call split, while inserting data you can get to the point to where the cluster becomes unstable, or rows will disappear. The unit test gives you some flexibility of: - How many rows - How wide the rows are - The frequency of the split. The default settings crash unit tests or cause the unit tests to fail on my laptop. On my macbook air, i could actually turn down the number of total rows, and the frequency of the splits which is surprising. I think this is because the macbook air has much better IO than my backup acer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Give recovered.edits files better names, ones that include first and last sequence id so we can skip files with edits we know older than current region h
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154696#comment-13154696 ] jirapos...@reviews.apache.org commented on HBASE-4797: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2906/#review3409 --- Very nice patch. In future, would suggest you confine your change just to what you are adding. The white space cleanup is nice but it distracts from your patch. It also bloats it and makes it look intimidating to review (smile). Minor fixups only. src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java https://reviews.apache.org/r/2906/#comment7635 So, are these already sorted in right order from oldest edit to newest? src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java https://reviews.apache.org/r/2906/#comment7636 Possilbe should be Possible. I'd be more assertive in this message. Maximum possible sequenceid for this log is + + , skipping .. src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java https://reviews.apache.org/r/2906/#comment7637 Good. src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java https://reviews.apache.org/r/2906/#comment7638 Any more asserts we can do in here? Assert we replayed N of the M files? - Michael On 2011-11-21 22:38:39, Jimmy Xiang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2906/ bq. --- bq. bq. (Updated 2011-11-21 22:38:39) bq. bq. bq. Review request for hbase, Todd Lipcon and Michael Stack. bq. bq. bq. Summary bq. --- bq. bq. If there are multiple recovered edits files, I used the file name to find the initial sequence id. After these files are sorted, we can find a file's possible maximum sequence id based on the next file's initial sequence id. If the maximum sequence id is smaller than the current sequence id, the whole recovered edits file is old and ignored. bq. bq. bq. This addresses bug HBASE-4797. bq. https://issues.apache.org/jira/browse/HBASE-4797 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 8b89661 bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 5daa02b bq. bq. Diff: https://reviews.apache.org/r/2906/diff bq. bq. bq. Testing bq. --- bq. bq. Added test case to TestHRegion, and all the tests in this test are passed. bq. bq. bq. Thanks, bq. bq. Jimmy bq. bq. [availability] Give recovered.edits files better names, ones that include first and last sequence id so we can skip files with edits we know older than current region has -- Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Give recovered.edits files better names, ones that include first and last sequence id so we can skip files with edits we know older than current region h
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154699#comment-13154699 ] Kannan Muthukkaruppan commented on HBASE-4797: -- The title for the bug can be updated given that we are no longer renaming the files in recovered.edits. [That concerned me initially -- but reading through the details, looks like you have come up with a way to avoid new name format. That's always smoother for upgrades and such..] [availability] Give recovered.edits files better names, ones that include first and last sequence id so we can skip files with edits we know older than current region has -- Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change
[ https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154697#comment-13154697 ] Jimmy Xiang commented on HBASE-4820: I think Jon has a great point. In porting this feature to CDH, I spent quite some time to understand it. To Kannan's concern, the coding change for this Jira should be easy to back port since you already have the original change. To someone else who doesn't have the original change, this Jira is supposed to make it easier to understand and port. Distributed log splitting coding enhancement to make it easier to understand, no semantics change - Key: HBASE-4820 URL: https://issues.apache.org/jira/browse/HBASE-4820 Project: HBase Issue Type: Improvement Components: wal Affects Versions: 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Labels: newbie Fix For: 0.94.0 Attachments: 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch, 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch In reviewing distributed log splitting feature, we found some cosmetic issues. They make the code hard to understand. It will be great to fix them. For this issue, there should be no semantic change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4797: - Summary: [availability] Skip recovered.edits files with edits we know older than what region currently has (was: [availability] Give recovered.edits files better names, ones that include first and last sequence id so we can skip files with edits we know older than current region has) Updated JIRA subject. [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154704#comment-13154704 ] stack commented on HBASE-4797: -- bq. The region opening is tried periodically. The waiting interval is about 1/3 of the assignment time out. I think that's fine. From the log snippet above though Jimmy, it seems like we are updating the znode every second almost. Thats too much? [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4842) [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck
[ https://issues.apache.org/jira/browse/HBASE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154709#comment-13154709 ] stack commented on HBASE-4842: -- Is this an hbck issue Jon or are is it in our recovery code? [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck --- Key: HBASE-4842 URL: https://issues.apache.org/jira/browse/HBASE-4842 Project: HBase Issue Type: Bug Reporter: Jonathan Hsieh Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck is intermittently failing. In the test, a region's assignment is purposely changed in META but not in ZK. After the equivalent of 'hbck -fix', a subsequent check that should be clean comes up with a new ZK assignment but with META still being inconsistent with ZK. The RS in ZK sometimes this points to the same RS, but sometimes it moves to another ZK. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4841) If I call split fast enough, while inserting, rows disappear.
[ https://issues.apache.org/jira/browse/HBASE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Newman updated HBASE-4841: --- Attachment: log2 here is a log of the wrong number of rows being returned If I call split fast enough, while inserting, rows disappear. -- Key: HBASE-4841 URL: https://issues.apache.org/jira/browse/HBASE-4841 Project: HBase Issue Type: Bug Reporter: Alex Newman Attachments: 1, log, log2 I'll attach a unit test for this. Basically if you call split, while inserting data you can get to the point to where the cluster becomes unstable, or rows will disappear. The unit test gives you some flexibility of: - How many rows - How wide the rows are - The frequency of the split. The default settings crash unit tests or cause the unit tests to fail on my laptop. On my macbook air, i could actually turn down the number of total rows, and the frequency of the splits which is surprising. I think this is because the macbook air has much better IO than my backup acer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4841) If I call split fast enough, while inserting, rows disappear.
[ https://issues.apache.org/jira/browse/HBASE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Newman updated HBASE-4841: --- Attachment: log Here is a log of this script taking the HBase server out. If I call split fast enough, while inserting, rows disappear. -- Key: HBASE-4841 URL: https://issues.apache.org/jira/browse/HBASE-4841 Project: HBase Issue Type: Bug Reporter: Alex Newman Attachments: 1, log, log2 I'll attach a unit test for this. Basically if you call split, while inserting data you can get to the point to where the cluster becomes unstable, or rows will disappear. The unit test gives you some flexibility of: - How many rows - How wide the rows are - The frequency of the split. The default settings crash unit tests or cause the unit tests to fail on my laptop. On my macbook air, i could actually turn down the number of total rows, and the frequency of the splits which is surprising. I think this is because the macbook air has much better IO than my backup acer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast
[ https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154719#comment-13154719 ] Ted Yu commented on HBASE-4832: --- +1 on patch. TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast --- Key: HBASE-4832 URL: https://issues.apache.org/jira/browse/HBASE-4832 Project: HBase Issue Type: Bug Components: coprocessors, test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: Eugene Koontz Priority: Minor Attachments: 4832_trunk_hregionserver.patch, HBASE-4832.patch The current implementation of HRegionServer#stop is {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); synchronized (this) { // Wakes run() if it is sleeping notifyAll(); // FindBugs NN_NAKED_NOTIFY } } {noformat} The notification is sent on the wrong object and does nothing. As a consequence, the region server continues to sleep instead of waking up and stopping immediately. A correct implementation is: {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); // Wakes run() if it is sleeping sleeper.skipSleepCycle(); } {noformat} Then the region server stops immediately. This makes the region server stops 0,5s faster on average, which is quite useful for unit tests. However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does not work. It likely because the code does no expect the region server to stop that fast. The exception is: {noformat} testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort) Time elapsed: 30.06 sec ERROR! java.lang.Exception: test timed out after 3 milliseconds at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.init(Throwable.java:196) at java.lang.Exception.init(Exception.java:41) at java.lang.InterruptedException.init(InterruptedException.java:48) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697) at org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280) at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:892) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:750) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:725) at org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort.testExceptionFromCoprocessorDuringPut(TestRegionServerCoprocessorExceptionWithAbort.java:84) at
[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154723#comment-13154723 ] jirapos...@reviews.apache.org commented on HBASE-4797: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2906/#review3413 --- src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java https://reviews.apache.org/r/2906/#comment7642 maxSedId should be named maxSeqId - Ted On 2011-11-21 22:38:39, Jimmy Xiang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2906/ bq. --- bq. bq. (Updated 2011-11-21 22:38:39) bq. bq. bq. Review request for hbase, Todd Lipcon and Michael Stack. bq. bq. bq. Summary bq. --- bq. bq. If there are multiple recovered edits files, I used the file name to find the initial sequence id. After these files are sorted, we can find a file's possible maximum sequence id based on the next file's initial sequence id. If the maximum sequence id is smaller than the current sequence id, the whole recovered edits file is old and ignored. bq. bq. bq. This addresses bug HBASE-4797. bq. https://issues.apache.org/jira/browse/HBASE-4797 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 8b89661 bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 5daa02b bq. bq. Diff: https://reviews.apache.org/r/2906/diff bq. bq. bq. Testing bq. --- bq. bq. Added test case to TestHRegion, and all the tests in this test are passed. bq. bq. bq. Thanks, bq. bq. Jimmy bq. bq. [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4842) [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck
[ https://issues.apache.org/jira/browse/HBASE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154729#comment-13154729 ] Jonathan Hsieh commented on HBASE-4842: --- Hm.. this looks like a race or due to the lack of a rendezvous of some sort. Up to HBASE-4378, there was a 15000ms (yikes 15 sec!) sleep between the 'hbck -fix' call and the subsequent 'hbck' call that is supposed to be clean. HBASE-4703 removed this. My hunch is that maybe the update to META the 'hbck -fix' does isn't seen on the second 'hbck' run. https://github.com/apache/hbase/commit/6ca0e79a6ac92190238d5cda56f787ab9702d7fc#L61L138 TestHBaseFsck.java:138 [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck --- Key: HBASE-4842 URL: https://issues.apache.org/jira/browse/HBASE-4842 Project: HBase Issue Type: Bug Reporter: Jonathan Hsieh Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck is intermittently failing. In the test, a region's assignment is purposely changed in META but not in ZK. After the equivalent of 'hbck -fix', a subsequent check that should be clean comes up with a new ZK assignment but with META still being inconsistent with ZK. The RS in ZK sometimes this points to the same RS, but sometimes it moves to another ZK. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast
[ https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4832: -- Attachment: 4832-timeout.txt Patch which stores timeout value in a static variable. TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast --- Key: HBASE-4832 URL: https://issues.apache.org/jira/browse/HBASE-4832 Project: HBase Issue Type: Bug Components: coprocessors, test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: Eugene Koontz Priority: Minor Attachments: 4832-timeout.txt, 4832_trunk_hregionserver.patch, HBASE-4832.patch The current implementation of HRegionServer#stop is {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); synchronized (this) { // Wakes run() if it is sleeping notifyAll(); // FindBugs NN_NAKED_NOTIFY } } {noformat} The notification is sent on the wrong object and does nothing. As a consequence, the region server continues to sleep instead of waking up and stopping immediately. A correct implementation is: {noformat} public void stop(final String msg) { this.stopped = true; LOG.info(STOPPED: + msg); // Wakes run() if it is sleeping sleeper.skipSleepCycle(); } {noformat} Then the region server stops immediately. This makes the region server stops 0,5s faster on average, which is quite useful for unit tests. However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does not work. It likely because the code does no expect the region server to stop that fast. The exception is: {noformat} testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort) Time elapsed: 30.06 sec ERROR! java.lang.Exception: test timed out after 3 milliseconds at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.init(Throwable.java:196) at java.lang.Exception.init(Exception.java:41) at java.lang.InterruptedException.init(InterruptedException.java:48) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697) at org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280) at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:892) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:750) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:725) at org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort.testExceptionFromCoprocessorDuringPut(TestRegionServerCoprocessorExceptionWithAbort.java:84) at
[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154747#comment-13154747 ] jirapos...@reviews.apache.org commented on HBASE-4797: -- bq. On 2011-11-21 23:23:07, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 2468 bq. https://reviews.apache.org/r/2906/diff/2/?file=59652#file59652line2468 bq. bq. maxSedId should be named maxSeqId Good catch. - Jimmy --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2906/#review3413 --- On 2011-11-21 22:38:39, Jimmy Xiang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2906/ bq. --- bq. bq. (Updated 2011-11-21 22:38:39) bq. bq. bq. Review request for hbase, Todd Lipcon and Michael Stack. bq. bq. bq. Summary bq. --- bq. bq. If there are multiple recovered edits files, I used the file name to find the initial sequence id. After these files are sorted, we can find a file's possible maximum sequence id based on the next file's initial sequence id. If the maximum sequence id is smaller than the current sequence id, the whole recovered edits file is old and ignored. bq. bq. bq. This addresses bug HBASE-4797. bq. https://issues.apache.org/jira/browse/HBASE-4797 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 8b89661 bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 5daa02b bq. bq. Diff: https://reviews.apache.org/r/2906/diff bq. bq. bq. Testing bq. --- bq. bq. Added test case to TestHRegion, and all the tests in this test are passed. bq. bq. bq. Thanks, bq. bq. Jimmy bq. bq. [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever
[ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154749#comment-13154749 ] Jean-Daniel Cryans commented on HBASE-4739: --- I prefer the latest patch, although it seems the EventHandler.java bit was only needed with the new znode. Handling RegionAlreadyInTransitionException would be cleaner. Master dying while going to close a region can leave it in transition forever - Key: HBASE-4739 URL: https://issues.apache.org/jira/browse/HBASE-4739 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: gaojinchao Priority: Minor Fix For: 0.92.0, 0.94.0, 0.90.5 Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet. When the master restarted it saw the znode and started printing this: {quote} 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing {quote} It's never going to happen, and it's blocking balancing. I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever
[ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154758#comment-13154758 ] stack commented on HBASE-4739: -- Is this state misnamed? Is the comment below saying that the master sets CLOSING state? {code} -RS_ZK_REGION_CLOSING (1), // RS is in process of closing a region +RS_ZK_REGION_CLOSING (1), // Master adds this region as closing in ZK {code} Why not just print out state rather than do this in log message: {code} (state.isPendingClose() ? pending close : closing ) + {code} I'm not up on what rest of patch does so can't comment more. Thanks for digging in Jinchao. Master dying while going to close a region can leave it in transition forever - Key: HBASE-4739 URL: https://issues.apache.org/jira/browse/HBASE-4739 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: gaojinchao Priority: Minor Fix For: 0.92.0, 0.94.0, 0.90.5 Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet. When the master restarted it saw the znode and started printing this: {quote} 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing {quote} It's never going to happen, and it's blocking balancing. I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira