[jira] [Updated] (HBASE-4815) Disable online altering by default, create a config for it

2011-11-21 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4815:
--

Attachment: HBASE-4796.patch

 Disable online altering by default, create a config for it
 --

 Key: HBASE-4815
 URL: https://issues.apache.org/jira/browse/HBASE-4815
 Project: HBase
  Issue Type: Task
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
Priority: Blocker
 Fix For: 0.92.0

 Attachments: 4815-v2.txt, 4815.addendum, 4815.patch


 There's a whole class of bugs that we've been revealing from trying out 
 online altering in conjunction with other operations like splitting. 
 HBASE-4729, HBASE-4794, and HBASE-4814 are examples.
 It's not so much that the online altering code is buggy, but that it wasn't 
 tested in an environment that permits splitting.
 I think we should mark online altering as experimental in 0.92 and add a 
 config to enable it (so it would be disabled by default, requiring people to 
 enable for altering table schema).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4815) Disable online altering by default, create a config for it

2011-11-21 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4815:
--

Attachment: (was: HBASE-4796.patch)

 Disable online altering by default, create a config for it
 --

 Key: HBASE-4815
 URL: https://issues.apache.org/jira/browse/HBASE-4815
 Project: HBase
  Issue Type: Task
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
Priority: Blocker
 Fix For: 0.92.0

 Attachments: 4815-v2.txt, 4815.addendum, 4815.patch


 There's a whole class of bugs that we've been revealing from trying out 
 online altering in conjunction with other operations like splitting. 
 HBASE-4729, HBASE-4794, and HBASE-4814 are examples.
 It's not so much that the online altering code is buggy, but that it wasn't 
 tested in an environment that permits splitting.
 I think we should mark online altering as experimental in 0.92 and add a 
 config to enable it (so it would be disabled by default, requiring people to 
 enable for altering table schema).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication

2011-11-21 Thread Mikhail Bautin (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154088#comment-13154088
 ] 

Mikhail Bautin commented on HBASE-2418:
---

I just saw this regionserver crash in my five-node, three-RS cluster test. 
Since this is a ZK-related patch that went in recently, I am attaching the 
stack trace here just in case.

2011-11-21 01:30:15,188 FATAL 
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
machine_name,60020,1321867814890: Initialization of RS failed.  Hence 
aborting RS.
java.util.ConcurrentModificationException
at java.util.Hashtable$Enumerator.next(Hashtable.java:1031)
at 
org.apache.hadoop.conf.Configuration.iterator(Configuration.java:1042)
at 
org.apache.hadoop.hbase.zookeeper.ZKConfig.makeZKProps(ZKConfig.java:75)
at 
org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:245)
at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:144)
at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:124)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1262)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:568)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:559)
at 
org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:183)
at 
org.apache.hadoop.hbase.catalog.CatalogTracker.init(CatalogTracker.java:177)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:575)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:534)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:642)
at java.lang.Thread.run(Thread.java:619)


 add support for ZooKeeper authentication
 

 Key: HBASE-2418
 URL: https://issues.apache.org/jira/browse/HBASE-2418
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Reporter: Patrick Hunt
Assignee: Eugene Koontz
Priority: Critical
  Labels: security, zookeeper
 Fix For: 0.92.0, 0.94.0

 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch


 Some users may run a ZooKeeper cluster in multi tenant mode meaning that 
 more than one client service would
 like to share a single ZooKeeper service instance (cluster). In this case the 
 client services typically want to protect
 their data (ZK znodes) from access by other services (tenants) on the 
 cluster. Say you are running HBase and Solr 
 and Neo4j, or multiple HBase instances, etc... having 
 authentication/authorization on the znodes is important for both 
 security and helping to ensure that services don't interact negatively (touch 
 each other's data).
 Today HBase does not have support for authentication or authorization. This 
 should be added to the HBase clients
 that are accessing the ZK cluster. In general it means calling addAuthInfo 
 once after a session is established:
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String,
  byte[])
 with a user specific credential, often times this is a shared secret or 
 certificate. You may be able to statically configure this
 in some cases (config string or file to read from), however in my case in 
 particular you may need to access it programmatically,
 which adds complexity as the end user may need to load code into HBase for 
 accessing the credential.
 Secondly you need to specify a non world ACL when interacting with znodes 
 (create primarily):
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html
 Feel free to ping the ZooKeeper team if you have questions. It might also be 
 good to discuss with some 
 potential end users - in particular regarding how the end user can specify 
 the credential.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication

2011-11-21 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154100#comment-13154100
 ] 

ramkrishna.s.vasudevan commented on HBASE-2418:
---

Not able to build from the maven repository for zookeeper 3.4.0 SNAPSHOT.
Correct me if am wrong.

 add support for ZooKeeper authentication
 

 Key: HBASE-2418
 URL: https://issues.apache.org/jira/browse/HBASE-2418
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Reporter: Patrick Hunt
Assignee: Eugene Koontz
Priority: Critical
  Labels: security, zookeeper
 Fix For: 0.92.0, 0.94.0

 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch


 Some users may run a ZooKeeper cluster in multi tenant mode meaning that 
 more than one client service would
 like to share a single ZooKeeper service instance (cluster). In this case the 
 client services typically want to protect
 their data (ZK znodes) from access by other services (tenants) on the 
 cluster. Say you are running HBase and Solr 
 and Neo4j, or multiple HBase instances, etc... having 
 authentication/authorization on the znodes is important for both 
 security and helping to ensure that services don't interact negatively (touch 
 each other's data).
 Today HBase does not have support for authentication or authorization. This 
 should be added to the HBase clients
 that are accessing the ZK cluster. In general it means calling addAuthInfo 
 once after a session is established:
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String,
  byte[])
 with a user specific credential, often times this is a shared secret or 
 certificate. You may be able to statically configure this
 in some cases (config string or file to read from), however in my case in 
 particular you may need to access it programmatically,
 which adds complexity as the end user may need to load code into HBase for 
 accessing the credential.
 Secondly you need to specify a non world ACL when interacting with znodes 
 (create primarily):
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html
 Feel free to ping the ZooKeeper team if you have questions. It might also be 
 good to discuss with some 
 potential end users - in particular regarding how the end user can specify 
 the credential.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication

2011-11-21 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154104#comment-13154104
 ] 

ramkrishna.s.vasudevan commented on HBASE-2418:
---

I resolved by adding this
{code}
repository
  idghelmling.testing/id
  nameGary Helmling test repo/name
  urlhttp://people.apache.org/~garyh/mvn//url
  snapshots
enabledtrue/enabled
  /snapshots
  releases
enabledtrue/enabled
  /releases
/repository
{code}

this was present in HBASE-2418-3.patch

 add support for ZooKeeper authentication
 

 Key: HBASE-2418
 URL: https://issues.apache.org/jira/browse/HBASE-2418
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Reporter: Patrick Hunt
Assignee: Eugene Koontz
Priority: Critical
  Labels: security, zookeeper
 Fix For: 0.92.0, 0.94.0

 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch


 Some users may run a ZooKeeper cluster in multi tenant mode meaning that 
 more than one client service would
 like to share a single ZooKeeper service instance (cluster). In this case the 
 client services typically want to protect
 their data (ZK znodes) from access by other services (tenants) on the 
 cluster. Say you are running HBase and Solr 
 and Neo4j, or multiple HBase instances, etc... having 
 authentication/authorization on the znodes is important for both 
 security and helping to ensure that services don't interact negatively (touch 
 each other's data).
 Today HBase does not have support for authentication or authorization. This 
 should be added to the HBase clients
 that are accessing the ZK cluster. In general it means calling addAuthInfo 
 once after a session is established:
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String,
  byte[])
 with a user specific credential, often times this is a shared secret or 
 certificate. You may be able to statically configure this
 in some cases (config string or file to read from), however in my case in 
 particular you may need to access it programmatically,
 which adds complexity as the end user may need to load code into HBase for 
 accessing the credential.
 Secondly you need to specify a non world ACL when interacting with znodes 
 (create primarily):
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html
 Feel free to ping the ZooKeeper team if you have questions. It might also be 
 good to discuss with some 
 potential end users - in particular regarding how the end user can specify 
 the credential.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4833) HRegionServer stops could be 0,5s faster

2011-11-21 Thread nkeywal (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154129#comment-13154129
 ] 

nkeywal commented on HBASE-4833:


I created HBASE-4832 to follow the fix on 
TestRegionServerCoprocessorExceptionWithAbort and added a link.

 HRegionServer stops could be 0,5s faster
 

 Key: HBASE-4833
 URL: https://issues.apache.org/jira/browse/HBASE-4833
 Project: HBase
  Issue Type: Improvement
  Components: regionserver, test
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 4833_trunk_hregionserver.patch


 The current implementation of HRegionServer#stop is
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 synchronized (this) {
   // Wakes run() if it is sleeping
   notifyAll(); // FindBugs NN_NAKED_NOTIFY
 }
   }
 {noformat}
 The notification is sent on the wrong object and does nothing. As a 
 consequence, the region server continues to sleep instead of waking up and 
 stopping immediately. A correct implementation is:
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 // Wakes run() if it is sleeping
 sleeper.skipSleepCycle();
   }
 {noformat}
 Then the region server stops immediately. This makes the region server stops 
 0,5s faster on average, which is quite useful for unit tests.
 However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does 
 not work.
 It likely because the code does no expect the region server to stop that 
 fast. See HBASE-4832

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager

2011-11-21 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-4308:
-

Assignee: ramkrishna.s.vasudevan

 Race between RegionOpenedHandler and AssignmentManager
 --

 Key: HBASE-4308
 URL: https://issues.apache.org/jira/browse/HBASE-4308
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0


 When the master is processing a ZK event for REGION_OPENED, it calls delete() 
 on the znode before it removes the node from RegionsInTransition. If the 
 notification of that delete comes back into AssignmentManager before the 
 region is removed from RIT, you see an error like:
 2011-08-30 17:43:29,537 WARN  [main-EventThread] 
 master.AssignmentManager(861): Node deleted but still in RIT: 
 .META.,,1.1028785192 state=OPEN, ts=1314751409532, 
 server=todd-w510,55655,1314751396840
 Not certain if it causes issues, but it's a concerning log message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication

2011-11-21 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154146#comment-13154146
 ] 

Hudson commented on HBASE-2418:
---

Integrated in HBase-TRUNK-security #2 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/2/])
HBASE-2418 Support for ZooKeeper authentication

apurtell : 
Files : 
* /hbase/trunk/pom.xml
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperACL.java


 add support for ZooKeeper authentication
 

 Key: HBASE-2418
 URL: https://issues.apache.org/jira/browse/HBASE-2418
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Reporter: Patrick Hunt
Assignee: Eugene Koontz
Priority: Critical
  Labels: security, zookeeper
 Fix For: 0.92.0, 0.94.0

 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch


 Some users may run a ZooKeeper cluster in multi tenant mode meaning that 
 more than one client service would
 like to share a single ZooKeeper service instance (cluster). In this case the 
 client services typically want to protect
 their data (ZK znodes) from access by other services (tenants) on the 
 cluster. Say you are running HBase and Solr 
 and Neo4j, or multiple HBase instances, etc... having 
 authentication/authorization on the znodes is important for both 
 security and helping to ensure that services don't interact negatively (touch 
 each other's data).
 Today HBase does not have support for authentication or authorization. This 
 should be added to the HBase clients
 that are accessing the ZK cluster. In general it means calling addAuthInfo 
 once after a session is established:
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String,
  byte[])
 with a user specific credential, often times this is a shared secret or 
 certificate. You may be able to statically configure this
 in some cases (config string or file to read from), however in my case in 
 particular you may need to access it programmatically,
 which adds complexity as the end user may need to load code into HBase for 
 accessing the credential.
 Secondly you need to specify a non world ACL when interacting with znodes 
 (create primarily):
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html
 Feel free to ping the ZooKeeper team if you have questions. It might also be 
 good to discuss with some 
 potential end users - in particular regarding how the end user can specify 
 the credential.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.

2011-11-21 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4213:
--

Attachment: (was: 4213-trunk-v7.txt)

 Support for fault tolerant, instant schema updates with out master's 
 intervention (i.e with out enable/disable and bulk assign/unassign) through 
 ZK.
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.94.0

 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 
 4213-101211-Support_instant_schema_changes_through_ZK.patch, 
 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 
 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 
 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 
 4213-V10-Support_instant_schema_changes_through_ZK.patch, 
 4213-V5-Support_instant_schema_changes_through_ZK.patch, 
 4213-V7-Support_instant_schema_changes_through_ZK.patch, 
 4213-V8-Support_instant_schema_changes_through_ZK.patch, 
 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 
 4213-trunk-v3.txt, 4213-trunk-v4.txt, 4213-trunk-v5.txt, 4213-trunk-v6.txt, 
 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.v6, 
 HBASE-4213-Instant_schema_change.patch, 
 HBASE-4213_Instant_schema_change_-Version_2_.patch, 
 HBASE_Instant_schema_change-version_3_.patch, schema-update.png


 This Jira is a slight variation in approach to what is being done as part of 
 https://issues.apache.org/jira/browse/HBASE-1730
 Support instant schema updates such as Modify Table, Add Column, Modify 
 Column operations:
 1. With out enable/disabling the table.
 2. With out bulk unassign/assign of regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.

2011-11-21 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4213:
--

Status: Open  (was: Patch Available)

 Support for fault tolerant, instant schema updates with out master's 
 intervention (i.e with out enable/disable and bulk assign/unassign) through 
 ZK.
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.94.0

 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 
 4213-101211-Support_instant_schema_changes_through_ZK.patch, 
 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 
 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 
 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 
 4213-V10-Support_instant_schema_changes_through_ZK.patch, 
 4213-V5-Support_instant_schema_changes_through_ZK.patch, 
 4213-V7-Support_instant_schema_changes_through_ZK.patch, 
 4213-V8-Support_instant_schema_changes_through_ZK.patch, 
 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 
 4213-trunk-v3.txt, 4213-trunk-v4.txt, 4213-trunk-v5.txt, 4213-trunk-v6.txt, 
 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.v6, 
 HBASE-4213-Instant_schema_change.patch, 
 HBASE-4213_Instant_schema_change_-Version_2_.patch, 
 HBASE_Instant_schema_change-version_3_.patch, schema-update.png


 This Jira is a slight variation in approach to what is being done as part of 
 https://issues.apache.org/jira/browse/HBASE-1730
 Support instant schema updates such as Modify Table, Add Column, Modify 
 Column operations:
 1. With out enable/disabling the table.
 2. With out bulk unassign/assign of regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.

2011-11-21 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4213:
--

Attachment: 4213-trunk-v7.txt

 Support for fault tolerant, instant schema updates with out master's 
 intervention (i.e with out enable/disable and bulk assign/unassign) through 
 ZK.
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.94.0

 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 
 4213-101211-Support_instant_schema_changes_through_ZK.patch, 
 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 
 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 
 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 
 4213-V10-Support_instant_schema_changes_through_ZK.patch, 
 4213-V5-Support_instant_schema_changes_through_ZK.patch, 
 4213-V7-Support_instant_schema_changes_through_ZK.patch, 
 4213-V8-Support_instant_schema_changes_through_ZK.patch, 
 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 
 4213-trunk-v3.txt, 4213-trunk-v4.txt, 4213-trunk-v5.txt, 4213-trunk-v6.txt, 
 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.v6, 
 HBASE-4213-Instant_schema_change.patch, 
 HBASE-4213_Instant_schema_change_-Version_2_.patch, 
 HBASE_Instant_schema_change-version_3_.patch, schema-update.png


 This Jira is a slight variation in approach to what is being done as part of 
 https://issues.apache.org/jira/browse/HBASE-1730
 Support instant schema updates such as Modify Table, Add Column, Modify 
 Column operations:
 1. With out enable/disabling the table.
 2. With out bulk unassign/assign of regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.

2011-11-21 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4213:
--

Status: Patch Available  (was: Open)

 Support for fault tolerant, instant schema updates with out master's 
 intervention (i.e with out enable/disable and bulk assign/unassign) through 
 ZK.
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.94.0

 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 
 4213-101211-Support_instant_schema_changes_through_ZK.patch, 
 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 
 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 
 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 
 4213-V10-Support_instant_schema_changes_through_ZK.patch, 
 4213-V5-Support_instant_schema_changes_through_ZK.patch, 
 4213-V7-Support_instant_schema_changes_through_ZK.patch, 
 4213-V8-Support_instant_schema_changes_through_ZK.patch, 
 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 
 4213-trunk-v3.txt, 4213-trunk-v4.txt, 4213-trunk-v5.txt, 4213-trunk-v6.txt, 
 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.v6, 
 HBASE-4213-Instant_schema_change.patch, 
 HBASE-4213_Instant_schema_change_-Version_2_.patch, 
 HBASE_Instant_schema_change-version_3_.patch, schema-update.png


 This Jira is a slight variation in approach to what is being done as part of 
 https://issues.apache.org/jira/browse/HBASE-1730
 Support instant schema updates such as Modify Table, Add Column, Modify 
 Column operations:
 1. With out enable/disabling the table.
 2. With out bulk unassign/assign of regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk

2011-11-21 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154249#comment-13154249
 ] 

Jonathan Hsieh commented on HBASE-2856:
---

@lars the 0.92 version or TestAcidGuarantees ran for about 12 hours without 
problems. 


 TestAcidGuarantee broken on trunk 
 --

 Key: HBASE-2856
 URL: https://issues.apache.org/jira/browse/HBASE-2856
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89.20100621
Reporter: ryan rawson
Assignee: Amitanand Aiyer
Priority: Blocker
 Fix For: 0.94.0

 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 
 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 
 2856-v9-all-inclusive.txt, acid.txt


 TestAcidGuarantee has a test whereby it attempts to read a number of columns 
 from a row, and every so often the first column of N is different, when it 
 should be the same.  This is a bug deep inside the scanner whereby the first 
 peek() of a row is done at time T then the rest of the read is done at T+1 
 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' 
 data becomes committed and flushed to disk.
 One possible solution is to introduce the memstoreTS (or similarly equivalent 
 value) to the HFile thus allowing us to preserve read consistency past 
 flushes.  Another solution involves fixing the scanners so that peek() is not 
 destructive (and thus might return different things at different times alas).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk

2011-11-21 Thread Nicolas Spiegelberg (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154262#comment-13154262
 ] 

Nicolas Spiegelberg commented on HBASE-2856:


Something to keep in mind: we have a version of this for our prod branch 
running on some smaller test clusters, but not yet on our actual prod clusters 
(since we committed it at the same time you did).  Also, note that between 
HFileV2  this, there is no easy downgrade strategy after moving from 90 to 92. 
 I think that putting this in a 92 RC definitely means a extra testing effort.  
However, it's been the last massive outstanding caveat for ACID semantics so it 
makes sense for 92 inclusion.  I'm sure that other companies consider this a 
critical issue for their customers, so they would be up for accelerating this 
testing effort ahead of our schedule. :)

 TestAcidGuarantee broken on trunk 
 --

 Key: HBASE-2856
 URL: https://issues.apache.org/jira/browse/HBASE-2856
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89.20100621
Reporter: ryan rawson
Assignee: Amitanand Aiyer
Priority: Blocker
 Fix For: 0.94.0

 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 
 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 
 2856-v9-all-inclusive.txt, acid.txt


 TestAcidGuarantee has a test whereby it attempts to read a number of columns 
 from a row, and every so often the first column of N is different, when it 
 should be the same.  This is a bug deep inside the scanner whereby the first 
 peek() of a row is done at time T then the rest of the read is done at T+1 
 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' 
 data becomes committed and flushed to disk.
 One possible solution is to introduce the memstoreTS (or similarly equivalent 
 value) to the HFile thus allowing us to preserve read consistency past 
 flushes.  Another solution involves fixing the scanners so that peek() is not 
 destructive (and thus might return different things at different times alas).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4798) Sleeps and synchronisation improvements for tests

2011-11-21 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4798:
---

Attachment: 4798_trunk_all.v10.patch

 Sleeps and synchronisation improvements for tests
 -

 Key: HBASE-4798
 URL: https://issues.apache.org/jira/browse/HBASE-4798
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver, test
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 
 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 
 4798_trunk_all.v2.patch, 4798_trunk_all.v5.patch, 4798_trunk_all.v6.patch, 
 4798_trunk_all.v7.patch


 Multiple small changes:
 @commiters: Removing some sleeps made visible a bug on 
 JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. 
 You may want to review this.
 JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never 
 met (test on !c  !!c). Added a new synchronization point.
 AssignementManager#waitForAssignment: add a timeout on the wait = not stuck 
 if the notification is received before the wait.
 HMaster#loop: use a notification instead of a 1s sleep
 HRegionServer#waitForServerOnline: new method used by 
 JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification
 HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s 
 sleep
 HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s
 ZooKeeperNodeTracker#start: replace a recursive call by a loop
 ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not 
 stuck if the notification is received before the wait.
 HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s
 TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, 
 with the change on HBaseTestingUtility we are 60s faster
 TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead 
 of 1s
 TestRestartCluster#testClusterRestart: send all the table creation together, 
 then check creation, should be faster
 TestHLog: shutdown the whole cluster instead of DFS only (more standard) 
 JVMClusterUtil#startup: lower the sleep from 1s to 0,1s
 HConnectionManager#close: Zookeeper name in debug message from 
 HConnectionManager after connection close was always null because it was set 
 to null in the delete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4798) Sleeps and synchronisation improvements for tests

2011-11-21 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4798:
---

Status: Open  (was: Patch Available)

 Sleeps and synchronisation improvements for tests
 -

 Key: HBASE-4798
 URL: https://issues.apache.org/jira/browse/HBASE-4798
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver, test
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 
 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 
 4798_trunk_all.v2.patch, 4798_trunk_all.v5.patch, 4798_trunk_all.v6.patch, 
 4798_trunk_all.v7.patch


 Multiple small changes:
 @commiters: Removing some sleeps made visible a bug on 
 JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. 
 You may want to review this.
 JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never 
 met (test on !c  !!c). Added a new synchronization point.
 AssignementManager#waitForAssignment: add a timeout on the wait = not stuck 
 if the notification is received before the wait.
 HMaster#loop: use a notification instead of a 1s sleep
 HRegionServer#waitForServerOnline: new method used by 
 JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification
 HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s 
 sleep
 HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s
 ZooKeeperNodeTracker#start: replace a recursive call by a loop
 ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not 
 stuck if the notification is received before the wait.
 HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s
 TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, 
 with the change on HBaseTestingUtility we are 60s faster
 TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead 
 of 1s
 TestRestartCluster#testClusterRestart: send all the table creation together, 
 then check creation, should be faster
 TestHLog: shutdown the whole cluster instead of DFS only (more standard) 
 JVMClusterUtil#startup: lower the sleep from 1s to 0,1s
 HConnectionManager#close: Zookeeper name in debug message from 
 HConnectionManager after connection close was always null because it was set 
 to null in the delete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4798) Sleeps and synchronisation improvements for tests

2011-11-21 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4798:
---

Release Note: last try on the same patch, but I think it's ok to commit it.
  Status: Patch Available  (was: Open)

 Sleeps and synchronisation improvements for tests
 -

 Key: HBASE-4798
 URL: https://issues.apache.org/jira/browse/HBASE-4798
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver, test
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 
 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 
 4798_trunk_all.v2.patch, 4798_trunk_all.v5.patch, 4798_trunk_all.v6.patch, 
 4798_trunk_all.v7.patch


 Multiple small changes:
 @commiters: Removing some sleeps made visible a bug on 
 JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. 
 You may want to review this.
 JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never 
 met (test on !c  !!c). Added a new synchronization point.
 AssignementManager#waitForAssignment: add a timeout on the wait = not stuck 
 if the notification is received before the wait.
 HMaster#loop: use a notification instead of a 1s sleep
 HRegionServer#waitForServerOnline: new method used by 
 JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification
 HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s 
 sleep
 HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s
 ZooKeeperNodeTracker#start: replace a recursive call by a loop
 ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not 
 stuck if the notification is received before the wait.
 HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s
 TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, 
 with the change on HBaseTestingUtility we are 60s faster
 TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead 
 of 1s
 TestRestartCluster#testClusterRestart: send all the table creation together, 
 then check creation, should be faster
 TestHLog: shutdown the whole cluster instead of DFS only (more standard) 
 JVMClusterUtil#startup: lower the sleep from 1s to 0,1s
 HConnectionManager#close: Zookeeper name in debug message from 
 HConnectionManager after connection close was always null because it was set 
 to null in the delete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication

2011-11-21 Thread Andrew Purtell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154276#comment-13154276
 ] 

Andrew Purtell commented on HBASE-2418:
---

@Mikhail: Thanks, that doesn't have a clear direct relation. If it were a test 
failure, I'd say otherwise. This patch modified the MiniZKCluster to take a 
Configuration in constructor and use it. This patch did not touch ZKConfig, 
which is HBase side code.

 add support for ZooKeeper authentication
 

 Key: HBASE-2418
 URL: https://issues.apache.org/jira/browse/HBASE-2418
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Reporter: Patrick Hunt
Assignee: Eugene Koontz
Priority: Critical
  Labels: security, zookeeper
 Fix For: 0.92.0, 0.94.0

 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch


 Some users may run a ZooKeeper cluster in multi tenant mode meaning that 
 more than one client service would
 like to share a single ZooKeeper service instance (cluster). In this case the 
 client services typically want to protect
 their data (ZK znodes) from access by other services (tenants) on the 
 cluster. Say you are running HBase and Solr 
 and Neo4j, or multiple HBase instances, etc... having 
 authentication/authorization on the znodes is important for both 
 security and helping to ensure that services don't interact negatively (touch 
 each other's data).
 Today HBase does not have support for authentication or authorization. This 
 should be added to the HBase clients
 that are accessing the ZK cluster. In general it means calling addAuthInfo 
 once after a session is established:
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String,
  byte[])
 with a user specific credential, often times this is a shared secret or 
 certificate. You may be able to statically configure this
 in some cases (config string or file to read from), however in my case in 
 particular you may need to access it programmatically,
 which adds complexity as the end user may need to load code into HBase for 
 accessing the credential.
 Secondly you need to specify a non world ACL when interacting with znodes 
 (create primarily):
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html
 Feel free to ping the ZooKeeper team if you have questions. It might also be 
 good to discuss with some 
 potential end users - in particular regarding how the end user can specify 
 the credential.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication

2011-11-21 Thread Andrew Purtell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154280#comment-13154280
 ] 

Andrew Purtell commented on HBASE-2418:
---

@Ram I'm looking at the 0.92 pom right now and it includes the repository entry 
for ghelmling.testing.

 add support for ZooKeeper authentication
 

 Key: HBASE-2418
 URL: https://issues.apache.org/jira/browse/HBASE-2418
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Reporter: Patrick Hunt
Assignee: Eugene Koontz
Priority: Critical
  Labels: security, zookeeper
 Fix For: 0.92.0, 0.94.0

 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch


 Some users may run a ZooKeeper cluster in multi tenant mode meaning that 
 more than one client service would
 like to share a single ZooKeeper service instance (cluster). In this case the 
 client services typically want to protect
 their data (ZK znodes) from access by other services (tenants) on the 
 cluster. Say you are running HBase and Solr 
 and Neo4j, or multiple HBase instances, etc... having 
 authentication/authorization on the znodes is important for both 
 security and helping to ensure that services don't interact negatively (touch 
 each other's data).
 Today HBase does not have support for authentication or authorization. This 
 should be added to the HBase clients
 that are accessing the ZK cluster. In general it means calling addAuthInfo 
 once after a session is established:
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String,
  byte[])
 with a user specific credential, often times this is a shared secret or 
 certificate. You may be able to statically configure this
 in some cases (config string or file to read from), however in my case in 
 particular you may need to access it programmatically,
 which adds complexity as the end user may need to load code into HBase for 
 accessing the credential.
 Secondly you need to specify a non world ACL when interacting with znodes 
 (create primarily):
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html
 Feel free to ping the ZooKeeper team if you have questions. It might also be 
 good to discuss with some 
 potential end users - in particular regarding how the end user can specify 
 the credential.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4835) CME out of ZKConfig.makeZKProps

2011-11-21 Thread Andrew Purtell (Created) (JIRA)
CME out of ZKConfig.makeZKProps
---

 Key: HBASE-4835
 URL: https://issues.apache.org/jira/browse/HBASE-4835
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell


Mikhail reported this from a five-node, three-RS cluster test:

{code}
2011-11-21 01:30:15,188 FATAL 
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
machine_name,60020,1321867814890: Initialization of RS failed. Hence aborting 
RS.
java.util.ConcurrentModificationException
at java.util.Hashtable$Enumerator.next(Hashtable.java:1031)
at org.apache.hadoop.conf.Configuration.iterator(Configuration.java:1042)
at org.apache.hadoop.hbase.zookeeper.ZKConfig.makeZKProps(ZKConfig.java:75)
at 
org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:245)
at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:144)
at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:124)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1262)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:568)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:559)
at 
org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:183)
at 
org.apache.hadoop.hbase.catalog.CatalogTracker.init(CatalogTracker.java:177)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:575)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:534)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:642)
at java.lang.Thread.run(Thread.java:619)
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication

2011-11-21 Thread Andrew Purtell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154281#comment-13154281
 ] 

Andrew Purtell commented on HBASE-2418:
---

I opened HBASE-4835 for the CME.

 add support for ZooKeeper authentication
 

 Key: HBASE-2418
 URL: https://issues.apache.org/jira/browse/HBASE-2418
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Reporter: Patrick Hunt
Assignee: Eugene Koontz
Priority: Critical
  Labels: security, zookeeper
 Fix For: 0.92.0, 0.94.0

 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch


 Some users may run a ZooKeeper cluster in multi tenant mode meaning that 
 more than one client service would
 like to share a single ZooKeeper service instance (cluster). In this case the 
 client services typically want to protect
 their data (ZK znodes) from access by other services (tenants) on the 
 cluster. Say you are running HBase and Solr 
 and Neo4j, or multiple HBase instances, etc... having 
 authentication/authorization on the znodes is important for both 
 security and helping to ensure that services don't interact negatively (touch 
 each other's data).
 Today HBase does not have support for authentication or authorization. This 
 should be added to the HBase clients
 that are accessing the ZK cluster. In general it means calling addAuthInfo 
 once after a session is established:
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String,
  byte[])
 with a user specific credential, often times this is a shared secret or 
 certificate. You may be able to statically configure this
 in some cases (config string or file to read from), however in my case in 
 particular you may need to access it programmatically,
 which adds complexity as the end user may need to load code into HBase for 
 accessing the credential.
 Secondly you need to specify a non world ACL when interacting with znodes 
 (create primarily):
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html
 Feel free to ping the ZooKeeper team if you have questions. It might also be 
 good to discuss with some 
 potential end users - in particular regarding how the end user can specify 
 the credential.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4809) Per-CF set RPC metrics

2011-11-21 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154285#comment-13154285
 ] 

Phabricator commented on HBASE-4809:


nspiegelberg has accepted the revision [jira] [HBASE-4809] Per-CF set RPC 
metrics.

  could you put the test under TestHeapSize  I'll commit

INLINE COMMENTS
  
src/test/java/org/apache/hadoop/hbase/regionserver/metrics/TestSchemaMetrics.java:220-222
 I think it's better to be consistent than optimal.  Right now, heapsize is 
easy to refactor because it's done the same way for all classes.  I'm +1 on a 
heapsize refactor, but I say we put that in another JIRA.  It's easier to 
review 2 JIRAs, feature + refactor, than it is to combine the two.

REVISION DETAIL
  https://reviews.facebook.net/D483


 Per-CF set RPC metrics
 --

 Key: HBASE-4809
 URL: https://issues.apache.org/jira/browse/HBASE-4809
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D483.1.patch, D483.2.patch, D483.3.patch, 
 HBASE-4809_Per_CF_set_RPC_metrics.patch


 Porting per-CF set metrics for RPC times and response sizes from 0.89-fb to 
 trunk. For each mutation signature (a set of column families involved in an 
 RPC request) we increment several metrics, allowing to monitor access 
 patterns.  We deal with guarding against an explosion of the number of 
 metrics in HBASE-4638 (which might even be implemented as part of this JIRA).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4835) CME out of ZKConfig.makeZKProps

2011-11-21 Thread Andrew Purtell (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-4835:
--

Attachment: HBASE-4835.patch

I think the simplest course of action is to make a shallow copy of the 
Configuration in the ZKW constructor.

 CME out of ZKConfig.makeZKProps
 ---

 Key: HBASE-4835
 URL: https://issues.apache.org/jira/browse/HBASE-4835
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Attachments: HBASE-4835.patch


 Mikhail reported this from a five-node, three-RS cluster test:
 {code}
 2011-11-21 01:30:15,188 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 machine_name,60020,1321867814890: Initialization of RS failed. Hence 
 aborting RS.
 java.util.ConcurrentModificationException
 at java.util.Hashtable$Enumerator.next(Hashtable.java:1031)
 at org.apache.hadoop.conf.Configuration.iterator(Configuration.java:1042)
 at org.apache.hadoop.hbase.zookeeper.ZKConfig.makeZKProps(ZKConfig.java:75)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:245)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:144)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:124)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1262)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:568)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:559)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:183)
 at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.init(CatalogTracker.java:177)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:575)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:534)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:642)
 at java.lang.Thread.run(Thread.java:619)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.

2011-11-21 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154297#comment-13154297
 ] 

Hadoop QA commented on HBASE-4213:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12504508/4213-trunk-v7.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 18 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -162 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 66 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestInstantSchemaChange
  org.apache.hadoop.hbase.client.TestAdmin
  org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/318//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/318//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/318//console

This message is automatically generated.

 Support for fault tolerant, instant schema updates with out master's 
 intervention (i.e with out enable/disable and bulk assign/unassign) through 
 ZK.
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.94.0

 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 
 4213-101211-Support_instant_schema_changes_through_ZK.patch, 
 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 
 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 
 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 
 4213-V10-Support_instant_schema_changes_through_ZK.patch, 
 4213-V5-Support_instant_schema_changes_through_ZK.patch, 
 4213-V7-Support_instant_schema_changes_through_ZK.patch, 
 4213-V8-Support_instant_schema_changes_through_ZK.patch, 
 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 
 4213-trunk-v3.txt, 4213-trunk-v4.txt, 4213-trunk-v5.txt, 4213-trunk-v6.txt, 
 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.v6, 
 HBASE-4213-Instant_schema_change.patch, 
 HBASE-4213_Instant_schema_change_-Version_2_.patch, 
 HBASE_Instant_schema_change-version_3_.patch, schema-update.png


 This Jira is a slight variation in approach to what is being done as part of 
 https://issues.apache.org/jira/browse/HBASE-1730
 Support instant schema updates such as Modify Table, Add Column, Modify 
 Column operations:
 1. With out enable/disabling the table.
 2. With out bulk unassign/assign of regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk

2011-11-21 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154298#comment-13154298
 ] 

stack commented on HBASE-2856:
--

Lets get it in.

@Lars TestHCM failed recently for me in 0.92 building locally.   Maybe its not 
related to this.

@Jon Thanks for running 12 our proofing.

 TestAcidGuarantee broken on trunk 
 --

 Key: HBASE-2856
 URL: https://issues.apache.org/jira/browse/HBASE-2856
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89.20100621
Reporter: ryan rawson
Assignee: Amitanand Aiyer
Priority: Blocker
 Fix For: 0.94.0

 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 
 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 
 2856-v9-all-inclusive.txt, acid.txt


 TestAcidGuarantee has a test whereby it attempts to read a number of columns 
 from a row, and every so often the first column of N is different, when it 
 should be the same.  This is a bug deep inside the scanner whereby the first 
 peek() of a row is done at time T then the rest of the read is done at T+1 
 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' 
 data becomes committed and flushed to disk.
 One possible solution is to introduce the memstoreTS (or similarly equivalent 
 value) to the HFile thus allowing us to preserve read consistency past 
 flushes.  Another solution involves fixing the scanners so that peek() is not 
 destructive (and thus might return different things at different times alas).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.

2011-11-21 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154300#comment-13154300
 ] 

Ted Yu commented on HBASE-4213:
---

3 out of the 4 test failures were due to 'Too many open files'.

 Support for fault tolerant, instant schema updates with out master's 
 intervention (i.e with out enable/disable and bulk assign/unassign) through 
 ZK.
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.94.0

 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 
 4213-101211-Support_instant_schema_changes_through_ZK.patch, 
 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 
 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 
 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 
 4213-V10-Support_instant_schema_changes_through_ZK.patch, 
 4213-V5-Support_instant_schema_changes_through_ZK.patch, 
 4213-V7-Support_instant_schema_changes_through_ZK.patch, 
 4213-V8-Support_instant_schema_changes_through_ZK.patch, 
 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 
 4213-trunk-v3.txt, 4213-trunk-v4.txt, 4213-trunk-v5.txt, 4213-trunk-v6.txt, 
 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.v6, 
 HBASE-4213-Instant_schema_change.patch, 
 HBASE-4213_Instant_schema_change_-Version_2_.patch, 
 HBASE_Instant_schema_change-version_3_.patch, schema-update.png


 This Jira is a slight variation in approach to what is being done as part of 
 https://issues.apache.org/jira/browse/HBASE-1730
 Support instant schema updates such as Modify Table, Add Column, Modify 
 Column operations:
 1. With out enable/disabling the table.
 2. With out bulk unassign/assign of regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.

2011-11-21 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154306#comment-13154306
 ] 

Ted Yu commented on HBASE-4213:
---

Integrated to TRUNK.

Thanks for the patch Subbu.

Thanks for the review, Lars, Todd and Andy.

 Support for fault tolerant, instant schema updates with out master's 
 intervention (i.e with out enable/disable and bulk assign/unassign) through 
 ZK.
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.94.0

 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 
 4213-101211-Support_instant_schema_changes_through_ZK.patch, 
 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 
 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 
 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 
 4213-V10-Support_instant_schema_changes_through_ZK.patch, 
 4213-V5-Support_instant_schema_changes_through_ZK.patch, 
 4213-V7-Support_instant_schema_changes_through_ZK.patch, 
 4213-V8-Support_instant_schema_changes_through_ZK.patch, 
 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 
 4213-trunk-v3.txt, 4213-trunk-v4.txt, 4213-trunk-v5.txt, 4213-trunk-v6.txt, 
 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.v6, 
 HBASE-4213-Instant_schema_change.patch, 
 HBASE-4213_Instant_schema_change_-Version_2_.patch, 
 HBASE_Instant_schema_change-version_3_.patch, schema-update.png


 This Jira is a slight variation in approach to what is being done as part of 
 https://issues.apache.org/jira/browse/HBASE-1730
 Support instant schema updates such as Modify Table, Add Column, Modify 
 Column operations:
 1. With out enable/disabling the table.
 2. With out bulk unassign/assign of regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4835) CME out of ZKConfig.makeZKProps

2011-11-21 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154316#comment-13154316
 ] 

Ted Yu commented on HBASE-4835:
---

ZKConfig.makeZKProps() is used by ZKUtil.connect(), 
ZKConfig.getZKQuorumServersString(), etc
Would ConcurrentModificationException come out the other callers ?

 CME out of ZKConfig.makeZKProps
 ---

 Key: HBASE-4835
 URL: https://issues.apache.org/jira/browse/HBASE-4835
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Attachments: HBASE-4835.patch


 Mikhail reported this from a five-node, three-RS cluster test:
 {code}
 2011-11-21 01:30:15,188 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 machine_name,60020,1321867814890: Initialization of RS failed. Hence 
 aborting RS.
 java.util.ConcurrentModificationException
 at java.util.Hashtable$Enumerator.next(Hashtable.java:1031)
 at org.apache.hadoop.conf.Configuration.iterator(Configuration.java:1042)
 at org.apache.hadoop.hbase.zookeeper.ZKConfig.makeZKProps(ZKConfig.java:75)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:245)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:144)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:124)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1262)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:568)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:559)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:183)
 at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.init(CatalogTracker.java:177)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:575)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:534)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:642)
 at java.lang.Thread.run(Thread.java:619)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4835) CME out of ZKConfig.makeZKProps

2011-11-21 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154323#comment-13154323
 ] 

stack commented on HBASE-4835:
--

Would it be cleaner making the clone down in makeZKProps Andrew rather than 
globally per ZKW instance?  (Lets get this fix into 0.92).

 CME out of ZKConfig.makeZKProps
 ---

 Key: HBASE-4835
 URL: https://issues.apache.org/jira/browse/HBASE-4835
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Attachments: HBASE-4835.patch


 Mikhail reported this from a five-node, three-RS cluster test:
 {code}
 2011-11-21 01:30:15,188 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 machine_name,60020,1321867814890: Initialization of RS failed. Hence 
 aborting RS.
 java.util.ConcurrentModificationException
 at java.util.Hashtable$Enumerator.next(Hashtable.java:1031)
 at org.apache.hadoop.conf.Configuration.iterator(Configuration.java:1042)
 at org.apache.hadoop.hbase.zookeeper.ZKConfig.makeZKProps(ZKConfig.java:75)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:245)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:144)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:124)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1262)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:568)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:559)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:183)
 at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.init(CatalogTracker.java:177)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:575)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:534)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:642)
 at java.lang.Thread.run(Thread.java:619)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4836) Master stuck with ServerShutdownHandler doing a waitForMeta

2011-11-21 Thread stack (Created) (JIRA)
Master stuck with ServerShutdownHandler doing a waitForMeta
---

 Key: HBASE-4836
 URL: https://issues.apache.org/jira/browse/HBASE-4836
 Project: HBase
  Issue Type: Bug
Reporter: stack


Messing around w/ 0.92 on cluster I got myself into a situation where the 
master would not go down because we were hung as follows in an infinite wait on 
meta to come up:

{code}
MASTER_SERVER_OPERATIONS-sv4r11s38,7001,1321897362552-2 prio=10 
tid=0x4205d800 nid=0x19f6 waiting for monitor entry [0x7fe4eb3f1000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:457)
- waiting to lock 0xca199190 (a 
java.util.concurrent.atomic.AtomicBoolean)
at 
org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:426)
at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:253)
at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:168)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

MASTER_SERVER_OPERATIONS-sv4r11s38,7001,1321897362552-1 prio=10 
tid=0x4237b000 nid=0x19f4 waiting for monitor entry [0x7fe4ebefc000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:457)
- waiting to lock 0xca199190 (a 
java.util.concurrent.atomic.AtomicBoolean)
at 
org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:426)
at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:253)
at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:168)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

MASTER_SERVER_OPERATIONS-sv4r11s38,7001,1321897362552-0 prio=10 
tid=0x7fe4ec610800 nid=0x18e1 waiting on condition [0x7fe4eb4f2000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1295)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:655)
at org.apache.hadoop.hbase.catalog.MetaReader.get(MetaReader.java:245)
at 
org.apache.hadoop.hbase.catalog.MetaReader.getRegion(MetaReader.java:347)
at 
org.apache.hadoop.hbase.catalog.MetaReader.readRegionLocation(MetaReader.java:287)
at 
org.apache.hadoop.hbase.catalog.MetaReader.getMetaRegionLocation(MetaReader.java:274)
at 
org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:399)
- locked 0xca199190 (a 
java.util.concurrent.atomic.AtomicBoolean)
at 
org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:458)
- locked 0xca199190 (a 
java.util.concurrent.atomic.AtomicBoolean)
at 
org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:426)
at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:253)
at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:168)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
{code}

This bit of code needs a bit of refactor such that we can get in state of 
hosting server -- whether its stopped/stopping or not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4835) ConcurrentModificationException out of ZKConfig.makeZKProps

2011-11-21 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4835:
--

Summary: ConcurrentModificationException out of ZKConfig.makeZKProps  (was: 
CME out of ZKConfig.makeZKProps)

 ConcurrentModificationException out of ZKConfig.makeZKProps
 ---

 Key: HBASE-4835
 URL: https://issues.apache.org/jira/browse/HBASE-4835
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Attachments: HBASE-4835.patch


 Mikhail reported this from a five-node, three-RS cluster test:
 {code}
 2011-11-21 01:30:15,188 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 machine_name,60020,1321867814890: Initialization of RS failed. Hence 
 aborting RS.
 java.util.ConcurrentModificationException
 at java.util.Hashtable$Enumerator.next(Hashtable.java:1031)
 at org.apache.hadoop.conf.Configuration.iterator(Configuration.java:1042)
 at org.apache.hadoop.hbase.zookeeper.ZKConfig.makeZKProps(ZKConfig.java:75)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:245)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:144)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:124)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1262)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:568)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:559)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:183)
 at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.init(CatalogTracker.java:177)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:575)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:534)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:642)
 at java.lang.Thread.run(Thread.java:619)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4837) [book] book.xml - schema design, comment on new storefile creation

2011-11-21 Thread Doug Meil (Created) (JIRA)
[book] book.xml - schema design, comment on new storefile creation
--

 Key: HBASE-4837
 URL: https://issues.apache.org/jira/browse/HBASE-4837
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor


book.xml
* schema design chapter.  added sub-section commenting that table and CF 
changes won't take effect until new StoreFiles get written.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.

2011-11-21 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154328#comment-13154328
 ] 

Lars Hofhansl commented on HBASE-4213:
--

Awesome. Thanks for all the work on this Subbu and Ted. This will be incredibly 
useful! (Once online schema changes stabilize in general)

 Support for fault tolerant, instant schema updates with out master's 
 intervention (i.e with out enable/disable and bulk assign/unassign) through 
 ZK.
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.94.0

 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 
 4213-101211-Support_instant_schema_changes_through_ZK.patch, 
 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 
 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 
 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 
 4213-V10-Support_instant_schema_changes_through_ZK.patch, 
 4213-V5-Support_instant_schema_changes_through_ZK.patch, 
 4213-V7-Support_instant_schema_changes_through_ZK.patch, 
 4213-V8-Support_instant_schema_changes_through_ZK.patch, 
 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 
 4213-trunk-v3.txt, 4213-trunk-v4.txt, 4213-trunk-v5.txt, 4213-trunk-v6.txt, 
 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.v6, 
 HBASE-4213-Instant_schema_change.patch, 
 HBASE-4213_Instant_schema_change_-Version_2_.patch, 
 HBASE_Instant_schema_change-version_3_.patch, schema-update.png


 This Jira is a slight variation in approach to what is being done as part of 
 https://issues.apache.org/jira/browse/HBASE-1730
 Support instant schema updates such as Modify Table, Add Column, Modify 
 Column operations:
 1. With out enable/disabling the table.
 2. With out bulk unassign/assign of regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4814) Starting an online alter when regions are splitting can leave their daughters unaltered

2011-11-21 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4814:
--

Fix Version/s: (was: 0.92.0)

Fix this in 0.94

 Starting an online alter when regions are splitting can leave their daughters 
 unaltered
 ---

 Key: HBASE-4814
 URL: https://issues.apache.org/jira/browse/HBASE-4814
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
 Fix For: 0.94.0


 I've seen a situation where regions were splitting almost exactly at the same 
 time as an alter command was issued and those regions' daughters were left 
 unaltered. It would even seem that the daughters' daughters also share this 
 situation.
 Reopening all the regions fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4837) [book] book.xml - schema design, comment on new storefile creation

2011-11-21 Thread Doug Meil (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-4837:
-

Attachment: book_HBASE_4837.xml.patch

 [book] book.xml - schema design, comment on new storefile creation
 --

 Key: HBASE-4837
 URL: https://issues.apache.org/jira/browse/HBASE-4837
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: book_HBASE_4837.xml.patch


 book.xml
 * schema design chapter.  added sub-section commenting that table and CF 
 changes won't take effect until new StoreFiles get written.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4837) [book] book.xml - schema design, comment on new storefile creation

2011-11-21 Thread Doug Meil (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-4837:
-

Status: Patch Available  (was: Open)

 [book] book.xml - schema design, comment on new storefile creation
 --

 Key: HBASE-4837
 URL: https://issues.apache.org/jira/browse/HBASE-4837
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: book_HBASE_4837.xml.patch


 book.xml
 * schema design chapter.  added sub-section commenting that table and CF 
 changes won't take effect until new StoreFiles get written.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4837) [book] book.xml - schema design, comment on new storefile creation

2011-11-21 Thread Doug Meil (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-4837:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 [book] book.xml - schema design, comment on new storefile creation
 --

 Key: HBASE-4837
 URL: https://issues.apache.org/jira/browse/HBASE-4837
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: book_HBASE_4837.xml.patch


 book.xml
 * schema design chapter.  added sub-section commenting that table and CF 
 changes won't take effect until new StoreFiles get written.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk

2011-11-21 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154332#comment-13154332
 ] 

Lars Hofhansl commented on HBASE-2856:
--

testClosing is something I added as part of: HBASE-4805, I'll take a look.
Some of the other failing tests in there scare me. :)

 TestAcidGuarantee broken on trunk 
 --

 Key: HBASE-2856
 URL: https://issues.apache.org/jira/browse/HBASE-2856
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89.20100621
Reporter: ryan rawson
Assignee: Amitanand Aiyer
Priority: Blocker
 Fix For: 0.94.0

 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 
 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 
 2856-v9-all-inclusive.txt, acid.txt


 TestAcidGuarantee has a test whereby it attempts to read a number of columns 
 from a row, and every so often the first column of N is different, when it 
 should be the same.  This is a bug deep inside the scanner whereby the first 
 peek() of a row is done at time T then the rest of the read is done at T+1 
 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' 
 data becomes committed and flushed to disk.
 One possible solution is to introduce the memstoreTS (or similarly equivalent 
 value) to the HFile thus allowing us to preserve read consistency past 
 flushes.  Another solution involves fixing the scanners so that peek() is not 
 destructive (and thus might return different things at different times alas).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.

2011-11-21 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154333#comment-13154333
 ] 

Ted Yu commented on HBASE-4213:
---

Compared to the implementation from HBASE-1730, Subbu's code would take less 
amount of effort to stabilize.
Thanks for the support, Lars.

 Support for fault tolerant, instant schema updates with out master's 
 intervention (i.e with out enable/disable and bulk assign/unassign) through 
 ZK.
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.94.0

 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 
 4213-101211-Support_instant_schema_changes_through_ZK.patch, 
 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 
 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 
 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 
 4213-V10-Support_instant_schema_changes_through_ZK.patch, 
 4213-V5-Support_instant_schema_changes_through_ZK.patch, 
 4213-V7-Support_instant_schema_changes_through_ZK.patch, 
 4213-V8-Support_instant_schema_changes_through_ZK.patch, 
 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 
 4213-trunk-v3.txt, 4213-trunk-v4.txt, 4213-trunk-v5.txt, 4213-trunk-v6.txt, 
 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.v6, 
 HBASE-4213-Instant_schema_change.patch, 
 HBASE-4213_Instant_schema_change_-Version_2_.patch, 
 HBASE_Instant_schema_change-version_3_.patch, schema-update.png


 This Jira is a slight variation in approach to what is being done as part of 
 https://issues.apache.org/jira/browse/HBASE-1730
 Support instant schema updates such as Modify Table, Add Column, Modify 
 Column operations:
 1. With out enable/disabling the table.
 2. With out bulk unassign/assign of regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast

2011-11-21 Thread Eugene Koontz (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koontz reassigned HBASE-4832:


Assignee: Eugene Koontz

 TestRegionServerCoprocessorExceptionWithAbort fails if the region server 
 stops too fast
 ---

 Key: HBASE-4832
 URL: https://issues.apache.org/jira/browse/HBASE-4832
 Project: HBase
  Issue Type: Bug
  Components: coprocessors, test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: Eugene Koontz
Priority: Minor
 Attachments: 4832_trunk_hregionserver.patch


 The current implementation of HRegionServer#stop is
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 synchronized (this) {
   // Wakes run() if it is sleeping
   notifyAll(); // FindBugs NN_NAKED_NOTIFY
 }
   }
 {noformat}
 The notification is sent on the wrong object and does nothing. As a 
 consequence, the region server continues to sleep instead of waking up and 
 stopping immediately. A correct implementation is:
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 // Wakes run() if it is sleeping
 sleeper.skipSleepCycle();
   }
 {noformat}
 Then the region server stops immediately. This makes the region server stops 
 0,5s faster on average, which is quite useful for unit tests.
 However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does 
 not work.
 It likely because the code does no expect the region server to stop that fast.
 The exception is:
 {noformat}
 testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort)
   Time elapsed: 30.06 sec   ERROR!
 java.lang.Exception: test timed out after 3 milliseconds
   at java.lang.Throwable.fillInStackTrace(Native Method)
   at java.lang.Throwable.init(Throwable.java:196)
   at java.lang.Exception.init(Exception.java:41)
   at java.lang.InterruptedException.init(InterruptedException.java:48)
   at java.lang.Thread.sleep(Native Method)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280)
   at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
   at 
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
   at 
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354)
   at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:892)
   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:750)
   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:725)
   at 
 org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort.testExceptionFromCoprocessorDuringPut(TestRegionServerCoprocessorExceptionWithAbort.java:84)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 

[jira] [Commented] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change

2011-11-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154341#comment-13154341
 ] 

jirapos...@reviews.apache.org commented on HBASE-4820:
--



bq.  On 2011-11-21 04:54:00, ramkrishna vasudevan wrote:
bq.   src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java, 
line 1530
bq.   https://reviews.apache.org/r/2895/diff/1/?file=59536#file59536line1530
bq.  
bq.   Can we make this msg more clear.
bq.   Something like
bq.   Unexpected state : statename.. Cannot transit znode state from : 
currentState to OFFLINE.

You got it.


- Jimmy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2895/#review3388
---


On 2011-11-21 02:06:29, Jimmy Xiang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2895/
bq.  ---
bq.  
bq.  (Updated 2011-11-21 02:06:29)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon and Jonathan Robie.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Distributed log splitting coding enhancement to make it easier to 
understand, no semantics change.
bq.  It is some issue raised during the code review in back porting this 
feature to CDH.
bq.  
bq.  
bq.  This addresses bug HBASE-4820.
bq.  https://issues.apache.org/jira/browse/HBASE-4820
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 
f7ef653 
bq.src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java 
b9a3a2c 
bq.src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 
7dd67e9 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java 
1d329b0 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java 
21747b1 
bq.
src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java 
51daa1f 
bq.src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java 
c8684ec 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitLogWorker.java 
84d76e8 
bq.  
bq.  Diff: https://reviews.apache.org/r/2895/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Ran unit tests. Non-flaky tests are green. Two client tests didn't pass, 
which are not related to this change.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jimmy
bq.  
bq.



 Distributed log splitting coding enhancement to make it easier to understand, 
 no semantics change
 -

 Key: HBASE-4820
 URL: https://issues.apache.org/jira/browse/HBASE-4820
 Project: HBase
  Issue Type: Improvement
  Components: wal
Affects Versions: 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
  Labels: newbie
 Fix For: 0.94.0

 Attachments: 
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch,
  
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch


 In reviewing distributed log splitting feature, we found some cosmetic 
 issues.  They make the code hard to understand.
 It will be great to fix them.  For this issue, there should be no semantic 
 change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change

2011-11-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154342#comment-13154342
 ] 

jirapos...@reviews.apache.org commented on HBASE-4820:
--



bq.  On 2011-11-21 03:29:17, Ted Yu wrote:
bq.   src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java, line 
271
bq.   https://reviews.apache.org/r/2895/diff/1/?file=59537#file59537line271
bq.  
bq.   handleDeadWorkers would be a better method name.

Yes, that's right.


bq.  On 2011-11-21 03:29:17, Ted Yu wrote:
bq.   src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java, line 
431
bq.   https://reviews.apache.org/r/2895/diff/1/?file=59538#file59538line431
bq.  
bq.   retry_count is the remaining count. This log message should be 
clearer.

That's right


bq.  On 2011-11-21 03:29:17, Ted Yu wrote:
bq.   src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java, line 
480
bq.   https://reviews.apache.org/r/2895/diff/1/?file=59538#file59538line480
bq.  
bq.   We should say 'remaining retries='

Fixed.


bq.  On 2011-11-21 03:29:17, Ted Yu wrote:
bq.   src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java, line 
453
bq.   https://reviews.apache.org/r/2895/diff/1/?file=59538#file59538line453
bq.  
bq.   Can we implement this item now ?

We can do it in another jira.


bq.  On 2011-11-21 03:29:17, Ted Yu wrote:
bq.   src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java, line 
671
bq.   https://reviews.apache.org/r/2895/diff/1/?file=59538#file59538line671
bq.  
bq.   Please adjust indentation.

That's right.


bq.  On 2011-11-21 03:29:17, Ted Yu wrote:
bq.   src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java, line 
210
bq.   https://reviews.apache.org/r/2895/diff/1/?file=59538#file59538line210
bq.  
bq.   Please remove white space.

I assume you suggest we should not use tab.  Please correct me if I am wrong.


bq.  On 2011-11-21 03:29:17, Ted Yu wrote:
bq.   src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java, line 
952
bq.   https://reviews.apache.org/r/2895/diff/1/?file=59538#file59538line952
bq.  
bq.   Please adjust indentation for these 4 lines.

Fixed.  Replaced tabs with spaces.


bq.  On 2011-11-21 03:29:17, Ted Yu wrote:
bq.   src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java, line 
965
bq.   https://reviews.apache.org/r/2895/diff/1/?file=59538#file59538line965
bq.  
bq.   Should read 'splitlog workers'

fixed.


bq.  On 2011-11-21 03:29:17, Ted Yu wrote:
bq.   
src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java, line 
648
bq.   https://reviews.apache.org/r/2895/diff/1/?file=59540#file59540line648
bq.  
bq.   Adjust indentation, please.

fixed.


- Jimmy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2895/#review3385
---


On 2011-11-21 02:06:29, Jimmy Xiang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2895/
bq.  ---
bq.  
bq.  (Updated 2011-11-21 02:06:29)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon and Jonathan Robie.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Distributed log splitting coding enhancement to make it easier to 
understand, no semantics change.
bq.  It is some issue raised during the code review in back porting this 
feature to CDH.
bq.  
bq.  
bq.  This addresses bug HBASE-4820.
bq.  https://issues.apache.org/jira/browse/HBASE-4820
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 
f7ef653 
bq.src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java 
b9a3a2c 
bq.src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 
7dd67e9 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java 
1d329b0 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java 
21747b1 
bq.
src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java 
51daa1f 
bq.src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java 
c8684ec 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitLogWorker.java 
84d76e8 
bq.  
bq.  Diff: https://reviews.apache.org/r/2895/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Ran unit tests. Non-flaky tests are green. Two client tests didn't pass, 
which are not related to this change.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jimmy
bq.  
bq.



 Distributed log splitting coding enhancement to make it easier to understand, 
 no semantics change
 

[jira] [Commented] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change

2011-11-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154340#comment-13154340
 ] 

jirapos...@reviews.apache.org commented on HBASE-4820:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2895/
---

(Updated 2011-11-21 18:22:03.402105)


Review request for hbase, Todd Lipcon and Jonathan Robie.


Changes
---

Updated patch diff after changes per review.


Summary
---

Distributed log splitting coding enhancement to make it easier to understand, 
no semantics change.
It is some issue raised during the code review in back porting this feature to 
CDH.


This addresses bug HBASE-4820.
https://issues.apache.org/jira/browse/HBASE-4820


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java f7ef653 
  src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java b9a3a2c 
  src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 7dd67e9 
  src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java 
1d329b0 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java 
21747b1 
  src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java 
51daa1f 
  src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java c8684ec 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitLogWorker.java 
84d76e8 

Diff: https://reviews.apache.org/r/2895/diff


Testing
---

Ran unit tests. Non-flaky tests are green. Two client tests didn't pass, which 
are not related to this change.


Thanks,

Jimmy



 Distributed log splitting coding enhancement to make it easier to understand, 
 no semantics change
 -

 Key: HBASE-4820
 URL: https://issues.apache.org/jira/browse/HBASE-4820
 Project: HBase
  Issue Type: Improvement
  Components: wal
Affects Versions: 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
  Labels: newbie
 Fix For: 0.94.0

 Attachments: 
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch,
  
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch


 In reviewing distributed log splitting feature, we found some cosmetic 
 issues.  They make the code hard to understand.
 It will be great to fix them.  For this issue, there should be no semantic 
 change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4798) Sleeps and synchronisation improvements for tests

2011-11-21 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154343#comment-13154343
 ] 

Hadoop QA commented on HBASE-4798:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12504512/4798_trunk_all.v10.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 21 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -166 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 59 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestAdmin

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/319//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/319//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/319//console

This message is automatically generated.

 Sleeps and synchronisation improvements for tests
 -

 Key: HBASE-4798
 URL: https://issues.apache.org/jira/browse/HBASE-4798
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver, test
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 
 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 
 4798_trunk_all.v2.patch, 4798_trunk_all.v5.patch, 4798_trunk_all.v6.patch, 
 4798_trunk_all.v7.patch


 Multiple small changes:
 @commiters: Removing some sleeps made visible a bug on 
 JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. 
 You may want to review this.
 JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never 
 met (test on !c  !!c). Added a new synchronization point.
 AssignementManager#waitForAssignment: add a timeout on the wait = not stuck 
 if the notification is received before the wait.
 HMaster#loop: use a notification instead of a 1s sleep
 HRegionServer#waitForServerOnline: new method used by 
 JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification
 HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s 
 sleep
 HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s
 ZooKeeperNodeTracker#start: replace a recursive call by a loop
 ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not 
 stuck if the notification is received before the wait.
 HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s
 TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, 
 with the change on HBaseTestingUtility we are 60s faster
 TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead 
 of 1s
 TestRestartCluster#testClusterRestart: send all the table creation together, 
 then check creation, should be faster
 TestHLog: shutdown the whole cluster instead of DFS only (more standard) 
 JVMClusterUtil#startup: lower the sleep from 1s to 0,1s
 HConnectionManager#close: Zookeeper name in debug message from 
 HConnectionManager after connection close was always null because it was set 
 to null in the delete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4797) [availability] Give recovered.edits files better names, ones that include first and last sequence id so we can skip files with edits we know older than current region ha

2011-11-21 Thread Jimmy Xiang (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang reassigned HBASE-4797:
--

Assignee: Jimmy Xiang

 [availability] Give recovered.edits files better names, ones that include 
 first and last sequence id so we can skip files with edits we know older than 
 current region has
 --

 Key: HBASE-4797
 URL: https://issues.apache.org/jira/browse/HBASE-4797
 Project: HBase
  Issue Type: Bug
  Components: performance
Reporter: stack
Assignee: Jimmy Xiang
Priority: Critical
  Labels: noob

 Testing 0.92, I crashed all servers out.  Another bug makes it so WALs are 
 not getting cleaned so I had 7000 regions to replay.  The distributed split 
 code did a nice job and cluster came back but interesting is that some hot 
 regions ended up having loads of recovered.edits files -- tens if not 
 hundreds -- to replay against the region (can we bulk load recovered.edits 
 instead of replaying them?).  Each recovered.edits file is taking about a 
 second to process (though only about 30 odd edits per file it seems).  The 
 region is unavailable during this time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4798) Sleeps and synchronisation improvements for tests

2011-11-21 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154356#comment-13154356
 ] 

stack commented on HBASE-4798:
--

The TestAdmin fails because of too many open files.  Let me commit.

 Sleeps and synchronisation improvements for tests
 -

 Key: HBASE-4798
 URL: https://issues.apache.org/jira/browse/HBASE-4798
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver, test
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 
 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 
 4798_trunk_all.v2.patch, 4798_trunk_all.v5.patch, 4798_trunk_all.v6.patch, 
 4798_trunk_all.v7.patch


 Multiple small changes:
 @commiters: Removing some sleeps made visible a bug on 
 JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. 
 You may want to review this.
 JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never 
 met (test on !c  !!c). Added a new synchronization point.
 AssignementManager#waitForAssignment: add a timeout on the wait = not stuck 
 if the notification is received before the wait.
 HMaster#loop: use a notification instead of a 1s sleep
 HRegionServer#waitForServerOnline: new method used by 
 JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification
 HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s 
 sleep
 HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s
 ZooKeeperNodeTracker#start: replace a recursive call by a loop
 ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not 
 stuck if the notification is received before the wait.
 HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s
 TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, 
 with the change on HBaseTestingUtility we are 60s faster
 TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead 
 of 1s
 TestRestartCluster#testClusterRestart: send all the table creation together, 
 then check creation, should be faster
 TestHLog: shutdown the whole cluster instead of DFS only (more standard) 
 JVMClusterUtil#startup: lower the sleep from 1s to 0,1s
 HConnectionManager#close: Zookeeper name in debug message from 
 HConnectionManager after connection close was always null because it was set 
 to null in the delete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4835) ConcurrentModificationException out of ZKConfig.makeZKProps

2011-11-21 Thread Andrew Purtell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154360#comment-13154360
 ] 

Andrew Purtell commented on HBASE-4835:
---

Or synchronize access to the Configuration object in makeZKProps

 ConcurrentModificationException out of ZKConfig.makeZKProps
 ---

 Key: HBASE-4835
 URL: https://issues.apache.org/jira/browse/HBASE-4835
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Attachments: HBASE-4835.patch


 Mikhail reported this from a five-node, three-RS cluster test:
 {code}
 2011-11-21 01:30:15,188 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 machine_name,60020,1321867814890: Initialization of RS failed. Hence 
 aborting RS.
 java.util.ConcurrentModificationException
 at java.util.Hashtable$Enumerator.next(Hashtable.java:1031)
 at org.apache.hadoop.conf.Configuration.iterator(Configuration.java:1042)
 at org.apache.hadoop.hbase.zookeeper.ZKConfig.makeZKProps(ZKConfig.java:75)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:245)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:144)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:124)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1262)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:568)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:559)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:183)
 at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.init(CatalogTracker.java:177)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:575)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:534)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:642)
 at java.lang.Thread.run(Thread.java:619)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (HBASE-4835) ConcurrentModificationException out of ZKConfig.makeZKProps

2011-11-21 Thread Andrew Purtell (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154360#comment-13154360
 ] 

Andrew Purtell edited comment on HBASE-4835 at 11/21/11 6:48 PM:
-

Or synchronize access to the Configuration object in makeZKProps, in addition 
to cloning the Configuration in the constructor. Can also consider heavy handed 
synchronization of every read or mutation of it in o.a.h.h.zookeeper just to be 
sure.

The concern I have about cloning in makeZKProps is the same hashmap iteration 
will happen for that. 

  was (Author: apurtell):
Or synchronize access to the Configuration object in makeZKProps
  
 ConcurrentModificationException out of ZKConfig.makeZKProps
 ---

 Key: HBASE-4835
 URL: https://issues.apache.org/jira/browse/HBASE-4835
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Attachments: HBASE-4835.patch


 Mikhail reported this from a five-node, three-RS cluster test:
 {code}
 2011-11-21 01:30:15,188 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 machine_name,60020,1321867814890: Initialization of RS failed. Hence 
 aborting RS.
 java.util.ConcurrentModificationException
 at java.util.Hashtable$Enumerator.next(Hashtable.java:1031)
 at org.apache.hadoop.conf.Configuration.iterator(Configuration.java:1042)
 at org.apache.hadoop.hbase.zookeeper.ZKConfig.makeZKProps(ZKConfig.java:75)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:245)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:144)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:124)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1262)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:568)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:559)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:183)
 at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.init(CatalogTracker.java:177)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:575)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:534)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:642)
 at java.lang.Thread.run(Thread.java:619)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4797) [availability] Give recovered.edits files better names, ones that include first and last sequence id so we can skip files with edits we know older than current region h

2011-11-21 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154362#comment-13154362
 ] 

stack commented on HBASE-4797:
--

Thanks Jimmy for taking this on.  Looks like you don't have to rename the 
files; just sort them and figure which set to apply (and do what Todd suggests 
rewriting the znode less often -- or asynchronously).

 [availability] Give recovered.edits files better names, ones that include 
 first and last sequence id so we can skip files with edits we know older than 
 current region has
 --

 Key: HBASE-4797
 URL: https://issues.apache.org/jira/browse/HBASE-4797
 Project: HBase
  Issue Type: Bug
  Components: performance
Reporter: stack
Assignee: Jimmy Xiang
Priority: Critical
  Labels: noob

 Testing 0.92, I crashed all servers out.  Another bug makes it so WALs are 
 not getting cleaned so I had 7000 regions to replay.  The distributed split 
 code did a nice job and cluster came back but interesting is that some hot 
 regions ended up having loads of recovered.edits files -- tens if not 
 hundreds -- to replay against the region (can we bulk load recovered.edits 
 instead of replaying them?).  Each recovered.edits file is taking about a 
 second to process (though only about 30 odd edits per file it seems).  The 
 region is unavailable during this time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4835) ConcurrentModificationException out of ZKConfig.makeZKProps

2011-11-21 Thread Mikhail Bautin (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154366#comment-13154366
 ] 

Mikhail Bautin commented on HBASE-4835:
---

@Andrew: thanks for the fix!
The simple approach with copying the configuration sounds good -- I presume we 
don't create too many unique ZooKeeperWatchers in a single JVM.

Alternatively, we could somehow get an immutable snapshot of the 
configuration's key set and iterate that instead of the configuration itself.

 ConcurrentModificationException out of ZKConfig.makeZKProps
 ---

 Key: HBASE-4835
 URL: https://issues.apache.org/jira/browse/HBASE-4835
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Attachments: HBASE-4835.patch


 Mikhail reported this from a five-node, three-RS cluster test:
 {code}
 2011-11-21 01:30:15,188 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 machine_name,60020,1321867814890: Initialization of RS failed. Hence 
 aborting RS.
 java.util.ConcurrentModificationException
 at java.util.Hashtable$Enumerator.next(Hashtable.java:1031)
 at org.apache.hadoop.conf.Configuration.iterator(Configuration.java:1042)
 at org.apache.hadoop.hbase.zookeeper.ZKConfig.makeZKProps(ZKConfig.java:75)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKConfig.getZKQuorumServersString(ZKConfig.java:245)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:144)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:124)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1262)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:568)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:559)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:183)
 at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.init(CatalogTracker.java:177)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:575)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:534)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:642)
 at java.lang.Thread.run(Thread.java:619)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4823) long running scans lose benefit of bloomfilters and timerange hints

2011-11-21 Thread Kannan Muthukkaruppan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kannan Muthukkaruppan updated HBASE-4823:
-

Assignee: Amitanand Aiyer  (was: Kannan Muthukkaruppan)

Amitanand will be helping on this issue.

 long running scans lose benefit of bloomfilters and timerange hints
 ---

 Key: HBASE-4823
 URL: https://issues.apache.org/jira/browse/HBASE-4823
 Project: HBase
  Issue Type: Bug
Reporter: Kannan Muthukkaruppan
Assignee: Amitanand Aiyer

 When you have a long running scan due to say an MR job, you can lose the 
 benefit of timerange hints  bloom filters midway if your scanner gets reset. 
 [Note: The scanners can get reset say due to a flush or compaction].
 In one of our workloads, we periodically want to do rollups on recent 15 
 minutes of data in a column family... but the timerange hint benefit is lost 
 midway when this resetScannerStack (shown below) happens. And end result-- we 
 end up reading all the old HFiles rather than just the recent HFiles.
 {code}
  private void resetScannerStack(KeyValue lastTopKey) throws IOException {
 if (heap != null) {
   throw new RuntimeException(StoreScanner.reseek run on an existing 
 heap!);
 }
 /* When we have the scan object, should we not pass it to getScanners()
  * to get a limited set of scanners? We did so in the constructor and we
  * could have done it now by storing the scan object from the constructor 
 */
 ListKeyValueScanner scanners = getScanners();
 {code}
 The comment in the code seems to be aware of this issue and even has the 
 suggested fix!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4797) [availability] Give recovered.edits files better names, ones that include first and last sequence id so we can skip files with edits we know older than current region h

2011-11-21 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154380#comment-13154380
 ] 

Jimmy Xiang commented on HBASE-4797:


Yes, that's what I was thinking. The file name has the start seq id.  If
there are multiple files, there should be multiple start seq ids.  That
implies the max seq ids in
some of these files, if sorted.  I can use these information to filter out
some files safely.

On Mon, Nov 21, 2011 at 10:52 AM, stack (Commented) (JIRA)



 [availability] Give recovered.edits files better names, ones that include 
 first and last sequence id so we can skip files with edits we know older than 
 current region has
 --

 Key: HBASE-4797
 URL: https://issues.apache.org/jira/browse/HBASE-4797
 Project: HBase
  Issue Type: Bug
  Components: performance
Reporter: stack
Assignee: Jimmy Xiang
Priority: Critical
  Labels: noob

 Testing 0.92, I crashed all servers out.  Another bug makes it so WALs are 
 not getting cleaned so I had 7000 regions to replay.  The distributed split 
 code did a nice job and cluster came back but interesting is that some hot 
 regions ended up having loads of recovered.edits files -- tens if not 
 hundreds -- to replay against the region (can we bulk load recovered.edits 
 instead of replaying them?).  Each recovered.edits file is taking about a 
 second to process (though only about 30 odd edits per file it seems).  The 
 region is unavailable during this time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change

2011-11-21 Thread Kannan Muthukkaruppan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154385#comment-13154385
 ] 

Kannan Muthukkaruppan commented on HBASE-4820:
--

This type of code factoring changes will make it harder for 89-fb/our internal 
branch changes to stay in sync with trunk; and to push/pull patches between the 
two revs. But agree that that can't be the reason to block all code 
factor/improvements. So those have to be evaluated on a case to case basis. Do 
we think these changes and code moves are worth it? [I have only looked at the 
specific changes superficially, but wanted to express the concern at least so 
that someone who has reviewed in detail can comment if this change is a must.]

 Distributed log splitting coding enhancement to make it easier to understand, 
 no semantics change
 -

 Key: HBASE-4820
 URL: https://issues.apache.org/jira/browse/HBASE-4820
 Project: HBase
  Issue Type: Improvement
  Components: wal
Affects Versions: 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
  Labels: newbie
 Fix For: 0.94.0

 Attachments: 
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch,
  
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch


 In reviewing distributed log splitting feature, we found some cosmetic 
 issues.  They make the code hard to understand.
 It will be great to fix them.  For this issue, there should be no semantic 
 change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.

2011-11-21 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154396#comment-13154396
 ] 

Hudson commented on HBASE-4213:
---

Integrated in HBase-TRUNK #2468 (See 
[https://builds.apache.org/job/HBase-TRUNK/2468/])
HBASE-4213 Support for fault tolerant, instant schema updates with out 
master's intervention through ZK

tedyu : 
Files : 
* /hbase/trunk/pom.xml
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/MasterServices.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ModifyTableHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/TableDeleteFamilyHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/OnlineRegions.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/MasterSchemaChangeTracker.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/SchemaChangeTracker.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
* /hbase/trunk/src/main/resources/hbase-default.xml
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/InstantSchemaChangeTestBase.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestInstantSchemaChange.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestInstantSchemaChangeFailover.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestInstantSchemaChangeSplit.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java


 Support for fault tolerant, instant schema updates with out master's 
 intervention (i.e with out enable/disable and bulk assign/unassign) through 
 ZK.
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.94.0

 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 
 4213-101211-Support_instant_schema_changes_through_ZK.patch, 
 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 
 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 
 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 
 4213-V10-Support_instant_schema_changes_through_ZK.patch, 
 4213-V5-Support_instant_schema_changes_through_ZK.patch, 
 4213-V7-Support_instant_schema_changes_through_ZK.patch, 
 4213-V8-Support_instant_schema_changes_through_ZK.patch, 
 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 
 4213-trunk-v3.txt, 4213-trunk-v4.txt, 4213-trunk-v5.txt, 4213-trunk-v6.txt, 
 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.v6, 
 HBASE-4213-Instant_schema_change.patch, 
 HBASE-4213_Instant_schema_change_-Version_2_.patch, 
 HBASE_Instant_schema_change-version_3_.patch, schema-update.png


 This Jira is a slight variation in approach to what is being done as part of 
 https://issues.apache.org/jira/browse/HBASE-1730
 Support instant schema updates such as Modify Table, Add Column, Modify 
 Column operations:
 1. With out enable/disabling the table.
 2. With out bulk unassign/assign of regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk

2011-11-21 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154399#comment-13154399
 ] 

Lars Hofhansl commented on HBASE-2856:
--

This one looks bad:

{noformat}
testFilterAcrossMultipleRegions(org.apache.hadoop.hbase.client.TestFromClientSid
e)  Time elapsed: 12.233 sec   FAILURE!
java.lang.AssertionError: expected:17576 but was:28064
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at org.junit.Assert.assertEquals(Assert.java:456)
at org.apache.hadoop.hbase.client.TestFromClientSide.assertRowCount(Test
FromClientSide.java:528)
at org.apache.hadoop.hbase.client.TestFromClientSide.testFilterAcrossMul
tipleRegions(TestFromClientSide.java:436)
{noformat}

Happens only with the 0.92 patch applied. It seems the scanner now finds too 
many cells.

 TestAcidGuarantee broken on trunk 
 --

 Key: HBASE-2856
 URL: https://issues.apache.org/jira/browse/HBASE-2856
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89.20100621
Reporter: ryan rawson
Assignee: Amitanand Aiyer
Priority: Blocker
 Fix For: 0.94.0

 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 
 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 
 2856-v9-all-inclusive.txt, acid.txt


 TestAcidGuarantee has a test whereby it attempts to read a number of columns 
 from a row, and every so often the first column of N is different, when it 
 should be the same.  This is a bug deep inside the scanner whereby the first 
 peek() of a row is done at time T then the rest of the read is done at T+1 
 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' 
 data becomes committed and flushed to disk.
 One possible solution is to introduce the memstoreTS (or similarly equivalent 
 value) to the HFile thus allowing us to preserve read consistency past 
 flushes.  Another solution involves fixing the scanners so that peek() is not 
 destructive (and thus might return different things at different times alas).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.

2011-11-21 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4213:
--

Attachment: 4213.addendum

Addendum to temporarily disable 
testInstantSchemaOperationsInZKForMasterFailover which relies on predetermined 
sleep interval for schema janitor to clean up schema change request

Subbu will come up with better test.

 Support for fault tolerant, instant schema updates with out master's 
 intervention (i.e with out enable/disable and bulk assign/unassign) through 
 ZK.
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.94.0

 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 
 4213-101211-Support_instant_schema_changes_through_ZK.patch, 
 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 
 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 
 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 
 4213-V10-Support_instant_schema_changes_through_ZK.patch, 
 4213-V5-Support_instant_schema_changes_through_ZK.patch, 
 4213-V7-Support_instant_schema_changes_through_ZK.patch, 
 4213-V8-Support_instant_schema_changes_through_ZK.patch, 
 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 
 4213-trunk-v3.txt, 4213-trunk-v4.txt, 4213-trunk-v5.txt, 4213-trunk-v6.txt, 
 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.addendum, 4213.v6, 
 HBASE-4213-Instant_schema_change.patch, 
 HBASE-4213_Instant_schema_change_-Version_2_.patch, 
 HBASE_Instant_schema_change-version_3_.patch, schema-update.png


 This Jira is a slight variation in approach to what is being done as part of 
 https://issues.apache.org/jira/browse/HBASE-1730
 Support instant schema updates such as Modify Table, Add Column, Modify 
 Column operations:
 1. With out enable/disabling the table.
 2. With out bulk unassign/assign of regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk

2011-11-21 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154410#comment-13154410
 ] 

Lars Hofhansl commented on HBASE-2856:
--

I looked through the entire patch again manually, but I can't figure out what 
would cause this failure.

 TestAcidGuarantee broken on trunk 
 --

 Key: HBASE-2856
 URL: https://issues.apache.org/jira/browse/HBASE-2856
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89.20100621
Reporter: ryan rawson
Assignee: Amitanand Aiyer
Priority: Blocker
 Fix For: 0.94.0

 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 
 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 
 2856-v9-all-inclusive.txt, acid.txt


 TestAcidGuarantee has a test whereby it attempts to read a number of columns 
 from a row, and every so often the first column of N is different, when it 
 should be the same.  This is a bug deep inside the scanner whereby the first 
 peek() of a row is done at time T then the rest of the read is done at T+1 
 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' 
 data becomes committed and flushed to disk.
 One possible solution is to introduce the memstoreTS (or similarly equivalent 
 value) to the HFile thus allowing us to preserve read consistency past 
 flushes.  Another solution involves fixing the scanners so that peek() is not 
 destructive (and thus might return different things at different times alas).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.

2011-11-21 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154417#comment-13154417
 ] 

Hadoop QA commented on HBASE-4213:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12504535/4213.addendum
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/320//console

This message is automatically generated.

 Support for fault tolerant, instant schema updates with out master's 
 intervention (i.e with out enable/disable and bulk assign/unassign) through 
 ZK.
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.94.0

 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 
 4213-101211-Support_instant_schema_changes_through_ZK.patch, 
 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 
 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 
 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 
 4213-V10-Support_instant_schema_changes_through_ZK.patch, 
 4213-V5-Support_instant_schema_changes_through_ZK.patch, 
 4213-V7-Support_instant_schema_changes_through_ZK.patch, 
 4213-V8-Support_instant_schema_changes_through_ZK.patch, 
 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 
 4213-trunk-v3.txt, 4213-trunk-v4.txt, 4213-trunk-v5.txt, 4213-trunk-v6.txt, 
 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.addendum, 4213.v6, 
 HBASE-4213-Instant_schema_change.patch, 
 HBASE-4213_Instant_schema_change_-Version_2_.patch, 
 HBASE_Instant_schema_change-version_3_.patch, schema-update.png


 This Jira is a slight variation in approach to what is being done as part of 
 https://issues.apache.org/jira/browse/HBASE-1730
 Support instant schema updates such as Modify Table, Add Column, Modify 
 Column operations:
 1. With out enable/disabling the table.
 2. With out bulk unassign/assign of regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change

2011-11-21 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154438#comment-13154438
 ] 

Ted Yu commented on HBASE-4820:
---

I agree with Kannan that we should reduce the number of places where refactor 
is done to make porting easier.

Currently SplitLogManager.splitLogDistributed(final ListPath logDirs) creates 
MonitoredTask to display status.
This results in repetitive display of the following form on 60010/master-status:
{code}
Doing distributed log split in [hdfs://...-splitting]
{code}
We should make log splitting status display cleaner.

 Distributed log splitting coding enhancement to make it easier to understand, 
 no semantics change
 -

 Key: HBASE-4820
 URL: https://issues.apache.org/jira/browse/HBASE-4820
 Project: HBase
  Issue Type: Improvement
  Components: wal
Affects Versions: 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
  Labels: newbie
 Fix For: 0.94.0

 Attachments: 
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch,
  
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch


 In reviewing distributed log splitting feature, we found some cosmetic 
 issues.  They make the code hard to understand.
 It will be great to fix them.  For this issue, there should be no semantic 
 change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change

2011-11-21 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154443#comment-13154443
 ] 

Jonathan Hsieh commented on HBASE-4820:
---

@Kannan, I'm looking at this from the point of view of someone who recently 
spent a many hours reviewing the dist log splitting patches in aggregate and 
may be responsible for fixing issues if it has problems.  I had a harder time 
than I'd prefer, and will likely have the same problem again if there are 
problems in the future.  Doing a little bit of semantics preserving changes 
such as making var/method/class names more descriptive and encapsulating pieces 
would go a long way to make the code more easily and quickly understandable by 
more people.

Are you suggesting splitting these changes into smaller pieces such as:

* add better exception error messages.
* consolidate calls only used once. Ex: async callbacks submethods; inline 
finishInitailize into SLM's constructor
* rename vague methods. ex: installTask(String taskName) might be better as 
enqueueSplitLog(String logPath);  handleDeadWorker might be better as 
blacklistDeadWorker;  'exec(String name, Progressable)' might be better as  
'split(String logfilename, Progressable)'
* rename vague classes. ex: Task to SplitTask, TaskBatch to 
SplitTaskState/SplitTaskContext
* correct comments to be consistent with code (comments in SplitLogWorker talks 
about SUCCESS state which acutally is DONE state).








 Distributed log splitting coding enhancement to make it easier to understand, 
 no semantics change
 -

 Key: HBASE-4820
 URL: https://issues.apache.org/jira/browse/HBASE-4820
 Project: HBase
  Issue Type: Improvement
  Components: wal
Affects Versions: 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
  Labels: newbie
 Fix For: 0.94.0

 Attachments: 
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch,
  
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch


 In reviewing distributed log splitting feature, we found some cosmetic 
 issues.  They make the code hard to understand.
 It will be great to fix them.  For this issue, there should be no semantic 
 change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk

2011-11-21 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154453#comment-13154453
 ] 

Jonathan Hsieh commented on HBASE-2856:
---

On the bulkload operation, the error has something to do with the split point 
-- in the test I force a split and the resulting error has something to do with 
the point where the start of the second daughter.

@Lars -- since the original issue is resolved, and since this seems non-trival, 
maybe this should get move into a new issue?

 TestAcidGuarantee broken on trunk 
 --

 Key: HBASE-2856
 URL: https://issues.apache.org/jira/browse/HBASE-2856
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89.20100621
Reporter: ryan rawson
Assignee: Amitanand Aiyer
Priority: Blocker
 Fix For: 0.94.0

 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 
 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 
 2856-v9-all-inclusive.txt, acid.txt


 TestAcidGuarantee has a test whereby it attempts to read a number of columns 
 from a row, and every so often the first column of N is different, when it 
 should be the same.  This is a bug deep inside the scanner whereby the first 
 peek() of a row is done at time T then the rest of the read is done at T+1 
 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' 
 data becomes committed and flushed to disk.
 One possible solution is to introduce the memstoreTS (or similarly equivalent 
 value) to the HFile thus allowing us to preserve read consistency past 
 flushes.  Another solution involves fixing the scanners so that peek() is not 
 destructive (and thus might return different things at different times alas).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.

2011-11-21 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4213:
--

Attachment: (was: 4213-trunk-v4.txt)

 Support for fault tolerant, instant schema updates with out master's 
 intervention (i.e with out enable/disable and bulk assign/unassign) through 
 ZK.
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.94.0

 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 
 4213-101211-Support_instant_schema_changes_through_ZK.patch, 
 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 
 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 
 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 
 4213-V10-Support_instant_schema_changes_through_ZK.patch, 
 4213-V5-Support_instant_schema_changes_through_ZK.patch, 
 4213-V7-Support_instant_schema_changes_through_ZK.patch, 
 4213-V8-Support_instant_schema_changes_through_ZK.patch, 
 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 
 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.addendum, 4213.v6, 
 HBASE-4213-Instant_schema_change.patch, 
 HBASE-4213_Instant_schema_change_-Version_2_.patch, 
 HBASE_Instant_schema_change-version_3_.patch, schema-update.png


 This Jira is a slight variation in approach to what is being done as part of 
 https://issues.apache.org/jira/browse/HBASE-1730
 Support instant schema updates such as Modify Table, Add Column, Modify 
 Column operations:
 1. With out enable/disabling the table.
 2. With out bulk unassign/assign of regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.

2011-11-21 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4213:
--

Attachment: (was: 4213-trunk-v3.txt)

 Support for fault tolerant, instant schema updates with out master's 
 intervention (i.e with out enable/disable and bulk assign/unassign) through 
 ZK.
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.94.0

 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 
 4213-101211-Support_instant_schema_changes_through_ZK.patch, 
 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 
 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 
 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 
 4213-V10-Support_instant_schema_changes_through_ZK.patch, 
 4213-V5-Support_instant_schema_changes_through_ZK.patch, 
 4213-V7-Support_instant_schema_changes_through_ZK.patch, 
 4213-V8-Support_instant_schema_changes_through_ZK.patch, 
 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 
 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.addendum, 4213.v6, 
 HBASE-4213-Instant_schema_change.patch, 
 HBASE-4213_Instant_schema_change_-Version_2_.patch, 
 HBASE_Instant_schema_change-version_3_.patch, schema-update.png


 This Jira is a slight variation in approach to what is being done as part of 
 https://issues.apache.org/jira/browse/HBASE-1730
 Support instant schema updates such as Modify Table, Add Column, Modify 
 Column operations:
 1. With out enable/disabling the table.
 2. With out bulk unassign/assign of regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.

2011-11-21 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4213:
--

Attachment: (was: 4213-trunk-v5.txt)

 Support for fault tolerant, instant schema updates with out master's 
 intervention (i.e with out enable/disable and bulk assign/unassign) through 
 ZK.
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.94.0

 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 
 4213-101211-Support_instant_schema_changes_through_ZK.patch, 
 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 
 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 
 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 
 4213-V10-Support_instant_schema_changes_through_ZK.patch, 
 4213-V5-Support_instant_schema_changes_through_ZK.patch, 
 4213-V7-Support_instant_schema_changes_through_ZK.patch, 
 4213-V8-Support_instant_schema_changes_through_ZK.patch, 
 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 
 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.addendum, 4213.v6, 
 HBASE-4213-Instant_schema_change.patch, 
 HBASE-4213_Instant_schema_change_-Version_2_.patch, 
 HBASE_Instant_schema_change-version_3_.patch, schema-update.png


 This Jira is a slight variation in approach to what is being done as part of 
 https://issues.apache.org/jira/browse/HBASE-1730
 Support instant schema updates such as Modify Table, Add Column, Modify 
 Column operations:
 1. With out enable/disabling the table.
 2. With out bulk unassign/assign of regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.

2011-11-21 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4213:
--

Attachment: (was: 4213-trunk-v6.txt)

 Support for fault tolerant, instant schema updates with out master's 
 intervention (i.e with out enable/disable and bulk assign/unassign) through 
 ZK.
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.94.0

 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 
 4213-101211-Support_instant_schema_changes_through_ZK.patch, 
 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 
 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 
 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 
 4213-V10-Support_instant_schema_changes_through_ZK.patch, 
 4213-V5-Support_instant_schema_changes_through_ZK.patch, 
 4213-V7-Support_instant_schema_changes_through_ZK.patch, 
 4213-V8-Support_instant_schema_changes_through_ZK.patch, 
 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 
 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.addendum, 4213.v6, 
 HBASE-4213-Instant_schema_change.patch, 
 HBASE-4213_Instant_schema_change_-Version_2_.patch, 
 HBASE_Instant_schema_change-version_3_.patch, schema-update.png


 This Jira is a slight variation in approach to what is being done as part of 
 https://issues.apache.org/jira/browse/HBASE-1730
 Support instant schema updates such as Modify Table, Add Column, Modify 
 Column operations:
 1. With out enable/disabling the table.
 2. With out bulk unassign/assign of regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4830) Regionserver BLOCKED on WAITING DFSClient$DFSOutputStream.waitForAckedSeqno running 0.20.205.0+

2011-11-21 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4830:
-

Attachment: 4830.txt

Todd's suggestion.  Testing it actually works.

 Regionserver BLOCKED on WAITING DFSClient$DFSOutputStream.waitForAckedSeqno 
 running 0.20.205.0+
 ---

 Key: HBASE-4830
 URL: https://issues.apache.org/jira/browse/HBASE-4830
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Attachments: 4830.txt, hbase-stack-regionserver-sv4r9s38.out


 Running 0.20.205.1 (I was not at tip of the branch) I ran into the following 
 hung regionserver:
 {code}
 regionserver7003.logRoller daemon prio=10 tid=0x7fd98028f800 nid=0x61af 
 in Object.wait() [0x7fd987bfa000]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 at java.lang.Object.wait(Object.java:485)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.waitForAckedSeqno(DFSClient.java:3606)
 - locked 0xf8656788 (a java.util.LinkedList)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.flushInternal(DFSClient.java:3595)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3687)
 - locked 0xf8656458 (a 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3626)
 at 
 org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
 at 
 org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
 at 
 org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:966)
 - locked 0xf8655998 (a 
 org.apache.hadoop.io.SequenceFile$Writer)
 at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.close(SequenceFileLogWriter.java:214)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.cleanupCurrentWriter(HLog.java:791)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:578)
 - locked 0xc443deb0 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 Other threads are like this (here's a sample):
 {code}
 regionserver7003.logSyncer daemon prio=10 tid=0x7fd98025e000 nid=0x61ae 
 waiting for monitor entry [0x7fd987cfb000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1074)
 - waiting to lock 0xc443deb0 (a java.lang.Object)
 at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1195)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1057)
 at java.lang.Thread.run(Thread.java:662)
 
 IPC Server handler 0 on 7003 daemon prio=10 tid=0x7fd98049b800 
 nid=0x61b8 waiting for monitor entry [0x7fd9872f1000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:1007)
 - waiting to lock 0xc443deb0 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:1798)
 at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1668)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2980)
 at sun.reflect.GeneratedMethodAccessor636.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1325)
 {code}
 Looks like HDFS-1529?  (Todd?)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4838) Port 2856 (TestAcidGuarantees is failing) to 0.92

2011-11-21 Thread Lars Hofhansl (Created) (JIRA)
Port 2856 (TestAcidGuarantees is failing) to 0.92
-

 Key: HBASE-4838
 URL: https://issues.apache.org/jira/browse/HBASE-4838
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0


Moving back port into a separate issue (as suggested by JonH), because this not 
trivial.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk

2011-11-21 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154491#comment-13154491
 ] 

Lars Hofhansl commented on HBASE-2856:
--

Created HBASE-4838

 TestAcidGuarantee broken on trunk 
 --

 Key: HBASE-2856
 URL: https://issues.apache.org/jira/browse/HBASE-2856
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89.20100621
Reporter: ryan rawson
Assignee: Amitanand Aiyer
Priority: Blocker
 Fix For: 0.94.0

 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 
 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 
 2856-v9-all-inclusive.txt, acid.txt


 TestAcidGuarantee has a test whereby it attempts to read a number of columns 
 from a row, and every so often the first column of N is different, when it 
 should be the same.  This is a bug deep inside the scanner whereby the first 
 peek() of a row is done at time T then the rest of the read is done at T+1 
 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' 
 data becomes committed and flushed to disk.
 One possible solution is to introduce the memstoreTS (or similarly equivalent 
 value) to the HFile thus allowing us to preserve read consistency past 
 flushes.  Another solution involves fixing the scanners so that peek() is not 
 destructive (and thus might return different things at different times alas).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4838) Port 2856 (TestAcidGuarantee is failing) to 0.92

2011-11-21 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-4838:
-

Summary: Port 2856 (TestAcidGuarantee is failing) to 0.92  (was: Port 2856 
(TestAcidGuarantees is failing) to 0.92)

 Port 2856 (TestAcidGuarantee is failing) to 0.92
 

 Key: HBASE-4838
 URL: https://issues.apache.org/jira/browse/HBASE-4838
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0


 Moving back port into a separate issue (as suggested by JonH), because this 
 not trivial.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4838) Port 2856 (TestAcidGuarantee is failing) to 0.92

2011-11-21 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-4838:
-

Attachment: 4838-v1.txt

Patch identical to 0.92-patch in HBASE-2856.
This has issues with failing tests.

 Port 2856 (TestAcidGuarantee is failing) to 0.92
 

 Key: HBASE-4838
 URL: https://issues.apache.org/jira/browse/HBASE-4838
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0

 Attachments: 4838-v1.txt


 Moving back port into a separate issue (as suggested by JonH), because this 
 not trivial.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.

2011-11-21 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154503#comment-13154503
 ] 

stack commented on HBASE-4213:
--

Want to do the test fix in another issue Ted and Subbu?

 Support for fault tolerant, instant schema updates with out master's 
 intervention (i.e with out enable/disable and bulk assign/unassign) through 
 ZK.
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.94.0

 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 
 4213-101211-Support_instant_schema_changes_through_ZK.patch, 
 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 
 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 
 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 
 4213-V10-Support_instant_schema_changes_through_ZK.patch, 
 4213-V5-Support_instant_schema_changes_through_ZK.patch, 
 4213-V7-Support_instant_schema_changes_through_ZK.patch, 
 4213-V8-Support_instant_schema_changes_through_ZK.patch, 
 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk-v2.txt, 
 4213-trunk-v7.txt, 4213-trunk.txt, 4213-v9.txt, 4213.addendum, 4213.v6, 
 HBASE-4213-Instant_schema_change.patch, 
 HBASE-4213_Instant_schema_change_-Version_2_.patch, 
 HBASE_Instant_schema_change-version_3_.patch, schema-update.png


 This Jira is a slight variation in approach to what is being done as part of 
 https://issues.apache.org/jira/browse/HBASE-1730
 Support instant schema updates such as Modify Table, Add Column, Modify 
 Column operations:
 1. With out enable/disabling the table.
 2. With out bulk unassign/assign of regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4839) Re-enable TestInstantSchemaChangeFailover#testInstantSchemaOperationsInZKForMasterFailover

2011-11-21 Thread Ted Yu (Created) (JIRA)
Re-enable 
TestInstantSchemaChangeFailover#testInstantSchemaOperationsInZKForMasterFailover
--

 Key: HBASE-4839
 URL: https://issues.apache.org/jira/browse/HBASE-4839
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu


TestInstantSchemaChangeFailover#testInstantSchemaOperationsInZKForMasterFailover
 was disabled for instant schema change (HBASE-4213) after it failed on Jenkins.

We should enable it and make it pass on Jenkins and dev enviroments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4797) [availability] Give recovered.edits files better names, ones that include first and last sequence id so we can skip files with edits we know older than current region h

2011-11-21 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154538#comment-13154538
 ] 

Jimmy Xiang commented on HBASE-4797:


The region opening is tried periodically.  The waiting interval is about 1/3 of 
the assignment time out. I think that's fine.

 [availability] Give recovered.edits files better names, ones that include 
 first and last sequence id so we can skip files with edits we know older than 
 current region has
 --

 Key: HBASE-4797
 URL: https://issues.apache.org/jira/browse/HBASE-4797
 Project: HBase
  Issue Type: Bug
  Components: performance
Reporter: stack
Assignee: Jimmy Xiang
Priority: Critical
  Labels: noob

 Testing 0.92, I crashed all servers out.  Another bug makes it so WALs are 
 not getting cleaned so I had 7000 regions to replay.  The distributed split 
 code did a nice job and cluster came back but interesting is that some hot 
 regions ended up having loads of recovered.edits files -- tens if not 
 hundreds -- to replay against the region (can we bulk load recovered.edits 
 instead of replaying them?).  Each recovered.edits file is taking about a 
 second to process (though only about 30 odd edits per file it seems).  The 
 region is unavailable during this time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4840) If I call split fast enough, while inserting, rows disappear.

2011-11-21 Thread Alex Newman (Created) (JIRA)
If I call split fast enough, while inserting, rows disappear. 
--

 Key: HBASE-4840
 URL: https://issues.apache.org/jira/browse/HBASE-4840
 Project: HBase
  Issue Type: Bug
Reporter: Alex Newman


I'll attach a unit test for this. Basically if you call split, while inserting 
data you can get to the point to where the cluster becomes unstable, or rows 
will  disappear.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4841) If I call split fast enough, while inserting, rows disappear.

2011-11-21 Thread Alex Newman (Created) (JIRA)
If I call split fast enough, while inserting, rows disappear. 
--

 Key: HBASE-4841
 URL: https://issues.apache.org/jira/browse/HBASE-4841
 Project: HBase
  Issue Type: Bug
Reporter: Alex Newman


I'll attach a unit test for this. Basically if you call split, while inserting 
data you can get to the point to where the cluster becomes unstable, or rows 
will  disappear.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4841) If I call split fast enough, while inserting, rows disappear.

2011-11-21 Thread Alex Newman (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154581#comment-13154581
 ] 

Alex Newman commented on HBASE-4841:


Since this can cause dataloss it may make sense to increase the priority.

 If I call split fast enough, while inserting, rows disappear. 
 --

 Key: HBASE-4841
 URL: https://issues.apache.org/jira/browse/HBASE-4841
 Project: HBase
  Issue Type: Bug
Reporter: Alex Newman
 Attachments: 1


 I'll attach a unit test for this. Basically if you call split, while 
 inserting data you can get to the point to where the cluster becomes 
 unstable, or rows will  disappear.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4841) If I call split fast enough, while inserting, rows disappear.

2011-11-21 Thread Alex Newman (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Newman updated HBASE-4841:
---

Attachment: 1

 If I call split fast enough, while inserting, rows disappear. 
 --

 Key: HBASE-4841
 URL: https://issues.apache.org/jira/browse/HBASE-4841
 Project: HBase
  Issue Type: Bug
Reporter: Alex Newman
 Attachments: 1


 I'll attach a unit test for this. Basically if you call split, while 
 inserting data you can get to the point to where the cluster becomes 
 unstable, or rows will  disappear.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast

2011-11-21 Thread Eugene Koontz (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koontz updated HBASE-4832:
-

Release Note: 
This incorporates nkeywal's earlier patch to this JIRA, and allows 
TestRegionServerCoprocessortWithAbort() to work with it. It changes the test to 
use a Zookeeper watcher in a separate thread to watch for the regionserver to 
abort. (This is also what is currently done with 
TestMasterCoprocessorWithAbort()).

In my testing, repeated iterations (30+) of 
TestRegionServerCoprocessortWithAbort() succeed.
  Status: Patch Available  (was: Open)

 TestRegionServerCoprocessorExceptionWithAbort fails if the region server 
 stops too fast
 ---

 Key: HBASE-4832
 URL: https://issues.apache.org/jira/browse/HBASE-4832
 Project: HBase
  Issue Type: Bug
  Components: coprocessors, test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: Eugene Koontz
Priority: Minor
 Attachments: 4832_trunk_hregionserver.patch


 The current implementation of HRegionServer#stop is
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 synchronized (this) {
   // Wakes run() if it is sleeping
   notifyAll(); // FindBugs NN_NAKED_NOTIFY
 }
   }
 {noformat}
 The notification is sent on the wrong object and does nothing. As a 
 consequence, the region server continues to sleep instead of waking up and 
 stopping immediately. A correct implementation is:
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 // Wakes run() if it is sleeping
 sleeper.skipSleepCycle();
   }
 {noformat}
 Then the region server stops immediately. This makes the region server stops 
 0,5s faster on average, which is quite useful for unit tests.
 However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does 
 not work.
 It likely because the code does no expect the region server to stop that fast.
 The exception is:
 {noformat}
 testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort)
   Time elapsed: 30.06 sec   ERROR!
 java.lang.Exception: test timed out after 3 milliseconds
   at java.lang.Throwable.fillInStackTrace(Native Method)
   at java.lang.Throwable.init(Throwable.java:196)
   at java.lang.Exception.init(Exception.java:41)
   at java.lang.InterruptedException.init(InterruptedException.java:48)
   at java.lang.Thread.sleep(Native Method)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280)
   at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
   at 
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
   at 
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354)
   at 

[jira] [Updated] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast

2011-11-21 Thread Eugene Koontz (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koontz updated HBASE-4832:
-

Attachment: HBASE-4832.patch

 TestRegionServerCoprocessorExceptionWithAbort fails if the region server 
 stops too fast
 ---

 Key: HBASE-4832
 URL: https://issues.apache.org/jira/browse/HBASE-4832
 Project: HBase
  Issue Type: Bug
  Components: coprocessors, test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: Eugene Koontz
Priority: Minor
 Attachments: 4832_trunk_hregionserver.patch, HBASE-4832.patch


 The current implementation of HRegionServer#stop is
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 synchronized (this) {
   // Wakes run() if it is sleeping
   notifyAll(); // FindBugs NN_NAKED_NOTIFY
 }
   }
 {noformat}
 The notification is sent on the wrong object and does nothing. As a 
 consequence, the region server continues to sleep instead of waking up and 
 stopping immediately. A correct implementation is:
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 // Wakes run() if it is sleeping
 sleeper.skipSleepCycle();
   }
 {noformat}
 Then the region server stops immediately. This makes the region server stops 
 0,5s faster on average, which is quite useful for unit tests.
 However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does 
 not work.
 It likely because the code does no expect the region server to stop that fast.
 The exception is:
 {noformat}
 testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort)
   Time elapsed: 30.06 sec   ERROR!
 java.lang.Exception: test timed out after 3 milliseconds
   at java.lang.Throwable.fillInStackTrace(Native Method)
   at java.lang.Throwable.init(Throwable.java:196)
   at java.lang.Exception.init(Exception.java:41)
   at java.lang.InterruptedException.init(InterruptedException.java:48)
   at java.lang.Thread.sleep(Native Method)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280)
   at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
   at 
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
   at 
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354)
   at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:892)
   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:750)
   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:725)
   at 
 org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort.testExceptionFromCoprocessorDuringPut(TestRegionServerCoprocessorExceptionWithAbort.java:84)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 

[jira] [Commented] (HBASE-4841) If I call split fast enough, while inserting, rows disappear.

2011-11-21 Thread Alex Newman (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154682#comment-13154682
 ] 

Alex Newman commented on HBASE-4841:


I realized it may be easier If I post the log for the unit test, rather than 
requiring you to run it. It's on the way.

 If I call split fast enough, while inserting, rows disappear. 
 --

 Key: HBASE-4841
 URL: https://issues.apache.org/jira/browse/HBASE-4841
 Project: HBase
  Issue Type: Bug
Reporter: Alex Newman
 Attachments: 1


 I'll attach a unit test for this. Basically if you call split, while 
 inserting data you can get to the point to where the cluster becomes 
 unstable, or rows will  disappear.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4842) [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck

2011-11-21 Thread Jonathan Hsieh (Created) (JIRA)
[hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck
---

 Key: HBASE-4842
 URL: https://issues.apache.org/jira/browse/HBASE-4842
 Project: HBase
  Issue Type: Bug
Reporter: Jonathan Hsieh


Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck is 
intermittently failing.

In the test, a region's assignment is changed in META but not in ZK.  After the 
equivalent of 
'hbck -fix', a subsequent check that should be clean comes up with a new ZK 
assignment but with META still being inconsistent with ZK.  The RS in ZK 
sometimes this points to the same RS, but sometimes it moves to another ZK. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4842) [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck

2011-11-21 Thread Jonathan Hsieh (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-4842:
--

Description: 
Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck is 
intermittently failing.

In the test, a region's assignment is purposely changed in META but not in ZK.  
After the equivalent of 'hbck -fix', a subsequent check that should be clean 
comes up with a new ZK assignment but with META still being inconsistent with 
ZK.  The RS in ZK sometimes this points to the same RS, but sometimes it 
moves to another ZK. 

  was:
Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck is 
intermittently failing.

In the test, a region's assignment is changed in META but not in ZK.  After the 
equivalent of 
'hbck -fix', a subsequent check that should be clean comes up with a new ZK 
assignment but with META still being inconsistent with ZK.  The RS in ZK 
sometimes this points to the same RS, but sometimes it moves to another ZK. 


 [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck
 ---

 Key: HBASE-4842
 URL: https://issues.apache.org/jira/browse/HBASE-4842
 Project: HBase
  Issue Type: Bug
Reporter: Jonathan Hsieh

 Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck 
 is intermittently failing.
 In the test, a region's assignment is purposely changed in META but not in 
 ZK.  After the equivalent of 'hbck -fix', a subsequent check that should be 
 clean comes up with a new ZK assignment but with META still being 
 inconsistent with ZK.  The RS in ZK sometimes this points to the same RS, but 
 sometimes it moves to another ZK. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4842) [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck

2011-11-21 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154686#comment-13154686
 ] 

Jonathan Hsieh commented on HBASE-4842:
---

Output Examples:

Note that the ZK assignment and the META assignment did not change.
{code}
// hbck -fix call
ERROR: Region 
tableBadMetaAssign,,1321733234211.35120fc878802e3b6829e6d7b597b44c. listed in 
META on region server ubuntu64-build01.sf.cloudera.com,51134,1321733229687 but 
found on region server ubuntu64-build01.sf.cloudera.com,38112,1321733229583
Trying to fix assignment error...
...
// hbck after fix
ERROR: Region 
tableBadMetaAssign,,1321733234211.35120fc878802e3b6829e6d7b597b44c. listed in 
META on region server ubuntu64-build01.sf.cloudera.com,51134,1321733229687 but 
found on region server ubuntu64-build01.sf.cloudera.com,38112,1321733229583
{code}

Note that the ZK assignment changed but meta had not yet changed.
{code}
// hbck -fix
ERROR: Region 
tableBadMetaAssign,,1321719700727.af24fbbe3e1df676b8e31e3ff5765fb6. listed in 
META on region server p0123.sf.cloudera.com,36067,1321719696277 but found on 
region server p0123.sf.cloudera.com,54221,1321719696237
Trying to fix assignment error...
...
// hbck after fix
ERROR: Region 
tableBadMetaAssign,,1321719700727.af24fbbe3e1df676b8e31e3ff5765fb6. listed in 
META on region server p0123.sf.cloudera.com,36067,1321719696277 but found on 
region server p0123.sf.cloudera.com,59522,1321719696305
{code}

 [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck
 ---

 Key: HBASE-4842
 URL: https://issues.apache.org/jira/browse/HBASE-4842
 Project: HBase
  Issue Type: Bug
Reporter: Jonathan Hsieh

 Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck 
 is intermittently failing.
 In the test, a region's assignment is purposely changed in META but not in 
 ZK.  After the equivalent of 'hbck -fix', a subsequent check that should be 
 clean comes up with a new ZK assignment but with META still being 
 inconsistent with ZK.  The RS in ZK sometimes this points to the same RS, but 
 sometimes it moves to another ZK. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4797) [availability] Give recovered.edits files better names, ones that include first and last sequence id so we can skip files with edits we know older than current region h

2011-11-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154689#comment-13154689
 ] 

jirapos...@reviews.apache.org commented on HBASE-4797:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2906/
---

Review request for hbase, Todd Lipcon and Michael Stack.


Summary
---

If there are multiple recovered edits files, I used the file name to find the 
initial sequence id.  After these files are sorted, we can find a file's 
possible maximum sequence id based on the next file's initial sequence id.  If 
the maximum sequence id is smaller than the current sequence id, the whole 
recovered edits file is old and ignored.


This addresses bug HBASE-4797.
https://issues.apache.org/jira/browse/HBASE-4797


Diffs
-

  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 8b89661 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 5daa02b 

Diff: https://reviews.apache.org/r/2906/diff


Testing
---

Added test case to TestHRegion, and all the tests in this test are passed.


Thanks,

Jimmy



 [availability] Give recovered.edits files better names, ones that include 
 first and last sequence id so we can skip files with edits we know older than 
 current region has
 --

 Key: HBASE-4797
 URL: https://issues.apache.org/jira/browse/HBASE-4797
 Project: HBase
  Issue Type: Bug
  Components: performance
Reporter: stack
Assignee: Jimmy Xiang
Priority: Critical
  Labels: noob

 Testing 0.92, I crashed all servers out.  Another bug makes it so WALs are 
 not getting cleaned so I had 7000 regions to replay.  The distributed split 
 code did a nice job and cluster came back but interesting is that some hot 
 regions ended up having loads of recovered.edits files -- tens if not 
 hundreds -- to replay against the region (can we bulk load recovered.edits 
 instead of replaying them?).  Each recovered.edits file is taking about a 
 second to process (though only about 30 odd edits per file it seems).  The 
 region is unavailable during this time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4841) If I call split fast enough, while inserting, rows disappear.

2011-11-21 Thread Alex Newman (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Newman updated HBASE-4841:
---

Description: 
I'll attach a unit test for this. Basically if you call split, while inserting 
data you can get to the point to where the cluster becomes unstable, or rows 
will  disappear. The unit test gives you some flexibility of:

- How many rows
- How wide the rows are
- The frequency of the split. 


The default settings crash unit tests or cause the unit tests to fail on my 
laptop. On my macbook air, i could actually turn down the number of total rows, 
and the frequency of the splits which is surprising. I think this is because 
the macbook air has much better IO than my backup acer.

  was:I'll attach a unit test for this. Basically if you call split, while 
inserting data you can get to the point to where the cluster becomes unstable, 
or rows will  disappear.


 If I call split fast enough, while inserting, rows disappear. 
 --

 Key: HBASE-4841
 URL: https://issues.apache.org/jira/browse/HBASE-4841
 Project: HBase
  Issue Type: Bug
Reporter: Alex Newman
 Attachments: 1


 I'll attach a unit test for this. Basically if you call split, while 
 inserting data you can get to the point to where the cluster becomes 
 unstable, or rows will  disappear. The unit test gives you some flexibility 
 of:
 - How many rows
 - How wide the rows are
 - The frequency of the split. 
 The default settings crash unit tests or cause the unit tests to fail on my 
 laptop. On my macbook air, i could actually turn down the number of total 
 rows, and the frequency of the splits which is surprising. I think this is 
 because the macbook air has much better IO than my backup acer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4797) [availability] Give recovered.edits files better names, ones that include first and last sequence id so we can skip files with edits we know older than current region h

2011-11-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154696#comment-13154696
 ] 

jirapos...@reviews.apache.org commented on HBASE-4797:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2906/#review3409
---


Very nice patch.

In future, would suggest you confine your change just to what you are adding.   
The white space cleanup is nice but it distracts from your patch.  It also 
bloats it and makes it look intimidating to review (smile).

Minor fixups only.


src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
https://reviews.apache.org/r/2906/#comment7635

So, are these already sorted in right order from oldest edit to newest?



src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
https://reviews.apache.org/r/2906/#comment7636

Possilbe should be Possible.

I'd be more assertive in this message.  Maximum possible sequenceid for 
this log is  + + , skipping ..



src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
https://reviews.apache.org/r/2906/#comment7637

Good.



src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
https://reviews.apache.org/r/2906/#comment7638

Any more asserts we can do in here?   Assert we replayed N of the M files?


- Michael


On 2011-11-21 22:38:39, Jimmy Xiang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2906/
bq.  ---
bq.  
bq.  (Updated 2011-11-21 22:38:39)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  If there are multiple recovered edits files, I used the file name to find 
the initial sequence id.  After these files are sorted, we can find a file's 
possible maximum sequence id based on the next file's initial sequence id.  If 
the maximum sequence id is smaller than the current sequence id, the whole 
recovered edits file is old and ignored.
bq.  
bq.  
bq.  This addresses bug HBASE-4797.
bq.  https://issues.apache.org/jira/browse/HBASE-4797
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 8b89661 
bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 
5daa02b 
bq.  
bq.  Diff: https://reviews.apache.org/r/2906/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Added test case to TestHRegion, and all the tests in this test are passed.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jimmy
bq.  
bq.



 [availability] Give recovered.edits files better names, ones that include 
 first and last sequence id so we can skip files with edits we know older than 
 current region has
 --

 Key: HBASE-4797
 URL: https://issues.apache.org/jira/browse/HBASE-4797
 Project: HBase
  Issue Type: Bug
  Components: performance
Reporter: stack
Assignee: Jimmy Xiang
Priority: Critical
  Labels: noob

 Testing 0.92, I crashed all servers out.  Another bug makes it so WALs are 
 not getting cleaned so I had 7000 regions to replay.  The distributed split 
 code did a nice job and cluster came back but interesting is that some hot 
 regions ended up having loads of recovered.edits files -- tens if not 
 hundreds -- to replay against the region (can we bulk load recovered.edits 
 instead of replaying them?).  Each recovered.edits file is taking about a 
 second to process (though only about 30 odd edits per file it seems).  The 
 region is unavailable during this time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4797) [availability] Give recovered.edits files better names, ones that include first and last sequence id so we can skip files with edits we know older than current region h

2011-11-21 Thread Kannan Muthukkaruppan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154699#comment-13154699
 ] 

Kannan Muthukkaruppan commented on HBASE-4797:
--

The title for the bug can be updated given that we are no longer renaming the 
files in recovered.edits. [That concerned me initially -- but reading through 
the details, looks like you have come up with a way to avoid new name format. 
That's always smoother for upgrades and such..]



 [availability] Give recovered.edits files better names, ones that include 
 first and last sequence id so we can skip files with edits we know older than 
 current region has
 --

 Key: HBASE-4797
 URL: https://issues.apache.org/jira/browse/HBASE-4797
 Project: HBase
  Issue Type: Bug
  Components: performance
Reporter: stack
Assignee: Jimmy Xiang
Priority: Critical
  Labels: noob

 Testing 0.92, I crashed all servers out.  Another bug makes it so WALs are 
 not getting cleaned so I had 7000 regions to replay.  The distributed split 
 code did a nice job and cluster came back but interesting is that some hot 
 regions ended up having loads of recovered.edits files -- tens if not 
 hundreds -- to replay against the region (can we bulk load recovered.edits 
 instead of replaying them?).  Each recovered.edits file is taking about a 
 second to process (though only about 30 odd edits per file it seems).  The 
 region is unavailable during this time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change

2011-11-21 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154697#comment-13154697
 ] 

Jimmy Xiang commented on HBASE-4820:


I think Jon has a great point.  In porting this feature to CDH, I spent quite 
some time to understand it.

To Kannan's concern, the coding change for this Jira should be easy to back 
port since you already have the original change.

To someone else who doesn't have the original change, this Jira is supposed to 
make it easier to understand and port.

 Distributed log splitting coding enhancement to make it easier to understand, 
 no semantics change
 -

 Key: HBASE-4820
 URL: https://issues.apache.org/jira/browse/HBASE-4820
 Project: HBase
  Issue Type: Improvement
  Components: wal
Affects Versions: 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
  Labels: newbie
 Fix For: 0.94.0

 Attachments: 
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch,
  
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch


 In reviewing distributed log splitting feature, we found some cosmetic 
 issues.  They make the code hard to understand.
 It will be great to fix them.  For this issue, there should be no semantic 
 change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has

2011-11-21 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4797:
-

Summary: [availability] Skip recovered.edits files with edits we know older 
than what region currently has  (was: [availability] Give recovered.edits files 
better names, ones that include first and last sequence id so we can skip files 
with edits we know older than current region has)

Updated JIRA subject.

 [availability] Skip recovered.edits files with edits we know older than what 
 region currently has
 -

 Key: HBASE-4797
 URL: https://issues.apache.org/jira/browse/HBASE-4797
 Project: HBase
  Issue Type: Bug
  Components: performance
Reporter: stack
Assignee: Jimmy Xiang
Priority: Critical
  Labels: noob

 Testing 0.92, I crashed all servers out.  Another bug makes it so WALs are 
 not getting cleaned so I had 7000 regions to replay.  The distributed split 
 code did a nice job and cluster came back but interesting is that some hot 
 regions ended up having loads of recovered.edits files -- tens if not 
 hundreds -- to replay against the region (can we bulk load recovered.edits 
 instead of replaying them?).  Each recovered.edits file is taking about a 
 second to process (though only about 30 odd edits per file it seems).  The 
 region is unavailable during this time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has

2011-11-21 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154704#comment-13154704
 ] 

stack commented on HBASE-4797:
--

bq. The region opening is tried periodically. The waiting interval is about 1/3 
of the assignment time out. I think that's fine.

From the log snippet above though Jimmy, it seems like we are updating the 
znode every second almost.  Thats too much?

 [availability] Skip recovered.edits files with edits we know older than what 
 region currently has
 -

 Key: HBASE-4797
 URL: https://issues.apache.org/jira/browse/HBASE-4797
 Project: HBase
  Issue Type: Bug
  Components: performance
Reporter: stack
Assignee: Jimmy Xiang
Priority: Critical
  Labels: noob

 Testing 0.92, I crashed all servers out.  Another bug makes it so WALs are 
 not getting cleaned so I had 7000 regions to replay.  The distributed split 
 code did a nice job and cluster came back but interesting is that some hot 
 regions ended up having loads of recovered.edits files -- tens if not 
 hundreds -- to replay against the region (can we bulk load recovered.edits 
 instead of replaying them?).  Each recovered.edits file is taking about a 
 second to process (though only about 30 odd edits per file it seems).  The 
 region is unavailable during this time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4842) [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck

2011-11-21 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154709#comment-13154709
 ] 

stack commented on HBASE-4842:
--

Is this an hbck issue Jon or are is it in our recovery code?

 [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck
 ---

 Key: HBASE-4842
 URL: https://issues.apache.org/jira/browse/HBASE-4842
 Project: HBase
  Issue Type: Bug
Reporter: Jonathan Hsieh

 Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck 
 is intermittently failing.
 In the test, a region's assignment is purposely changed in META but not in 
 ZK.  After the equivalent of 'hbck -fix', a subsequent check that should be 
 clean comes up with a new ZK assignment but with META still being 
 inconsistent with ZK.  The RS in ZK sometimes this points to the same RS, but 
 sometimes it moves to another ZK. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4841) If I call split fast enough, while inserting, rows disappear.

2011-11-21 Thread Alex Newman (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Newman updated HBASE-4841:
---

Attachment: log2

here is a log of the wrong number of rows being returned

 If I call split fast enough, while inserting, rows disappear. 
 --

 Key: HBASE-4841
 URL: https://issues.apache.org/jira/browse/HBASE-4841
 Project: HBase
  Issue Type: Bug
Reporter: Alex Newman
 Attachments: 1, log, log2


 I'll attach a unit test for this. Basically if you call split, while 
 inserting data you can get to the point to where the cluster becomes 
 unstable, or rows will  disappear. The unit test gives you some flexibility 
 of:
 - How many rows
 - How wide the rows are
 - The frequency of the split. 
 The default settings crash unit tests or cause the unit tests to fail on my 
 laptop. On my macbook air, i could actually turn down the number of total 
 rows, and the frequency of the splits which is surprising. I think this is 
 because the macbook air has much better IO than my backup acer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4841) If I call split fast enough, while inserting, rows disappear.

2011-11-21 Thread Alex Newman (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Newman updated HBASE-4841:
---

Attachment: log

Here is a log of this script taking the HBase server out.

 If I call split fast enough, while inserting, rows disappear. 
 --

 Key: HBASE-4841
 URL: https://issues.apache.org/jira/browse/HBASE-4841
 Project: HBase
  Issue Type: Bug
Reporter: Alex Newman
 Attachments: 1, log, log2


 I'll attach a unit test for this. Basically if you call split, while 
 inserting data you can get to the point to where the cluster becomes 
 unstable, or rows will  disappear. The unit test gives you some flexibility 
 of:
 - How many rows
 - How wide the rows are
 - The frequency of the split. 
 The default settings crash unit tests or cause the unit tests to fail on my 
 laptop. On my macbook air, i could actually turn down the number of total 
 rows, and the frequency of the splits which is surprising. I think this is 
 because the macbook air has much better IO than my backup acer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast

2011-11-21 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154719#comment-13154719
 ] 

Ted Yu commented on HBASE-4832:
---

+1 on patch.

 TestRegionServerCoprocessorExceptionWithAbort fails if the region server 
 stops too fast
 ---

 Key: HBASE-4832
 URL: https://issues.apache.org/jira/browse/HBASE-4832
 Project: HBase
  Issue Type: Bug
  Components: coprocessors, test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: Eugene Koontz
Priority: Minor
 Attachments: 4832_trunk_hregionserver.patch, HBASE-4832.patch


 The current implementation of HRegionServer#stop is
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 synchronized (this) {
   // Wakes run() if it is sleeping
   notifyAll(); // FindBugs NN_NAKED_NOTIFY
 }
   }
 {noformat}
 The notification is sent on the wrong object and does nothing. As a 
 consequence, the region server continues to sleep instead of waking up and 
 stopping immediately. A correct implementation is:
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 // Wakes run() if it is sleeping
 sleeper.skipSleepCycle();
   }
 {noformat}
 Then the region server stops immediately. This makes the region server stops 
 0,5s faster on average, which is quite useful for unit tests.
 However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does 
 not work.
 It likely because the code does no expect the region server to stop that fast.
 The exception is:
 {noformat}
 testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort)
   Time elapsed: 30.06 sec   ERROR!
 java.lang.Exception: test timed out after 3 milliseconds
   at java.lang.Throwable.fillInStackTrace(Native Method)
   at java.lang.Throwable.init(Throwable.java:196)
   at java.lang.Exception.init(Exception.java:41)
   at java.lang.InterruptedException.init(InterruptedException.java:48)
   at java.lang.Thread.sleep(Native Method)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280)
   at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
   at 
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
   at 
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354)
   at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:892)
   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:750)
   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:725)
   at 
 org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort.testExceptionFromCoprocessorDuringPut(TestRegionServerCoprocessorExceptionWithAbort.java:84)
   at 

[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has

2011-11-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154723#comment-13154723
 ] 

jirapos...@reviews.apache.org commented on HBASE-4797:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2906/#review3413
---



src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
https://reviews.apache.org/r/2906/#comment7642

maxSedId should be named maxSeqId


- Ted


On 2011-11-21 22:38:39, Jimmy Xiang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2906/
bq.  ---
bq.  
bq.  (Updated 2011-11-21 22:38:39)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  If there are multiple recovered edits files, I used the file name to find 
the initial sequence id.  After these files are sorted, we can find a file's 
possible maximum sequence id based on the next file's initial sequence id.  If 
the maximum sequence id is smaller than the current sequence id, the whole 
recovered edits file is old and ignored.
bq.  
bq.  
bq.  This addresses bug HBASE-4797.
bq.  https://issues.apache.org/jira/browse/HBASE-4797
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 8b89661 
bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 
5daa02b 
bq.  
bq.  Diff: https://reviews.apache.org/r/2906/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Added test case to TestHRegion, and all the tests in this test are passed.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jimmy
bq.  
bq.



 [availability] Skip recovered.edits files with edits we know older than what 
 region currently has
 -

 Key: HBASE-4797
 URL: https://issues.apache.org/jira/browse/HBASE-4797
 Project: HBase
  Issue Type: Bug
  Components: performance
Reporter: stack
Assignee: Jimmy Xiang
Priority: Critical
  Labels: noob

 Testing 0.92, I crashed all servers out.  Another bug makes it so WALs are 
 not getting cleaned so I had 7000 regions to replay.  The distributed split 
 code did a nice job and cluster came back but interesting is that some hot 
 regions ended up having loads of recovered.edits files -- tens if not 
 hundreds -- to replay against the region (can we bulk load recovered.edits 
 instead of replaying them?).  Each recovered.edits file is taking about a 
 second to process (though only about 30 odd edits per file it seems).  The 
 region is unavailable during this time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4842) [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck

2011-11-21 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154729#comment-13154729
 ] 

Jonathan Hsieh commented on HBASE-4842:
---

Hm.. this looks like a race or due to the lack of a rendezvous of some sort.  
Up to HBASE-4378, there was a 15000ms (yikes 15 sec!) sleep between the 'hbck 
-fix' call and the subsequent 'hbck' call that is supposed to be clean.  
HBASE-4703 removed this.  

My hunch is that maybe the update to META the 'hbck -fix' does isn't seen on 
the second 'hbck' run.

https://github.com/apache/hbase/commit/6ca0e79a6ac92190238d5cda56f787ab9702d7fc#L61L138
TestHBaseFsck.java:138 


 [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck
 ---

 Key: HBASE-4842
 URL: https://issues.apache.org/jira/browse/HBASE-4842
 Project: HBase
  Issue Type: Bug
Reporter: Jonathan Hsieh

 Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck 
 is intermittently failing.
 In the test, a region's assignment is purposely changed in META but not in 
 ZK.  After the equivalent of 'hbck -fix', a subsequent check that should be 
 clean comes up with a new ZK assignment but with META still being 
 inconsistent with ZK.  The RS in ZK sometimes this points to the same RS, but 
 sometimes it moves to another ZK. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast

2011-11-21 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4832:
--

Attachment: 4832-timeout.txt

Patch which stores timeout value in a static variable.

 TestRegionServerCoprocessorExceptionWithAbort fails if the region server 
 stops too fast
 ---

 Key: HBASE-4832
 URL: https://issues.apache.org/jira/browse/HBASE-4832
 Project: HBase
  Issue Type: Bug
  Components: coprocessors, test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: Eugene Koontz
Priority: Minor
 Attachments: 4832-timeout.txt, 4832_trunk_hregionserver.patch, 
 HBASE-4832.patch


 The current implementation of HRegionServer#stop is
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 synchronized (this) {
   // Wakes run() if it is sleeping
   notifyAll(); // FindBugs NN_NAKED_NOTIFY
 }
   }
 {noformat}
 The notification is sent on the wrong object and does nothing. As a 
 consequence, the region server continues to sleep instead of waking up and 
 stopping immediately. A correct implementation is:
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 // Wakes run() if it is sleeping
 sleeper.skipSleepCycle();
   }
 {noformat}
 Then the region server stops immediately. This makes the region server stops 
 0,5s faster on average, which is quite useful for unit tests.
 However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does 
 not work.
 It likely because the code does no expect the region server to stop that fast.
 The exception is:
 {noformat}
 testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort)
   Time elapsed: 30.06 sec   ERROR!
 java.lang.Exception: test timed out after 3 milliseconds
   at java.lang.Throwable.fillInStackTrace(Native Method)
   at java.lang.Throwable.init(Throwable.java:196)
   at java.lang.Exception.init(Exception.java:41)
   at java.lang.InterruptedException.init(InterruptedException.java:48)
   at java.lang.Thread.sleep(Native Method)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280)
   at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
   at 
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
   at 
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354)
   at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:892)
   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:750)
   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:725)
   at 
 org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort.testExceptionFromCoprocessorDuringPut(TestRegionServerCoprocessorExceptionWithAbort.java:84)
   at 

[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has

2011-11-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154747#comment-13154747
 ] 

jirapos...@reviews.apache.org commented on HBASE-4797:
--



bq.  On 2011-11-21 23:23:07, Ted Yu wrote:
bq.   src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 
2468
bq.   https://reviews.apache.org/r/2906/diff/2/?file=59652#file59652line2468
bq.  
bq.   maxSedId should be named maxSeqId

Good catch.


- Jimmy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2906/#review3413
---


On 2011-11-21 22:38:39, Jimmy Xiang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2906/
bq.  ---
bq.  
bq.  (Updated 2011-11-21 22:38:39)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon and Michael Stack.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  If there are multiple recovered edits files, I used the file name to find 
the initial sequence id.  After these files are sorted, we can find a file's 
possible maximum sequence id based on the next file's initial sequence id.  If 
the maximum sequence id is smaller than the current sequence id, the whole 
recovered edits file is old and ignored.
bq.  
bq.  
bq.  This addresses bug HBASE-4797.
bq.  https://issues.apache.org/jira/browse/HBASE-4797
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 8b89661 
bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 
5daa02b 
bq.  
bq.  Diff: https://reviews.apache.org/r/2906/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Added test case to TestHRegion, and all the tests in this test are passed.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jimmy
bq.  
bq.



 [availability] Skip recovered.edits files with edits we know older than what 
 region currently has
 -

 Key: HBASE-4797
 URL: https://issues.apache.org/jira/browse/HBASE-4797
 Project: HBase
  Issue Type: Bug
  Components: performance
Reporter: stack
Assignee: Jimmy Xiang
Priority: Critical
  Labels: noob

 Testing 0.92, I crashed all servers out.  Another bug makes it so WALs are 
 not getting cleaned so I had 7000 regions to replay.  The distributed split 
 code did a nice job and cluster came back but interesting is that some hot 
 regions ended up having loads of recovered.edits files -- tens if not 
 hundreds -- to replay against the region (can we bulk load recovered.edits 
 instead of replaying them?).  Each recovered.edits file is taking about a 
 second to process (though only about 30 odd edits per file it seems).  The 
 region is unavailable during this time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

2011-11-21 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154749#comment-13154749
 ] 

Jean-Daniel Cryans commented on HBASE-4739:
---

I prefer the latest patch, although it seems the EventHandler.java bit was only 
needed with the new znode.

Handling RegionAlreadyInTransitionException would be cleaner.

 Master dying while going to close a region can leave it in transition forever
 -

 Key: HBASE-4739
 URL: https://issues.apache.org/jira/browse/HBASE-4739
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: 4739_trial2.patch, 4739_trialV3.patch, 
 HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, 
 HBASE-4739_trial.patch


 I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when 
 the master died it had just created the RIT znode for a region but didn't 
 tell the RS to close it yet.
 When the master restarted it saw the znode and started printing this:
 {quote}
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. 
 state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for 
 too long, this should eventually complete or the server will expire, doing 
 nothing
 {quote}
 It's never going to happen, and it's blocking balancing.
 I'm marking this as minor since I believe this situation is pretty rare 
 unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

2011-11-21 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154758#comment-13154758
 ] 

stack commented on HBASE-4739:
--

Is this state misnamed?  Is the comment below saying that the master sets 
CLOSING state?

{code}
-RS_ZK_REGION_CLOSING  (1),   // RS is in process of closing a region
+RS_ZK_REGION_CLOSING  (1),   // Master adds this region as closing in 
ZK
{code}

Why not just print out state rather than do this in log message:

{code}
(state.isPendingClose() ? pending close  : closing ) +
{code}

I'm not up on what rest of patch does so can't comment more.

Thanks for digging in Jinchao.



 Master dying while going to close a region can leave it in transition forever
 -

 Key: HBASE-4739
 URL: https://issues.apache.org/jira/browse/HBASE-4739
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: 4739_trial2.patch, 4739_trialV3.patch, 
 HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, 
 HBASE-4739_trial.patch


 I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when 
 the master died it had just created the RIT znode for a region but didn't 
 tell the RS to close it yet.
 When the master restarted it saw the znode and started printing this:
 {quote}
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. 
 state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for 
 too long, this should eventually complete or the server will expire, doing 
 nothing
 {quote}
 It's never going to happen, and it's blocking balancing.
 I'm marking this as minor since I believe this situation is pretty rare 
 unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >