date:20120401


 [ 
https://issues.apache.org/jira/browse/HBASE-5671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5671:
-

Fix Version/s: (was: 0.94.1)
   0.94.0

 hbase.metrics.showTableName should be true by default
 -

 Key: HBASE-5671
 URL: https://issues.apache.org/jira/browse/HBASE-5671
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Critical
 Fix For: 0.94.0, 0.96.0

 Attachments: HBASE-5671_v1.patch


 HBASE-4768 added per-cf metrics and a new configuration option 
 hbase.metrics.showTableName. We should switch the conf option to true by 
 default, since it is not intuitive (at least to me) to aggregate per-cf 
 across tables by default, and it seems confusing to report on cf's without 
 table names. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5638) Backport to 0.90 and 0.92 - NPE reading ZK config in HBase


 [ 
https://issues.apache.org/jira/browse/HBASE-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5638:
-

Fix Version/s: (was: 0.94.1)
   0.96.0

 Backport to 0.90 and 0.92 - NPE reading ZK config in HBase
 --

 Key: HBASE-5638
 URL: https://issues.apache.org/jira/browse/HBASE-5638
 Project: HBase
  Issue Type: Sub-task
  Components: zookeeper
Affects Versions: 0.90.6, 0.92.1
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Minor
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5633-0.90.patch, HBASE-5633-0.92.patch, 
 HBASE-5638-0.90-v1.patch, HBASE-5638-0.90-v2.patch, HBASE-5638-0.92-v1.patch, 
 HBASE-5638-0.92-v2.patch, HBASE-5638-trunk-v1.patch, HBASE-5638-trunk-v2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5677) The master never does balance because duplicate openhandled the one region

2012-04-01 Thread xufeng (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243668#comment-13243668
 ] 

xufeng commented on HBASE-5677:
---

We can reproduce this issue by following steps with 0.90:

step1:start a cluster and create a table that has many regions.
step2:disable table created in step1 by shell.
step3:kill the active master.
step3:the backup master will become active one,when the master checkin 
regionservers. enable the table by shell.

result:the duplicate problem issue happened.


I think the master should not provide service when it did not complete the 
initialization.
We can add a method in HMasterInterface 
like:
{noformat}
public boolean isMasterAvailable();

  //the master is running and it can provide service
  public boolean isMasterAvailable() {
return !isStopped()  isActiveMaster()  isInitialized();
  }
{noformat}


When the client getMaster,we can check it.

pls give me the suggestions,thanks.

 The master never does balance because duplicate openhandled the one region
 --

 Key: HBASE-5677
 URL: https://issues.apache.org/jira/browse/HBASE-5677
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
 Environment: 0.90
Reporter: xufeng
Assignee: xufeng

 If region be assigned When the master is doing initialization(before do 
 processFailover),the region will be duplicate openhandled.
 because the unassigned node in zookeeper will be handled again in 
 AssignmentManager#processFailover()
 it cause the region in RIT,thus the master never does balance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5671) hbase.metrics.showTableName should be true by default


 [ 
https://issues.apache.org/jira/browse/HBASE-5671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5671:
-

Fix Version/s: (was: 0.96.0)

 hbase.metrics.showTableName should be true by default
 -

 Key: HBASE-5671
 URL: https://issues.apache.org/jira/browse/HBASE-5671
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Critical
 Fix For: 0.94.0

 Attachments: HBASE-5671_v1.patch


 HBASE-4768 added per-cf metrics and a new configuration option 
 hbase.metrics.showTableName. We should switch the conf option to true by 
 default, since it is not intuitive (at least to me) to aggregate per-cf 
 across tables by default, and it seems confusing to report on cf's without 
 table names. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5671) hbase.metrics.showTableName should be true by default


 [ 
https://issues.apache.org/jira/browse/HBASE-5671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5671:
-

Fix Version/s: 0.96.0

 hbase.metrics.showTableName should be true by default
 -

 Key: HBASE-5671
 URL: https://issues.apache.org/jira/browse/HBASE-5671
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Critical
 Fix For: 0.94.0, 0.96.0

 Attachments: HBASE-5671_v1.patch


 HBASE-4768 added per-cf metrics and a new configuration option 
 hbase.metrics.showTableName. We should switch the conf option to true by 
 default, since it is not intuitive (at least to me) to aggregate per-cf 
 across tables by default, and it seems confusing to report on cf's without 
 table names. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available


 [ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-5666:
---

Attachment: HBASE-5666-v0.patch

Patch attached to retry only on HRegionServer . Using 
hbase.basenode.avail.timeout as conf key.

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: HBASE-5666-v0.patch, hbase-1-regionserver.log, 
 hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, 
 hbase-regionserver.log, hbase-zookeeper.log, zk-exists-refactor-v0.patch


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5213) hbase master stop does not bring down backup masters

2012-04-01 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243673#comment-13243673
 ] 

Hudson commented on HBASE-5213:
---

Integrated in HBase-0.92 #348 (See 
[https://builds.apache.org/job/HBase-0.92/348/])
HBASE-5213 hbase master stop does not bring down backup masters (Gregory) 
(Revision 1308012)

 Result = SUCCESS
jmhsieh : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestMasterShutdown.java


 hbase master stop does not bring down backup masters
 --

 Key: HBASE-5213
 URL: https://issues.apache.org/jira/browse/HBASE-5213
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
Priority: Minor
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5213-v0-trunk.patch, HBASE-5213-v1-trunk.patch, 
 HBASE-5213-v2-90.patch, HBASE-5213-v2-92.patch, HBASE-5213-v2-trunk.patch


 Typing hbase master stop produces the following message:
 stop   Start cluster shutdown; Master signals RegionServer shutdown
 It seems like backup masters should be considered part of the cluster, but 
 they are not brought down by hbase master stop.
 stop-hbase.sh does correctly bring down the backup masters.
 The same behavior is observed when a client app makes use of the client API 
 HBaseAdmin.shutdown() 
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#shutdown()
  -- this isn't too surprising since I think hbase master stop just calls 
 this API.
 It seems like HBASE-1448 address this; perhaps there was a regression?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available


 [ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-5666:
---

Status: Patch Available  (was: Open)

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: HBASE-5666-v0.patch, hbase-1-regionserver.log, 
 hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, 
 hbase-regionserver.log, hbase-zookeeper.log, zk-exists-refactor-v0.patch


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

2012-04-01 Thread chunhui shen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243675#comment-13243675
 ] 

chunhui shen commented on HBASE-5689:
-

bq.Is a TreeMap needed above ? We're just remembering the mapping, right ?
I first used ConcurrentHashMap, but for the same region name, they mapped to 
different values because of byte[]

 Skipping RecoveredEdits may cause data loss
 ---

 Key: HBASE-5689
 URL: https://issues.apache.org/jira/browse/HBASE-5689
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.94.0

 Attachments: 5689-simplified.txt, 5689-testcase.patch, 
 HBASE-5689.patch


 Let's see the following scenario:
 1.Region is on the server A
 2.put KV(r1-v1) to the region
 3.move region from server A to server B
 4.put KV(r2-v2) to the region
 5.move region from server B to server A
 6.put KV(r3-v3) to the region
 7.kill -9 server B and start it
 8.kill -9 server A and start it 
 9.scan the region, we could only get two KV(r1-v1,r2-v2), the third 
 KV(r3-v3) is lost.
 Let's analyse the upper scenario from the code:
 1.the edit logs of KV(r1-v1) and KV(r3-v3) are both recorded in the same 
 hlog file on server A.
 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
 we create one RecoveredEdits file f1 for the region.
 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
 we create another RecoveredEdits file f2 for the region.
 3.however, RecoveredEdits file f2 will be skiped when initializing region
 HRegion#replayRecoveredEditsIfAny
 {code}
  for (Path edits: files) {
   if (edits == null || !this.fs.exists(edits)) {
 LOG.warn(Null or non-existent edits file:  + edits);
 continue;
   }
   if (isZeroLengthThenDelete(this.fs, edits)) continue;
   if (checkSafeToSkip) {
 Path higher = files.higher(edits);
 long maxSeqId = Long.MAX_VALUE;
 if (higher != null) {
   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: -?[0-9]+
   String fileName = higher.getName();
   maxSeqId = Math.abs(Long.parseLong(fileName));
 }
 if (maxSeqId = minSeqId) {
   String msg = Maximum possible sequenceid for this log is  + 
 maxSeqId
   + , skipped the whole file, path= + edits;
   LOG.debug(msg);
   continue;
 } else {
   checkSafeToSkip = false;
 }
   }
 {code}
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-5693) When creating a region, the master initializes it and creates a memstore within the master server

2012-04-01 Thread nkeywal (Created) (JIRA)

When creating a region, the master initializes it and creates a memstore within 
the master server
-

 Key: HBASE-5693
 URL: https://issues.apache.org/jira/browse/HBASE-5693
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor


I didn't do a complete analysis, but the attached patch saves more than 0.25s 
for each region creation and locally all the unit tests work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5693) When creating a region, the master initializes it and creates a memstore within the master server

2012-04-01 Thread nkeywal (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5693:
---

Attachment: 5693.v1.patch

 When creating a region, the master initializes it and creates a memstore 
 within the master server
 -

 Key: HBASE-5693
 URL: https://issues.apache.org/jira/browse/HBASE-5693
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5693.v1.patch


 I didn't do a complete analysis, but the attached patch saves more than 0.25s 
 for each region creation and locally all the unit tests work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5693) When creating a region, the master initializes it and creates a memstore within the master server

2012-04-01 Thread nkeywal (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5693:
---

Status: Patch Available  (was: Open)

 When creating a region, the master initializes it and creates a memstore 
 within the master server
 -

 Key: HBASE-5693
 URL: https://issues.apache.org/jira/browse/HBASE-5693
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5693.v1.patch


 I didn't do a complete analysis, but the attached patch saves more than 0.25s 
 for each region creation and locally all the unit tests work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available


 [ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-5666:
---

Attachment: (was: zk-exists-refactor-v0.patch)

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, 
 hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, 
 hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available


 [ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-5666:
---

Attachment: (was: HBASE-5666-v0.patch)

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, 
 hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, 
 hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available


 [ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-5666:
---

Attachment: HBASE-5666-v1.patch

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: HBASE-5666-v1.patch, hbase-1-regionserver.log, 
 hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, 
 hbase-regionserver.log, hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

2012-04-01 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243743#comment-13243743
 ] 

Ted Yu commented on HBASE-5689:
---

Using a TreeMap is common practice.

Please attach test suite result - Hadoop QA is not working.

 Skipping RecoveredEdits may cause data loss
 ---

 Key: HBASE-5689
 URL: https://issues.apache.org/jira/browse/HBASE-5689
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.94.0

 Attachments: 5689-simplified.txt, 5689-testcase.patch, 
 HBASE-5689.patch


 Let's see the following scenario:
 1.Region is on the server A
 2.put KV(r1-v1) to the region
 3.move region from server A to server B
 4.put KV(r2-v2) to the region
 5.move region from server B to server A
 6.put KV(r3-v3) to the region
 7.kill -9 server B and start it
 8.kill -9 server A and start it 
 9.scan the region, we could only get two KV(r1-v1,r2-v2), the third 
 KV(r3-v3) is lost.
 Let's analyse the upper scenario from the code:
 1.the edit logs of KV(r1-v1) and KV(r3-v3) are both recorded in the same 
 hlog file on server A.
 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
 we create one RecoveredEdits file f1 for the region.
 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
 we create another RecoveredEdits file f2 for the region.
 3.however, RecoveredEdits file f2 will be skiped when initializing region
 HRegion#replayRecoveredEditsIfAny
 {code}
  for (Path edits: files) {
   if (edits == null || !this.fs.exists(edits)) {
 LOG.warn(Null or non-existent edits file:  + edits);
 continue;
   }
   if (isZeroLengthThenDelete(this.fs, edits)) continue;
   if (checkSafeToSkip) {
 Path higher = files.higher(edits);
 long maxSeqId = Long.MAX_VALUE;
 if (higher != null) {
   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: -?[0-9]+
   String fileName = higher.getName();
   maxSeqId = Math.abs(Long.parseLong(fileName));
 }
 if (maxSeqId = minSeqId) {
   String msg = Maximum possible sequenceid for this log is  + 
 maxSeqId
   + , skipped the whole file, path= + edits;
   LOG.debug(msg);
   continue;
 } else {
   checkSafeToSkip = false;
 }
   }
 {code}
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4348) Add metrics for regions in transition


[ 
https://issues.apache.org/jira/browse/HBASE-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243745#comment-13243745
 ] 

jirapos...@reviews.apache.org commented on HBASE-4348:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4402/#review6604
---



src/main/java/org/apache/hadoop/hbase/HConstants.java
https://reviews.apache.org/r/4402/#comment14269

I think the trailing '.time' isn't needed. Take a look at existing config 
parameter names involving threshold:
{code}
   this.thresholdIdleConnections = conf.getInt(ipc.client.idlethreshold, 
4000);
src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
putsortreducer.row.threshold, 2L * (130));
src/main/java/org/apache/hadoop/hbase/mapreduce/PutSortReducer.java
{code}




src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
https://reviews.apache.org/r/4402/#comment14270

Please add curly braces around the following line.



src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
https://reviews.apache.org/r/4402/#comment14271

Lift this line to line 2733.



src/main/java/org/apache/hadoop/hbase/master/HMaster.java
https://reviews.apache.org/r/4402/#comment14272

'out' isn't needed here. It would be nice to combine this sentence into the 
comment for this method.


- Ted


On 2012-03-30 05:21:12, Himanshu Vashishtha wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4402/
bq.  ---
bq.  
bq.  (Updated 2012-03-30 05:21:12)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch is for adding Region in transition metrics to the HMaster 
metrics system. It also adds these metrics in the master ui, in the Region in 
transition section. I have attached the proposed new format in the jira 4348.
bq.  
bq.  
bq.  This addresses bug HBase-4348.
bq.  https://issues.apache.org/jira/browse/HBase-4348
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
src/main/jamon/org/apache/hadoop/hbase/tmpl/master/AssignmentManagerStatusTmpl.jamon
 0dc0691 
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 21ac4ba 
bq.src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 
64def15 
bq.src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9bd4ace 
bq.src/main/java/org/apache/hadoop/hbase/master/metrics/MasterMetrics.java 
83abc52 
bq.src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java 
91dce36 
bq.  
bq.  Diff: https://reviews.apache.org/r/4402/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Ran on a 5 node cluster and kill region servers randomly to observe the 
changes in the RIT metrics as emitted out by the Master's mxbean;
bq.  
bq.  mvn test passes without any failure.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Himanshu
bq.  
bq.



 Add metrics for regions in transition
 -

 Key: HBASE-4348
 URL: https://issues.apache.org/jira/browse/HBASE-4348
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Himanshu Vashishtha
Priority: Minor
  Labels: noob
 Fix For: 0.96.0

 Attachments: 4348-metrics-v3.patch, 4348-v1.patch, 4348-v2.patch, 
 RITs.png, RegionInTransitions2.png, metrics-v2.patch


 The following metrics would be useful for monitoring the master:
 - the number of regions in transition
 - the number of regions in transition that have been in transition for more 
 than a minute
 - how many seconds has the oldest region-in-transition been in transition

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5693) When creating a region, the master initializes it and creates a memstore within the master server


[ 
https://issues.apache.org/jira/browse/HBASE-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243746#comment-13243746
 ] 

Ted Yu commented on HBASE-5693:
---

CreateTableHandler isn't initializing the regions.
Who will initialize them ?

 When creating a region, the master initializes it and creates a memstore 
 within the master server
 -

 Key: HBASE-5693
 URL: https://issues.apache.org/jira/browse/HBASE-5693
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5693.v1.patch


 I didn't do a complete analysis, but the attached patch saves more than 0.25s 
 for each region creation and locally all the unit tests work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5677) The master never does balance because duplicate openhandled the one region


[ 
https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243747#comment-13243747
 ] 

Ted Yu commented on HBASE-5677:
---

Interesting.
Chunhui proposed safe mode for Master in HBASE-5270. See 
https://issues.apache.org/jira/browse/HBASE-5270?focusedCommentId=13214394page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13214394

Can you verify that this issue has been fixed in 0.92.2 ?

Thanks

 The master never does balance because duplicate openhandled the one region
 --

 Key: HBASE-5677
 URL: https://issues.apache.org/jira/browse/HBASE-5677
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
 Environment: 0.90
Reporter: xufeng
Assignee: xufeng

 If region be assigned When the master is doing initialization(before do 
 processFailover),the region will be duplicate openhandled.
 because the unassigned node in zookeeper will be handled again in 
 AssignmentManager#processFailover()
 it cause the region in RIT,thus the master never does balance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5693) When creating a region, the master initializes it and creates a memstore within the master server

2012-04-01 Thread nkeywal (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243751#comment-13243751
 ] 

nkeywal commented on HBASE-5693:


I didn't look very far in the code. CreateTableHandler  is executed on the
master. It does not need to initialize the memstore  so on.
The underlying method is called from the region server as well; and here
the initialization code is called. May be there is some thing more complex
I didn't see, but at least all the unit tests went well.



On Sun, Apr 1, 2012 at 5:28 PM, Ted Yu (Commented) (JIRA)



 When creating a region, the master initializes it and creates a memstore 
 within the master server
 -

 Key: HBASE-5693
 URL: https://issues.apache.org/jira/browse/HBASE-5693
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5693.v1.patch


 I didn't do a complete analysis, but the attached patch saves more than 0.25s 
 for each region creation and locally all the unit tests work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5693) When creating a region, the master initializes it and creates a memstore within the master server


[ 
https://issues.apache.org/jira/browse/HBASE-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243762#comment-13243762
 ] 

Ted Yu commented on HBASE-5693:
---

It is called from OpenRegionHandler.openRegion()

I once made some threads daemon which passed unit tests but resulted in master 
and region server failing to start.

Testing on a real cluster is desirable.

 When creating a region, the master initializes it and creates a memstore 
 within the master server
 -

 Key: HBASE-5693
 URL: https://issues.apache.org/jira/browse/HBASE-5693
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5693.v1.patch


 I didn't do a complete analysis, but the attached patch saves more than 0.25s 
 for each region creation and locally all the unit tests work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5693) When creating a region, the master initializes it and creates a memstore within the master server

2012-04-01 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243765#comment-13243765
 ] 

Ted Yu commented on HBASE-5693:
---

@N:
Can you rebased the patch for trunk ?
{code}
Hunk #3 FAILED at 3613.
1 out of 3 hunks FAILED -- saving rejects to file 
src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java.rej
{code}

 When creating a region, the master initializes it and creates a memstore 
 within the master server
 -

 Key: HBASE-5693
 URL: https://issues.apache.org/jira/browse/HBASE-5693
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5693.v1.patch


 I didn't do a complete analysis, but the attached patch saves more than 0.25s 
 for each region creation and locally all the unit tests work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5688) Convert zk root-region-server znode content to pb


[ 
https://issues.apache.org/jira/browse/HBASE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243769#comment-13243769
 ] 

jirapos...@reviews.apache.org commented on HBASE-5688:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4600/#review6605
---


Looks good to me.

- Jimmy


On 2012-04-01 00:18:54, Michael Stack wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4600/
bq.  ---
bq.  
bq.  (Updated 2012-04-01 00:18:54)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Changes the content of the root location znode, root-region-server, to be
bq.  four magic bytes ('PBUF') followed by a protobuf message that holds the
bq.  ServerName of the server currently hosting root.
bq.  
bq.  D src/main/java/org/apache/hadoop/hbase/catalog/RootLocationEditor.java
bq.Removed. Had two methods, one to add root-region-server znode and another
bq.to removed it.  Rather, put these methods in RootRegionTracker.  It
bq.tracks root-region-server znode.  Having all to do w/ root-region-server
bq.is more cohesive.  Also makes it so can encapsulate in one class
bq.all to do w/ create, delete, and reading of root-region-server.
bq.We also want to purge the catalog package (See note at head of
bq.CatalogTracker).
bq.  M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
bq.  M src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
bq.  M src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
bq.Get root region location from RootRegionTracker rather than from 
RootLocationEditor.
bq.  A src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
bq.Utility to do w/ protobuf handling.  Has methods to help prefixing
bq.and stripping from serialized protobuf messages some 'magic'.
bq.  A 
src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java
bq.PB generated.
bq.  M src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
bq.Use new RootRegionTracker method for getting content of znode rather
bq.than do it all here (going via RootRegionTracker, we can keep how
bq.the znode content is serialized private to the RootRegionTracker class.
bq.  M src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java
bq.Has the methods that used to be in RootLocationEditor plus a new
bq.  
bq.  
bq.  This addresses bug hbase-5688.
bq.  https://issues.apache.org/jira/browse/hbase-5688
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/catalog/RootLocationEditor.java 
c90864a 
bq.src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 
b2a5463 
bq.src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 
64def15 
bq.src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 
9c215b4 
bq.src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 2f05005 
bq.src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java 
33e4e71 
bq.src/main/protobuf/ZooKeeper.proto PRE-CREATION 
bq.src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 
533b2bf 
bq.
src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTrackerOnCluster.java 
fe37156 
bq.src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java 
2132036 
bq.
src/test/java/org/apache/hadoop/hbase/zookeeper/TestRootRegionTracker.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4600/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Michael
bq.  
bq.



 Convert zk root-region-server znode content to pb
 -

 Key: HBASE-5688
 URL: https://issues.apache.org/jira/browse/HBASE-5688
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Fix For: 0.96.0

 Attachments: 5688.txt, 5688v4.txt


 Move the root-region-server znode content from the versioned bytes that 
 ServerName.getVersionedBytes outputs to instead be pb.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on

[jira] [Commented] (HBASE-5665) Repeated split causes HRegionServer failures and breaks table

2012-04-01 Thread Matteo Bertozzi (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243771#comment-13243771
 ] 

Matteo Bertozzi commented on HBASE-5665:


Can we also add a couple of methods to the region like isSplittable() and 
isAvailable()
{code}
boolean isAvailable() {
  return !isClosed()  !isClosing();
}

boolean isSplittable() {
  return isAvailable()  !hasReferences();
}
{code}

just to avoid similar problems in future...
For example in HRegionServer both getMostLoadedRegions() and closeUserRegions() 
does the same isAvailable() check...

 Repeated split causes HRegionServer failures and breaks table 
 --

 Key: HBASE-5665
 URL: https://issues.apache.org/jira/browse/HBASE-5665
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0, 0.92.1
Reporter: Cosmin Lehene
Assignee: Cosmin Lehene
Priority: Blocker
 Attachments: HBASE-5665-0.92.patch


 Repeated splits on large tables (2 consecutive would suffice) will 
 essentially break the table (and the cluster), unrecoverable.
 The regionserver doing the split dies and the master will get into an 
 infinite loop trying to assign regions that seem to have the files missing 
 from HDFS.
 The table can be disabled once. upon trying to re-enable it, it will remain 
 in an intermediary state forever.
 I was able to reproduce this on a smaller table consistently.
 {code}
 hbase(main):030:0 (0..1).each{|x| put 't1', #{x}, 'f1:t', 'dd'}
 hbase(main):030:0 (0..1000).each{|x| split 't1', #{x*10}}
 {code}
 Running overlapping splits in parallel (e.g. #{x*10+1}, #{x*10+2}... ) 
 will reproduce the issue almost instantly and consistently. 
 {code}
 2012-03-28 10:57:16,320 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Offlined parent region t1,,1332957435767.2fb0473f4e71339e88dab0ee0d4dffa1. in 
 META
 2012-03-28 10:57:16,321 DEBUG 
 org.apache.hadoop.hbase.regionserver.CompactSplitThread: Split requested for 
 t1,5,1332957435767.648d30de55a5cec6fc2f56dcb3c7eee1..  
 compaction_queue=(0:1), split_queue=10
 2012-03-28 10:57:16,343 INFO 
 org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup 
 of failed split of t1,,1332957435767.2fb0473f4e71339e88dab0ee0d4dffa1.; 
 Failed ld2,60020,1332957343833-daughterOpener=2469c5650ea2aeed631eb85d3cdc3124
 java.io.IOException: Failed 
 ld2,60020,1332957343833-daughterOpener=2469c5650ea2aeed631eb85d3cdc3124
 at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:363)
 at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:451)
 at 
 org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.io.FileNotFoundException: File does not exist: 
 /hbase/t1/589c44cabba419c6ad8c9b427e5894e3.2fb0473f4e71339e88dab0ee0d4dffa1/f1/d62a852c25ad44e09518e102ca557237
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1822)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.init(DFSClient.java:1813)
 at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:544)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:187)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:456)
 at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:341)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile$Reader.init(StoreFile.java:1008)
 at 
 org.apache.hadoop.hbase.io.HalfStoreFileReader.init(HalfStoreFileReader.java:65)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:467)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:548)
 at 
 org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:284)
 at org.apache.hadoop.hbase.regionserver.Store.init(Store.java:221)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:2511)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:450)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3229)
 at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughterRegion(SplitTransaction.java:504)
 at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction$DaughterOpener.run(SplitTransaction.java:484)
 ... 1 more
 2012-03-28 10:57:16,345 FATAL

[jira] [Commented] (HBASE-5688) Convert zk root-region-server znode content to pb

2012-04-01 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243774#comment-13243774
 ] 

jirapos...@reviews.apache.org commented on HBASE-5688:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4600/#review6606
---



src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
https://reviews.apache.org/r/4600/#comment14273

I think prefixedWithPBMagic would be a better name for this method.



src/test/java/org/apache/hadoop/hbase/zookeeper/TestRootRegionTracker.java
https://reviews.apache.org/r/4600/#comment14274

Javadoc would be desirable.



src/test/java/org/apache/hadoop/hbase/zookeeper/TestRootRegionTracker.java
https://reviews.apache.org/r/4600/#comment14275

White space.


- Ted


On 2012-04-01 00:18:54, Michael Stack wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4600/
bq.  ---
bq.  
bq.  (Updated 2012-04-01 00:18:54)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Changes the content of the root location znode, root-region-server, to be
bq.  four magic bytes ('PBUF') followed by a protobuf message that holds the
bq.  ServerName of the server currently hosting root.
bq.  
bq.  D src/main/java/org/apache/hadoop/hbase/catalog/RootLocationEditor.java
bq.Removed. Had two methods, one to add root-region-server znode and another
bq.to removed it.  Rather, put these methods in RootRegionTracker.  It
bq.tracks root-region-server znode.  Having all to do w/ root-region-server
bq.is more cohesive.  Also makes it so can encapsulate in one class
bq.all to do w/ create, delete, and reading of root-region-server.
bq.We also want to purge the catalog package (See note at head of
bq.CatalogTracker).
bq.  M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
bq.  M src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
bq.  M src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
bq.Get root region location from RootRegionTracker rather than from 
RootLocationEditor.
bq.  A src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
bq.Utility to do w/ protobuf handling.  Has methods to help prefixing
bq.and stripping from serialized protobuf messages some 'magic'.
bq.  A 
src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java
bq.PB generated.
bq.  M src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
bq.Use new RootRegionTracker method for getting content of znode rather
bq.than do it all here (going via RootRegionTracker, we can keep how
bq.the znode content is serialized private to the RootRegionTracker class.
bq.  M src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java
bq.Has the methods that used to be in RootLocationEditor plus a new
bq.  
bq.  
bq.  This addresses bug hbase-5688.
bq.  https://issues.apache.org/jira/browse/hbase-5688
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/catalog/RootLocationEditor.java 
c90864a 
bq.src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 
b2a5463 
bq.src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 
64def15 
bq.src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java 
PRE-CREATION 
bq.
src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 
9c215b4 
bq.src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 2f05005 
bq.src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java 
33e4e71 
bq.src/main/protobuf/ZooKeeper.proto PRE-CREATION 
bq.src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 
533b2bf 
bq.
src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTrackerOnCluster.java 
fe37156 
bq.src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java 
2132036 
bq.
src/test/java/org/apache/hadoop/hbase/zookeeper/TestRootRegionTracker.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4600/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Michael
bq.  
bq.



 Convert zk root-region-server znode content to pb
 -

 Key: HBASE-5688
 URL: https://issues.apache.org/jira/browse/HBASE-5688
 Project: HBase
  Issue Type: Task
Reporter:

[jira] [Commented] (HBASE-5693) When creating a region, the master initializes it and creates a memstore within the master server

2012-04-01 Thread nkeywal (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243775#comment-13243775
 ] 

nkeywal commented on HBASE-5693:


Ok, I will do that + a test on a real cluster.

On Sun, Apr 1, 2012 at 6:12 PM, Ted Yu (Commented) (JIRA)



 When creating a region, the master initializes it and creates a memstore 
 within the master server
 -

 Key: HBASE-5693
 URL: https://issues.apache.org/jira/browse/HBASE-5693
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5693.v1.patch


 I didn't do a complete analysis, but the attached patch saves more than 0.25s 
 for each region creation and locally all the unit tests work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4348) Add metrics for regions in transition

2012-04-01 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243798#comment-13243798
 ] 

jirapos...@reviews.apache.org commented on HBASE-4348:
--



bq.  On 2012-04-01 15:13:59, Ted Yu wrote:
bq.   src/main/java/org/apache/hadoop/hbase/HConstants.java, line 655
bq.   https://reviews.apache.org/r/4402/diff/6/?file=97548#file97548line655
bq.  
bq.   I think the trailing '.time' isn't needed. Take a look at existing 
config parameter names involving threshold:
bq.   {code}
bq.  this.thresholdIdleConnections = 
conf.getInt(ipc.client.idlethreshold, 4000);
bq.   src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
bq.   putsortreducer.row.threshold, 2L * (130));
bq.   src/main/java/org/apache/hadoop/hbase/mapreduce/PutSortReducer.java
bq.   {code}
bq.  

Actually, looking at the metric name without context, 
hbase.metrics.rit.threshold makes me think this is a count of the number of max 
regions in transition.  With the .time suffix, it makes me think it is the max 
time for an RIT which also isn't quite right.  If all things are in millis than 
we probably don't need units but it doesn't hurt IMO. What do you think of 
something like: hbase.metrics.rit.refresh.millis, 
hbase.metrics.rit.refresh.threshold.millis, or 
hbase.metrics.rit.refresh.threshold?


- jmhsieh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4402/#review6604
---


On 2012-03-30 05:21:12, Himanshu Vashishtha wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4402/
bq.  ---
bq.  
bq.  (Updated 2012-03-30 05:21:12)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch is for adding Region in transition metrics to the HMaster 
metrics system. It also adds these metrics in the master ui, in the Region in 
transition section. I have attached the proposed new format in the jira 4348.
bq.  
bq.  
bq.  This addresses bug HBase-4348.
bq.  https://issues.apache.org/jira/browse/HBase-4348
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
src/main/jamon/org/apache/hadoop/hbase/tmpl/master/AssignmentManagerStatusTmpl.jamon
 0dc0691 
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 21ac4ba 
bq.src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 
64def15 
bq.src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9bd4ace 
bq.src/main/java/org/apache/hadoop/hbase/master/metrics/MasterMetrics.java 
83abc52 
bq.src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java 
91dce36 
bq.  
bq.  Diff: https://reviews.apache.org/r/4402/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Ran on a 5 node cluster and kill region servers randomly to observe the 
changes in the RIT metrics as emitted out by the Master's mxbean;
bq.  
bq.  mvn test passes without any failure.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Himanshu
bq.  
bq.



 Add metrics for regions in transition
 -

 Key: HBASE-4348
 URL: https://issues.apache.org/jira/browse/HBASE-4348
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Himanshu Vashishtha
Priority: Minor
  Labels: noob
 Fix For: 0.96.0

 Attachments: 4348-metrics-v3.patch, 4348-v1.patch, 4348-v2.patch, 
 RITs.png, RegionInTransitions2.png, metrics-v2.patch


 The following metrics would be useful for monitoring the master:
 - the number of regions in transition
 - the number of regions in transition that have been in transition for more 
 than a minute
 - how many seconds has the oldest region-in-transition been in transition

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4348) Add metrics for regions in transition

2012-04-01 Thread Jonathan Hsieh (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243801#comment-13243801
 ] 

Jonathan Hsieh commented on HBASE-4348:
---

@Otis  I generally like the policy of setting versions on commit, or having a 
release manager set it if they decide it is necessary for a release. 

 Add metrics for regions in transition
 -

 Key: HBASE-4348
 URL: https://issues.apache.org/jira/browse/HBASE-4348
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Himanshu Vashishtha
Priority: Minor
  Labels: noob
 Fix For: 0.96.0

 Attachments: 4348-metrics-v3.patch, 4348-v1.patch, 4348-v2.patch, 
 RITs.png, RegionInTransitions2.png, metrics-v2.patch


 The following metrics would be useful for monitoring the master:
 - the number of regions in transition
 - the number of regions in transition that have been in transition for more 
 than a minute
 - how many seconds has the oldest region-in-transition been in transition

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5693) When creating a region, the master initializes it and creates a memstore within the master server

[
https://issues.apache.org/jira/browse/HBASE-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243836#comment-13243836
]

Hadoop QA commented on HBASE-5693:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12520822/5693.v1.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified
tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

-1 patch. The patch command could not apply the patch.

Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/1358//console

This message is automatically generated.

When creating a region, the master initializes it and creates a memstore
within the master server
-

Key: HBASE-5693
URL: https://issues.apache.org/jira/browse/HBASE-5693
Project: HBase
Issue Type: Improvement
Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
Attachments: 5693.v1.patch

I didn't do a complete analysis, but the attached patch saves more than 0.25s
for each region creation and locally all the unit tests work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5665) Repeated split causes HRegionServer failures and breaks table


 [ 
https://issues.apache.org/jira/browse/HBASE-5665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-5665:
---

Attachment: HBASE-5665-trunk.patch

 Repeated split causes HRegionServer failures and breaks table 
 --

 Key: HBASE-5665
 URL: https://issues.apache.org/jira/browse/HBASE-5665
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0, 0.92.1
Reporter: Cosmin Lehene
Assignee: Cosmin Lehene
Priority: Blocker
 Attachments: HBASE-5665-0.92.patch, HBASE-5665-trunk.patch


 Repeated splits on large tables (2 consecutive would suffice) will 
 essentially break the table (and the cluster), unrecoverable.
 The regionserver doing the split dies and the master will get into an 
 infinite loop trying to assign regions that seem to have the files missing 
 from HDFS.
 The table can be disabled once. upon trying to re-enable it, it will remain 
 in an intermediary state forever.
 I was able to reproduce this on a smaller table consistently.
 {code}
 hbase(main):030:0 (0..1).each{|x| put 't1', #{x}, 'f1:t', 'dd'}
 hbase(main):030:0 (0..1000).each{|x| split 't1', #{x*10}}
 {code}
 Running overlapping splits in parallel (e.g. #{x*10+1}, #{x*10+2}... ) 
 will reproduce the issue almost instantly and consistently. 
 {code}
 2012-03-28 10:57:16,320 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Offlined parent region t1,,1332957435767.2fb0473f4e71339e88dab0ee0d4dffa1. in 
 META
 2012-03-28 10:57:16,321 DEBUG 
 org.apache.hadoop.hbase.regionserver.CompactSplitThread: Split requested for 
 t1,5,1332957435767.648d30de55a5cec6fc2f56dcb3c7eee1..  
 compaction_queue=(0:1), split_queue=10
 2012-03-28 10:57:16,343 INFO 
 org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup 
 of failed split of t1,,1332957435767.2fb0473f4e71339e88dab0ee0d4dffa1.; 
 Failed ld2,60020,1332957343833-daughterOpener=2469c5650ea2aeed631eb85d3cdc3124
 java.io.IOException: Failed 
 ld2,60020,1332957343833-daughterOpener=2469c5650ea2aeed631eb85d3cdc3124
 at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:363)
 at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:451)
 at 
 org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.io.FileNotFoundException: File does not exist: 
 /hbase/t1/589c44cabba419c6ad8c9b427e5894e3.2fb0473f4e71339e88dab0ee0d4dffa1/f1/d62a852c25ad44e09518e102ca557237
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1822)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.init(DFSClient.java:1813)
 at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:544)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:187)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:456)
 at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:341)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile$Reader.init(StoreFile.java:1008)
 at 
 org.apache.hadoop.hbase.io.HalfStoreFileReader.init(HalfStoreFileReader.java:65)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:467)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:548)
 at 
 org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:284)
 at org.apache.hadoop.hbase.regionserver.Store.init(Store.java:221)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:2511)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:450)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3229)
 at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughterRegion(SplitTransaction.java:504)
 at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction$DaughterOpener.run(SplitTransaction.java:484)
 ... 1 more
 2012-03-28 10:57:16,345 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 ld2,60020,1332957343833: Abort; we got an error after point-of-no-return
 {code}
 http://hastebin.com/diqinibajo.avrasm
 later edit:
 (I'm using the last 4 characters from each string)
 Region 94e3 has storefile 7237
 Region 94e3 gets splited in daughters a: ffa1 and b: eee1
 Daughter region ffa1 get's splitted in daughters a: 3124 and b: dc77

[jira] [Updated] (HBASE-4393) Implement a canary monitoring program


 [ 
https://issues.apache.org/jira/browse/HBASE-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-4393:
---

Status: Patch Available  (was: Open)

 Implement a canary monitoring program
 -

 Key: HBASE-4393
 URL: https://issues.apache.org/jira/browse/HBASE-4393
 Project: HBase
  Issue Type: New Feature
  Components: monitoring
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Matteo Bertozzi
 Attachments: Canary-v0.java, HBaseCanary.java


 This JIRA is to implement a standalone program that can be used to do canary 
 monitoring of a running HBase cluster. This program would gather a list of 
 the regions in the cluster, then iterate over them doing lightweight 
 operations (eg short scans) to provide metrics about latency as well as alert 
 on availability issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5665) Repeated split causes HRegionServer failures and breaks table


[ 
https://issues.apache.org/jira/browse/HBASE-5665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243842#comment-13243842
 ] 

Ted Yu commented on HBASE-5665:
---

HBASE-5665-trunk.patch looks good.

 Repeated split causes HRegionServer failures and breaks table 
 --

 Key: HBASE-5665
 URL: https://issues.apache.org/jira/browse/HBASE-5665
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0, 0.92.1
Reporter: Cosmin Lehene
Assignee: Cosmin Lehene
Priority: Blocker
 Attachments: HBASE-5665-0.92.patch, HBASE-5665-trunk.patch


 Repeated splits on large tables (2 consecutive would suffice) will 
 essentially break the table (and the cluster), unrecoverable.
 The regionserver doing the split dies and the master will get into an 
 infinite loop trying to assign regions that seem to have the files missing 
 from HDFS.
 The table can be disabled once. upon trying to re-enable it, it will remain 
 in an intermediary state forever.
 I was able to reproduce this on a smaller table consistently.
 {code}
 hbase(main):030:0 (0..1).each{|x| put 't1', #{x}, 'f1:t', 'dd'}
 hbase(main):030:0 (0..1000).each{|x| split 't1', #{x*10}}
 {code}
 Running overlapping splits in parallel (e.g. #{x*10+1}, #{x*10+2}... ) 
 will reproduce the issue almost instantly and consistently. 
 {code}
 2012-03-28 10:57:16,320 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Offlined parent region t1,,1332957435767.2fb0473f4e71339e88dab0ee0d4dffa1. in 
 META
 2012-03-28 10:57:16,321 DEBUG 
 org.apache.hadoop.hbase.regionserver.CompactSplitThread: Split requested for 
 t1,5,1332957435767.648d30de55a5cec6fc2f56dcb3c7eee1..  
 compaction_queue=(0:1), split_queue=10
 2012-03-28 10:57:16,343 INFO 
 org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup 
 of failed split of t1,,1332957435767.2fb0473f4e71339e88dab0ee0d4dffa1.; 
 Failed ld2,60020,1332957343833-daughterOpener=2469c5650ea2aeed631eb85d3cdc3124
 java.io.IOException: Failed 
 ld2,60020,1332957343833-daughterOpener=2469c5650ea2aeed631eb85d3cdc3124
 at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:363)
 at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:451)
 at 
 org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.io.FileNotFoundException: File does not exist: 
 /hbase/t1/589c44cabba419c6ad8c9b427e5894e3.2fb0473f4e71339e88dab0ee0d4dffa1/f1/d62a852c25ad44e09518e102ca557237
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1822)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.init(DFSClient.java:1813)
 at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:544)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:187)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:456)
 at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:341)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile$Reader.init(StoreFile.java:1008)
 at 
 org.apache.hadoop.hbase.io.HalfStoreFileReader.init(HalfStoreFileReader.java:65)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:467)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:548)
 at 
 org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:284)
 at org.apache.hadoop.hbase.regionserver.Store.init(Store.java:221)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:2511)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:450)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3229)
 at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughterRegion(SplitTransaction.java:504)
 at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction$DaughterOpener.run(SplitTransaction.java:484)
 ... 1 more
 2012-03-28 10:57:16,345 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 ld2,60020,1332957343833: Abort; we got an error after point-of-no-return
 {code}
 http://hastebin.com/diqinibajo.avrasm
 later edit:
 (I'm using the last 4 characters from each string)
 Region 94e3 has storefile 7237
 Region 94e3 gets splited in daughters a: ffa1 and b: eee1
 Daughter region ffa1

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available


[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243845#comment-13243845
 ] 

Ted Yu commented on HBASE-5666:
---

{code}
+if (keeperEx != null)
+  throw keeperEx;
{code}
Please either lift the throw to the same line as if or add curly braces.
{code}
+checkExists(zk, parentZNode, maxTimeMs);
+LOG.info(Parent znode exists:  + parentZNode);
{code}
If checkExists() returns -1, would the log statement still be true ?

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: HBASE-5666-v1.patch, hbase-1-regionserver.log, 
 hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, 
 hbase-regionserver.log, hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-5681) Split Region crash if region is still offline after a previous split

2012-04-01 Thread Matteo Bertozzi (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi resolved HBASE-5681.


Resolution: Duplicate

Duplicate of HBASE-5665, trying to split the parent region, hasReferences() is 
true and split shouldn't be done.

 Split Region crash if region is still offline after a previous split
 

 Key: HBASE-5681
 URL: https://issues.apache.org/jira/browse/HBASE-5681
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Affects Versions: 0.92.1, 0.96.0, 0.94.1
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Minor
 Attachments: logs0-HBASE-5681.tar.bz2, logs1-HBASE-5681.tar.bz2


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true) due to HBASE-5666 I need a sleep to 
 ensure that rs are up.
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 sleep 5 # bug HBASE-5666 rs doesn't retry if znode is not available.
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 Once hbase is started I run an hbase shell script file (see below)
 everything is fine till last split operation.
 {code}
 # $HBASE_HOME/bin/hbase shell test.hbase
 # test.hbase
 create 'bugtb-t1', 'tcf11', 'tcf12'
 create 'bugtb-t2', 'tcf11', 'tcf12'
 put 'bugtb-t1', '10', 'tcf11:c1', 'a'
 put 'bugtb-t1', '15', 'tcf11:c2', 'b'
 put 'bugtb-t1', '20', 'tcf11:c1', 'c'
 put 'bugtb-t1', '30', 'tcf11:c2', 'd'
 put 'bugtb-t1', '35', 'tcf11:c1', 'e'
 put 'bugtb-t1', '40', 'tcf11:c2', 'f'
 put 'bugtb-t2', '10', 'tcf11:c1', 'a'
 put 'bugtb-t2', '15', 'tcf11:c2', 'b'
 put 'bugtb-t2', '20', 'tcf11:c1', 'c'
 put 'bugtb-t2', '30', 'tcf11:c2', 'd'
 put 'bugtb-t2', '35', 'tcf11:c1', 'e'
 put 'bugtb-t2', '40', 'tcf11:c2', 'f'
 split 'bugtb-t1', '20'
 split 'bugtb-t2', '20'
 split 'bugtb-t1', '40'
 {code}
 During the last split the region is still offline, and you get an 
 exception
 (If you sleep a bit before executing the last split, everything is fine)
 {code}
 ERROR: org.apache.hadoop.ipc.RemoteException: 
 org.apache.hadoop.hbase.NotServingRegionException: Region is not online: 
 bugtb-t1,,1333134892936.4e14c2cf4293156d5b099dc3d5c44890.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3123)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.splitRegion(HRegionServer.java:2926)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1383)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5693) When creating a region, the master initializes it and creates a memstore within the master server

2012-04-01 Thread stack (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243853#comment-13243853
]

stack commented on HBASE-5693:
--

+1 on patch. Its silly we initialize the region over on master on creation.

bq. CreateTableHandler isn't initializing the regions. Who will initialize them
?

The regionserver when its assigned a region.

bq. May be there is some thing more complex I didn't see, but at least all the
unit tests went well.

Nothing complex here.

Regards test on a real cluster, not necessary on a patch this small. Unit
tests run clusters anyways.

Please make a patch that applies N and run it by hadoopqa. Thanks.

When creating a region, the master initializes it and creates a memstore
within the master server
-

I didn't do a complete analysis, but the attached patch saves more than 0.25s
for each region creation and locally all the unit tests work.

[jira] [Commented] (HBASE-5688) Convert zk root-region-server znode content to pb

[
https://issues.apache.org/jira/browse/HBASE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243856#comment-13243856
]

Hadoop QA commented on HBASE-5688:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12520807/5688v4.txt
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 11 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.mapreduce.TestImportTsv
org.apache.hadoop.hbase.mapred.TestTableMapReduce
org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
org.apache.hadoop.hbase.master.TestSplitLogManager

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/1359//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/1359//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/1359//console

This message is automatically generated.

Convert zk root-region-server znode content to pb
-

Key: HBASE-5688
URL: https://issues.apache.org/jira/browse/HBASE-5688
Project: HBase
Issue Type: Task
Reporter: stack
Assignee: stack
Fix For: 0.96.0

Attachments: 5688.txt, 5688v4.txt

Move the root-region-server znode content from the versioned bytes that
ServerName.getVersionedBytes outputs to instead be pb.

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-04-01 Thread Matteo Bertozzi (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243855#comment-13243855
 ] 

Matteo Bertozzi commented on HBASE-5666:


@Ted woo good catch
I've just translated the method without thinking...
and this simplified version emphasizes a problem already present in the 
previous version.
If you take a look at the original version, (the LOG.info is under if, ok)
but what happens if the method return and the znode is not available?
no exception is raised... but I think that the caller of waitForXyz() expect 
some exception in case of timeout, in the other case the value that I'm looking 
for must be present...
(this function is just called by one test)

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: HBASE-5666-v1.patch, hbase-1-regionserver.log, 
 hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, 
 hbase-regionserver.log, hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available


 [ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-5666:
---

Attachment: HBASE-5666-v2.patch

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, 
 hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, 
 hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available


[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243863#comment-13243863
 ] 

Ted Yu commented on HBASE-5666:
---

Patch v2 looks good.

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, 
 hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, 
 hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-5694) getRowsWithColumnsTs function Thrift service incorrectly handles time range

2012-04-01 Thread Wouter Bolsterlee (Created) (JIRA)

getRowsWithColumnsTs function Thrift service incorrectly handles time range
---

 Key: HBASE-5694
 URL: https://issues.apache.org/jira/browse/HBASE-5694
 Project: HBase
  Issue Type: Bug
  Components: thrift
Affects Versions: 0.92.1
Reporter: Wouter Bolsterlee
 Fix For: 0.92.2


The getRowsWithColumnsTs() method in the Thrift interface only applies the 
timestamp if columns are explicitly specified. However, this method also allows 
for columns to be unspecified (this is even used internally to implement e.g. 
getRows()). The cause of the bug is a minor scoping issue: the time range is 
set inside a wrong if statement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

[
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243871#comment-13243871
]

Hadoop QA commented on HBASE-5666:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12520825/HBASE-5666-v1.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.util.TestHBaseFsck
org.apache.hadoop.hbase.TestZooKeeper

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/1360//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/1360//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/1360//console

This message is automatically generated.

RegionServer doesn't retry to check if base node is available
-

Key: HBASE-5666
URL: https://issues.apache.org/jira/browse/HBASE-5666
Project: HBase
Issue Type: Bug
Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch,
hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log,
hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log

I've a script that starts hbase and a couple of region servers in distributed
mode (hbase.cluster.distributed = true)
{code}
$HBASE_HOME/bin/start-hbase.sh
$HBASE_HOME/bin/local-regionservers.sh start 1 2 3
{code}
but the region servers are not able to start...
It seems that during the RS start the the znode is still not available, and
HRegionServer.initializeZooKeeper() check just once if the base not is
available.
{code}
2012-03-28 21:54:05,013 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value
configured in 'zookeeper.znode.parent'. There could be a mismatch with the
one configured in the master.
2012-03-28 21:54:08,598 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
localhost,60202,133296824: Initialization of RS failed. Hence aborting
RS.
java.io.IOException: Received the shutdown message while waiting.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
at java.lang.Thread.run(Thread.java:662)
{code}

[jira] [Updated] (HBASE-5694) getRowsWithColumnsTs function Thrift service incorrectly handles time range

2012-04-01 Thread Wouter Bolsterlee (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wouter Bolsterlee updated HBASE-5694:
-

Status: Patch Available  (was: Open)

Trivial patch to fix the reported issue.

 getRowsWithColumnsTs function Thrift service incorrectly handles time range
 ---

 Key: HBASE-5694
 URL: https://issues.apache.org/jira/browse/HBASE-5694
 Project: HBase
  Issue Type: Bug
  Components: thrift
Affects Versions: 0.92.1
Reporter: Wouter Bolsterlee
 Fix For: 0.92.2


 The getRowsWithColumnsTs() method in the Thrift interface only applies the 
 timestamp if columns are explicitly specified. However, this method also 
 allows for columns to be unspecified (this is even used internally to 
 implement e.g. getRows()). The cause of the bug is a minor scoping issue: the 
 time range is set inside a wrong if statement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5694) getRowsWithColumnsTs function Thrift service incorrectly handles time range

2012-04-01 Thread Wouter Bolsterlee (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243873#comment-13243873
 ] 

Wouter Bolsterlee commented on HBASE-5694:
--

For some reason, JIRA doesn't accept my patch file in the upload dialog. Here 
it is:


--- ThriftServer.java.orig  2012-04-01 23:41:16.881172406 +0200
+++ ThriftServer.java   2012-04-01 23:41:30.177238337 +0200
@@ -477,8 +477,8 @@
 get.addColumn(famAndQf[0], famAndQf[1]);
   }
 }
-get.setTimeRange(Long.MIN_VALUE, timestamp);
   }
+  get.setTimeRange(Long.MIN_VALUE, timestamp);
   gets.add(get);
 }
 Result[] result = table.get(gets);


 getRowsWithColumnsTs function Thrift service incorrectly handles time range
 ---

 Key: HBASE-5694
 URL: https://issues.apache.org/jira/browse/HBASE-5694
 Project: HBase
  Issue Type: Bug
  Components: thrift
Affects Versions: 0.92.1
Reporter: Wouter Bolsterlee
 Fix For: 0.92.2


 The getRowsWithColumnsTs() method in the Thrift interface only applies the 
 timestamp if columns are explicitly specified. However, this method also 
 allows for columns to be unspecified (this is even used internally to 
 implement e.g. getRows()). The cause of the bug is a minor scoping issue: the 
 time range is set inside a wrong if statement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)

2012-04-01 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243874#comment-13243874
 ] 

stack commented on HBASE-5682:
--

On commit, change this '+  LOG.debug(Abort, t);' to include the passed in 
msg?

Else, +1 on the patch.  Let me ask N if he thinks TRUNK can pick up anything 
from this patch (maybe his keepalive should do this auto-reconnect but maybe it 
doesn't need it).  What were you doing w/ it was taking a long time to recover?




 Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 
 only)
 --

 Key: HBASE-5682
 URL: https://issues.apache.org/jira/browse/HBASE-5682
 Project: HBase
  Issue Type: Improvement
  Components: client
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.94.0

 Attachments: 5682-all-v2.txt, 5682-all.txt, 5682-v2.txt, 5682.txt


 Just realized that without this HBASE-4805 is broken.
 I.e. there's no point keeping a persistent HConnection around if it can be 
 rendered permanently unusable if the ZK connection is lost temporarily.
 Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
 backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5694) getRowsWithColumnsTs function Thrift service incorrectly handles time range

2012-04-01 Thread Wouter Bolsterlee (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wouter Bolsterlee updated HBASE-5694:
-

Attachment: HBASE-5694.patch

Okay, here's the patch. For some reason it works in Firefox, but not in 
Epiphany.

Explanation for the patch: set time range regardless of column specification, 
making the time range actually work when no columns are specified.

 getRowsWithColumnsTs function Thrift service incorrectly handles time range
 ---

 Key: HBASE-5694
 URL: https://issues.apache.org/jira/browse/HBASE-5694
 Project: HBase
  Issue Type: Bug
  Components: thrift
Affects Versions: 0.92.1
Reporter: Wouter Bolsterlee
 Fix For: 0.92.2

 Attachments: HBASE-5694.patch


 The getRowsWithColumnsTs() method in the Thrift interface only applies the 
 timestamp if columns are explicitly specified. However, this method also 
 allows for columns to be unspecified (this is even used internally to 
 implement e.g. getRows()). The cause of the bug is a minor scoping issue: the 
 time range is set inside a wrong if statement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5663) MultithreadedTableMapper doesn't work.


[ 
https://issues.apache.org/jira/browse/HBASE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243882#comment-13243882
 ] 

Hadoop QA commented on HBASE-5663:
--

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12520786/5663%2B5636.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 12 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1361//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1361//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1361//console

This message is automatically generated.

 MultithreadedTableMapper doesn't work.
 --

 Key: HBASE-5663
 URL: https://issues.apache.org/jira/browse/HBASE-5663
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.94.0
Reporter: Takuya Ueshin
Assignee: Takuya Ueshin
 Fix For: 0.94.0, 0.96.0

 Attachments: 5663+5636.txt, HBASE-5663.patch


 MapReduce job using MultithreadedTableMapper goes down throwing the following 
 Exception:
 {noformat}
 java.io.IOException: java.lang.NoSuchMethodException: 
 org.apache.hadoop.mapreduce.Mapper$Context.init(org.apache.hadoop.conf.Configuration,
  org.apache.hadoop.mapred.TaskAttemptID, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
  
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
  org.apache.hadoop.hbase.mapreduce.TableSplit)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.init(MultithreadedTableMapper.java:260)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.run(MultithreadedTableMapper.java:133)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.NoSuchMethodException: 
 org.apache.hadoop.mapreduce.Mapper$Context.init(org.apache.hadoop.conf.Configuration,
  org.apache.hadoop.mapred.TaskAttemptID, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordReader,
  
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapRecordWriter,
  org.apache.hadoop.hbase.mapreduce.TableOutputCommitter, 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$SubMapStatusReporter,
  org.apache.hadoop.hbase.mapreduce.TableSplit)
   at java.lang.Class.getConstructor0(Class.java:2706)
   at java.lang.Class.getConstructor(Class.java:1657)
   at 
 org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper$MapRunner.init(MultithreadedTableMapper.java:241)
   ... 8 more
 {noformat}
 This occured when the tasks are creating MapRunner threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5665) Repeated split causes HRegionServer failures and breaks table


[ 
https://issues.apache.org/jira/browse/HBASE-5665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243898#comment-13243898
 ] 

Hadoop QA commented on HBASE-5665:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12520847/HBASE-5665-trunk.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1362//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1362//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1362//console

This message is automatically generated.

 Repeated split causes HRegionServer failures and breaks table 
 --

 Key: HBASE-5665
 URL: https://issues.apache.org/jira/browse/HBASE-5665
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0, 0.92.1
Reporter: Cosmin Lehene
Assignee: Cosmin Lehene
Priority: Blocker
 Attachments: HBASE-5665-0.92.patch, HBASE-5665-trunk.patch


 Repeated splits on large tables (2 consecutive would suffice) will 
 essentially break the table (and the cluster), unrecoverable.
 The regionserver doing the split dies and the master will get into an 
 infinite loop trying to assign regions that seem to have the files missing 
 from HDFS.
 The table can be disabled once. upon trying to re-enable it, it will remain 
 in an intermediary state forever.
 I was able to reproduce this on a smaller table consistently.
 {code}
 hbase(main):030:0 (0..1).each{|x| put 't1', #{x}, 'f1:t', 'dd'}
 hbase(main):030:0 (0..1000).each{|x| split 't1', #{x*10}}
 {code}
 Running overlapping splits in parallel (e.g. #{x*10+1}, #{x*10+2}... ) 
 will reproduce the issue almost instantly and consistently. 
 {code}
 2012-03-28 10:57:16,320 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Offlined parent region t1,,1332957435767.2fb0473f4e71339e88dab0ee0d4dffa1. in 
 META
 2012-03-28 10:57:16,321 DEBUG 
 org.apache.hadoop.hbase.regionserver.CompactSplitThread: Split requested for 
 t1,5,1332957435767.648d30de55a5cec6fc2f56dcb3c7eee1..  
 compaction_queue=(0:1), split_queue=10
 2012-03-28 10:57:16,343 INFO 
 org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup 
 of failed split of t1,,1332957435767.2fb0473f4e71339e88dab0ee0d4dffa1.; 
 Failed ld2,60020,1332957343833-daughterOpener=2469c5650ea2aeed631eb85d3cdc3124
 java.io.IOException: Failed 
 ld2,60020,1332957343833-daughterOpener=2469c5650ea2aeed631eb85d3cdc3124
 at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:363)
 at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:451)
 at 
 org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.io.FileNotFoundException: File does not exist: 
 /hbase/t1/589c44cabba419c6ad8c9b427e5894e3.2fb0473f4e71339e88dab0ee0d4dffa1/f1/d62a852c25ad44e09518e102ca557237
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1822)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.init(DFSClient.java:1813)
 at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:544)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:187)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:456)
 at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:341)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile$Reader.init(StoreFile.java:1008)
 at 
 org.apache.hadoop.hbase.io.HalfStoreFileReader.init(HalfStoreFileReader.java:65)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:467)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:548)
 at

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-04-01 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243902#comment-13243902
]

Hadoop QA commented on HBASE-5666:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12520850/HBASE-5666-v2.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.TestZooKeeper

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/1363//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/1363//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/1363//console

This message is automatically generated.

RegionServer doesn't retry to check if base node is available
-

[jira] [Commented] (HBASE-5696) Use Hadoop's DataOutputOutputStream instead of have a copy local

2012-04-01 Thread stack (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243917#comment-13243917
 ] 

stack commented on HBASE-5696:
--

We have a DOOS and so does hadoop.  If I diff them, the hadoop one is public 
where ours is not (but a patch that is coming in also makes our's public, the 
protobuf hbase-5451.

 Use Hadoop's DataOutputOutputStream instead of have a copy local
 

 Key: HBASE-5696
 URL: https://issues.apache.org/jira/browse/HBASE-5696
 Project: HBase
  Issue Type: Improvement
Reporter: stack



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-5696) Use Hadoop's DataOutputOutputStream instead of have a copy local

2012-04-01 Thread stack (Created) (JIRA)

Use Hadoop's DataOutputOutputStream instead of have a copy local


 Key: HBASE-5696
 URL: https://issues.apache.org/jira/browse/HBASE-5696
 Project: HBase
  Issue Type: Improvement
Reporter: stack




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-5695) Use Hadoop's DataOutputOutputStream instead of have a copy local

2012-04-01 Thread stack (Created) (JIRA)

Use Hadoop's DataOutputOutputStream instead of have a copy local


 Key: HBASE-5695
 URL: https://issues.apache.org/jira/browse/HBASE-5695
 Project: HBase
  Issue Type: Improvement
Reporter: stack




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5451) Switch RPC call envelope/headers to PBs


[ 
https://issues.apache.org/jira/browse/HBASE-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243919#comment-13243919
 ] 

jirapos...@reviews.apache.org commented on HBASE-5451:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4096/#review6613
---


Some more questions.  Just being careful DD. 


http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/DataOutputOutputStream.java
https://reviews.apache.org/r/4096/#comment14285

We should just be using the hadoop DOOS... looks like no diff (when I diff 
them).  I'll make an issue to remove.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
https://reviews.apache.org/r/4096/#comment14278

Is this written up anywhere?  That its hrpc, then version, then a length, 
then a protobuf?

I see it in the proto definition.  That'll do.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
https://reviews.apache.org/r/4096/#comment14279

We have an issue for removing this Invocation stuff?



http://svn.apache.org/repos/asf/hbase/trunk/src/main/protobuf/RPCMessageProto.proto
https://reviews.apache.org/r/4096/#comment14280

Should we just remove them in the next iteration on rpc since 0.96 is to be 
a singularity?  Why even bother trying to keep compatibility w/ older clients?

What is 'failure compatibility'?  We are telling the client to go away, 
nicely (smile).

What you think we should replace hrpc0x0005 with?

this - these



http://svn.apache.org/repos/asf/hbase/trunk/src/main/protobuf/RPCMessageProto.proto
https://reviews.apache.org/r/4096/#comment14281

How does RpcRequestWithHeaderProto relate to ConnectionHeaderProto?  This 
text should say?

Would be nice to have illustration on how the back and forth work.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/protobuf/RPCMessageProto.proto
https://reviews.apache.org/r/4096/#comment14282

We'll send this String each time?



http://svn.apache.org/repos/asf/hbase/trunk/src/main/protobuf/RPCMessageProto.proto
https://reviews.apache.org/r/4096/#comment14283

Which part in here is the 'header'?  How does it relate to 
ConnectionHeaderProto?

request can be an Invocation/Writable?  Or a protobuf?  Do we need a length 
in here?



http://svn.apache.org/repos/asf/hbase/trunk/src/main/protobuf/RPCMessageProto.proto
https://reviews.apache.org/r/4096/#comment14284

Should this precede the response?  So if false, a response follows else an 
exception?  Do we need a length here?  Where is the header that the message 
name refers too?


- Michael


On 2012-03-30 23:29:32, Devaraj Das wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4096/
bq.  ---
bq.  
bq.  (Updated 2012-03-30 23:29:32)
bq.  
bq.  
bq.  Review request for Michael Stack and Benoit Sigoure.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Switch RPC call envelope/headers to PBs
bq.  
bq.  
bq.  This addresses bug HBASE-5451.
bq.  https://issues.apache.org/jira/browse/HBASE-5451
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/DataOutputOutputStream.java
 1307644 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
 1307644 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
 1307644 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/RPCMessageProtos.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/security/User.java
 1307644 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/protobuf/RPCMessageProto.proto
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4096/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Devaraj
bq.  
bq.



 Switch RPC call envelope/headers to PBs
 ---

 Key: HBASE-5451
 URL: https://issues.apache.org/jira/browse/HBASE-5451
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Affects Versions: 0.94.0
Reporter: Todd Lipcon
Assignee: Devaraj Das
 Fix For: 0.96.0

 Attachments: rpc-proto.2.txt, rpc-proto.3.txt,

[jira] [Commented] (HBASE-5680) Hbase94 and Hbase 92.2 is not compatible with the Hadoop 23.1

2012-04-01 Thread Jonathan Hsieh (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243942#comment-13243942
 ] 

Jonathan Hsieh commented on HBASE-5680:
---

Tried it again, and actually -- if you recompile using -Dhadoop.profile=23 
without the security profile the Master comes up and does not encounter the 
problem.  

(I probably had the wrong hadoop jars in my hbase classpath). 

So it boils down to needing to recompile hbase against hadoop 23. 

Maybe we should catch this exception and warn the user to recompile HBase, or 
possibly put out yet another package that is compiled against 23. 

 Hbase94 and Hbase 92.2 is not compatible with the Hadoop 23.1 
 --

 Key: HBASE-5680
 URL: https://issues.apache.org/jira/browse/HBASE-5680
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Kristam Subba Swathi

 Hmaster is not able to start because of the following error
 Please find the following error 
 
 2012-03-30 11:12:19,487 FATAL org.apache.hadoop.hbase.master.HMaster: 
 Unhandled exception. Starting shutdown.
 java.lang.NoClassDefFoundError: 
 org/apache/hadoop/hdfs/protocol/FSConstants$SafeModeAction
   at org.apache.hadoop.hbase.util.FSUtils.waitOnSafeMode(FSUtils.java:524)
   at 
 org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:324)
   at 
 org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:127)
   at 
 org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:112)
   at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:496)
   at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:363)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hadoop.hdfs.protocol.FSConstants$SafeModeAction
   at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
   ... 7 more
 There is a change in the FSConstants

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5443) Add PB-based calls to HRegionInterface

2012-04-01 Thread binlijin (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243948#comment-13243948
 ] 

binlijin commented on HBASE-5443:
-

Hi guys,i have some question, why choose pb? why not avro or thrift?

 Add PB-based calls to HRegionInterface
 --

 Key: HBASE-5443
 URL: https://issues.apache.org/jira/browse/HBASE-5443
 Project: HBase
  Issue Type: Task
  Components: ipc, master, migration, regionserver
Reporter: Todd Lipcon
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: region_java-proto-mapping.pdf




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5443) Add PB-based calls to HRegionInterface

2012-04-01 Thread binlijin (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243949#comment-13243949
 ] 

binlijin commented on HBASE-5443:
-

Hi guys,i have some question, why choose pb? why not avro or thrift?

 Add PB-based calls to HRegionInterface
 --

 Key: HBASE-5443
 URL: https://issues.apache.org/jira/browse/HBASE-5443
 Project: HBase
  Issue Type: Task
  Components: ipc, master, migration, regionserver
Reporter: Todd Lipcon
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: region_java-proto-mapping.pdf




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5443) Add PB-based calls to HRegionInterface

2012-04-01 Thread binlijin (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243950#comment-13243950
 ] 

binlijin commented on HBASE-5443:
-

Hi guys,i have some question, why choose pb? why not avro or thrift?

 Add PB-based calls to HRegionInterface
 --

 Key: HBASE-5443
 URL: https://issues.apache.org/jira/browse/HBASE-5443
 Project: HBase
  Issue Type: Task
  Components: ipc, master, migration, regionserver
Reporter: Todd Lipcon
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: region_java-proto-mapping.pdf




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5644) [findbugs] Fix null pointer warnings.

2012-04-01 Thread Uma Maheswara Rao G (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HBASE-5644:
---

Attachment: NullPointerFindBugs_Analysis.xlsx

 [findbugs] Fix null pointer warnings.
 -

 Key: HBASE-5644
 URL: https://issues.apache.org/jira/browse/HBASE-5644
 Project: HBase
  Issue Type: Sub-task
  Components: scripts
Reporter: Jonathan Hsieh
Assignee: Uma Maheswara Rao G
 Attachments: NullPointerFindBugs_Analysis.xlsx


 See 
 https://builds.apache.org/job/PreCommit-HBASE-Build/1313//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
 Fix the NP category

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5644) [findbugs] Fix null pointer warnings.

2012-04-01 Thread Uma Maheswara Rao G (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HBASE-5644:
---

Attachment: HBASE-5644.patch

Attached the patch and analysis sheet.

 [findbugs] Fix null pointer warnings.
 -

 Key: HBASE-5644
 URL: https://issues.apache.org/jira/browse/HBASE-5644
 Project: HBase
  Issue Type: Sub-task
  Components: scripts
Reporter: Jonathan Hsieh
Assignee: Uma Maheswara Rao G
 Attachments: HBASE-5644.patch, NullPointerFindBugs_Analysis.xlsx


 See 
 https://builds.apache.org/job/PreCommit-HBASE-Build/1313//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
 Fix the NP category

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5436) Right-size the map when reading attributes.

2012-04-01 Thread Benoit Sigoure (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243952#comment-13243952
 ] 

Benoit Sigoure commented on HBASE-5436:
---

Ping?  Can we get this trivial change in the next point release of 0.92.x?

 Right-size the map when reading attributes.
 ---

 Key: HBASE-5436
 URL: https://issues.apache.org/jira/browse/HBASE-5436
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.0
Reporter: Benoit Sigoure
Assignee: Benoit Sigoure
Priority: Trivial
  Labels: performance
 Fix For: 0.94.0

 Attachments: 0001-Right-size-the-map-when-reading-attributes.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5443) Add PB-based calls to HRegionInterface

2012-04-01 Thread Jimmy Xiang (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243954#comment-13243954
 ] 

Jimmy Xiang commented on HBASE-5443:


The main reason is that the HBase writable RPC already supports pb.  Hadoop 
uses pb too.

 Add PB-based calls to HRegionInterface
 --

 Key: HBASE-5443
 URL: https://issues.apache.org/jira/browse/HBASE-5443
 Project: HBase
  Issue Type: Task
  Components: ipc, master, migration, regionserver
Reporter: Todd Lipcon
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: region_java-proto-mapping.pdf




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5436) Right-size the map when reading attributes.