date:20110909


[ 
https://issues.apache.org/jira/browse/HBASE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101014#comment-13101014
 ] 

Hudson commented on HBASE-4007:
---

Integrated in HBase-TRUNK #2192 (See 
[https://builds.apache.org/job/HBase-TRUNK/2192/])
HBASE-4007 distributed log splitting can get indefinitely stuck

stack : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKSplitLog.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java


 distributed log splitting can get indefinitely stuck
 

 Key: HBASE-4007
 URL: https://issues.apache.org/jira/browse/HBASE-4007
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani
Assignee: Prakash Khemani
Priority: Critical
 Fix For: 0.92.0

 Attachments: 
 0001-HBASE-4007-distributed-log-splitting-can-get-indefin.patch


 After the configured number of retries SplitLogManager is not going to 
 resubmit log-split tasks. In this situation even if the splitLogWorker that 
 owns the task dies the task will not get resubmitted.
 When a regionserver goes away then all the split-log tasks that it owned 
 should be resubmitted by the SplitLogMaster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4350) Fix a Bloom filter bug introduced by HFile v2 and TestMultiColumnScanner that caught it


[ 
https://issues.apache.org/jira/browse/HBASE-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101015#comment-13101015
 ] 

Hudson commented on HBASE-4350:
---

Integrated in HBase-TRUNK #2192 (See 
[https://builds.apache.org/job/HBase-TRUNK/2192/])
HBASE-4350 Fix a Bloom filter bug introduced by HFile v2 and 
TestMultiColumnScanner that caught it

stack : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanDeleteTracker.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java


 Fix a Bloom filter bug introduced by HFile v2 and TestMultiColumnScanner that 
 caught it
 ---

 Key: HBASE-4350
 URL: https://issues.apache.org/jira/browse/HBASE-4350
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Fix For: 0.92.0

 Attachments: 0001-TestMultiColumnScanner-and-Bloom-filter-fix.patch


 Nicolas pointed out to me that the new unit test TestMultiColumnScanner that 
 I wrote for the multi-column scanner Bloom filter optimization (which we will 
 soon release) did not pass on the open-source trunk, and it bisected down to 
 the HFile v2 commit. I debugged the unit test and found that there was a 
 serious bug in HFile v2 Bloom filter lookup not caught by any of the existing 
 unit tests: Bloom filters were used for non-Get Scans, which did not have 
 minimum/maximum row set correctly, and some scan results were not returned.
 This diff is the unit test that helped catch the problem and a one-line fix 
 for the bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4359) Show dead RegionServer names in the HMaster info page


 [ 
https://issues.apache.org/jira/browse/HBASE-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HBASE-4359:
---

Attachment: HBase Master UI - Dead Servers.png
HBASE-4359.r1.diff

Jamon+tests patch that adds in the improvement. Please review!

I ran the updated master status servlet test.

{code}
- mvn -Dtest=TestMasterStatusServlet test
---
 T E S T S
---
Running org.apache.hadoop.hbase.master.TestMasterStatusServlet
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.93 sec

Results :

Tests run: 5, Failures: 0, Errors: 0, Skipped: 0
{code}

I also ran the build manually via bin/start-hbase.sh with proper config and 
HDFS running. The screenshot shows the implementation.

Thanks in advance for reviews!

 Show dead RegionServer names in the HMaster info page
 -

 Key: HBASE-4359
 URL: https://issues.apache.org/jira/browse/HBASE-4359
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.90.4
Reporter: Harsh J
Assignee: Harsh J
Priority: Trivial
 Fix For: 0.94.0

 Attachments: HBASE-4359.r1.diff, HBase Master UI - Dead Servers.png


 Unlike other components of the cluster, like NameNode and JobTracker pages, 
 the HMaster's info page does not show any data on dead region servers. While 
 an RS is stateless being a good reason not to count dead nodes, I think 
 having a list of dead nodes helps in cases where an administrator would want 
 to find out which nodes are missing out on RS action (hey, everyone likes 
 consistently spiking graphs! ;)).
 Following HBASE-3580, I think it makes sense to have a list of already 
 maintained dead nodes show up in the info UI.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4359) Show dead RegionServer names in the HMaster info page


 [ 
https://issues.apache.org/jira/browse/HBASE-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HBASE-4359:
---

Status: Patch Available  (was: Open)

 Show dead RegionServer names in the HMaster info page
 -

 Key: HBASE-4359
 URL: https://issues.apache.org/jira/browse/HBASE-4359
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.90.4
Reporter: Harsh J
Assignee: Harsh J
Priority: Trivial
 Fix For: 0.94.0

 Attachments: HBASE-4359.r1.diff, HBase Master UI - Dead Servers.png


 Unlike other components of the cluster, like NameNode and JobTracker pages, 
 the HMaster's info page does not show any data on dead region servers. While 
 an RS is stateless being a good reason not to count dead nodes, I think 
 having a list of dead nodes helps in cases where an administrator would want 
 to find out which nodes are missing out on RS action (hey, everyone likes 
 consistently spiking graphs! ;)).
 Following HBASE-3580, I think it makes sense to have a list of already 
 maintained dead nodes show up in the info UI.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4359) Show dead RegionServer names in the HMaster info page


 [ 
https://issues.apache.org/jira/browse/HBASE-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HBASE-4359:
---

Attachment: HBase Master UI - Dead Servers (Yes, still dead).png
HBASE-4359.r2.diff

Newer patch after a chat with Todd. Made it more consistent with the online 
listing.

In brightest day, in darkest night, no test case shall escape my sight:
{code}
---
 T E S T S
---

---
 T E S T S
---
Running org.apache.hadoop.hbase.master.TestMasterStatusServlet
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.028 sec

Results :

Tests run: 5, Failures: 0, Errors: 0, Skipped: 0
{code}

Also upped a new UI image after manual testing.

 Show dead RegionServer names in the HMaster info page
 -

 Key: HBASE-4359
 URL: https://issues.apache.org/jira/browse/HBASE-4359
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.90.4
Reporter: Harsh J
Assignee: Harsh J
Priority: Trivial
 Fix For: 0.94.0

 Attachments: HBASE-4359.r1.diff, HBASE-4359.r2.diff, HBase Master UI 
 - Dead Servers (Yes, still dead).png, HBase Master UI - Dead Servers.png


 Unlike other components of the cluster, like NameNode and JobTracker pages, 
 the HMaster's info page does not show any data on dead region servers. While 
 an RS is stateless being a good reason not to count dead nodes, I think 
 having a list of dead nodes helps in cases where an administrator would want 
 to find out which nodes are missing out on RS action (hey, everyone likes 
 consistently spiking graphs! ;)).
 Following HBASE-3580, I think it makes sense to have a list of already 
 maintained dead nodes show up in the info UI.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4360) Maintain information on the time a RS went dead

Maintain information on the time a RS went dead
---

 Key: HBASE-4360
 URL: https://issues.apache.org/jira/browse/HBASE-4360
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.94.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
 Fix For: 0.94.0


Just something that'd be generally helpful, is to maintain DeadServer info with 
the last timestamp when it was determined as dead.

Makes it easier to hunt the logs, and I don't think its much too expensive to 
maintain (one additional update per dead determination).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4313) Refactor TestHBaseFsck to make adding individual hbck tests easier


[ 
https://issues.apache.org/jira/browse/HBASE-4313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101104#comment-13101104
 ] 

Hudson commented on HBASE-4313:
---

Integrated in HBase-TRUNK #2193 (See 
[https://builds.apache.org/job/HBase-TRUNK/2193/])
HBASE-4313 Refactor TestHBaseFsck to make adding individual hbck tests 
easier

stack : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java


 Refactor TestHBaseFsck to make adding individual hbck tests easier
 --

 Key: HBASE-4313
 URL: https://issues.apache.org/jira/browse/HBASE-4313
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Fix For: 0.90.5

 Attachments: 
 0001-HBASE-4313-Refactor-TestHBaseFsck-to-make-adding-hbc.patch, 
 0001-HBASE-4313-Refactor-TestHBaseFsck-to-make-adding-hbc.patch, 
 hbase-4313-trunk.patch


 The current TestHBaseFsck has one test case that tests multiple things in the 
 same table.  This refactor essentially preserves what is tested but isolates 
 each error type so that there is no bleed over in error from table to table.  
 This will also enable the writing of other simple to read tests for other 
 hbck detectable errors.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4350) Fix a Bloom filter bug introduced by HFile v2 and TestMultiColumnScanner that caught it


[ 
https://issues.apache.org/jira/browse/HBASE-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101105#comment-13101105
 ] 

Hudson commented on HBASE-4350:
---

Integrated in HBase-TRUNK #2193 (See 
[https://builds.apache.org/job/HBase-TRUNK/2193/])
HBASE-4350 Fix a Bloom filter bug introduced by HFile v2 and 
TestMultiColumnScanner that caught it

stack : 
Files : 
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java


 Fix a Bloom filter bug introduced by HFile v2 and TestMultiColumnScanner that 
 caught it
 ---

 Key: HBASE-4350
 URL: https://issues.apache.org/jira/browse/HBASE-4350
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Fix For: 0.92.0

 Attachments: 0001-TestMultiColumnScanner-and-Bloom-filter-fix.patch


 Nicolas pointed out to me that the new unit test TestMultiColumnScanner that 
 I wrote for the multi-column scanner Bloom filter optimization (which we will 
 soon release) did not pass on the open-source trunk, and it bisected down to 
 the HFile v2 commit. I debugged the unit test and found that there was a 
 serious bug in HFile v2 Bloom filter lookup not caught by any of the existing 
 unit tests: Bloom filters were used for non-Get Scans, which did not have 
 minimum/maximum row set correctly, and some scan results were not returned.
 This diff is the unit test that helped catch the problem and a one-line fix 
 for the bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4301) META migration from 0.90 to trunk fails


[ 
https://issues.apache.org/jira/browse/HBASE-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101106#comment-13101106
 ] 

Hudson commented on HBASE-4301:
---

Integrated in HBase-TRUNK #2193 (See 
[https://builds.apache.org/job/HBase-TRUNK/2193/])
HBASE-4301  META migration from 0.90 to trunk fails (Subbu Iyer)

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/FSTableDescriptors.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/FSUtils.java


 META migration from 0.90 to trunk fails
 ---

 Key: HBASE-4301
 URL: https://issues.apache.org/jira/browse/HBASE-4301
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Subbu M Iyer
Priority: Blocker
 Fix For: 0.92.0

 Attachments: 4301-1-Fixed_Root_migration_to_newer_HRI_format_.patch, 
 4301-2-Fixed_Root_migration_to_newer_HRI_format_.patch, 
 4301-Fixed_Root_migration_to_newer_HRI_format_.patch, 4301-v3.txt, 
 4301-v4.txt, 4301-v7.txt, 4301.txt, 4301_v2.txt, logs.tar.gz, master-log.txt, 
 meta_migrate, meta_trunk, root_migrate, root_trunk


 I started a trunk cluster as an upgrade from 0.90.4ish, and now I can't scan 
 my .META. table, etc, and other operations fail.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4340) Hbase can't balance.

2011-09-09 Thread gaojinchao (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101180#comment-13101180
 ] 

gaojinchao commented on HBASE-4340:
---

Yes, All test cases have passed.

 Hbase can't balance.
 

 Key: HBASE-4340
 URL: https://issues.apache.org/jira/browse/HBASE-4340
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: gaojinchao
Assignee: gaojinchao
 Fix For: 0.90.5

 Attachments: HBASE-4340_branch90.patch


 Version: 0.90.4
 Cluster : 40 boxes
 As I saw below logs. It said that balance couldn't work because of a dead RS.
 I dug deeply and found two issues:
 1.   shutdownhandler didn't clear numProcessing deal with some 
 exceptions. It seems whatever exceptions we should clear the flag or close 
 master.
 2.   dead regionserver(s): [158-1-130-12,20020,1314971097929] is 
 inaccurate. The dead sever should be  158-1-130-10,20020,1315068597979
 //master logs:
 2011-09-05 00:28:00,487 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 00:33:00,489 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 00:38:00,493 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 00:43:00,495 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 00:48:00,499 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 00:53:00,501 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 00:58:00,501 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:03:00,502 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:08:00,506 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:13:00,508 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:18:00,512 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:23:00,514 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:28:00,518 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:33:00,520 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:38:00,524 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:43:00,526 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:48:00,530 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:53:00,532 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:58:00,536 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 02:03:00,537 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 02:08:00,538 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 02:13:00,539 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 02:18:00,543 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running

[jira] [Commented] (HBASE-4153) Handle RegionAlreadyInTransitionException in AssignmentManager

2011-09-09 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101200#comment-13101200
 ] 

ramkrishna.s.vasudevan commented on HBASE-4153:
---

After HBASE-4015 these are the following changes in my previous observation and 
pls note that as part of this JIRA the fix will be once we get 
RegionAlreadyInTransition I will not be moving the memory state to OFFLINE
- Open Open
Here if the first open region is in progress
a) before transition OFFLINE-OPENING or OPENING-OPENED
The second open region call will set the data to OFFLINE and there will be a 
version mismatch when the first RS tries to transit to OPENING and hence the 
first open region call will fail.
So the second open region call will get RegionAlreadyInTransition and its upto 
the TimeOutMonitor to now open the region as it finds the RIT in PENDING_OPEN
b) After transition to OPENED
By not moving the inmemory state to OFFLINE on RegionAlreadyIntransition, once 
a call back comes for OPENED node to Master we can delete the inmemory state  
(this is already happening) of PENDING_OPEN due to second open region

If we leave memory state in OFFLINE as per current behaviour 
{code}
  if (regionState == null ||
  (!regionState.isPendingOpen()  !regionState.isOpening())) {
LOG.warn(Received OPENED for region  +
prettyPrintedRegionName +
 from server  + data.getOrigin() +  but region was in  +
 the state  + regionState +  and not  +
in expected PENDING_OPEN or OPENING states);
return;
  }
{code} . 
This is the major problem i see.

- Close Open
As per my previous analysis
a) before transition from CLOSING to CLOSED
when an open call arrives while close region is in progress, 
{code}
try {
  if (ZKAssign.transitionNodeClosed(server.getZooKeeper(), regionInfo,
  server.getServerName(), expectedVersion) == FAILED) {
LOG.warn(Completed the CLOSE of a region but when transitioning from  
+
 CLOSING to CLOSED got a version mismatch, someone else clashed  +
so now unassigning);
region.close();
return;
  }
{code}
the region will be closed in RS side but the RIT in master will be in 
PENDING_OPEN due to regionalready in transtition which again the timeoutmonitor 
will take care of opening the region.
b) after setting the node to CLOSED state 
here once again the assign call will happen as part of CloseRegionProcessing 
and if a parallel new open region arrives it goes back to Open Open state as 
described previously.

Pls note that in all cases manually through admin assign() and unassign() has 
been invoked parallely.
I am not sure if you guys are planning to handle this scenario totally in a 
different way
as from my above analysis we can infer that things largely depend on the 
timeoutmonitor for the second operation to be successful.



 Handle RegionAlreadyInTransitionException in AssignmentManager
 --

 Key: HBASE-4153
 URL: https://issues.apache.org/jira/browse/HBASE-4153
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0


 Comment from Stack over in HBASE-3741:
 {quote}
 Question: Looking at this patch again, if we throw a 
 RegionAlreadyInTransitionException, won't we just assign the region elsewhere 
 though RegionAlreadyInTransitionException in at least one case here is saying 
 that the region is already open on this regionserver?
 {quote}
 Indeed looking at the code it's going to be handled the same way other 
 exceptions are. Need to add special cases for assign and unassign.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4212) TestMasterFailover fails occasionally

2011-09-09 Thread gaojinchao (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101213#comment-13101213
 ] 

gaojinchao commented on HBASE-4212:
---

@Stack， Thanks for your review. 
In our environment, it often fails, so we skip this case(for my case is that 
all test cases are performed automatically every day). 

The step for opening a root region:
step A: Master tells Region server to open root region.
step B: Region server opens root region and sets zk node(rootServerZNodezk). It 
is finished means that catalogtracker can works.
step C: Region server updates the zk node(assignmentZNode) tells master that 
root has opened(some cases may fail, but we have told the root could be used).
step D: Master deletes the zk node (assignmentZNode) and adds root region to 
online set.

In my case, master skipped the step D because delayed. master forced root 
region online in processFailover. So zk node couldn't be deleted and failover 
case failed.

finishInitialization code：
// Make sure root and meta assigned before proceeding.
assignRootAndMeta();

// Is this fresh start with no regions assigned or are we a master joining
// an already-running cluster?  If regionsCount == 0, then for sure a
// fresh start.  TOOD: Be fancier.  If regionsCount == 2, perhaps the
// 2 are .META. and -ROOT- and we should fall into the fresh startup
// branch below.  For now, do processFailover.
if (regionCount == 0) {
  LOG.info(Master startup proceeding: cluster startup);
  this.assignmentManager.cleanoutUnassigned();
  this.assignmentManager.assignAllUserRegions();
} else {
  LOG.info(Master startup proceeding: master failover);
  this.assignmentManager.processFailover();
}

processFailover code:
HServerInfo hsi =
  this.serverManager.getHServerInfo(this.catalogTracker.getMetaLocation());
regionOnline(HRegionInfo.FIRST_META_REGIONINFO, hsi);
hsi = 
this.serverManager.getHServerInfo(this.catalogTracker.getRootLocation());
regionOnline(HRegionInfo.ROOT_REGIONINFO, hsi);


 TestMasterFailover fails occasionally
 -

 Key: HBASE-4212
 URL: https://issues.apache.org/jira/browse/HBASE-4212
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: gaojinchao
Assignee: gaojinchao
 Fix For: 0.90.5

 Attachments: HBASE-4212_TrunkV1.patch, HBASE-4212_branch90V1.patch


 It seems a bug. The root in RIT can't be moved..
 In the failover process, it enforces root on-line. But not clean zk node. 
 test will wait forever.
   void processFailover() throws KeeperException, IOException, 
 InterruptedException {
  
 // we enforce on-line root.
 HServerInfo hsi =
   
 this.serverManager.getHServerInfo(this.catalogTracker.getMetaLocation());
 regionOnline(HRegionInfo.FIRST_META_REGIONINFO, hsi);
 hsi = 
 this.serverManager.getHServerInfo(this.catalogTracker.getRootLocation());
 regionOnline(HRegionInfo.ROOT_REGIONINFO, hsi);
 It seems that we should wait finished as meta region 
   int assignRootAndMeta()
   throws InterruptedException, IOException, KeeperException {
 int assigned = 0;
 long timeout = this.conf.getLong(hbase.catalog.verification.timeout, 
 1000);
 // Work on ROOT region.  Is it in zk in transition?
 boolean rit = this.assignmentManager.
   
 processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO);
 if (!catalogTracker.verifyRootRegionLocation(timeout)) {
   this.assignmentManager.assignRoot();
   this.catalogTracker.waitForRoot();
   //we need add this code and guarantee that the transition has completed
   this.assignmentManager.waitForAssignment(HRegionInfo.ROOT_REGIONINFO);
   assigned++;
 }
 logs:
 2011-08-16 07:45:40,715 DEBUG 
 [RegionServer:0;C4S2.site,47710,1313495126115-EventThread] 
 zookeeper.ZooKeeperWatcher(252): regionserver:47710-0x131d2690f780004 
 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, 
 path=/hbase/unassigned/70236052
 2011-08-16 07:45:40,715 DEBUG [RS_OPEN_ROOT-C4S2.site,47710,1313495126115-0] 
 zookeeper.ZKAssign(712): regionserver:47710-0x131d2690f780004 Successfully 
 transitioned node 70236052 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING
 2011-08-16 07:45:40,715 DEBUG [Thread-760-EventThread] 
 zookeeper.ZooKeeperWatcher(252): master:60701-0x131d2690f780009 Received 
 ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, 
 path=/hbase/unassigned/70236052
 2011-08-16 07:45:40,716 INFO  [PostOpenDeployTasks:70236052] 
 catalog.RootLocationEditor(62): Setting ROOT region location in ZooKeeper as 
 C4S2.site:47710
 2011-08-16 07:45:40,716 DEBUG [Thread-760-EventThread] 
 zookeeper.ZKUtil(1109): master:60701-0x131d2690f780009 Retrieved

[jira] [Commented] (HBASE-4153) Handle RegionAlreadyInTransitionException in AssignmentManager


[ 
https://issues.apache.org/jira/browse/HBASE-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101231#comment-13101231
 ] 

Ted Yu commented on HBASE-4153:
---

So we should handle (Open Open) case b.

Thanks for the analysis Ramkrishna.

 Handle RegionAlreadyInTransitionException in AssignmentManager
 --

 Key: HBASE-4153
 URL: https://issues.apache.org/jira/browse/HBASE-4153
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0


 Comment from Stack over in HBASE-3741:
 {quote}
 Question: Looking at this patch again, if we throw a 
 RegionAlreadyInTransitionException, won't we just assign the region elsewhere 
 though RegionAlreadyInTransitionException in at least one case here is saying 
 that the region is already open on this regionserver?
 {quote}
 Indeed looking at the code it's going to be handled the same way other 
 exceptions are. Need to add special cases for assign and unassign.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4357) Region in transition - in closing state


[ 
https://issues.apache.org/jira/browse/HBASE-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101273#comment-13101273
 ] 

Ming Ma commented on HBASE-4357:


Stack, it is the trunk. I don't know the root cause yet.

 Region in transition - in closing state
 ---

 Key: HBASE-4357
 URL: https://issues.apache.org/jira/browse/HBASE-4357
 Project: HBase
  Issue Type: Bug
Reporter: Ming Ma

 Got the following during testing, 
 1. On a given machine, kill RS process id. Then kill HMaster process id.
 2. Start RS first via bin/hbase-daemon.sh --config ./conf start 
 regionserver.. Start HMaster via bin/hbase-daemon.sh --config ./conf start 
 master.
 One region of a table stayed in closing state.
 According to zookeeper,
 794a6ff17a4de0dd0a19b984ba18eea9 
 miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
  state=CLOSING, ts=Wed Sep 07 17:21:44 PDT 2011 (75701s ago), 
 server=sea-esxi-0,6,1315428682281 
 According to .META. table, the region has been assigned to from sea-esxi-0 to 
 sea-esxi-4.
 miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
  sea-esxi-4:60030  H\xB49X\x10bM\xB1 I7K\xC6\xA7\xEF\x9D\x90 0 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2195) Support cyclic replication


[ 
https://issues.apache.org/jira/browse/HBASE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101379#comment-13101379
 ] 

Lars Hofhansl commented on HBASE-2195:
--

Should CopyTable be made aware of Master - Master scenarios as well?

Otherwise everything that CopyTable copies from MasterI to MasterII is 
replicated back to the MasterI once.
At least maybe it should be added to the documentation (i.e. setup Master - 
Master replication after CopyTable is finished).


 Support cyclic replication
 --

 Key: HBASE-2195
 URL: https://issues.apache.org/jira/browse/HBASE-2195
 Project: HBase
  Issue Type: Sub-task
  Components: replication
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
 Attachments: 2195-v10.txt, 2195-v12.txt, 2195-v13.txt, 2195-v14.txt, 
 2195-v5.txt, 2195-v6.txt, 2195.txt


 We need to support cyclic replication by using the cluster id of each HlogKey 
 and stop replicating when it goes back to the original cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-2195) Support cyclic replication


 [ 
https://issues.apache.org/jira/browse/HBASE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-2195:
-

Affects Version/s: 0.92.0
Fix Version/s: 0.92.0

 Support cyclic replication
 --

 Key: HBASE-2195
 URL: https://issues.apache.org/jira/browse/HBASE-2195
 Project: HBase
  Issue Type: Sub-task
  Components: replication
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
 Fix For: 0.92.0

 Attachments: 2195-v10.txt, 2195-v12.txt, 2195-v13.txt, 2195-v14.txt, 
 2195-v5.txt, 2195-v6.txt, 2195.txt


 We need to support cyclic replication by using the cluster id of each HlogKey 
 and stop replicating when it goes back to the original cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4354) track region history

2011-09-09 Thread Andrew Purtell (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101412#comment-13101412
]

Andrew Purtell commented on HBASE-4354:
---

bq. There may have been deadlocks too around updating history while trying to
do edits in .META. but my memory may not be serving me right here

Yes.

bq. The natural place to do this stuff would be in a table inside hbase I'd
think.

The mistake we made last time IMHO was making region historian updating
synchronous with the transitions. If we instead log the transitions to a table
in a background thread (executor?) with best effort, the result could be viable.

track region history

Key: HBASE-4354
URL: https://issues.apache.org/jira/browse/HBASE-4354
Project: HBase
Issue Type: New Feature
Components: master, metrics, regionserver
Reporter: Ming Ma
Assignee: Ming Ma

For debugging and analysis purposes it will be useful to understand regions'
lifecycle, how it is created ( from which parent region, for example), how it
is splitted, assigned, etc. Some of these info are in the logs, hbase .META.
table, zookeeper, metrics. Certain history data is lost; for example, the
states will be removed from zookeeper /hbase/unassigned once the region is
assigned; also .META. table has max version of 10 thus only tracks the last
10 RS assignments of a given region. It will be nice to put it a central
place. It can provide:
1. How applications use hbase. For example, it might create large number of
regions in a short period of time and drop the table later.
2. How HBase internally manage regions such as how regions are splitted,
assigned, turned offline, etc.
Things to track
1. How it is created, parent region in the case of split.
2. Region tranisition process such as region state change, region server
change.
One idea is to put such transition history data to zookeeper. One issue is it
could blow up zookeeper memory if we have large number of regions and the
cluster runs for a long time. I would like to get your feedback on different
approaches to address the issue. One assumption is region assignment doesn't
happen with high frequency and thus the overhead introduced won't have much
impact on the system performance.
Approach 1:
Zookeeper knows the history of how /hbase/unassigned is modified, if we can
get zookeeper's logs (Bookkeeper ? ) somehow, we know the history of region
transition.
Approach 2:
1.HBase logs extra region transition data to zookeeper. It could be one
zookeeper node per transaction.
2.Have a separate thread on the Master to move data from zookeeper and
append to HDFS. That will keep the zookeeper size in check.
3.Have some tool or web UI to show the history of a given region by
looking at zookeeper and HDFS.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4358) Batch Table Alter Operations

2011-09-09 Thread jirapos...@reviews.apache.org (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101457#comment-13101457
 ] 

jirapos...@reviews.apache.org commented on HBASE-4358:
--



bq.  On 2011-09-09 02:13:22, Ted Yu wrote:
bq.   
/src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java,
 line 63
bq.   https://reviews.apache.org/r/1768/diff/1/?file=38944#file38944line63
bq.  
bq.   addFamily() can perform overwrite.
bq.   Better add more javadoc.

I'm not clear on exactly what you mean here. If it's that addFamily() will 
replace the old family descriptor with the new one, it seems like that would be 
expected behavior for a modify family handler.


- Riley


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1768/#review1826
---


On 2011-09-09 18:39:05, Riley Patterson wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1768/
bq.  ---
bq.  
bq.  (Updated 2011-09-09 18:39:05)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Currently, the RPC provides no way of asking for several table alterations 
at once, and the master has no way of batch handling alter requests. Thus, when 
the user requests several changes at the same time (i.e. add these I columns, 
delete these J columns, and modify these K columns), each region is brought 
down (I+J+K) times so that it can reflect the new schema. Additionally, 
multiple writes are made to META, and multiple RPC calls must be made.
bq.  
bq.  This patch provides batching for these operations, both at the RPC level 
and within the Master's TableEventHandlers. This involves a bit of 
reorganization in the TableEventHandler class hierarchy, and a new 
TableEventHandler, TableMultiFamilyHandler. The net effect ends up being the 
difference seen here:
bq.  
bq.  Before patch:
bq.  hbase(main):001:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', 
NAME = 'name'}
bq.  Updating all regions with the new schema...
bq.  1/1 regions updated.
bq.  Done.
bq.  Updating all regions with the new schema...
bq.  1/1 regions updated.
bq.  Done.
bq.  0 row(s) in 2.6450 seconds
bq.  
bq.  After patch:
bq.  hbase(main):002:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', 
NAME = 'name'}
bq.  Updating all regions with the new schema...
bq.  1/1 regions updated.
bq.  Done.
bq.  0 row(s) in 1.1930 seconds
bq.  
bq.  Regions are only brought down once, and the duration is cut 1/N.
bq.  
bq.  
bq.  This addresses bug HBASE-4358.
bq.  https://issues.apache.org/jira/browse/HBASE-4358
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq./src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java 
1166933 
bq./src/main/java/org/apache/hadoop/hbase/coprocessor/MasterObserver.java 
1166933 
bq./src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java 
1166933 
bq./src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 1166933 
bq./src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1166933 
bq./src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java 
1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java
 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableDeleteFamilyHandler.java
 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableFamilyHandler.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java
 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableMultiFamilyHandler.java
 PRE-CREATION 
bq./src/main/ruby/hbase/admin.rb 1166933 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java 
1166933 
bq.  
bq.  Diff: https://reviews.apache.org/r/1768/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Sanity checked functionality in psuedo-distributed mode (tried several 
permutations of different alterations, all completed successfully and with only 
one round of region restarts). Ran all unit tests successfully.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Riley
bq.  
bq.



 Batch Table Alter Operations
 

 Key: HBASE-4358
 URL: https://issues.apache.org/jira/browse/HBASE-4358
 Project: HBase
  Issue Type: Improvement
  Components: ipc, master, shell
Affects Versions: 0.92.0
Reporter: Riley Patterson
Assignee: Riley Patterson

[jira] [Commented] (HBASE-4358) Batch Table Alter Operations

2011-09-09 Thread jirapos...@reviews.apache.org (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101456#comment-13101456
 ] 

jirapos...@reviews.apache.org commented on HBASE-4358:
--



bq.  On 2011-09-09 03:40:32, Lars Hofhansl wrote:
bq.   /src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java, line 949
bq.   https://reviews.apache.org/r/1768/diff/1/?file=38934#file38934line949
bq.  
bq.   Coprocessors can modify these lists,right?
bq.   The list returned by Arrays.asList(...) is fixed-length, i.e. the 
coprocessor can neither add nor remove entries.
bq.   
bq.   If that's OK you can consider Collections.singletonList(column) 
instead.
bq.   
bq.   If it's not OK - which I think is the case - this probably needs to 
be new ArrayList(Collections.singleton(column))
bq.  

Makes sense. Good catch.


- Riley


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1768/#review1827
---


On 2011-09-09 18:39:05, Riley Patterson wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1768/
bq.  ---
bq.  
bq.  (Updated 2011-09-09 18:39:05)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Currently, the RPC provides no way of asking for several table alterations 
at once, and the master has no way of batch handling alter requests. Thus, when 
the user requests several changes at the same time (i.e. add these I columns, 
delete these J columns, and modify these K columns), each region is brought 
down (I+J+K) times so that it can reflect the new schema. Additionally, 
multiple writes are made to META, and multiple RPC calls must be made.
bq.  
bq.  This patch provides batching for these operations, both at the RPC level 
and within the Master's TableEventHandlers. This involves a bit of 
reorganization in the TableEventHandler class hierarchy, and a new 
TableEventHandler, TableMultiFamilyHandler. The net effect ends up being the 
difference seen here:
bq.  
bq.  Before patch:
bq.  hbase(main):001:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', 
NAME = 'name'}
bq.  Updating all regions with the new schema...
bq.  1/1 regions updated.
bq.  Done.
bq.  Updating all regions with the new schema...
bq.  1/1 regions updated.
bq.  Done.
bq.  0 row(s) in 2.6450 seconds
bq.  
bq.  After patch:
bq.  hbase(main):002:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', 
NAME = 'name'}
bq.  Updating all regions with the new schema...
bq.  1/1 regions updated.
bq.  Done.
bq.  0 row(s) in 1.1930 seconds
bq.  
bq.  Regions are only brought down once, and the duration is cut 1/N.
bq.  
bq.  
bq.  This addresses bug HBASE-4358.
bq.  https://issues.apache.org/jira/browse/HBASE-4358
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq./src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java 
1166933 
bq./src/main/java/org/apache/hadoop/hbase/coprocessor/MasterObserver.java 
1166933 
bq./src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java 
1166933 
bq./src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 1166933 
bq./src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1166933 
bq./src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java 
1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java
 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableDeleteFamilyHandler.java
 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableFamilyHandler.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java
 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableMultiFamilyHandler.java
 PRE-CREATION 
bq./src/main/ruby/hbase/admin.rb 1166933 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java 
1166933 
bq.  
bq.  Diff: https://reviews.apache.org/r/1768/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Sanity checked functionality in psuedo-distributed mode (tried several 
permutations of different alterations, all completed successfully and with only 
one round of region restarts). Ran all unit tests successfully.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Riley
bq.  
bq.



 Batch Table Alter Operations
 

 Key: HBASE-4358
 URL: https://issues.apache.org/jira/browse/HBASE-4358
 Project: HBase
  Issue Type: Improvement

[jira] [Commented] (HBASE-4358) Batch Table Alter Operations

2011-09-09 Thread jirapos...@reviews.apache.org (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101462#comment-13101462
 ] 

jirapos...@reviews.apache.org commented on HBASE-4358:
--



bq.  On 2011-09-09 03:40:33, Michael Stack wrote:
bq.   Looks good to me.  All the table mod tests still pass though they go via 
a different path now?

All unit tests pass.


- Riley


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1768/#review1828
---


On 2011-09-09 18:39:05, Riley Patterson wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1768/
bq.  ---
bq.  
bq.  (Updated 2011-09-09 18:39:05)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Currently, the RPC provides no way of asking for several table alterations 
at once, and the master has no way of batch handling alter requests. Thus, when 
the user requests several changes at the same time (i.e. add these I columns, 
delete these J columns, and modify these K columns), each region is brought 
down (I+J+K) times so that it can reflect the new schema. Additionally, 
multiple writes are made to META, and multiple RPC calls must be made.
bq.  
bq.  This patch provides batching for these operations, both at the RPC level 
and within the Master's TableEventHandlers. This involves a bit of 
reorganization in the TableEventHandler class hierarchy, and a new 
TableEventHandler, TableMultiFamilyHandler. The net effect ends up being the 
difference seen here:
bq.  
bq.  Before patch:
bq.  hbase(main):001:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', 
NAME = 'name'}
bq.  Updating all regions with the new schema...
bq.  1/1 regions updated.
bq.  Done.
bq.  Updating all regions with the new schema...
bq.  1/1 regions updated.
bq.  Done.
bq.  0 row(s) in 2.6450 seconds
bq.  
bq.  After patch:
bq.  hbase(main):002:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', 
NAME = 'name'}
bq.  Updating all regions with the new schema...
bq.  1/1 regions updated.
bq.  Done.
bq.  0 row(s) in 1.1930 seconds
bq.  
bq.  Regions are only brought down once, and the duration is cut 1/N.
bq.  
bq.  
bq.  This addresses bug HBASE-4358.
bq.  https://issues.apache.org/jira/browse/HBASE-4358
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq./src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java 
1166933 
bq./src/main/java/org/apache/hadoop/hbase/coprocessor/MasterObserver.java 
1166933 
bq./src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java 
1166933 
bq./src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 1166933 
bq./src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1166933 
bq./src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java 
1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java
 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableDeleteFamilyHandler.java
 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableFamilyHandler.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java
 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableMultiFamilyHandler.java
 PRE-CREATION 
bq./src/main/ruby/hbase/admin.rb 1166933 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java 
1166933 
bq.  
bq.  Diff: https://reviews.apache.org/r/1768/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Sanity checked functionality in psuedo-distributed mode (tried several 
permutations of different alterations, all completed successfully and with only 
one round of region restarts). Ran all unit tests successfully.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Riley
bq.  
bq.



 Batch Table Alter Operations
 

 Key: HBASE-4358
 URL: https://issues.apache.org/jira/browse/HBASE-4358
 Project: HBase
  Issue Type: Improvement
  Components: ipc, master, shell
Affects Versions: 0.92.0
Reporter: Riley Patterson
Assignee: Riley Patterson
Priority: Minor
 Attachments: HBASE-4358.patch


 Currently, the RPC provides no way of asking for several table alterations at 
 once, and the master has no way of batch handling alter requests. Thus, when 
 the user requests several changes at the same time (i.e. add these I columns, 
 delete these J columns, and

[jira] [Commented] (HBASE-4358) Batch Table Alter Operations

2011-09-09 Thread jirapos...@reviews.apache.org (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101461#comment-13101461
 ] 

jirapos...@reviews.apache.org commented on HBASE-4358:
--



bq.  On 2011-09-09 04:38:24, Ted Yu wrote:
bq.   
/src/main/java/org/apache/hadoop/hbase/master/handler/TableMultiFamilyHandler.java,
 line 75
bq.   https://reviews.apache.org/r/1768/diff/1/?file=38945#file38945line75
bq.  
bq.   IOE may pop up from any operation.
bq.   I think we should document that we adopt fail fast strategy.
bq.   
bq.   Personally I think we should catch and store one 
InvalidFamilyOperationException, if any pops up.
bq.   After completing all operations, we throw the stored 
InvalidFamilyOperationException.
bq.  

How about we just allocate a new HTableDescriptor object that we pass to 
updateTableDescriptor? Then we can document that if there is even a single 
exception, no changes were made. The updates to the FS and region restarts 
don't occur until after the potential IOExceptions due to updateTableDescriptor.


- Riley


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1768/#review1830
---


On 2011-09-09 18:39:05, Riley Patterson wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1768/
bq.  ---
bq.  
bq.  (Updated 2011-09-09 18:39:05)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Currently, the RPC provides no way of asking for several table alterations 
at once, and the master has no way of batch handling alter requests. Thus, when 
the user requests several changes at the same time (i.e. add these I columns, 
delete these J columns, and modify these K columns), each region is brought 
down (I+J+K) times so that it can reflect the new schema. Additionally, 
multiple writes are made to META, and multiple RPC calls must be made.
bq.  
bq.  This patch provides batching for these operations, both at the RPC level 
and within the Master's TableEventHandlers. This involves a bit of 
reorganization in the TableEventHandler class hierarchy, and a new 
TableEventHandler, TableMultiFamilyHandler. The net effect ends up being the 
difference seen here:
bq.  
bq.  Before patch:
bq.  hbase(main):001:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', 
NAME = 'name'}
bq.  Updating all regions with the new schema...
bq.  1/1 regions updated.
bq.  Done.
bq.  Updating all regions with the new schema...
bq.  1/1 regions updated.
bq.  Done.
bq.  0 row(s) in 2.6450 seconds
bq.  
bq.  After patch:
bq.  hbase(main):002:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', 
NAME = 'name'}
bq.  Updating all regions with the new schema...
bq.  1/1 regions updated.
bq.  Done.
bq.  0 row(s) in 1.1930 seconds
bq.  
bq.  Regions are only brought down once, and the duration is cut 1/N.
bq.  
bq.  
bq.  This addresses bug HBASE-4358.
bq.  https://issues.apache.org/jira/browse/HBASE-4358
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq./src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java 
1166933 
bq./src/main/java/org/apache/hadoop/hbase/coprocessor/MasterObserver.java 
1166933 
bq./src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java 
1166933 
bq./src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 1166933 
bq./src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1166933 
bq./src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java 
1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java
 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableDeleteFamilyHandler.java
 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableFamilyHandler.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java
 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableMultiFamilyHandler.java
 PRE-CREATION 
bq./src/main/ruby/hbase/admin.rb 1166933 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java 
1166933 
bq.  
bq.  Diff: https://reviews.apache.org/r/1768/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Sanity checked functionality in psuedo-distributed mode (tried several 
permutations of different alterations, all completed successfully and with only 
one round of region restarts). Ran all unit tests successfully.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Riley
bq.  
bq.



 Batch Table

[jira] [Commented] (HBASE-4358) Batch Table Alter Operations

2011-09-09 Thread jirapos...@reviews.apache.org (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101490#comment-13101490
 ] 

jirapos...@reviews.apache.org commented on HBASE-4358:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1768/#review1841
---


Can you perform testing on a small, real cluster ?


/src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java
https://reviews.apache.org/r/1768/#comment4212

Where does this method call end up in this patch ?



/src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java
https://reviews.apache.org/r/1768/#comment4213

I meant we should document that addFamily() performs modification here.
This is minor.


- Ted


On 2011-09-09 18:39:05, Riley Patterson wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1768/
bq.  ---
bq.  
bq.  (Updated 2011-09-09 18:39:05)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Currently, the RPC provides no way of asking for several table alterations 
at once, and the master has no way of batch handling alter requests. Thus, when 
the user requests several changes at the same time (i.e. add these I columns, 
delete these J columns, and modify these K columns), each region is brought 
down (I+J+K) times so that it can reflect the new schema. Additionally, 
multiple writes are made to META, and multiple RPC calls must be made.
bq.  
bq.  This patch provides batching for these operations, both at the RPC level 
and within the Master's TableEventHandlers. This involves a bit of 
reorganization in the TableEventHandler class hierarchy, and a new 
TableEventHandler, TableMultiFamilyHandler. The net effect ends up being the 
difference seen here:
bq.  
bq.  Before patch:
bq.  hbase(main):001:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', 
NAME = 'name'}
bq.  Updating all regions with the new schema...
bq.  1/1 regions updated.
bq.  Done.
bq.  Updating all regions with the new schema...
bq.  1/1 regions updated.
bq.  Done.
bq.  0 row(s) in 2.6450 seconds
bq.  
bq.  After patch:
bq.  hbase(main):002:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', 
NAME = 'name'}
bq.  Updating all regions with the new schema...
bq.  1/1 regions updated.
bq.  Done.
bq.  0 row(s) in 1.1930 seconds
bq.  
bq.  Regions are only brought down once, and the duration is cut 1/N.
bq.  
bq.  
bq.  This addresses bug HBASE-4358.
bq.  https://issues.apache.org/jira/browse/HBASE-4358
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq./src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java 
1166933 
bq./src/main/java/org/apache/hadoop/hbase/coprocessor/MasterObserver.java 
1166933 
bq./src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java 
1166933 
bq./src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 1166933 
bq./src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1166933 
bq./src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java 
1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java
 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableDeleteFamilyHandler.java
 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableFamilyHandler.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java
 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableMultiFamilyHandler.java
 PRE-CREATION 
bq./src/main/ruby/hbase/admin.rb 1166933 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java 
1166933 
bq.  
bq.  Diff: https://reviews.apache.org/r/1768/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Sanity checked functionality in psuedo-distributed mode (tried several 
permutations of different alterations, all completed successfully and with only 
one round of region restarts). Ran all unit tests successfully.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Riley
bq.  
bq.



 Batch Table Alter Operations
 

 Key: HBASE-4358
 URL: https://issues.apache.org/jira/browse/HBASE-4358
 Project: HBase
  Issue Type: Improvement
  Components: ipc, master, shell
Affects Versions: 0.92.0
Reporter: Riley Patterson
Assignee: Riley Patterson
Priority: Minor

[jira] [Commented] (HBASE-4358) Batch Table Alter Operations

2011-09-09 Thread jirapos...@reviews.apache.org (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101503#comment-13101503
 ] 

jirapos...@reviews.apache.org commented on HBASE-4358:
--



bq.  On 2011-09-09 19:00:11, Ted Yu wrote:
bq.   Can you perform testing on a small, real cluster ?

Will do with this next revision.


bq.  On 2011-09-09 19:00:11, Ted Yu wrote:
bq.   
/src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java,
 line 64
bq.   https://reviews.apache.org/r/1768/diff/1/?file=38941#file38941line64
bq.  
bq.   Where does this method call end up in this patch ?

It is not in the patch - its functionality is redundant with 
TableFamilyHandler.handleTableOperation(). Both the MasterFileSystem's services 
and the services passed to the handlers are simply a reference to the master 
itself, and both run getTableDescriptors().add() and 
getTableDescriptors().get() on the reference.


- Riley


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1768/#review1841
---


On 2011-09-09 18:39:05, Riley Patterson wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1768/
bq.  ---
bq.  
bq.  (Updated 2011-09-09 18:39:05)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Currently, the RPC provides no way of asking for several table alterations 
at once, and the master has no way of batch handling alter requests. Thus, when 
the user requests several changes at the same time (i.e. add these I columns, 
delete these J columns, and modify these K columns), each region is brought 
down (I+J+K) times so that it can reflect the new schema. Additionally, 
multiple writes are made to META, and multiple RPC calls must be made.
bq.  
bq.  This patch provides batching for these operations, both at the RPC level 
and within the Master's TableEventHandlers. This involves a bit of 
reorganization in the TableEventHandler class hierarchy, and a new 
TableEventHandler, TableMultiFamilyHandler. The net effect ends up being the 
difference seen here:
bq.  
bq.  Before patch:
bq.  hbase(main):001:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', 
NAME = 'name'}
bq.  Updating all regions with the new schema...
bq.  1/1 regions updated.
bq.  Done.
bq.  Updating all regions with the new schema...
bq.  1/1 regions updated.
bq.  Done.
bq.  0 row(s) in 2.6450 seconds
bq.  
bq.  After patch:
bq.  hbase(main):002:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', 
NAME = 'name'}
bq.  Updating all regions with the new schema...
bq.  1/1 regions updated.
bq.  Done.
bq.  0 row(s) in 1.1930 seconds
bq.  
bq.  Regions are only brought down once, and the duration is cut 1/N.
bq.  
bq.  
bq.  This addresses bug HBASE-4358.
bq.  https://issues.apache.org/jira/browse/HBASE-4358
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq./src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java 
1166933 
bq./src/main/java/org/apache/hadoop/hbase/coprocessor/MasterObserver.java 
1166933 
bq./src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java 
1166933 
bq./src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 1166933 
bq./src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1166933 
bq./src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java 
1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java
 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableDeleteFamilyHandler.java
 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableFamilyHandler.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java
 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableMultiFamilyHandler.java
 PRE-CREATION 
bq./src/main/ruby/hbase/admin.rb 1166933 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java 
1166933 
bq.  
bq.  Diff: https://reviews.apache.org/r/1768/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Sanity checked functionality in psuedo-distributed mode (tried several 
permutations of different alterations, all completed successfully and with only 
one round of region restarts). Ran all unit tests successfully.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Riley
bq.  
bq.



 Batch Table Alter Operations
 

 Key: HBASE-4358
 URL:

[jira] [Commented] (HBASE-4358) Batch Table Alter Operations

2011-09-09 Thread jirapos...@reviews.apache.org (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101515#comment-13101515
 ] 

jirapos...@reviews.apache.org commented on HBASE-4358:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1768/#review1843
---



/src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java
https://reviews.apache.org/r/1768/#comment4215

After HBASE-451, changes to table descriptor have to be persisted to HDFS.

I browsed handleTableOperation() methods in this patch and didn't find that.


- Ted


On 2011-09-09 18:39:05, Riley Patterson wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1768/
bq.  ---
bq.  
bq.  (Updated 2011-09-09 18:39:05)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Currently, the RPC provides no way of asking for several table alterations 
at once, and the master has no way of batch handling alter requests. Thus, when 
the user requests several changes at the same time (i.e. add these I columns, 
delete these J columns, and modify these K columns), each region is brought 
down (I+J+K) times so that it can reflect the new schema. Additionally, 
multiple writes are made to META, and multiple RPC calls must be made.
bq.  
bq.  This patch provides batching for these operations, both at the RPC level 
and within the Master's TableEventHandlers. This involves a bit of 
reorganization in the TableEventHandler class hierarchy, and a new 
TableEventHandler, TableMultiFamilyHandler. The net effect ends up being the 
difference seen here:
bq.  
bq.  Before patch:
bq.  hbase(main):001:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', 
NAME = 'name'}
bq.  Updating all regions with the new schema...
bq.  1/1 regions updated.
bq.  Done.
bq.  Updating all regions with the new schema...
bq.  1/1 regions updated.
bq.  Done.
bq.  0 row(s) in 2.6450 seconds
bq.  
bq.  After patch:
bq.  hbase(main):002:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', 
NAME = 'name'}
bq.  Updating all regions with the new schema...
bq.  1/1 regions updated.
bq.  Done.
bq.  0 row(s) in 1.1930 seconds
bq.  
bq.  Regions are only brought down once, and the duration is cut 1/N.
bq.  
bq.  
bq.  This addresses bug HBASE-4358.
bq.  https://issues.apache.org/jira/browse/HBASE-4358
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq./src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java 
1166933 
bq./src/main/java/org/apache/hadoop/hbase/coprocessor/MasterObserver.java 
1166933 
bq./src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java 
1166933 
bq./src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 1166933 
bq./src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1166933 
bq./src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java 
1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java
 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableDeleteFamilyHandler.java
 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableFamilyHandler.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java
 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableMultiFamilyHandler.java
 PRE-CREATION 
bq./src/main/ruby/hbase/admin.rb 1166933 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java 
1166933 
bq.  
bq.  Diff: https://reviews.apache.org/r/1768/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Sanity checked functionality in psuedo-distributed mode (tried several 
permutations of different alterations, all completed successfully and with only 
one round of region restarts). Ran all unit tests successfully.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Riley
bq.  
bq.



 Batch Table Alter Operations
 

 Key: HBASE-4358
 URL: https://issues.apache.org/jira/browse/HBASE-4358
 Project: HBase
  Issue Type: Improvement
  Components: ipc, master, shell
Affects Versions: 0.92.0
Reporter: Riley Patterson
Assignee: Riley Patterson
Priority: Minor
 Attachments: HBASE-4358.patch


 Currently, the RPC provides no way of asking for several table alterations at 
 once, and the master has no way of batch handling alter requests.

[jira] [Updated] (HBASE-4194) RegionSplitter: Split on under-loaded region servers first


 [ 
https://issues.apache.org/jira/browse/HBASE-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4194:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

This was committed by Ted a while back.  Resolving.

 RegionSplitter: Split on under-loaded region servers first
 --

 Key: HBASE-4194
 URL: https://issues.apache.org/jira/browse/HBASE-4194
 Project: HBase
  Issue Type: Improvement
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Trivial
 Fix For: 0.92.0

 Attachments: HBASE-4194.patch


 When running RegionSplitter, our app devs noticed that they were getting a 
 lot of NSREs.  This is caused by 2 factors: 
 1. the split itself will cause an NSRE 
 2. any load balancing will cause one.  
 The former cannot be helped.  We can more tightly control load balancing 
 though.  Instead of doing a name-sorted round-robin split across RS in the 
 tier, we could sort the RS's by region count.  That way, we only split an RS 
 with 10 regions after there are no more RS with 9 regions.  This will prevent 
 the load balancing slop from kicking in and will fix the problem where 
 restarting RegionSplitter always starts splitting at RS #1.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4243) HADOOP_HOME should be auto-detected


 [ 
https://issues.apache.org/jira/browse/HBASE-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4243:
-

Fix Version/s: (was: 0.92.0)

 HADOOP_HOME should be auto-detected
 ---

 Key: HBASE-4243
 URL: https://issues.apache.org/jira/browse/HBASE-4243
 Project: HBase
  Issue Type: Improvement
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik
Priority: Minor
 Attachments: HBASE-4243.patch.txt


 Now that HBASE-3465 has been integrated, perhaps we should try to auto-detect 
 the HADOOP_HOME setting if it is not given explicitly. Something along the 
 lines of:
 {noformat}
 # check for hadoop in the path
 141   HADOOP_IN_PATH=`which hadoop 2/dev/null`
 142   if [ -f ${HADOOP_IN_PATH} ]; then
 143 HADOOP_DIR=`dirname $HADOOP_IN_PATH`/..
 144   fi
 145   # HADOOP_HOME env variable overrides hadoop in the path
 146   HADOOP_HOME=${HADOOP_HOME:-$HADOOP_DIR}
 147   if [ $HADOOP_HOME ==  ]; then
 148 echo Cannot find hadoop installation: \$HADOOP_HOME must be set or 
 hadoop must be in the path;
 149 exit 4;
 150   fi
 {noformat}
 Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4347) Remove duplicated code from Put, Delete, Get, Scan, MultiPut


[ 
https://issues.apache.org/jira/browse/HBASE-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101523#comment-13101523
 ] 

stack commented on HBASE-4347:
--

Mind running tests Lars?  I can fix license on commit.

 Remove duplicated code from Put, Delete, Get, Scan, MultiPut
 

 Key: HBASE-4347
 URL: https://issues.apache.org/jira/browse/HBASE-4347
 Project: HBase
  Issue Type: Improvement
  Components: util
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Priority: Minor
 Fix For: 0.92.0

 Attachments: 4347-v2.txt, 4347.txt


 This came from discussion with Stack w.r.t. HBASE-2195.
 There is currently a lot of duplicated code especially between Put and 
 Delete, and also between all Operations.
 For example all of Put/Delete/Get/Scan have attributes with exactly the same 
 code in all classes.
 Put and Delete also have the familyMap, Row, Rowlock, Timestamp, etc.
 One way to do this is to introduce OperationWithAttributes which extends 
 Operation, and have Put/Delete/Get/Scan extend that rather than Operation.
 In addition Put and Delete could extends from Mutation (which itself would 
 extend OperationWithAttributes).
 If a static inheritance hierarchy is not desired here, we can use delegation.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-4301) META migration from 0.90 to trunk fails


 [ 
https://issues.apache.org/jira/browse/HBASE-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-4301.
--

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Committed to TRUNK yesterday by Ted.  Thanks for the patch Subbu (and Ted) and 
to Sebastian for debugging help.

 META migration from 0.90 to trunk fails
 ---

 Key: HBASE-4301
 URL: https://issues.apache.org/jira/browse/HBASE-4301
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Subbu M Iyer
Priority: Blocker
 Fix For: 0.92.0

 Attachments: 4301-1-Fixed_Root_migration_to_newer_HRI_format_.patch, 
 4301-2-Fixed_Root_migration_to_newer_HRI_format_.patch, 
 4301-Fixed_Root_migration_to_newer_HRI_format_.patch, 4301-v3.txt, 
 4301-v4.txt, 4301-v7.txt, 4301.txt, 4301_v2.txt, logs.tar.gz, master-log.txt, 
 meta_migrate, meta_trunk, root_migrate, root_trunk


 I started a trunk cluster as an upgrade from 0.90.4ish, and now I can't scan 
 my .META. table, etc, and other operations fail.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4347) Remove duplicated code from Put, Delete, Get, Scan, MultiPut


[ 
https://issues.apache.org/jira/browse/HBASE-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101538#comment-13101538
 ] 

Lars Hofhansl commented on HBASE-4347:
--

This change is causing a *significant* slowdown in some of the tests. I must 
have missed something...
Will report back when I found the problem.


 Remove duplicated code from Put, Delete, Get, Scan, MultiPut
 

 Key: HBASE-4347
 URL: https://issues.apache.org/jira/browse/HBASE-4347
 Project: HBase
  Issue Type: Improvement
  Components: util
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Priority: Minor
 Fix For: 0.92.0

 Attachments: 4347-v2.txt, 4347.txt


 This came from discussion with Stack w.r.t. HBASE-2195.
 There is currently a lot of duplicated code especially between Put and 
 Delete, and also between all Operations.
 For example all of Put/Delete/Get/Scan have attributes with exactly the same 
 code in all classes.
 Put and Delete also have the familyMap, Row, Rowlock, Timestamp, etc.
 One way to do this is to introduce OperationWithAttributes which extends 
 Operation, and have Put/Delete/Get/Scan extend that rather than Operation.
 In addition Put and Delete could extends from Mutation (which itself would 
 extend OperationWithAttributes).
 If a static inheritance hierarchy is not desired here, we can use delegation.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4243) HADOOP_HOME should be auto-detected

2011-09-09 Thread Roman Shaposhnik (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101540#comment-13101540
 ] 

Roman Shaposhnik commented on HBASE-4243:
-

Sorry about that. I tend to assume Linux these days. What's the level of UNIX 
API I can count on? POSIX? Or even less than that? Please let me know and I'll 
update the patch.

 HADOOP_HOME should be auto-detected
 ---

 Key: HBASE-4243
 URL: https://issues.apache.org/jira/browse/HBASE-4243
 Project: HBase
  Issue Type: Improvement
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik
Priority: Minor
 Attachments: HBASE-4243.patch.txt


 Now that HBASE-3465 has been integrated, perhaps we should try to auto-detect 
 the HADOOP_HOME setting if it is not given explicitly. Something along the 
 lines of:
 {noformat}
 # check for hadoop in the path
 141   HADOOP_IN_PATH=`which hadoop 2/dev/null`
 142   if [ -f ${HADOOP_IN_PATH} ]; then
 143 HADOOP_DIR=`dirname $HADOOP_IN_PATH`/..
 144   fi
 145   # HADOOP_HOME env variable overrides hadoop in the path
 146   HADOOP_HOME=${HADOOP_HOME:-$HADOOP_DIR}
 147   if [ $HADOOP_HOME ==  ]; then
 148 echo Cannot find hadoop installation: \$HADOOP_HOME must be set or 
 hadoop must be in the path;
 149 exit 4;
 150   fi
 {noformat}
 Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4358) Batch Table Alter Operations

2011-09-09 Thread jirapos...@reviews.apache.org (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101551#comment-13101551
 ] 

jirapos...@reviews.apache.org commented on HBASE-4358:
--



bq.  On 2011-09-09 19:48:45, Ted Yu wrote:
bq.   
/src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java,
 line 64
bq.   https://reviews.apache.org/r/1768/diff/1/?file=38941#file38941line64
bq.  
bq.   After HBASE-451, changes to table descriptor have to be persisted to 
HDFS.
bq.   
bq.   I browsed handleTableOperation() methods in this patch and didn't 
find that.

I'm not familiar enough with how exactly table descriptors are persisted to be 
able to tell you for certain that this approach correctly ensures persistence. 
But I can confidently tell you that this diff does everything that the current 
trunk does with regards to updating table descriptors. If you look at the 
actual implementation of MasterFileSystem.{add,modify,Delete}Column(), it gets 
a table descriptor from master services, modifies the table descriptor 
appropriately, then adds it back to master services' table descriptors. Between 
handleTableOperation() and updateTableDescriptor(), this patch follows the 
exact same procedure, and has the same instance of MasterServices. This 
separation is for the purpose of enabling batching to happen in a way that 
doesn't leave the system in an intermediate state in the case of a thrown 
exception. From basic testing, I see that the table descriptor changes are 
actually being written to my fs. If this procedure is not enough to actually 
ensure that this happens, we should file a separate JIRA to look into it.


- Riley


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1768/#review1843
---


On 2011-09-09 18:39:05, Riley Patterson wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1768/
bq.  ---
bq.  
bq.  (Updated 2011-09-09 18:39:05)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Currently, the RPC provides no way of asking for several table alterations 
at once, and the master has no way of batch handling alter requests. Thus, when 
the user requests several changes at the same time (i.e. add these I columns, 
delete these J columns, and modify these K columns), each region is brought 
down (I+J+K) times so that it can reflect the new schema. Additionally, 
multiple writes are made to META, and multiple RPC calls must be made.
bq.  
bq.  This patch provides batching for these operations, both at the RPC level 
and within the Master's TableEventHandlers. This involves a bit of 
reorganization in the TableEventHandler class hierarchy, and a new 
TableEventHandler, TableMultiFamilyHandler. The net effect ends up being the 
difference seen here:
bq.  
bq.  Before patch:
bq.  hbase(main):001:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', 
NAME = 'name'}
bq.  Updating all regions with the new schema...
bq.  1/1 regions updated.
bq.  Done.
bq.  Updating all regions with the new schema...
bq.  1/1 regions updated.
bq.  Done.
bq.  0 row(s) in 2.6450 seconds
bq.  
bq.  After patch:
bq.  hbase(main):002:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', 
NAME = 'name'}
bq.  Updating all regions with the new schema...
bq.  1/1 regions updated.
bq.  Done.
bq.  0 row(s) in 1.1930 seconds
bq.  
bq.  Regions are only brought down once, and the duration is cut 1/N.
bq.  
bq.  
bq.  This addresses bug HBASE-4358.
bq.  https://issues.apache.org/jira/browse/HBASE-4358
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq./src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java 
1166933 
bq./src/main/java/org/apache/hadoop/hbase/coprocessor/MasterObserver.java 
1166933 
bq./src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java 
1166933 
bq./src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 1166933 
bq./src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1166933 
bq./src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java 
1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java
 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableDeleteFamilyHandler.java
 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableFamilyHandler.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java
 1166933 
bq.

[jira] [Commented] (HBASE-4347) Remove duplicated code from Put, Delete, Get, Scan, MultiPut

[
https://issues.apache.org/jira/browse/HBASE-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101557#comment-13101557
]

Lars Hofhansl commented on HBASE-4347:
--

Found the problem. I had accidentally written readFields(in) instead of
readAttributes(in) in Scan.readFields(in). So that would just wait forever for
the stream... Running tests now.

Remove duplicated code from Put, Delete, Get, Scan, MultiPut

Key: HBASE-4347
URL: https://issues.apache.org/jira/browse/HBASE-4347
Project: HBase
Issue Type: Improvement
Components: util
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Priority: Minor
Fix For: 0.92.0

Attachments: 4347-v2.txt, 4347.txt

This came from discussion with Stack w.r.t. HBASE-2195.
There is currently a lot of duplicated code especially between Put and
Delete, and also between all Operations.
For example all of Put/Delete/Get/Scan have attributes with exactly the same
code in all classes.
Put and Delete also have the familyMap, Row, Rowlock, Timestamp, etc.
One way to do this is to introduce OperationWithAttributes which extends
Operation, and have Put/Delete/Get/Scan extend that rather than Operation.
In addition Put and Delete could extends from Mutation (which itself would
extend OperationWithAttributes).
If a static inheritance hierarchy is not desired here, we can use delegation.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4357) Region in transition - in closing state


[ 
https://issues.apache.org/jira/browse/HBASE-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101574#comment-13101574
 ] 

Ming Ma commented on HBASE-4357:


Here is the issue. It has nothing to do with master restart.

CloseRegionHandler.getCurrentVersion failed. Thus regionserver can't close the 
region properly. One reason it can't get data from zookeeper could be that 
there are lots of regions in transition.


11/09/07 17:21:48 WARN handler.CloseRegionHandler: Error getting node's version 
in CLOSING state, aborting close of 
miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.


Possible fixes:

1. Perhaps CloseRegionHandler.getCurrentVersion should retry on calls to 
ZKAssign.getVersion?
2. Timeout Monitor doesn't do anything for region that stays in CLOSING state 
for long. Perhaps it can try to repair it like reissuing a closeregion request 
in RS?

 Region in transition - in closing state
 ---

 Key: HBASE-4357
 URL: https://issues.apache.org/jira/browse/HBASE-4357
 Project: HBase
  Issue Type: Bug
Reporter: Ming Ma

 Got the following during testing, 
 1. On a given machine, kill RS process id. Then kill HMaster process id.
 2. Start RS first via bin/hbase-daemon.sh --config ./conf start 
 regionserver.. Start HMaster via bin/hbase-daemon.sh --config ./conf start 
 master.
 One region of a table stayed in closing state.
 According to zookeeper,
 794a6ff17a4de0dd0a19b984ba18eea9 
 miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
  state=CLOSING, ts=Wed Sep 07 17:21:44 PDT 2011 (75701s ago), 
 server=sea-esxi-0,6,1315428682281 
 According to .META. table, the region has been assigned to from sea-esxi-0 to 
 sea-esxi-4.
 miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
  sea-esxi-4:60030  H\xB49X\x10bM\xB1 I7K\xC6\xA7\xEF\x9D\x90 0 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4358) Batch Table Alter Operations

2011-09-09 Thread Riley Patterson (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Riley Patterson updated HBASE-4358:
---

Attachment: HBASE-4358-v2.patch

Addressed comments made on the review board. Cleaned up whitespace.

Batch Table Alter Operations

Key: HBASE-4358
URL: https://issues.apache.org/jira/browse/HBASE-4358
Project: HBase
Issue Type: Improvement
Components: ipc, master, shell
Affects Versions: 0.92.0
Reporter: Riley Patterson
Assignee: Riley Patterson
Priority: Minor
Attachments: HBASE-4358-v2.patch, HBASE-4358.patch

Currently, the RPC provides no way of asking for several table alterations at
once, and the master has no way of batch handling alter requests. Thus, when
the user requests several changes at the same time (i.e. add these I columns,
delete these J columns, and modify these K columns), each region is brought
down (I+J+K) times so that it can reflect the new schema. Additionally,
multiple writes are made to META, and multiple RPC calls must be made.
This patch provides batching for these operations, both at the RPC level and
within the Master's TableEventHandlers. This involves a bit of reorganization
in the TableEventHandler class hierarchy, and a new TableEventHandler,
TableMultiFamilyHandler. The net effect ends up being the difference seen
here:
Before patch:
hbase(main):001:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', NAME
= 'name'}
Updating all regions with the new schema...
1/1 regions updated.
Done.
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 2.6450 seconds
After patch:
hbase(main):002:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', NAME
= 'name'}
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.1930 seconds
Regions are only brought down once, and the duration is cut 1/N.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4358) Batch Table Alter Operations

2011-09-09 Thread jirapos...@reviews.apache.org (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101579#comment-13101579
]

jirapos...@reviews.apache.org commented on HBASE-4358:
--

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1768/
---

(Updated 2011-09-09 21:23:14.655517)

Review request for hbase.

Changes
---

Addressed various comments made here.

Summary
---

This patch provides batching for these operations, both at the RPC level and
within the Master's TableEventHandlers. This involves a bit of reorganization
in the TableEventHandler class hierarchy, and a new TableEventHandler,
TableMultiFamilyHandler. The net effect ends up being the difference seen here:

Before patch:
hbase(main):001:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', NAME
= 'name'}
Updating all regions with the new schema...
1/1 regions updated.
Done.
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 2.6450 seconds

After patch:
hbase(main):002:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', NAME
= 'name'}
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.1930 seconds

Regions are only brought down once, and the duration is cut 1/N.

This addresses bug HBASE-4358.
https://issues.apache.org/jira/browse/HBASE-4358

Diffs (updated)
-

/src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java 1166933
/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 1166933
/src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java
1166933
/src/main/java/org/apache/hadoop/hbase/coprocessor/MasterObserver.java
1166933
/src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java 1166933
/src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 1166933
/src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1166933
/src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java
1166933

/src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java
1166933

/src/main/java/org/apache/hadoop/hbase/master/handler/TableDeleteFamilyHandler.java
1166933
/src/main/java/org/apache/hadoop/hbase/master/handler/TableFamilyHandler.java
PRE-CREATION

/src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java
1166933

/src/main/java/org/apache/hadoop/hbase/master/handler/TableMultiFamilyHandler.java
PRE-CREATION
/src/main/ruby/hbase/admin.rb 1166933
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java
1166933

Diff: https://reviews.apache.org/r/1768/diff

Testing
---

Sanity checked functionality in psuedo-distributed mode (tried several
permutations of different alterations, all completed successfully and with only
one round of region restarts). Ran all unit tests successfully.

Thanks,

Riley

Batch Table Alter Operations

[jira] [Commented] (HBASE-4358) Batch Table Alter Operations

2011-09-09 Thread jirapos...@reviews.apache.org (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101592#comment-13101592
 ] 

jirapos...@reviews.apache.org commented on HBASE-4358:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1768/#review1846
---



/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
https://reviews.apache.org/r/1768/#comment4218

Missed another Arrays.asList(...) :)



/src/main/java/org/apache/hadoop/hbase/master/handler/TableFamilyHandler.java
https://reviews.apache.org/r/1768/#comment4219

Trailing whitespace :(


- Lars


On 2011-09-09 21:23:14, Riley Patterson wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1768/
bq.  ---
bq.  
bq.  (Updated 2011-09-09 21:23:14)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Currently, the RPC provides no way of asking for several table alterations 
at once, and the master has no way of batch handling alter requests. Thus, when 
the user requests several changes at the same time (i.e. add these I columns, 
delete these J columns, and modify these K columns), each region is brought 
down (I+J+K) times so that it can reflect the new schema. Additionally, 
multiple writes are made to META, and multiple RPC calls must be made.
bq.  
bq.  This patch provides batching for these operations, both at the RPC level 
and within the Master's TableEventHandlers. This involves a bit of 
reorganization in the TableEventHandler class hierarchy, and a new 
TableEventHandler, TableMultiFamilyHandler. The net effect ends up being the 
difference seen here:
bq.  
bq.  Before patch:
bq.  hbase(main):001:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', 
NAME = 'name'}
bq.  Updating all regions with the new schema...
bq.  1/1 regions updated.
bq.  Done.
bq.  Updating all regions with the new schema...
bq.  1/1 regions updated.
bq.  Done.
bq.  0 row(s) in 2.6450 seconds
bq.  
bq.  After patch:
bq.  hbase(main):002:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', 
NAME = 'name'}
bq.  Updating all regions with the new schema...
bq.  1/1 regions updated.
bq.  Done.
bq.  0 row(s) in 1.1930 seconds
bq.  
bq.  Regions are only brought down once, and the duration is cut 1/N.
bq.  
bq.  
bq.  This addresses bug HBASE-4358.
bq.  https://issues.apache.org/jira/browse/HBASE-4358
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq./src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java 1166933 
bq./src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java 
1166933 
bq./src/main/java/org/apache/hadoop/hbase/coprocessor/MasterObserver.java 
1166933 
bq./src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java 
1166933 
bq./src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 1166933 
bq./src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1166933 
bq./src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java 
1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java
 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableDeleteFamilyHandler.java
 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableFamilyHandler.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java
 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableMultiFamilyHandler.java
 PRE-CREATION 
bq./src/main/ruby/hbase/admin.rb 1166933 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java 
1166933 
bq.  
bq.  Diff: https://reviews.apache.org/r/1768/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Sanity checked functionality in psuedo-distributed mode (tried several 
permutations of different alterations, all completed successfully and with only 
one round of region restarts). Ran all unit tests successfully.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Riley
bq.  
bq.



 Batch Table Alter Operations
 

 Key: HBASE-4358
 URL: https://issues.apache.org/jira/browse/HBASE-4358
 Project: HBase
  Issue Type: Improvement
  Components: ipc, master, shell
Affects Versions: 0.92.0
Reporter: Riley Patterson
Assignee: Riley Patterson
Priority: Minor
 Attachments: HBASE-4358-v2.patch, HBASE-4358.patch


 Currently, the RPC provides no way of

[jira] [Created] (HBASE-4361) Certain filter expressions fail in the shell

Certain filter expressions fail in the shell


 Key: HBASE-4361
 URL: https://issues.apache.org/jira/browse/HBASE-4361
 Project: HBase
  Issue Type: Bug
  Components: filters, shell
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Priority: Critical
 Fix For: 0.92.0


Running the following in the shell hangs and then fails:
{noformat}
scan 't1', { FILTER = SingleColumnValueFilter(, '1', 'f1', 'col_a') }
{noformat}
The error seems to be: org.jruby.exceptions.RaiseException: (NoMethodError) 
undefined method `write' for true:TrueClass

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4361) Certain filter expressions fail in the shell


[ 
https://issues.apache.org/jira/browse/HBASE-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101604#comment-13101604
 ] 

Todd Lipcon commented on HBASE-4361:


After hacking HBase to show a full stack trace:
{noformat}
org.jruby.exceptions.RaiseException: (NoMethodError) undefined method `write' 
for true:TrueClassorg.jruby.exceptions.RaiseException: (NoMethodError) 
undefined method `write' for true:TrueClass
at 
Hbase::Table.scan(/home/todd/git/hbase/bin/../bin/../src/main/ruby/hbase/table.rb:255)
at 
Shell::Commands::Scan.command(/home/todd/git/hbase/bin/../bin/../src/main/ruby/shell/commands/scan.rb:61)
at 
Shell::Commands::Scan.command_safe(/home/todd/git/hbase/bin/../bin/../src/main/ruby/shell/commands.rb:31)
at 
Shell::Commands::Command.translate_hbase_exceptions(/home/todd/git/hbase/bin/../bin/../src/main/ruby/shell/commands.rb:70)
at 
Shell::Commands::Command.command_safe(/home/todd/git/hbase/bin/../bin/../src/main/ruby/shell/commands.rb:31)
at 
Shell::Shell.command(/home/todd/git/hbase/bin/../bin/../src/main/ruby/shell.rb:106)
{noformat}

 Certain filter expressions fail in the shell
 

 Key: HBASE-4361
 URL: https://issues.apache.org/jira/browse/HBASE-4361
 Project: HBase
  Issue Type: Bug
  Components: filters, shell
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Priority: Critical
 Fix For: 0.92.0


 Running the following in the shell hangs and then fails:
 {noformat}
 scan 't1', { FILTER = SingleColumnValueFilter(, '1', 'f1', 'col_a') }
 {noformat}
 The error seems to be: org.jruby.exceptions.RaiseException: (NoMethodError) 
 undefined method `write' for true:TrueClass

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4362) SITE: Center logo


 [ 
https://issues.apache.org/jira/browse/HBASE-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4362:
-

Attachment: site.txt

 SITE: Center logo
 -

 Key: HBASE-4362
 URL: https://issues.apache.org/jira/browse/HBASE-4362
 Project: HBase
  Issue Type: Task
Reporter: stack
 Attachments: site.txt




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4354) track region history

[
https://issues.apache.org/jira/browse/HBASE-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101611#comment-13101611
]

Ming Ma commented on HBASE-4354:

Thanks, Stack, Andy. Writing the data to RegionHistory table in HBASE sounds
a good idea. The key point is to make it async as Andy said, or to handle
situation when RegionHistory isn't available.

1. Track the regions of RegionHistory. When the regions of RegionHistory
are moved around, the write to RegionHistory won't work.
2. Track the regions of -ROOT- and .META.. Ideally we would like to track
all regions including those for -ROOT-, .META.. In the case of cluster
startup, RegionHistory will be available after -ROOT-, .META..

So to make it work:

1. Make the logging async.
2. If we want to keep every entry even in the case of error like master
failover, make the logging reliable. For example, persist the data to zookeeper
or HDFS as buffer when RegionHistory isn't available.

We could also log it to another hbase cluster. But that will create operational
overheads, unless it can be combined with other metrics, logging scenarios (
like OpenTSDB ).

track region history

Key: HBASE-4354
URL: https://issues.apache.org/jira/browse/HBASE-4354
Project: HBase
Issue Type: New Feature
Components: master, metrics, regionserver
Reporter: Ming Ma
Assignee: Ming Ma

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4361) Certain filter expressions fail in the shell


[ 
https://issues.apache.org/jira/browse/HBASE-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101613#comment-13101613
 ] 

Todd Lipcon commented on HBASE-4361:


Several problems here:
1) I was using double-quotes twice, so it was passing true as the filter 
value. JRuby and its lovely lack of type checking then passed that through to 
the point where it tried to write true to the wire as a Writable, and failed.
2) The documentation for SingleColumnValueFilter has the incorrect order of 
arguments.
3) The errors given back by the filter parsing code are inscrutable.

 Certain filter expressions fail in the shell
 

 Key: HBASE-4361
 URL: https://issues.apache.org/jira/browse/HBASE-4361
 Project: HBase
  Issue Type: Bug
  Components: filters, shell
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Priority: Critical
 Fix For: 0.92.0


 Running the following in the shell hangs and then fails:
 {noformat}
 scan 't1', { FILTER = SingleColumnValueFilter(, '1', 'f1', 'col_a') }
 {noformat}
 The error seems to be: org.jruby.exceptions.RaiseException: (NoMethodError) 
 undefined method `write' for true:TrueClass

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4363) [replication] ReplicationSource won't close if failing to contact the sink

2011-09-09 Thread Jean-Daniel Cryans (JIRA)

[replication] ReplicationSource won't close if failing to contact the sink
--

 Key: HBASE-4363
 URL: https://issues.apache.org/jira/browse/HBASE-4363
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
 Fix For: 0.90.5


When trying to close a source, it will hang if it's already in shipEdits() and 
has issues reaching the sink. The reason is that in that method the while loop 
only checks if the RS is going down but not if the source was asked to shutdown.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4361) Certain filter expressions fail in the shell


[ 
https://issues.apache.org/jira/browse/HBASE-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101618#comment-13101618
 ] 

Todd Lipcon commented on HBASE-4361:


For reference, the correct way to specify this is:
scan 't1', { FILTER = SingleColumnValueFilter('f1', 'col_a', , 'binary:1') }

But I had to read the code for 30 minutes to figure it out. We need lots of 
docs updates on the filter language.


 Certain filter expressions fail in the shell
 

 Key: HBASE-4361
 URL: https://issues.apache.org/jira/browse/HBASE-4361
 Project: HBase
  Issue Type: Bug
  Components: filters, shell
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Priority: Critical
 Fix For: 0.92.0


 Running the following in the shell hangs and then fails:
 {noformat}
 scan 't1', { FILTER = SingleColumnValueFilter(, '1', 'f1', 'col_a') }
 {noformat}
 The error seems to be: org.jruby.exceptions.RaiseException: (NoMethodError) 
 undefined method `write' for true:TrueClass

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3130) [replication] ReplicationSource can't recover from session expired on remote clusters

2011-09-09 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101621#comment-13101621
 ] 

Jean-Daniel Cryans commented on HBASE-3130:
---

Looks like it got even worse recently, we got a situation where the 
SessionExpired was treated like if it was the RS's own and it FATAL'ed.

 [replication] ReplicationSource can't recover from session expired on remote 
 clusters
 -

 Key: HBASE-3130
 URL: https://issues.apache.org/jira/browse/HBASE-3130
 Project: HBase
  Issue Type: Bug
  Components: replication
Reporter: Jean-Daniel Cryans

 Currently ReplicationSource cannot recover when its zookeeper connection to 
 its remote cluster expires. HLogs are still being tracked, but a cluster 
 restart is required to continue replication (or a rolling restart).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-4362) SITE: Center logo


 [ 
https://issues.apache.org/jira/browse/HBASE-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-4362.
--

Resolution: Fixed
  Assignee: stack

Committed.  Updated site.

 SITE: Center logo
 -

 Key: HBASE-4362
 URL: https://issues.apache.org/jira/browse/HBASE-4362
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Attachments: site.txt




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4364) Column family pruning incorrectly prunes CFs referred to by filters


 [ 
https://issues.apache.org/jira/browse/HBASE-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HBASE-4364:
---

Affects Version/s: 0.92.0
   0.90.4

 Column family pruning incorrectly prunes CFs referred to by filters
 ---

 Key: HBASE-4364
 URL: https://issues.apache.org/jira/browse/HBASE-4364
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4, 0.92.0
Reporter: Todd Lipcon
Priority: Critical

 For a scan, if you select some set of columns using addColumns(), and then 
 apply a SingleColumnValueFilter that restricts the results based on some 
 other columns which aren't selected, and those non-selected columns are part 
 of a separate column family, then those filter conditions are ignored.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4364) Column family pruning incorrectly prunes CFs referred to by filters


[ 
https://issues.apache.org/jira/browse/HBASE-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101629#comment-13101629
 ] 

Todd Lipcon commented on HBASE-4364:


Example shell code to reproduce this:
{noformat}
create 't1', 'f1', f2'
put 't1', 'r1', 'f1:word', 'hello'
put 't1', 'r1', 'f2:word', 'bonjour'
put 't1', 'r2', 'f1:word', 'goodbye'
put 't1', 'r2', 'f2:word', 'au revoir'

# scan whole table, has 2 rows, each with 2 cols
scan 't1'
# scan selecting only one column - returns 2 distinct rows
scan 't1', { COLUMNS = ['f1:word'] }
# scan with a predicate of the french word  'b', returns 1 row
scan 't1', { FILTER = SingleColumnValueFilter('f2', 'word', , 'binary:b')  }
# scan with a predicate of the french word  'b', selecting only the english 
word
scan 't1', { COLUMNS = ['f1:word'], FILTER = SingleColumnValueFilter('f2', 
'word', , 'binary:b')  }
{noformat}

The incorrect result is as follows:
{noformat}
hbase(main):008:0 scan 't1'
ROWCOLUMN+CELL  
 
 r1column=f1:word, timestamp=1315608975212, 
value=hello  
 r1column=f2:word, timestamp=1315608975238, 
value=bonjour
 r2column=f1:word, timestamp=1315608975258, 
value=goodbye
 r2column=f2:word, timestamp=1315608975286, 
value=au revoir  
2 row(s) in 0.0270 seconds

hbase(main):009:0 scan 't1', { COLUMNS = ['f1:word'] }
ROWCOLUMN+CELL  
 
 r1column=f1:word, timestamp=1315608975212, 
value=hello  
 r2column=f1:word, timestamp=1315608975258, 
value=goodbye
2 row(s) in 0.0140 seconds

hbase(main):010:0 scan 't1', { FILTER = SingleColumnValueFilter('f2', 
'word', , 'binary:b')  }
ROWCOLUMN+CELL  
 
 r1column=f1:word, timestamp=1315608975212, 
value=hello  
 r1column=f2:word, timestamp=1315608975238, 
value=bonjour
1 row(s) in 0.0250 seconds

hbase(main):011:0 scan 't1', { COLUMNS = ['f1:word'], FILTER = 
SingleColumnValueFilter('f2', 'word', , 'binary:b')  }
ROWCOLUMN+CELL  
 
 r1column=f1:word, timestamp=1315608975212, 
value=hello  
 r2column=f1:word, timestamp=1315608975258, 
value=goodbye
2 row(s) in 0.0270 seconds  SHOULD NOT HAVE RETURNED ANY VALUE FOR r2!
{noformat}


 Column family pruning incorrectly prunes CFs referred to by filters
 ---

 Key: HBASE-4364
 URL: https://issues.apache.org/jira/browse/HBASE-4364
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4, 0.92.0
Reporter: Todd Lipcon
Priority: Critical

 For a scan, if you select some set of columns using addColumns(), and then 
 apply a SingleColumnValueFilter that restricts the results based on some 
 other columns which aren't selected, and those non-selected columns are part 
 of a separate column family, then those filter conditions are ignored.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4270) IOE ignored during flush-on-close causes dataloss

2011-09-09 Thread jirapos...@reviews.apache.org (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101630#comment-13101630
 ] 

jirapos...@reviews.apache.org commented on HBASE-4270:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1784/
---

Review request for hbase.


Summary
---

Todd wrote the patch for this issue.  Whats posted here is his patch plus a 
unit test.  The diff is pretty big because I refactored the 
TestOpenRegionHandler so I could share bits of it creating this new 
TestCloseRegionHandler; the bulk of the patch is making shared mock server and 
shared mock regionserverservice files.


This addresses bug hbase-4270.
https://issues.apache.org/jira/browse/hbase-4270


Diffs
-

  
src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java
 b684af2 
  
src/test/java/org/apache/hadoop/hbase/regionserver/handler/MockRegionServerServices.java
 PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/regionserver/handler/MockServer.java 
PRE-CREATION 
  
src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
 PRE-CREATION 
  
src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestOpenRegionHandler.java
 ab12968 

Diff: https://reviews.apache.org/r/1784/diff


Testing
---

I ran the new TestCloseRegionHandler test.


Thanks,

Michael



 IOE ignored during flush-on-close causes dataloss
 -

 Key: HBASE-4270
 URL: https://issues.apache.org/jira/browse/HBASE-4270
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4, 0.92.0
Reporter: Todd Lipcon
Priority: Blocker
 Fix For: 0.92.0

 Attachments: 
 0001-HBASE-4270.-Abort-and-rethrow-errors-on-close-failur.patch


 If the RS experiences an exception during the flush of a region while closing 
 it, it currently catches the exception, logs a warning, and keeps going. If 
 the exception was a DroppedSnapshotException, this means that it will 
 silently drop any data that was in memstore when the region was closed.
 Instead, the RS should do a hard abort so that its logs will be replayed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4364) Column family pruning incorrectly prunes CFs referred to by filters


[ 
https://issues.apache.org/jira/browse/HBASE-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101632#comment-13101632
 ] 

Todd Lipcon commented on HBASE-4364:


Actually, it turns out this isn't due to column family pruning - the same 
behavior occurs even with just one column family:

{noformat}

create 't2', 'f'
put 't2', 'r1', 'f:e_word', 'hello'
put 't2', 'r1', 'f:f_word', 'bonjour'
put 't2', 'r2', 'f:e_word', 'goodbye'
put 't2', 'r2', 'f:f_word', 'au revoir'
scan 't2'
# scan selecting only one column - returns 2 distinct rows
scan 't2', { COLUMNS = ['f:e_word'] }
# scan with a predicate of the french word  'b', returns 1 row
scan 't2', { FILTER = SingleColumnValueFilter('f', 'f_word', , 'binary:b')  
}
# scan with a predicate of the french word  'b', selecting only the english 
word
scan 't2', { COLUMNS = ['f:e_word'], FILTER = SingleColumnValueFilter('f', 
'e_word', , 'binary:b')  }
{noformat}

 Column family pruning incorrectly prunes CFs referred to by filters
 ---

 Key: HBASE-4364
 URL: https://issues.apache.org/jira/browse/HBASE-4364
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4, 0.92.0
Reporter: Todd Lipcon
Priority: Critical

 For a scan, if you select some set of columns using addColumns(), and then 
 apply a SingleColumnValueFilter that restricts the results based on some 
 other columns which aren't selected, and those non-selected columns are part 
 of a separate column family, then those filter conditions are ignored.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4364) Filters applied to rows not in the selected column list are ignored


 [ 
https://issues.apache.org/jira/browse/HBASE-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HBASE-4364:
---

Summary: Filters applied to rows not in the selected column list are 
ignored  (was: Column family pruning incorrectly prunes CFs referred to by 
filters)

 Filters applied to rows not in the selected column list are ignored
 ---

 Key: HBASE-4364
 URL: https://issues.apache.org/jira/browse/HBASE-4364
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4, 0.92.0
Reporter: Todd Lipcon
Priority: Critical

 For a scan, if you select some set of columns using addColumns(), and then 
 apply a SingleColumnValueFilter that restricts the results based on some 
 other columns which aren't selected, and those non-selected columns are part 
 of a separate column family, then those filter conditions are ignored.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4270) IOE ignored during flush-on-close causes dataloss


 [ 
https://issues.apache.org/jira/browse/HBASE-4270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4270:
-

Status: Patch Available  (was: Open)

Marking patch available.

 IOE ignored during flush-on-close causes dataloss
 -

 Key: HBASE-4270
 URL: https://issues.apache.org/jira/browse/HBASE-4270
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4, 0.92.0
Reporter: Todd Lipcon
Priority: Blocker
 Fix For: 0.92.0

 Attachments: 
 0001-HBASE-4270.-Abort-and-rethrow-errors-on-close-failur.patch


 If the RS experiences an exception during the flush of a region while closing 
 it, it currently catches the exception, logs a warning, and keeps going. If 
 the exception was a DroppedSnapshotException, this means that it will 
 silently drop any data that was in memstore when the region was closed.
 Instead, the RS should do a hard abort so that its logs will be replayed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-4364) Column family pruning incorrectly prunes CFs referred to by filters


[ 
https://issues.apache.org/jira/browse/HBASE-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101632#comment-13101632
 ] 

Todd Lipcon edited comment on HBASE-4364 at 9/9/11 11:08 PM:
-

Actually, it turns out this isn't due to column family pruning - the same 
behavior occurs even with just one column family:

{noformat}

create 't2', 'f'
put 't2', 'r1', 'f:e_word', 'hello'
put 't2', 'r1', 'f:f_word', 'bonjour'
put 't2', 'r2', 'f:e_word', 'goodbye'
put 't2', 'r2', 'f:f_word', 'au revoir'
scan 't2'
# scan selecting only one column - returns 2 distinct rows
scan 't2', { COLUMNS = ['f:e_word'] }
# scan with a predicate of the french word  'b', returns 1 row
scan 't2', { FILTER = SingleColumnValueFilter('f', 'f_word', , 'binary:b')  
}
# scan with a predicate of the french word  'b', selecting only the english 
word
scan 't2', { COLUMNS = ['f:e_word'], FILTER = SingleColumnValueFilter('f', 
'f_word', , 'binary:b')  }
{noformat}

  was (Author: tlipcon):
Actually, it turns out this isn't due to column family pruning - the same 
behavior occurs even with just one column family:

{noformat}

create 't2', 'f'
put 't2', 'r1', 'f:e_word', 'hello'
put 't2', 'r1', 'f:f_word', 'bonjour'
put 't2', 'r2', 'f:e_word', 'goodbye'
put 't2', 'r2', 'f:f_word', 'au revoir'
scan 't2'
# scan selecting only one column - returns 2 distinct rows
scan 't2', { COLUMNS = ['f:e_word'] }
# scan with a predicate of the french word  'b', returns 1 row
scan 't2', { FILTER = SingleColumnValueFilter('f', 'f_word', , 'binary:b')  
}
# scan with a predicate of the french word  'b', selecting only the english 
word
scan 't2', { COLUMNS = ['f:e_word'], FILTER = SingleColumnValueFilter('f', 
'e_word', , 'binary:b')  }
{noformat}
  
 Column family pruning incorrectly prunes CFs referred to by filters
 ---

 Key: HBASE-4364
 URL: https://issues.apache.org/jira/browse/HBASE-4364
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4, 0.92.0
Reporter: Todd Lipcon
Priority: Critical

 For a scan, if you select some set of columns using addColumns(), and then 
 apply a SingleColumnValueFilter that restricts the results based on some 
 other columns which aren't selected, and those non-selected columns are part 
 of a separate column family, then those filter conditions are ignored.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4361) Certain filter expressions fail in the shell


 [ 
https://issues.apache.org/jira/browse/HBASE-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HBASE-4361:
---

Attachment: small-improvements.txt

here are a few improvements. Still need to fix the docs, etc, to be correct. 
Ideally IMO the filter parsing would be done by javacc or antlr so we'd have a 
real grammar.

 Certain filter expressions fail in the shell
 

 Key: HBASE-4361
 URL: https://issues.apache.org/jira/browse/HBASE-4361
 Project: HBase
  Issue Type: Bug
  Components: filters, shell
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Priority: Critical
 Fix For: 0.92.0

 Attachments: small-improvements.txt


 Running the following in the shell hangs and then fails:
 {noformat}
 scan 't1', { FILTER = SingleColumnValueFilter(, '1', 'f1', 'col_a') }
 {noformat}
 The error seems to be: org.jruby.exceptions.RaiseException: (NoMethodError) 
 undefined method `write' for true:TrueClass

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4364) Filters applied to columns not in the selected column list are ignored


 [ 
https://issues.apache.org/jira/browse/HBASE-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HBASE-4364:
---

Component/s: filters
Description: For a scan, if you select some set of columns using 
addColumns(), and then apply a SingleColumnValueFilter that restricts the 
results based on some other columns which aren't selected, then those filter 
conditions are ignored.  (was: For a scan, if you select some set of columns 
using addColumns(), and then apply a SingleColumnValueFilter that restricts the 
results based on some other columns which aren't selected, and those 
non-selected columns are part of a separate column family, then those filter 
conditions are ignored.)
Summary: Filters applied to columns not in the selected column list are 
ignored  (was: Filters applied to rows not in the selected column list are 
ignored)

Updated description to reflect the above: this is a general issue, not related 
to CFs.

 Filters applied to columns not in the selected column list are ignored
 --

 Key: HBASE-4364
 URL: https://issues.apache.org/jira/browse/HBASE-4364
 Project: HBase
  Issue Type: Bug
  Components: filters
Affects Versions: 0.90.4, 0.92.0
Reporter: Todd Lipcon
Priority: Critical

 For a scan, if you select some set of columns using addColumns(), and then 
 apply a SingleColumnValueFilter that restricts the results based on some 
 other columns which aren't selected, then those filter conditions are ignored.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4243) HADOOP_HOME should be auto-detected


[ 
https://issues.apache.org/jira/browse/HBASE-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101637#comment-13101637
 ] 

stack commented on HBASE-4243:
--

np.  Linux is usually a safe bet but then you also have to make sure it works 
on the desktop machine a bunch of us whiney engineers use.  I saw this hunting 
around for shell portable readlink: 
http://stackoverflow.com/questions/1055671/how-can-i-get-the-behavior-of-gnus-readlink-f-on-a-mac
  Maybe it'll help?

 HADOOP_HOME should be auto-detected
 ---

 Key: HBASE-4243
 URL: https://issues.apache.org/jira/browse/HBASE-4243
 Project: HBase
  Issue Type: Improvement
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik
Priority: Minor
 Attachments: HBASE-4243.patch.txt


 Now that HBASE-3465 has been integrated, perhaps we should try to auto-detect 
 the HADOOP_HOME setting if it is not given explicitly. Something along the 
 lines of:
 {noformat}
 # check for hadoop in the path
 141   HADOOP_IN_PATH=`which hadoop 2/dev/null`
 142   if [ -f ${HADOOP_IN_PATH} ]; then
 143 HADOOP_DIR=`dirname $HADOOP_IN_PATH`/..
 144   fi
 145   # HADOOP_HOME env variable overrides hadoop in the path
 146   HADOOP_HOME=${HADOOP_HOME:-$HADOOP_DIR}
 147   if [ $HADOOP_HOME ==  ]; then
 148 echo Cannot find hadoop installation: \$HADOOP_HOME must be set or 
 hadoop must be in the path;
 149 exit 4;
 150   fi
 {noformat}
 Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4358) Batch Table Alter Operations

2011-09-09 Thread jirapos...@reviews.apache.org (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101640#comment-13101640
 ] 

jirapos...@reviews.apache.org commented on HBASE-4358:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1768/#review1847
---



/src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java
https://reviews.apache.org/r/1768/#comment4220

Riley:
Thanks for the detailed explanation.
I should have patched your patch locally and performed the drill down.

There're 3 Arrays.asList() calls in patch v2.

If you don't have time, I can change them before committing.

Thanks for the nice work.


- Ted


On 2011-09-09 21:23:14, Riley Patterson wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1768/
bq.  ---
bq.  
bq.  (Updated 2011-09-09 21:23:14)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Currently, the RPC provides no way of asking for several table alterations 
at once, and the master has no way of batch handling alter requests. Thus, when 
the user requests several changes at the same time (i.e. add these I columns, 
delete these J columns, and modify these K columns), each region is brought 
down (I+J+K) times so that it can reflect the new schema. Additionally, 
multiple writes are made to META, and multiple RPC calls must be made.
bq.  
bq.  This patch provides batching for these operations, both at the RPC level 
and within the Master's TableEventHandlers. This involves a bit of 
reorganization in the TableEventHandler class hierarchy, and a new 
TableEventHandler, TableMultiFamilyHandler. The net effect ends up being the 
difference seen here:
bq.  
bq.  Before patch:
bq.  hbase(main):001:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', 
NAME = 'name'}
bq.  Updating all regions with the new schema...
bq.  1/1 regions updated.
bq.  Done.
bq.  Updating all regions with the new schema...
bq.  1/1 regions updated.
bq.  Done.
bq.  0 row(s) in 2.6450 seconds
bq.  
bq.  After patch:
bq.  hbase(main):002:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', 
NAME = 'name'}
bq.  Updating all regions with the new schema...
bq.  1/1 regions updated.
bq.  Done.
bq.  0 row(s) in 1.1930 seconds
bq.  
bq.  Regions are only brought down once, and the duration is cut 1/N.
bq.  
bq.  
bq.  This addresses bug HBASE-4358.
bq.  https://issues.apache.org/jira/browse/HBASE-4358
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq./src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java 1166933 
bq./src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java 
1166933 
bq./src/main/java/org/apache/hadoop/hbase/coprocessor/MasterObserver.java 
1166933 
bq./src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java 
1166933 
bq./src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 1166933 
bq./src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1166933 
bq./src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java 
1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java
 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableDeleteFamilyHandler.java
 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableFamilyHandler.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java
 1166933 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableMultiFamilyHandler.java
 PRE-CREATION 
bq./src/main/ruby/hbase/admin.rb 1166933 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java 
1166933 
bq.  
bq.  Diff: https://reviews.apache.org/r/1768/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Sanity checked functionality in psuedo-distributed mode (tried several 
permutations of different alterations, all completed successfully and with only 
one round of region restarts). Ran all unit tests successfully.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Riley
bq.  
bq.



 Batch Table Alter Operations
 

 Key: HBASE-4358
 URL: https://issues.apache.org/jira/browse/HBASE-4358
 Project: HBase
  Issue Type: Improvement
  Components: ipc, master, shell
Affects Versions: 0.92.0
Reporter: Riley Patterson
Assignee: Riley Patterson

[jira] [Commented] (HBASE-4364) Filters applied to columns not in the selected column list are ignored


[ 
https://issues.apache.org/jira/browse/HBASE-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101641#comment-13101641
 ] 

Todd Lipcon commented on HBASE-4364:


Apparently this is actually known behavior according to 
SingleColumnValueFilter. From the JavaDoc:
{noformat}
When using this filter on a {@link Scan} with specified
 * inputs, the column to be tested should also be added as input (otherwise
 * the filter will regard the column as missing).
{noformat}
IMO, it's a bug, though, not a feature! Filters with requirements like this 
should automatically push their column requirements through to the 
ExplicitColumnTracker.

 Filters applied to columns not in the selected column list are ignored
 --

 Key: HBASE-4364
 URL: https://issues.apache.org/jira/browse/HBASE-4364
 Project: HBase
  Issue Type: Bug
  Components: filters
Affects Versions: 0.90.4, 0.92.0
Reporter: Todd Lipcon
Priority: Critical

 For a scan, if you select some set of columns using addColumns(), and then 
 apply a SingleColumnValueFilter that restricts the results based on some 
 other columns which aren't selected, then those filter conditions are ignored.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2195) Support cyclic replication

2011-09-09 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101647#comment-13101647
 ] 

Jean-Daniel Cryans commented on HBASE-2195:
---

I guess we could add some wits... maybe even verify beforehand the definition 
of each table and see if there's a problem.

 Support cyclic replication
 --

 Key: HBASE-2195
 URL: https://issues.apache.org/jira/browse/HBASE-2195
 Project: HBase
  Issue Type: Sub-task
  Components: replication
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
 Fix For: 0.92.0

 Attachments: 2195-v10.txt, 2195-v12.txt, 2195-v13.txt, 2195-v14.txt, 
 2195-v5.txt, 2195-v6.txt, 2195.txt


 We need to support cyclic replication by using the cluster id of each HlogKey 
 and stop replicating when it goes back to the original cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4358) Batch Table Alter Operations


 [ 
https://issues.apache.org/jira/browse/HBASE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4358:
--

Attachment: 4358-v3.txt

Patch version 3 removes Arrays.asList() calls.

 Batch Table Alter Operations
 

 Key: HBASE-4358
 URL: https://issues.apache.org/jira/browse/HBASE-4358
 Project: HBase
  Issue Type: Improvement
  Components: ipc, master, shell
Affects Versions: 0.92.0
Reporter: Riley Patterson
Assignee: Riley Patterson
Priority: Minor
 Attachments: 4358-v3.txt, HBASE-4358-v2.patch, HBASE-4358.patch


 Currently, the RPC provides no way of asking for several table alterations at 
 once, and the master has no way of batch handling alter requests. Thus, when 
 the user requests several changes at the same time (i.e. add these I columns, 
 delete these J columns, and modify these K columns), each region is brought 
 down (I+J+K) times so that it can reflect the new schema. Additionally, 
 multiple writes are made to META, and multiple RPC calls must be made.
 This patch provides batching for these operations, both at the RPC level and 
 within the Master's TableEventHandlers. This involves a bit of reorganization 
 in the TableEventHandler class hierarchy, and a new TableEventHandler, 
 TableMultiFamilyHandler. The net effect ends up being the difference seen 
 here:
 Before patch:
 hbase(main):001:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', NAME 
 = 'name'}
 Updating all regions with the new schema...
 1/1 regions updated.
 Done.
 Updating all regions with the new schema...
 1/1 regions updated.
 Done.
 0 row(s) in 2.6450 seconds
 After patch:
 hbase(main):002:0 alter 'peeps', {NAME = 'rawr'}, {METHOD = 'delete', NAME 
 = 'name'}
 Updating all regions with the new schema...
 1/1 regions updated.
 Done.
 0 row(s) in 1.1930 seconds
 Regions are only brought down once, and the duration is cut 1/N.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4354) track region history

[
https://issues.apache.org/jira/browse/HBASE-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101659#comment-13101659
]

Ming Ma commented on HBASE-4354:

Thanks, Todd. Yes, interface is good to abstract various implementations.

I was about to open a separate jira dynamic metrics logging for a more
general strutured data logging infracture, something useful to collect
hbase/mapreduce/hdfs dynamic metrics which aren't predefined and could change
over time. It seems like region transaction history could an application for
that system.

track region history

Key: HBASE-4354
URL: https://issues.apache.org/jira/browse/HBASE-4354
Project: HBase
Issue Type: New Feature
Components: master, metrics, regionserver
Reporter: Ming Ma
Assignee: Ming Ma

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-4331) Bypassing default actions in prePut fails sometimes with HTable client

2011-09-09 Thread Gary Helmling (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Helmling resolved HBASE-4331.
--

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Committed to trunk.  Thanks for the patch Lars.

 Bypassing default actions in prePut fails sometimes with HTable client
 --

 Key: HBASE-4331
 URL: https://issues.apache.org/jira/browse/HBASE-4331
 Project: HBase
  Issue Type: Bug
  Components: coprocessors
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0

 Attachments: 4331-v2.txt, 4331-v3.txt, 4331-v4.txt, 4331.txt


 While testing some other scenario I found calling 
 CoprocessorEnvironment.bypass() fails if all trailing puts in a batch are 
 bypassed that way. By extension a single bypassed put will also fail.
 The problem is that the puts are removed from the batch in a way that does 
 not align them with the result-status, and in addition the result is never 
 marked as success.
 A possible fix is to just mark bypassed puts as SUCCESS and filter them in 
 the following logic.
 (I also contemplated a new BYPASSED OperationStatusCode, but that turned out 
 to be not necessary).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4365) Add a decent heuristic for region size

Add a decent heuristic for region size
--

 Key: HBASE-4365
 URL: https://issues.apache.org/jira/browse/HBASE-4365
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.94.0
Reporter: Todd Lipcon


A few of us were brainstorming this morning about what the default region size 
should be. There were a few general points made:
- in some ways it's better to be too-large than too-small, since you can always 
split a table further, but you can't merge regions currently
- with HFile v2 and multithreaded compactions there are fewer reasons to avoid 
very-large regions (10GB+)
- for small tables you may want a small region size just so you can distribute 
load better across a cluster
- for big tables, multi-GB is probably best

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4209) The HBase hbase-daemon.sh SIGKILLs master when stopping it

2011-09-09 Thread Roman Shaposhnik (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101674#comment-13101674
]

Roman Shaposhnik commented on HBASE-4209:
-

stack, I'm sorry for putting this on a backburner, but at least I now have a
better understanding of what's going on.

Basically I got confused in a situation where suppressHdfsShutdownHook would be
called multiple times on the same filesystem object. The first call would
succeed, but all the other ones would fail. This is, obviously, just a problem
with my patch, not the minihdfs cluster. I'll cook up an alternative and once I
run the tests will attach an updated version.

P.S. Thanks for the encouragement!

The HBase hbase-daemon.sh SIGKILLs master when stopping it
--

Key: HBASE-4209
URL: https://issues.apache.org/jira/browse/HBASE-4209
Project: HBase
Issue Type: Bug
Components: master
Reporter: Roman Shaposhnik

There's a bit of code in hbase-daemon.sh that makes HBase master being
SIGKILLed when stopping it rather than trying SIGTERM (like it does for other
daemons). When HBase is executed in a standalone mode (and the only daemon
you need to run is master) that causes newly created tables to go missing as
unflushed data is thrown out. If there was not a good reason to kill master
with SIGKILL perhaps we can take that special case out and rely on SIGTERM.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4366) dynamic metrics logging

dynamic metrics logging
---

 Key: HBASE-4366
 URL: https://issues.apache.org/jira/browse/HBASE-4366
 Project: HBase
  Issue Type: New Feature
  Components: metrics
Reporter: Ming Ma
Assignee: Ming Ma


First, if there is existing solution for this, I would close this jira. Also I 
realize we already have various overlapping solutions; creating another 
solution isn't necessarily the best approach. However, I couldn't find anything 
that can meet the need. So open this jira for discussion.

We have some scenarios in hbase/mapreduce/hdfs that requires logging large 
number of dynamic metrics. They can be used for troubleshooting, better 
measurement on the system and scorecard. For example,
 
1.HBase. Get metrics such as request per sec that are specific to a table, or 
column family.
2.Mapreduce Job history analysis. Would like to found out all the job ids that 
are submitted, completed, etc. in a specific time window.

For troubleshooting, what people usually do today, 1) Use current machine-level 
metrics to find out which machine has the issue. 2) go to that machine, 
analysis the local log.



The characteristics of such kind of metrics:
 
1.It isn't something that can be predefined. The key such as table name, job id 
is dynamic.
2.The number of such metrics could be much larger than what the current metrics 
framework can handle.
3.We don't have a scenario that require near real time query support, e.g., 
from the time the metrics is generated to the time it is available to query can 
be at like an hour.
4.How data is consumed is highly application specific.

Some ideas:

1. Provide some interface for any application to log data.
2. The metrics can be written to log files. The log files or log entries will 
be loaded to HBase, or HDFS asynchronously. That could go to a separate cluster.
3. To consume such data, application could run map reduce job on the log files 
for aggregation, or do random read directly from HBase.


Comments?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4347) Remove duplicated code from Put, Delete, Get, Scan, MultiPut


 [ 
https://issues.apache.org/jira/browse/HBASE-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-4347:
-

Attachment: 4347-v3.txt

New patch. Passes all test.
(That is, the few tests that fail locally, fail with or without the patch).


 Remove duplicated code from Put, Delete, Get, Scan, MultiPut
 

 Key: HBASE-4347
 URL: https://issues.apache.org/jira/browse/HBASE-4347
 Project: HBase
  Issue Type: Improvement
  Components: util
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Priority: Minor
 Fix For: 0.92.0

 Attachments: 4347-v2.txt, 4347-v3.txt, 4347.txt


 This came from discussion with Stack w.r.t. HBASE-2195.
 There is currently a lot of duplicated code especially between Put and 
 Delete, and also between all Operations.
 For example all of Put/Delete/Get/Scan have attributes with exactly the same 
 code in all classes.
 Put and Delete also have the familyMap, Row, Rowlock, Timestamp, etc.
 One way to do this is to introduce OperationWithAttributes which extends 
 Operation, and have Put/Delete/Get/Scan extend that rather than Operation.
 In addition Put and Delete could extends from Mutation (which itself would 
 extend OperationWithAttributes).
 If a static inheritance hierarchy is not desired here, we can use delegation.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4362) SITE: Center logo


[ 
https://issues.apache.org/jira/browse/HBASE-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101728#comment-13101728
 ] 

Hudson commented on HBASE-4362:
---

Integrated in HBase-TRUNK #2195 (See 
[https://builds.apache.org/job/HBase-TRUNK/2195/])
HBASE-4362 Center logo

stack : 
Files : 
* /hbase/trunk/src/site/resources/css/site.css
* /hbase/trunk/src/site/site.vm


 SITE: Center logo
 -

 Key: HBASE-4362
 URL: https://issues.apache.org/jira/browse/HBASE-4362
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Attachments: site.txt




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-4347) Remove duplicated code from Put, Delete, Get, Scan, MultiPut


 [ 
https://issues.apache.org/jira/browse/HBASE-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl reassigned HBASE-4347:


Assignee: Lars Hofhansl

 Remove duplicated code from Put, Delete, Get, Scan, MultiPut
 

 Key: HBASE-4347
 URL: https://issues.apache.org/jira/browse/HBASE-4347
 Project: HBase
  Issue Type: Improvement
  Components: util
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.92.0

 Attachments: 4347-v2.txt, 4347-v3.txt, 4347.txt


 This came from discussion with Stack w.r.t. HBASE-2195.
 There is currently a lot of duplicated code especially between Put and 
 Delete, and also between all Operations.
 For example all of Put/Delete/Get/Scan have attributes with exactly the same 
 code in all classes.
 Put and Delete also have the familyMap, Row, Rowlock, Timestamp, etc.
 One way to do this is to introduce OperationWithAttributes which extends 
 Operation, and have Put/Delete/Get/Scan extend that rather than Operation.
 In addition Put and Delete could extends from Mutation (which itself would 
 extend OperationWithAttributes).
 If a static inheritance hierarchy is not desired here, we can use delegation.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4340) Hbase can't balance.


[ 
https://issues.apache.org/jira/browse/HBASE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101815#comment-13101815
 ] 

Ted Yu commented on HBASE-4340:
---

Can you prepare patch for TRUNK as well ?
I think 0.90 branch and TRUNK should be kept in sync.

 Hbase can't balance.
 

 Key: HBASE-4340
 URL: https://issues.apache.org/jira/browse/HBASE-4340
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: gaojinchao
Assignee: gaojinchao
 Fix For: 0.90.5

 Attachments: HBASE-4340_branch90.patch


 Version: 0.90.4
 Cluster : 40 boxes
 As I saw below logs. It said that balance couldn't work because of a dead RS.
 I dug deeply and found two issues:
 1.   shutdownhandler didn't clear numProcessing deal with some 
 exceptions. It seems whatever exceptions we should clear the flag or close 
 master.
 2.   dead regionserver(s): [158-1-130-12,20020,1314971097929] is 
 inaccurate. The dead sever should be  158-1-130-10,20020,1315068597979
 //master logs:
 2011-09-05 00:28:00,487 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 00:33:00,489 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 00:38:00,493 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 00:43:00,495 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 00:48:00,499 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 00:53:00,501 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 00:58:00,501 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:03:00,502 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:08:00,506 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:13:00,508 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:18:00,512 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:23:00,514 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:28:00,518 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:33:00,520 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:38:00,524 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:43:00,526 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:48:00,530 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:53:00,532 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:58:00,536 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 02:03:00,537 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 02:08:00,538 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 02:13:00,539 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 02:18:00,543 DEBUG

[jira] [Updated] (HBASE-4340) Hbase can't balance if ServerShutdownHandler encountered exception


 [ 
https://issues.apache.org/jira/browse/HBASE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4340:
--

Comment: was deleted

(was: Can you prepare patch for TRUNK as well ?
I think 0.90 branch and TRUNK should be kept in sync.)

 Hbase can't balance if ServerShutdownHandler encountered exception
 --

 Key: HBASE-4340
 URL: https://issues.apache.org/jira/browse/HBASE-4340
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: gaojinchao
Assignee: gaojinchao
 Fix For: 0.90.5

 Attachments: HBASE-4340_branch90.patch


 Version: 0.90.4
 Cluster : 40 boxes
 As I saw below logs. It said that balance couldn't work because of a dead RS.
 I dug deeply and found two issues:
 1.   shutdownhandler didn't clear numProcessing deal with some 
 exceptions. It seems whatever exceptions we should clear the flag or close 
 master.
 2.   dead regionserver(s): [158-1-130-12,20020,1314971097929] is 
 inaccurate. The dead sever should be  158-1-130-10,20020,1315068597979
 //master logs:
 2011-09-05 00:28:00,487 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 00:33:00,489 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 00:38:00,493 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 00:43:00,495 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 00:48:00,499 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 00:53:00,501 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 00:58:00,501 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:03:00,502 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:08:00,506 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:13:00,508 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:18:00,512 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:23:00,514 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:28:00,518 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:33:00,520 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:38:00,524 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:43:00,526 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:48:00,530 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:53:00,532 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:58:00,536 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 02:03:00,537 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 02:08:00,538 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 02:13:00,539 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s):

[jira] [Updated] (HBASE-4330) Fix races in slab cache

2011-09-09 Thread Li Pi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Pi updated HBASE-4330:
-

Attachment: hbase-4330v6.txt

Fixed race condition leading to the test failure.

 Fix races in slab cache
 ---

 Key: HBASE-4330
 URL: https://issues.apache.org/jira/browse/HBASE-4330
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Li Pi
 Fix For: 0.92.0

 Attachments: hbase-4330.txt, hbase-4330.txt, hbase-4330v3.txt, 
 hbase-4330v4.txt, hbase-4330v5.txt, hbase-4330v6.txt


 A few races are still lingering in the slab cache. Here are some tests and 
 proposed fixes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4340) Hbase can't balance if ServerShutdownHandler encountered exception


 [ 
https://issues.apache.org/jira/browse/HBASE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4340:
--

Summary: Hbase can't balance if ServerShutdownHandler encountered exception 
 (was: Hbase can't balance.)

 Hbase can't balance if ServerShutdownHandler encountered exception
 --

 Key: HBASE-4340
 URL: https://issues.apache.org/jira/browse/HBASE-4340
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: gaojinchao
Assignee: gaojinchao
 Fix For: 0.90.5

 Attachments: HBASE-4340_branch90.patch


 Version: 0.90.4
 Cluster : 40 boxes
 As I saw below logs. It said that balance couldn't work because of a dead RS.
 I dug deeply and found two issues:
 1.   shutdownhandler didn't clear numProcessing deal with some 
 exceptions. It seems whatever exceptions we should clear the flag or close 
 master.
 2.   dead regionserver(s): [158-1-130-12,20020,1314971097929] is 
 inaccurate. The dead sever should be  158-1-130-10,20020,1315068597979
 //master logs:
 2011-09-05 00:28:00,487 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 00:33:00,489 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 00:38:00,493 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 00:43:00,495 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 00:48:00,499 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 00:53:00,501 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 00:58:00,501 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:03:00,502 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:08:00,506 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:13:00,508 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:18:00,512 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:23:00,514 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:28:00,518 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:33:00,520 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:38:00,524 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:43:00,526 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:48:00,530 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:53:00,532 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:58:00,536 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 02:03:00,537 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 02:08:00,538 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 02:13:00,539 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s):

[jira] [Updated] (HBASE-4340) Hbase can't balance if ServerShutdownHandler encountered exception


 [ 
https://issues.apache.org/jira/browse/HBASE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4340:
--

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

 Hbase can't balance if ServerShutdownHandler encountered exception
 --

 Key: HBASE-4340
 URL: https://issues.apache.org/jira/browse/HBASE-4340
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: gaojinchao
Assignee: gaojinchao
 Fix For: 0.90.5

 Attachments: HBASE-4340_branch90.patch


 Version: 0.90.4
 Cluster : 40 boxes
 As I saw below logs. It said that balance couldn't work because of a dead RS.
 I dug deeply and found two issues:
 1.   shutdownhandler didn't clear numProcessing deal with some 
 exceptions. It seems whatever exceptions we should clear the flag or close 
 master.
 2.   dead regionserver(s): [158-1-130-12,20020,1314971097929] is 
 inaccurate. The dead sever should be  158-1-130-10,20020,1315068597979
 //master logs:
 2011-09-05 00:28:00,487 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 00:33:00,489 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 00:38:00,493 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 00:43:00,495 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 00:48:00,499 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 00:53:00,501 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 00:58:00,501 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:03:00,502 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:08:00,506 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:13:00,508 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:18:00,512 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:23:00,514 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:28:00,518 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:33:00,520 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:38:00,524 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:43:00,526 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:48:00,530 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:53:00,532 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 01:58:00,536 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 02:03:00,537 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 02:08:00,538 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s): 
 [158-1-130-12,20020,1314971097929]
 2011-09-05 02:13:00,539 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because processing dead regionserver(s):

[jira] [Commented] (HBASE-4330) Fix races in slab cache