[jira] [Issue Comment Edited] (HBASE-4374) Up default regions size from 256M to 1G

2011-09-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103397#comment-13103397
 ] 

Andrew Purtell edited comment on HBASE-4374 at 9/13/11 6:25 AM:


If we can get online schema edits into 0.92, at least metadata like HTD/HCD 
attributes, then users can easily change the region size split threshold as 
tables grow. Can add a ruby script helper for this purpose in bin or shell 
support. Does not take away from talking up presplitting but provides a good 
alternative if presplitting is not an option for whatever reason (e.g. keyspace 
distribution not well known). 

  was (Author: apurtell):
If we can get online schema edits into 0.92, at least metadata like HTD/HCD 
attributes, then users can easily change the split points as tables grow. Can 
add a ruby script helper for this purpose in bin or shell support. Does not 
take away from talking up presplitting but provides a good alternative if 
presplitting is not an option for whatever reason (e.g. keyspace distribution 
not well known). 
  
 Up default regions size from 256M to 1G
 ---

 Key: HBASE-4374
 URL: https://issues.apache.org/jira/browse/HBASE-4374
 Project: HBase
  Issue Type: Task
Reporter: stack
Priority: Blocker
 Fix For: 0.92.0


 HBASE-4365 has some discussion of why we default for a table should tend to 
 fewer bigger regions.  It doesn't look like this issue will be done for 0.92. 
  For 0.92, lets up default region size from 256M to 1G and talk up pre-split 
 on table creation in manual.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4374) Up default regions size from 256M to 1G

2011-09-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103397#comment-13103397
 ] 

Andrew Purtell commented on HBASE-4374:
---

If we can get online schema edits into 0.92, at least metadata like HTD/HCD 
attributes, then users can easily change the split points as tables grow. Can 
add a ruby script helper for this purpose in bin or shell support. Does not 
take away from talking up presplitting but provides a good alternative if 
presplitting is not an option for whatever reason (e.g. keyspace distribution 
not well known). 

 Up default regions size from 256M to 1G
 ---

 Key: HBASE-4374
 URL: https://issues.apache.org/jira/browse/HBASE-4374
 Project: HBase
  Issue Type: Task
Reporter: stack
Priority: Blocker
 Fix For: 0.92.0


 HBASE-4365 has some discussion of why we default for a table should tend to 
 fewer bigger regions.  It doesn't look like this issue will be done for 0.92. 
  For 0.92, lets up default region size from 256M to 1G and talk up pre-split 
 on table creation in manual.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4238) CatalogJanitor can clear a daughter that split before processing its parent

2011-09-13 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4238:
-

Attachment: 4238-v2.txt

 CatalogJanitor can clear a daughter that split before processing its parent
 ---

 Key: HBASE-4238
 URL: https://issues.apache.org/jira/browse/HBASE-4238
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: stack
Priority: Critical
 Fix For: 0.92.0, 0.90.5

 Attachments: 4238-v2.txt, 4238.txt


 I didn't dig a lot into this issue, but by splitting a table twice in a row I 
 was able to trigger a situation where a daughter of the first split was 
 deleted by the CatalogJanitor before it processed its parent. Will post log 
 in a comment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4238) CatalogJanitor can clear a daughter that split before processing its parent

2011-09-13 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4238:
-

Status: Patch Available  (was: Open)

Submitting patch.  Review J-D?

 CatalogJanitor can clear a daughter that split before processing its parent
 ---

 Key: HBASE-4238
 URL: https://issues.apache.org/jira/browse/HBASE-4238
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: stack
Priority: Critical
 Fix For: 0.92.0, 0.90.5

 Attachments: 4238-v2.txt, 4238.txt


 I didn't dig a lot into this issue, but by splitting a table twice in a row I 
 was able to trigger a situation where a daughter of the first split was 
 deleted by the CatalogJanitor before it processed its parent. Will post log 
 in a comment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4238) CatalogJanitor can clear a daughter that split before processing its parent

2011-09-13 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103404#comment-13103404
 ] 

jirapos...@reviews.apache.org commented on HBASE-4238:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1819/
---

Review request for hbase.


Summary
---

Previous, we'd not clean up a parent if its daughter region didn't exist in the 
fs.  This stipulation was added by HBASE-3872.  This patch undoes this barrier 
to parent cleanup (See  HBASE-3872 for why its ok to do this).

CatalogJanitor

+ Break out the Comparator used by CatalogJanitor.  It was an anonymous class.  
Instead we make it a static inner class so can add test that its actually 
sorting properly.
+ Added method hasNoReferences that will return true if no daughter dir OR no 
refs in daughter dir

Added some TODOs around SplitTransaction -- vaguely related to this patch.

Added new Test that checks cleanParent to ensure it works properly.  Refactored 
bits of previous tests so they use common code.


This addresses bug hbase-4238.
https://issues.apache.org/jira/browse/hbase-4238


Diffs
-

  src/main/java/org/apache/hadoop/hbase/master/CatalogJanitor.java b53e9a0 
  
src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 
742aea4 
  src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java 
abafe5e 
  src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 78e7d62 

Diff: https://reviews.apache.org/r/1819/diff


Testing
---


Thanks,

Michael



 CatalogJanitor can clear a daughter that split before processing its parent
 ---

 Key: HBASE-4238
 URL: https://issues.apache.org/jira/browse/HBASE-4238
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: stack
Priority: Critical
 Fix For: 0.92.0, 0.90.5

 Attachments: 4238-v2.txt, 4238.txt


 I didn't dig a lot into this issue, but by splitting a table twice in a row I 
 was able to trigger a situation where a daughter of the first split was 
 deleted by the CatalogJanitor before it processed its parent. Will post log 
 in a comment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4347) Remove duplicated code from Put, Delete, Get, Scan, MultiPut

2011-09-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103411#comment-13103411
 ] 

Hudson commented on HBASE-4347:
---

Integrated in HBase-TRUNK #2203 (See 
[https://builds.apache.org/job/HBase-TRUNK/2203/])
HBASE-4347 addendum that moves CLUSTER_ID_ATTR to Mutation

tedyu : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HConstants.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Mutation.java


 Remove duplicated code from Put, Delete, Get, Scan, MultiPut
 

 Key: HBASE-4347
 URL: https://issues.apache.org/jira/browse/HBASE-4347
 Project: HBase
  Issue Type: Improvement
  Components: util
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.92.0

 Attachments: 4347-addendum.txt, 4347-v2.txt, 4347-v3.txt, 4347.txt


 This came from discussion with Stack w.r.t. HBASE-2195.
 There is currently a lot of duplicated code especially between Put and 
 Delete, and also between all Operations.
 For example all of Put/Delete/Get/Scan have attributes with exactly the same 
 code in all classes.
 Put and Delete also have the familyMap, Row, Rowlock, Timestamp, etc.
 One way to do this is to introduce OperationWithAttributes which extends 
 Operation, and have Put/Delete/Get/Scan extend that rather than Operation.
 In addition Put and Delete could extends from Mutation (which itself would 
 extend OperationWithAttributes).
 If a static inheritance hierarchy is not desired here, we can use delegation.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4373) HBaseAdmin.assign() doesnot use force flag

2011-09-13 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103425#comment-13103425
 ] 

ramkrishna.s.vasudevan commented on HBASE-4373:
---

@Stack
But except in AssignmentManager.handleHBCK(RegionTransitionData data)
for the M_ZK_REGION_OFFLINE case we use
assign(regionInfo, false); with force=false.  All other places force=true.

And also when you take the HBaseAdmin.assign() api then allowing the user to 
use force=false will not yield him the result as he may not be aware in what 
state the znode is currently in.  So i felt like removing the parameter.  
Pls provide your suggestions.

 HBaseAdmin.assign() doesnot use force flag
 --

 Key: HBASE-4373
 URL: https://issues.apache.org/jira/browse/HBASE-4373
 Project: HBase
  Issue Type: Improvement
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor

 The HBaseAdmin.assign()
 {code}
   public void assign(final byte [] regionName, final boolean force)
   throws MasterNotRunningException, ZooKeeperConnectionException, IOException 
 {
 getMaster().assign(regionName, force);
   }
 {code}
 In the HMaster we call 
 {code}
 PairHRegionInfo, ServerName pair =
   MetaReader.getRegion(this.catalogTracker, regionName);
 if (pair == null) throw new 
 UnknownRegionException(Bytes.toString(regionName));
 if (cpHost != null) {
   if (cpHost.preAssign(pair.getFirst(), force)) {
 return;
   }
 }
 assignRegion(pair.getFirst());
 if (cpHost != null) {
   cpHost.postAssign(pair.getFirst(), force);
 }
 {code}
 The force flag is not getting used.  May be we need to update the javadoc or 
 do not provide the force flag as a parameter if we are not going to use it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4383) SlabCache reports negative heap sizes

2011-09-13 Thread Todd Lipcon (JIRA)
SlabCache reports negative heap sizes
-

 Key: HBASE-4383
 URL: https://issues.apache.org/jira/browse/HBASE-4383
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Li Pi
 Fix For: 0.92.0


2011-09-13 00:36:17,734 INFO org.apache.hadoop.hbase.io.hfile.slab.SlabCache: 
Request Stats
2011-09-13 00:36:17,734 INFO 
org.apache.hadoop.hbase.io.hfile.slab.SingleSizeCache: For Slab of size 72089: 
0 occupied, out of a capacity of 226398 blocks. HeapSize is -798.5m bytes., 
churnTime=0sec
2011-09-13 00:36:17,734 INFO 
org.apache.hadoop.hbase.io.hfile.slab.SingleSizeCache: For Slab of size 137625: 
0 occupied, out of a capacity of 29647 blocks. HeapSize is -202.1m bytes., 
churnTime=0sec
2011-09-13 00:36:17,735 INFO org.apache.hadoop.hbase.io.hfile.slab.SlabCache: 
Current heap size is: -1000.7m
2011-09-13 00:36:17,735 INFO org.apache.hadoop.hbase.io.hfile.slab.SlabCache: 
Successfully Cached Stats
2011-09-13 00:36:17,735 INFO 
org.apache.hadoop.hbase.io.hfile.slab.SingleSizeCache: For Slab of size 72089: 
0 occupied, out of a capacity of 226398 blocks. HeapSize is -798.5m bytes., 
churnTime=0sec
2011-09-13 00:36:17,735 INFO 
org.apache.hadoop.hbase.io.hfile.slab.SingleSizeCache: For Slab of size 137625: 
0 occupied, out of a capacity of 29647 blocks. HeapSize is -202.1m bytes., 
churnTime=0sec
2011-09-13 00:36:17,735 INFO org.apache.hadoop.hbase.io.hfile.slab.SlabCache: 
Current heap size is: -1000.7m


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4367) Deadlock in MemStore flusher due to JDK internally synchronizing on current thread

2011-09-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103459#comment-13103459
 ] 

Ted Yu commented on HBASE-4367:
---

bq. but I can't tell you what time 12349582034 is 
It is Sat May 23 1970 15:26:22 GMT-0700 (PST)
I use http://www.ruddwire.com/handy-code/date-to-millisecond-calculators/ quite 
often.

 Deadlock in MemStore flusher due to JDK internally synchronizing on current 
 thread
 --

 Key: HBASE-4367
 URL: https://issues.apache.org/jira/browse/HBASE-4367
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.4
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Critical
 Fix For: 0.92.0

 Attachments: 4367.txt, hbase-4367.txt


 We observed a deadlock in production between the following threads:
 - IPC handler thread holding the monitor lock on MemStoreFlusher inside 
 reclaimMemStoreMemory, waiting to obtain MemStoreFlusher.lock (the reentrant 
 lock member)
 - cacheFlusher thread inside flushRegion holds MemStoreFlusher.lock, and then 
 calls PriorityCompactionQueue.add, which calls 
 PriorityCompactionQueue.addToRegionsInQueue, which calls 
 CompactionRequest.toString(), which calls Date.toString. If this occurs just 
 after a GC under memory pressure, Date.toString needs to reload locale 
 information (stored in a soft reference), so it calls 
 ResourceBundle.loadBundle, which uses Thread.currentThread() as a 
 synchronizer (see sun bug http://bugs.sun.com/view_bug.do?bug_id=6915621). 
 Since the current thread is the MemStoreFlusher itself, we have a lock order 
 inversion and a deadlock.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4384) Hard to tell what causes failure in ZKAssign#createNodeClosing

2011-09-13 Thread Harsh J (JIRA)
Hard to tell what causes failure in ZKAssign#createNodeClosing
--

 Key: HBASE-4384
 URL: https://issues.apache.org/jira/browse/HBASE-4384
 Project: HBase
  Issue Type: Task
  Components: zookeeper
Affects Versions: 0.90.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
 Fix For: 0.94.0


The current code goes like:

{code}
467   public static int createNodeClosing(ZooKeeperWatcher zkw, HRegionInfo 
region,
468   String serverName)
469   throws KeeperException, KeeperException.NodeExistsException {
470 LOG.debug(zkw.prefix(Creating unassigned node for  +
471   region.getEncodedName() +  in a CLOSING state));
472 
473 RegionTransitionData data = new RegionTransitionData(
474 EventType.RS_ZK_REGION_CLOSING, region.getRegionName(), serverName);
475 
476 synchronized (zkw.getNodes()) {
477   String node = getNodeName(zkw, region.getEncodedName());
478   zkw.getNodes().add(node);
479   return ZKUtil.createAndWatch(zkw, node, data.getBytes());
480 }
481   }
{code}

Both WARN cases would be identical this way. In case of an exception, I think 
an exception ought to be logged as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4384) Hard to tell what causes failure in CloseRegionHandler#getCurrentVersion

2011-09-13 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HBASE-4384:
---

Description: 
The current code goes like:

{code}
172* Get the node's current version
173* @return The expectedVersion.  If -1, we failed getting the node
174*/
175   private int getCurrentVersion() {
176 int expectedVersion = FAILED;
177 try {
178   if ((expectedVersion = ZKAssign.getVersion(
179   server.getZooKeeper(), regionInfo)) == FAILED) {
180 LOG.warn(Error getting node's version in CLOSING state, +
181aborting close of  + regionInfo.getRegionNameAsString());
182   }
183 } catch (KeeperException e) {
184   LOG.warn(Error creating node in CLOSING state, aborting close of  +
185 regionInfo.getRegionNameAsString());
186 }
187 return expectedVersion;
188   }
189 }
{code}

Both WARN cases would be identical this way. In case of an exception, I think 
an exception ought to be logged as well.

  was:
The current code goes like:

{code}
467   public static int createNodeClosing(ZooKeeperWatcher zkw, HRegionInfo 
region,
468   String serverName)
469   throws KeeperException, KeeperException.NodeExistsException {
470 LOG.debug(zkw.prefix(Creating unassigned node for  +
471   region.getEncodedName() +  in a CLOSING state));
472 
473 RegionTransitionData data = new RegionTransitionData(
474 EventType.RS_ZK_REGION_CLOSING, region.getRegionName(), serverName);
475 
476 synchronized (zkw.getNodes()) {
477   String node = getNodeName(zkw, region.getEncodedName());
478   zkw.getNodes().add(node);
479   return ZKUtil.createAndWatch(zkw, node, data.getBytes());
480 }
481   }
{code}

Both WARN cases would be identical this way. In case of an exception, I think 
an exception ought to be logged as well.

Summary: Hard to tell what causes failure in 
CloseRegionHandler#getCurrentVersion  (was: Hard to tell what causes failure in 
ZKAssign#createNodeClosing)

(Updated topic comment/desc.)

 Hard to tell what causes failure in CloseRegionHandler#getCurrentVersion
 

 Key: HBASE-4384
 URL: https://issues.apache.org/jira/browse/HBASE-4384
 Project: HBase
  Issue Type: Task
  Components: zookeeper
Affects Versions: 0.90.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
 Fix For: 0.94.0


 The current code goes like:
 {code}
 172* Get the node's current version
 173* @return The expectedVersion.  If -1, we failed getting the node
 174*/
 175   private int getCurrentVersion() {
 176 int expectedVersion = FAILED;
 177 try {
 178   if ((expectedVersion = ZKAssign.getVersion(
 179   server.getZooKeeper(), regionInfo)) == FAILED) {
 180 LOG.warn(Error getting node's version in CLOSING state, +
 181aborting close of  + regionInfo.getRegionNameAsString());
 182   }
 183 } catch (KeeperException e) {
 184   LOG.warn(Error creating node in CLOSING state, aborting close of  
 +
 185 regionInfo.getRegionNameAsString());
 186 }
 187 return expectedVersion;
 188   }
 189 }
 {code}
 Both WARN cases would be identical this way. In case of an exception, I think 
 an exception ought to be logged as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4384) Hard to tell what causes failure in CloseRegionHandler#getCurrentVersion

2011-09-13 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HBASE-4384:
---

Attachment: HBASE-4384.r1.diff

 Hard to tell what causes failure in CloseRegionHandler#getCurrentVersion
 

 Key: HBASE-4384
 URL: https://issues.apache.org/jira/browse/HBASE-4384
 Project: HBase
  Issue Type: Task
  Components: zookeeper
Affects Versions: 0.90.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
 Fix For: 0.94.0

 Attachments: HBASE-4384.r1.diff


 The current code goes like:
 {code}
 172* Get the node's current version
 173* @return The expectedVersion.  If -1, we failed getting the node
 174*/
 175   private int getCurrentVersion() {
 176 int expectedVersion = FAILED;
 177 try {
 178   if ((expectedVersion = ZKAssign.getVersion(
 179   server.getZooKeeper(), regionInfo)) == FAILED) {
 180 LOG.warn(Error getting node's version in CLOSING state, +
 181aborting close of  + regionInfo.getRegionNameAsString());
 182   }
 183 } catch (KeeperException e) {
 184   LOG.warn(Error creating node in CLOSING state, aborting close of  
 +
 185 regionInfo.getRegionNameAsString());
 186 }
 187 return expectedVersion;
 188   }
 189 }
 {code}
 Both WARN cases would be identical this way. In case of an exception, I think 
 an exception ought to be logged as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4384) Hard to tell what causes failure in CloseRegionHandler#getCurrentVersion

2011-09-13 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HBASE-4384:
---

Status: Patch Available  (was: Reopened)

 Hard to tell what causes failure in CloseRegionHandler#getCurrentVersion
 

 Key: HBASE-4384
 URL: https://issues.apache.org/jira/browse/HBASE-4384
 Project: HBase
  Issue Type: Task
  Components: zookeeper
Affects Versions: 0.90.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
 Fix For: 0.94.0

 Attachments: HBASE-4384.r1.diff


 The current code goes like:
 {code}
 172* Get the node's current version
 173* @return The expectedVersion.  If -1, we failed getting the node
 174*/
 175   private int getCurrentVersion() {
 176 int expectedVersion = FAILED;
 177 try {
 178   if ((expectedVersion = ZKAssign.getVersion(
 179   server.getZooKeeper(), regionInfo)) == FAILED) {
 180 LOG.warn(Error getting node's version in CLOSING state, +
 181aborting close of  + regionInfo.getRegionNameAsString());
 182   }
 183 } catch (KeeperException e) {
 184   LOG.warn(Error creating node in CLOSING state, aborting close of  
 +
 185 regionInfo.getRegionNameAsString());
 186 }
 187 return expectedVersion;
 188   }
 189 }
 {code}
 Both WARN cases would be identical this way. In case of an exception, I think 
 an exception ought to be logged as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4060) Making region assignment more robust

2011-09-13 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-4060:
-

Assignee: ramkrishna.s.vasudevan

 Making region assignment more robust
 

 Key: HBASE-4060
 URL: https://issues.apache.org/jira/browse/HBASE-4060
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.0


 From Eran Kutner:
 My concern is that the region allocation process seems to rely too much on
 timing considerations and doesn't seem to take enough measures to guarantee
 conflicts do not occur. I understand that in a distributed environment, when
 you don't get a timely response from a remote machine you can't know for
 sure if it did or did not receive the request, however there are things that
 can be done to mitigate this and reduce the conflict time significantly. For
 example, when I run dbck it knows that some regions are multiply assigned,
 the master could do the same and try to resolve the conflict. Another
 approach would be to handle late responses, even if the response from the
 remote machine arrives after it was assumed to be dead the master should
 have enough information to know it had created a conflict by assigning the
 region to another server. An even better solution, I think, is for the RS to
 periodically test that it is indeed the rightful owner of every region it
 holds and relinquish control over the region if it's not.
 Obviously a state where two RSs hold the same region is pathological and can
 lead to data loss, as demonstrated in my case. The system should be able to
 actively protect itself against such a scenario. It probably doesn't need
 saying but there is really nothing worse for a data storage system than data
 loss.
 In my case the problem didn't happen in the initial phase but after
 disabling and enabling a table with about 12K regions.
 For more background information, see 'Errors after major compaction' 
 discussion on u...@hbase.apache.org

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4352) Apply version of hbase-4015 to branch

2011-09-13 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103615#comment-13103615
 ] 

ramkrishna.s.vasudevan commented on HBASE-4352:
---

@Stack
As part of this HBASE-4083 fix also needs to be applied to 0.90.x.  HBASE-4083 
fix has been checked into trunk version.  If you can remember, you had told 
that once rolling restart is tested we can take it to 0.90.x version.

 Apply version of hbase-4015 to branch
 -

 Key: HBASE-4352
 URL: https://issues.apache.org/jira/browse/HBASE-4352
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.5


 Consider adding a version of hbase-4015 to 0.90.  It changes HRegionInterface 
 so would need move change to end of the Interface and then test that it 
 doesn't break rolling restart.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4385) Use CacheBuilder in place of MapMaker

2011-09-13 Thread Ted Yu (JIRA)
Use CacheBuilder in place of MapMaker
-

 Key: HBASE-4385
 URL: https://issues.apache.org/jira/browse/HBASE-4385
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu


Guava release 10 introduced CacheBuilder.
We should use it in place of MapMaker which is used for caching.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4321) Add more comprehensive region split calculator

2011-09-13 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-4321:
--

Component/s: hbck

 Add more comprehensive region split calculator
 --

 Key: HBASE-4321
 URL: https://issues.apache.org/jira/browse/HBASE-4321
 Project: HBase
  Issue Type: Improvement
  Components: hbck
Affects Versions: 0.90.4
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Fix For: 0.92.0, 0.90.5

 Attachments: 
 0001-HBASE-4321-Add-more-comprehensive-region-split-calcu.patch, 
 0001-HBASE-4321-Add-more-comprehensive-region-split-calcu.patch, 
 hbase-4321.diff, hbase-4321.txt


 Hbck currently scans through meta one entry at a time, only keeping a 
 reference to the previous meta entry.  This is insufficient for capturing all 
 the possible problems in meta and needs something more to properly identify 
 holes, overlaps, duplicate start keys, and otherwise invalid meta entries.
 Ideally, this calculator could also be used online interrogating an existing 
 meta (HBASE-4058), and also used to generate a completely new meta offline 
 just from regioninfo and in hdfs (HBASE-3505). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4322) [hbck] Update checkIntegrity/checkRegionChain to present more accurate region split problem summary

2011-09-13 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-4322:
--

Component/s: hbck

 [hbck] Update checkIntegrity/checkRegionChain to present more accurate region 
 split problem summary
 ---

 Key: HBASE-4322
 URL: https://issues.apache.org/jira/browse/HBASE-4322
 Project: HBase
  Issue Type: Improvement
  Components: hbck
Affects Versions: 0.90.4, 0.94.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: 
 0001-HBASE-4322-hbck-Update-checkIntegrity-checkRegionCha.patch, 
 0001-HBASE-4322-hbck-Update-checkIntegrity-checkRegionCha.patch


 This is a mostly semantics preserving upgrade to hbck that uses the 
 RegionSplitCalculator from HBASE-4321 that provides more in depth information 
 about region split problems in meta when running hbck.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4058) Extend TestHBaseFsck with a complete .META. recovery scenario

2011-09-13 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-4058:
--

Component/s: hbck

 Extend TestHBaseFsck with a complete .META. recovery scenario
 -

 Key: HBASE-4058
 URL: https://issues.apache.org/jira/browse/HBASE-4058
 Project: HBase
  Issue Type: Improvement
  Components: hbck
Reporter: Andrew Purtell
Assignee: stack
 Fix For: 0.94.0


 We should have a unit test that launches a minicluster and constructs a few 
 tables, then deletes META files on disk, then bounces the master, then 
 recovers the result with HBCK. Perhaps it is possible to extend TestHBaseFsck 
 to do this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3505) hbck should be able to fix case where region is missing from META but on FS

2011-09-13 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-3505:
--

Component/s: hbck

 hbck should be able to fix case where region is missing from META but on FS
 ---

 Key: HBASE-3505
 URL: https://issues.apache.org/jira/browse/HBASE-3505
 Project: HBase
  Issue Type: Improvement
  Components: hbck
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hbase-3505.txt




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3887) Add region deletion tool

2011-09-13 Thread Daniel Einspanjer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Einspanjer updated HBASE-3887:
-

Attachment: online_delete.rb

This script deletes regions on a live table. The only thing it doesn't 
currently do is split regions that are partially within the start / end key 
range.  I would love it if someone could take a crack at putting that 
enhancement in this or writing a separate script to do it.

Tested with a few test clusters in various conditions and then used this script 
to delete thousands of old regions from our large production table.

 Add region deletion tool
 

 Key: HBASE-3887
 URL: https://issues.apache.org/jira/browse/HBASE-3887
 Project: HBase
  Issue Type: New Feature
  Components: regionserver
Reporter: Ophir Cohen
Priority: Minor
 Attachments: online_delete.rb


 A region deletion tool can be very useful to remove large amount of data.
 For example, it can be used to remove all data older than specific date 
 (assuming your data sorted by dates) etc...
 This tool should be something as follows:
 Input: region key or (even better!) start  end key.
 1. Split region to isolate the keys.
 2. Disable the relevant regions.
 3. Delete files from the file system.
 4. Update .META. table.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4322) [hbck] Update checkIntegrity/checkRegionChain to present more accurate region split problem summary

2011-09-13 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-4322:
--

Attachment: hbase-4322-0.90.patch

Attached a 0.90 compatible patch.  

 [hbck] Update checkIntegrity/checkRegionChain to present more accurate region 
 split problem summary
 ---

 Key: HBASE-4322
 URL: https://issues.apache.org/jira/browse/HBASE-4322
 Project: HBase
  Issue Type: Improvement
  Components: hbck
Affects Versions: 0.90.4, 0.94.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: 
 0001-HBASE-4322-hbck-Update-checkIntegrity-checkRegionCha.patch, 
 0001-HBASE-4322-hbck-Update-checkIntegrity-checkRegionCha.patch, 
 hbase-4322-0.90.patch


 This is a mostly semantics preserving upgrade to hbck that uses the 
 RegionSplitCalculator from HBASE-4321 that provides more in depth information 
 about region split problems in meta when running hbck.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4379) [hbck] Does not complain about tables with no end region [Z,]

2011-09-13 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-4379:
--

Component/s: hbck

 [hbck] Does not complain about tables with no end region [Z,]
 -

 Key: HBASE-4379
 URL: https://issues.apache.org/jira/browse/HBASE-4379
 Project: HBase
  Issue Type: Bug
  Components: hbck
Reporter: Jonathan Hsieh

 hbck does not detect or have an error condition when the last region of a 
 table is missing (end key != '').

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4375) [hbck] Add region coverage visualization to hbck

2011-09-13 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-4375:
--

Status: Patch Available  (was: Open)

 [hbck] Add region coverage visualization to hbck
 

 Key: HBASE-4375
 URL: https://issues.apache.org/jira/browse/HBASE-4375
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.94.0, 0.90.5
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: 
 0001-HBASE-4375-Add-region-coverage-visualization-to-hbck.patch


 After HBASE-4322 and HBASE-4321, we now have an accurate region splits / 
 coverage map for properly identifying holes, overlaps, backwards regions and 
 other kinds of problems in the .META. table.  hbck should display this 
 information so that someone can fix this.
 A simple version for a table with regions [,A], [A,B], [A,C], [C,] and would 
 dump out something like this (showing an overlap in [A,B])
 :  ['table,,..', 'table,A,..']
 A: ['table,A,..', 'B'] ['table,A,..', 'C']
 B: ['table,A,..', 'C']  
 C: ['table,C', '']
 null:
 My first thought is '-details' should this dump the full region map including 
 all good and bad regions.  Without -details, any errors should dump info with 
 some context -- dump one region before problems, problem regions, and then 
 one post problem region.
 Alternately we could add a new option or options to dump the region split map.
 What is the preferred way to toggle display of this information in hbck?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4375) [hbck] Add region coverage visualization to hbck

2011-09-13 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103701#comment-13103701
 ] 

Jonathan Hsieh commented on HBASE-4375:
---

Patch applies on 0.90 and trunk after HBASE-4322.

 [hbck] Add region coverage visualization to hbck
 

 Key: HBASE-4375
 URL: https://issues.apache.org/jira/browse/HBASE-4375
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.94.0, 0.90.5
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: 
 0001-HBASE-4375-Add-region-coverage-visualization-to-hbck.patch


 After HBASE-4322 and HBASE-4321, we now have an accurate region splits / 
 coverage map for properly identifying holes, overlaps, backwards regions and 
 other kinds of problems in the .META. table.  hbck should display this 
 information so that someone can fix this.
 A simple version for a table with regions [,A], [A,B], [A,C], [C,] and would 
 dump out something like this (showing an overlap in [A,B])
 :  ['table,,..', 'table,A,..']
 A: ['table,A,..', 'B'] ['table,A,..', 'C']
 B: ['table,A,..', 'C']  
 C: ['table,C', '']
 null:
 My first thought is '-details' should this dump the full region map including 
 all good and bad regions.  Without -details, any errors should dump info with 
 some context -- dump one region before problems, problem regions, and then 
 one post problem region.
 Alternately we could add a new option or options to dump the region split map.
 What is the preferred way to toggle display of this information in hbck?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4375) [hbck] Add region coverage visualization to hbck

2011-09-13 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103702#comment-13103702
 ] 

Jonathan Hsieh commented on HBASE-4375:
---

Implemented simplest behavior -- if details mode is on, then dumps all regions 
split ranges.

 [hbck] Add region coverage visualization to hbck
 

 Key: HBASE-4375
 URL: https://issues.apache.org/jira/browse/HBASE-4375
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.94.0, 0.90.5
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: 
 0001-HBASE-4375-Add-region-coverage-visualization-to-hbck.patch


 After HBASE-4322 and HBASE-4321, we now have an accurate region splits / 
 coverage map for properly identifying holes, overlaps, backwards regions and 
 other kinds of problems in the .META. table.  hbck should display this 
 information so that someone can fix this.
 A simple version for a table with regions [,A], [A,B], [A,C], [C,] and would 
 dump out something like this (showing an overlap in [A,B])
 :  ['table,,..', 'table,A,..']
 A: ['table,A,..', 'B'] ['table,A,..', 'C']
 B: ['table,A,..', 'C']  
 C: ['table,C', '']
 null:
 My first thought is '-details' should this dump the full region map including 
 all good and bad regions.  Without -details, any errors should dump info with 
 some context -- dump one region before problems, problem regions, and then 
 one post problem region.
 Alternately we could add a new option or options to dump the region split map.
 What is the preferred way to toggle display of this information in hbck?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4386) NPE in TaskMonitor

2011-09-13 Thread Todd Lipcon (JIRA)
NPE in TaskMonitor
--

 Key: HBASE-4386
 URL: https://issues.apache.org/jira/browse/HBASE-4386
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Priority: Critical
 Fix For: 0.92.0


Saw the following hitting /rs-status
preINTERNAL_SERVER_ERROR/pre/ph3Caused 
by:/h3prejava.lang.NullPointerException
at 
org.apache.hadoop.hbase.monitoring.TaskMonitor.purgeExpiredTasks(TaskMonitor.java:97)
at 
org.apache.hadoop.hbase.monitoring.TaskMonitor.getTasks(TaskMonitor.java:127)
at 
org.apache.hbase.tmpl.common.TaskMonitorTmplImpl.renderNoFlush(TaskMonitorTmplImpl.java:50)
at 
org.apache.hbase.tmpl.common.TaskMonitorTmpl.renderNoFlush(TaskMonitorTmpl.java:170)
at 
org.apache.hbase.tmpl.regionserver.RSStatusTmplImpl.renderNoFlush(RSStatusTmplImpl.java:70)
at 
org.apache.hbase.tmpl.regionserver.RSStatusTmpl.renderNoFlush(RSStatusTmpl.java:176)
at 
org.apache.hbase.tmpl.regionserver.RSStatusTmpl.render(RSStatusTmpl.java:167)
at 
org.apache.hadoop.hbase.regionserver.RSStatusServlet.doGet(RSStatusServlet.java:48)


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4386) NPE in TaskMonitor

2011-09-13 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103711#comment-13103711
 ] 

Todd Lipcon commented on HBASE-4386:


I think the issue is that items are added to the {{tasks}} list without 
synchronization. So the ArrayList can get into an inconsistent state where 
iterating it returns null.

 NPE in TaskMonitor
 --

 Key: HBASE-4386
 URL: https://issues.apache.org/jira/browse/HBASE-4386
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Priority: Critical
 Fix For: 0.92.0


 Saw the following hitting /rs-status
 preINTERNAL_SERVER_ERROR/pre/ph3Caused 
 by:/h3prejava.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.monitoring.TaskMonitor.purgeExpiredTasks(TaskMonitor.java:97)
 at 
 org.apache.hadoop.hbase.monitoring.TaskMonitor.getTasks(TaskMonitor.java:127)
 at 
 org.apache.hbase.tmpl.common.TaskMonitorTmplImpl.renderNoFlush(TaskMonitorTmplImpl.java:50)
 at 
 org.apache.hbase.tmpl.common.TaskMonitorTmpl.renderNoFlush(TaskMonitorTmpl.java:170)
 at 
 org.apache.hbase.tmpl.regionserver.RSStatusTmplImpl.renderNoFlush(RSStatusTmplImpl.java:70)
 at 
 org.apache.hbase.tmpl.regionserver.RSStatusTmpl.renderNoFlush(RSStatusTmpl.java:176)
 at 
 org.apache.hbase.tmpl.regionserver.RSStatusTmpl.render(RSStatusTmpl.java:167)
 at 
 org.apache.hadoop.hbase.regionserver.RSStatusServlet.doGet(RSStatusServlet.java:48)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4386) NPE in TaskMonitor

2011-09-13 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reassigned HBASE-4386:
--

Assignee: Todd Lipcon

 NPE in TaskMonitor
 --

 Key: HBASE-4386
 URL: https://issues.apache.org/jira/browse/HBASE-4386
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Critical
 Fix For: 0.92.0


 Saw the following hitting /rs-status
 preINTERNAL_SERVER_ERROR/pre/ph3Caused 
 by:/h3prejava.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.monitoring.TaskMonitor.purgeExpiredTasks(TaskMonitor.java:97)
 at 
 org.apache.hadoop.hbase.monitoring.TaskMonitor.getTasks(TaskMonitor.java:127)
 at 
 org.apache.hbase.tmpl.common.TaskMonitorTmplImpl.renderNoFlush(TaskMonitorTmplImpl.java:50)
 at 
 org.apache.hbase.tmpl.common.TaskMonitorTmpl.renderNoFlush(TaskMonitorTmpl.java:170)
 at 
 org.apache.hbase.tmpl.regionserver.RSStatusTmplImpl.renderNoFlush(RSStatusTmplImpl.java:70)
 at 
 org.apache.hbase.tmpl.regionserver.RSStatusTmpl.renderNoFlush(RSStatusTmpl.java:176)
 at 
 org.apache.hbase.tmpl.regionserver.RSStatusTmpl.render(RSStatusTmpl.java:167)
 at 
 org.apache.hadoop.hbase.regionserver.RSStatusServlet.doGet(RSStatusServlet.java:48)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4387) Error while syncing: DFSOutputStream is closed

2011-09-13 Thread Todd Lipcon (JIRA)
Error while syncing: DFSOutputStream is closed
--

 Key: HBASE-4387
 URL: https://issues.apache.org/jira/browse/HBASE-4387
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Priority: Critical
 Fix For: 0.92.0


In a billion-row load on ~25 servers, I see error while syncing reasonable 
often with the error DFSOutputStream is closed around a roll. We have some 
race where a roll at the same time as heavy inserts causes a problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4387) Error while syncing: DFSOutputStream is closed

2011-09-13 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HBASE-4387:
---

Attachment: errors-with-context.txt

Here are the logs with 100 lines of context around all the ERROR lines

 Error while syncing: DFSOutputStream is closed
 --

 Key: HBASE-4387
 URL: https://issues.apache.org/jira/browse/HBASE-4387
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Priority: Critical
 Fix For: 0.92.0

 Attachments: errors-with-context.txt


 In a billion-row load on ~25 servers, I see error while syncing reasonable 
 often with the error DFSOutputStream is closed around a roll. We have some 
 race where a roll at the same time as heavy inserts causes a problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.

2011-09-13 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4351:
--

Status: Open  (was: Patch Available)

 If from Admin we try to unassign a region forcefully, though a valid region 
 name is given the master is not able to identify the region to unassign.
 

 Key: HBASE-4351
 URL: https://issues.apache.org/jira/browse/HBASE-4351
 Project: HBase
  Issue Type: Bug
 Environment: Linux
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-4351.patch, HBASE-4351_1.patch


 The following is the problem
 Get the exact region name from UI and call
 HBaseAdmin.unassign(regionname, true).
 Here true is forceful option.
 As part of unassign api
 {code}
   public void unassign(final byte [] regionName, final boolean force)
   throws IOException {
 PairHRegionInfo, HServerAddress pair =
   MetaReader.getRegion(this.catalogTracker, regionName);
 if (pair == null) throw new 
 UnknownRegionException(Bytes.toStringBinary(regionName));
 HRegionInfo hri = pair.getFirst();
 if (force) this.assignmentManager.clearRegionFromTransition(hri);
 this.assignmentManager.unassign(hri, force);
   }
 {code}
 As part of clearRegionFromTransition()
 {code}
 synchronized (this.regions) {
   this.regions.remove(hri);
   for (SetHRegionInfo regions : this.servers.values()) {
 regions.remove(hri);
   }
 }
 {code}
 the region is also removed.  Hence when the master tries to identify the 
 region
 {code}
   if (!regions.containsKey(region)) {
 debugLog(region, Attempted to unassign region  +
   region.getRegionNameAsString() +  but it is not  +
   currently assigned anywhere);
 return;
   }
 {code}
 It is not able to identify the region.  It exists in trunk and 0.90.x also.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4153) Handle RegionAlreadyInTransitionException in AssignmentManager

2011-09-13 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4153:
--

Status: Patch Available  (was: Open)

 Handle RegionAlreadyInTransitionException in AssignmentManager
 --

 Key: HBASE-4153
 URL: https://issues.apache.org/jira/browse/HBASE-4153
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0

 Attachments: HBASE-4153_1.patch


 Comment from Stack over in HBASE-3741:
 {quote}
 Question: Looking at this patch again, if we throw a 
 RegionAlreadyInTransitionException, won't we just assign the region elsewhere 
 though RegionAlreadyInTransitionException in at least one case here is saying 
 that the region is already open on this regionserver?
 {quote}
 Indeed looking at the code it's going to be handled the same way other 
 exceptions are. Need to add special cases for assign and unassign.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.

2011-09-13 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4351:
--

Attachment: HBASE-4351_1.patch

 If from Admin we try to unassign a region forcefully, though a valid region 
 name is given the master is not able to identify the region to unassign.
 

 Key: HBASE-4351
 URL: https://issues.apache.org/jira/browse/HBASE-4351
 Project: HBase
  Issue Type: Bug
 Environment: Linux
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-4351.patch, HBASE-4351_1.patch


 The following is the problem
 Get the exact region name from UI and call
 HBaseAdmin.unassign(regionname, true).
 Here true is forceful option.
 As part of unassign api
 {code}
   public void unassign(final byte [] regionName, final boolean force)
   throws IOException {
 PairHRegionInfo, HServerAddress pair =
   MetaReader.getRegion(this.catalogTracker, regionName);
 if (pair == null) throw new 
 UnknownRegionException(Bytes.toStringBinary(regionName));
 HRegionInfo hri = pair.getFirst();
 if (force) this.assignmentManager.clearRegionFromTransition(hri);
 this.assignmentManager.unassign(hri, force);
   }
 {code}
 As part of clearRegionFromTransition()
 {code}
 synchronized (this.regions) {
   this.regions.remove(hri);
   for (SetHRegionInfo regions : this.servers.values()) {
 regions.remove(hri);
   }
 }
 {code}
 the region is also removed.  Hence when the master tries to identify the 
 region
 {code}
   if (!regions.containsKey(region)) {
 debugLog(region, Attempted to unassign region  +
   region.getRegionNameAsString() +  but it is not  +
   currently assigned anywhere);
 return;
   }
 {code}
 It is not able to identify the region.  It exists in trunk and 0.90.x also.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4153) Handle RegionAlreadyInTransitionException in AssignmentManager

2011-09-13 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4153:
--

Attachment: HBASE-4153_1.patch

 Handle RegionAlreadyInTransitionException in AssignmentManager
 --

 Key: HBASE-4153
 URL: https://issues.apache.org/jira/browse/HBASE-4153
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0

 Attachments: HBASE-4153_1.patch


 Comment from Stack over in HBASE-3741:
 {quote}
 Question: Looking at this patch again, if we throw a 
 RegionAlreadyInTransitionException, won't we just assign the region elsewhere 
 though RegionAlreadyInTransitionException in at least one case here is saying 
 that the region is already open on this regionserver?
 {quote}
 Indeed looking at the code it's going to be handled the same way other 
 exceptions are. Need to add special cases for assign and unassign.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.

2011-09-13 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4351:
--

Status: Patch Available  (was: Open)

 If from Admin we try to unassign a region forcefully, though a valid region 
 name is given the master is not able to identify the region to unassign.
 

 Key: HBASE-4351
 URL: https://issues.apache.org/jira/browse/HBASE-4351
 Project: HBase
  Issue Type: Bug
 Environment: Linux
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-4351.patch, HBASE-4351_1.patch


 The following is the problem
 Get the exact region name from UI and call
 HBaseAdmin.unassign(regionname, true).
 Here true is forceful option.
 As part of unassign api
 {code}
   public void unassign(final byte [] regionName, final boolean force)
   throws IOException {
 PairHRegionInfo, HServerAddress pair =
   MetaReader.getRegion(this.catalogTracker, regionName);
 if (pair == null) throw new 
 UnknownRegionException(Bytes.toStringBinary(regionName));
 HRegionInfo hri = pair.getFirst();
 if (force) this.assignmentManager.clearRegionFromTransition(hri);
 this.assignmentManager.unassign(hri, force);
   }
 {code}
 As part of clearRegionFromTransition()
 {code}
 synchronized (this.regions) {
   this.regions.remove(hri);
   for (SetHRegionInfo regions : this.servers.values()) {
 regions.remove(hri);
   }
 }
 {code}
 the region is also removed.  Hence when the master tries to identify the 
 region
 {code}
   if (!regions.containsKey(region)) {
 debugLog(region, Attempted to unassign region  +
   region.getRegionNameAsString() +  but it is not  +
   currently assigned anywhere);
 return;
   }
 {code}
 It is not able to identify the region.  It exists in trunk and 0.90.x also.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.

2011-09-13 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4351:
--

Attachment: HBASE-4351_2.patch

Resubmitting patch with minor changes

 If from Admin we try to unassign a region forcefully, though a valid region 
 name is given the master is not able to identify the region to unassign.
 

 Key: HBASE-4351
 URL: https://issues.apache.org/jira/browse/HBASE-4351
 Project: HBase
  Issue Type: Bug
 Environment: Linux
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-4351.patch, HBASE-4351_1.patch, HBASE-4351_2.patch


 The following is the problem
 Get the exact region name from UI and call
 HBaseAdmin.unassign(regionname, true).
 Here true is forceful option.
 As part of unassign api
 {code}
   public void unassign(final byte [] regionName, final boolean force)
   throws IOException {
 PairHRegionInfo, HServerAddress pair =
   MetaReader.getRegion(this.catalogTracker, regionName);
 if (pair == null) throw new 
 UnknownRegionException(Bytes.toStringBinary(regionName));
 HRegionInfo hri = pair.getFirst();
 if (force) this.assignmentManager.clearRegionFromTransition(hri);
 this.assignmentManager.unassign(hri, force);
   }
 {code}
 As part of clearRegionFromTransition()
 {code}
 synchronized (this.regions) {
   this.regions.remove(hri);
   for (SetHRegionInfo regions : this.servers.values()) {
 regions.remove(hri);
   }
 }
 {code}
 the region is also removed.  Hence when the master tries to identify the 
 region
 {code}
   if (!regions.containsKey(region)) {
 debugLog(region, Attempted to unassign region  +
   region.getRegionNameAsString() +  but it is not  +
   currently assigned anywhere);
 return;
   }
 {code}
 It is not able to identify the region.  It exists in trunk and 0.90.x also.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.

2011-09-13 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4351:
--

Status: Open  (was: Patch Available)

 If from Admin we try to unassign a region forcefully, though a valid region 
 name is given the master is not able to identify the region to unassign.
 

 Key: HBASE-4351
 URL: https://issues.apache.org/jira/browse/HBASE-4351
 Project: HBase
  Issue Type: Bug
 Environment: Linux
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-4351.patch, HBASE-4351_1.patch, HBASE-4351_2.patch


 The following is the problem
 Get the exact region name from UI and call
 HBaseAdmin.unassign(regionname, true).
 Here true is forceful option.
 As part of unassign api
 {code}
   public void unassign(final byte [] regionName, final boolean force)
   throws IOException {
 PairHRegionInfo, HServerAddress pair =
   MetaReader.getRegion(this.catalogTracker, regionName);
 if (pair == null) throw new 
 UnknownRegionException(Bytes.toStringBinary(regionName));
 HRegionInfo hri = pair.getFirst();
 if (force) this.assignmentManager.clearRegionFromTransition(hri);
 this.assignmentManager.unassign(hri, force);
   }
 {code}
 As part of clearRegionFromTransition()
 {code}
 synchronized (this.regions) {
   this.regions.remove(hri);
   for (SetHRegionInfo regions : this.servers.values()) {
 regions.remove(hri);
   }
 }
 {code}
 the region is also removed.  Hence when the master tries to identify the 
 region
 {code}
   if (!regions.containsKey(region)) {
 debugLog(region, Attempted to unassign region  +
   region.getRegionNameAsString() +  but it is not  +
   currently assigned anywhere);
 return;
   }
 {code}
 It is not able to identify the region.  It exists in trunk and 0.90.x also.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-3742) Master receives unexpected region close but doesn't do anything

2011-09-13 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans resolved HBASE-3742.
---

Resolution: Won't Fix

Resolving as won't fix, lots of rework done for the master in trunk. If there's 
still an issue, it'll probably come up differently.

 Master receives unexpected region close but doesn't do anything
 ---

 Key: HBASE-3742
 URL: https://issues.apache.org/jira/browse/HBASE-3742
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.1
Reporter: Jean-Daniel Cryans

 We got this in the context of HBASE-3741, a region was closed by a region 
 server but the master wasn't expecting it and didn't do anything about it. We 
 had to force assign it back.
 {quote}
 2011-04-05 15:15:55,812 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
 master:6-0x42ec2cece810b68 Retrieved 93 byte(s) of data from znode 
 /prodjobs/unassigned/1470298961 and set watcher; 
 region=stumbles_by_userid2,'穗���6,1266566087256, 
 server=sv4borg42,60020,1300920459477, state=RS_ZK_REGION_CLOSING
 2011-04-05 15:15:55,812 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned 
 node: /prodjobs/unassigned/1470298961 
 (region=stumbles_by_userid2,'穗���6,1266566087256, 
 server=sv4borg42,60020,1300920459477, state=RS_ZK_REGION_CLOSING)
 2011-04-05 15:15:55,812 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSING, server=sv4borg42,60020,1300920459477, 
 region=1470298961
 2011-04-05 15:15:55,812 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSING for region 
 1470298961 from server sv4borg42,60020,1300920459477 but region was in  the 
 state null and not in expected PENDING_CLOSE or CLOSING states
 2011-04-05 15:15:55,843 DEBUG 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: 
 master:6-0x42ec2cece810b68 Received ZooKeeper Event, 
 type=NodeDataChanged, state=SyncConnected, 
 path=/prodjobs/unassigned/1470298961
 2011-04-05 15:15:55,843 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
 master:6-0x42ec2cece810b68 Retrieved 93 byte(s) of data from znode 
 /prodjobs/unassigned/1470298961 and set watcher; 
 region=stumbles_by_userid2,'穗���6,1266566087256, 
 server=sv4borg42,60020,1300920459477, state=RS_ZK_REGION_CLOSED
 2011-04-05 15:15:55,843 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=sv4borg42,60020,1300920459477, 
 region=1470298961
 2011-04-05 15:15:55,843 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 1470298961 from server sv4borg42,60020,1300920459477 but region was in  the 
 state null and not in expected PENDING_CLOSE or CLOSING states
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.

2011-09-13 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4351:
--

Status: Patch Available  (was: Open)

 If from Admin we try to unassign a region forcefully, though a valid region 
 name is given the master is not able to identify the region to unassign.
 

 Key: HBASE-4351
 URL: https://issues.apache.org/jira/browse/HBASE-4351
 Project: HBase
  Issue Type: Bug
 Environment: Linux
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-4351.patch, HBASE-4351_1.patch, HBASE-4351_2.patch


 The following is the problem
 Get the exact region name from UI and call
 HBaseAdmin.unassign(regionname, true).
 Here true is forceful option.
 As part of unassign api
 {code}
   public void unassign(final byte [] regionName, final boolean force)
   throws IOException {
 PairHRegionInfo, HServerAddress pair =
   MetaReader.getRegion(this.catalogTracker, regionName);
 if (pair == null) throw new 
 UnknownRegionException(Bytes.toStringBinary(regionName));
 HRegionInfo hri = pair.getFirst();
 if (force) this.assignmentManager.clearRegionFromTransition(hri);
 this.assignmentManager.unassign(hri, force);
   }
 {code}
 As part of clearRegionFromTransition()
 {code}
 synchronized (this.regions) {
   this.regions.remove(hri);
   for (SetHRegionInfo regions : this.servers.values()) {
 regions.remove(hri);
   }
 }
 {code}
 the region is also removed.  Hence when the master tries to identify the 
 region
 {code}
   if (!regions.containsKey(region)) {
 debugLog(region, Attempted to unassign region  +
   region.getRegionNameAsString() +  but it is not  +
   currently assigned anywhere);
 return;
   }
 {code}
 It is not able to identify the region.  It exists in trunk and 0.90.x also.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4153) Handle RegionAlreadyInTransitionException in AssignmentManager

2011-09-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103756#comment-13103756
 ] 

Ted Yu commented on HBASE-4153:
---

Also, the relatively long exception messages for closeRegion() and openRegion() 
can be extracted so that majority of the message is shared.
The javadoc warning developer I mentioned above can be placed on the extracted 
exception message.

 Handle RegionAlreadyInTransitionException in AssignmentManager
 --

 Key: HBASE-4153
 URL: https://issues.apache.org/jira/browse/HBASE-4153
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0

 Attachments: HBASE-4153_1.patch


 Comment from Stack over in HBASE-3741:
 {quote}
 Question: Looking at this patch again, if we throw a 
 RegionAlreadyInTransitionException, won't we just assign the region elsewhere 
 though RegionAlreadyInTransitionException in at least one case here is saying 
 that the region is already open on this regionserver?
 {quote}
 Indeed looking at the code it's going to be handled the same way other 
 exceptions are. Need to add special cases for assign and unassign.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4387) Error while syncing: DFSOutputStream is closed

2011-09-13 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103774#comment-13103774
 ] 

Jean-Daniel Cryans commented on HBASE-4387:
---

HLog.syncer() syncs outside of the updateLock and has the following comment:

bq. // Done in parallel for all writer threads, thanks to HDFS-895

So we don't need to synchronize for sync'ing but we do need something when 
closing the file.

 Error while syncing: DFSOutputStream is closed
 --

 Key: HBASE-4387
 URL: https://issues.apache.org/jira/browse/HBASE-4387
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Priority: Critical
 Fix For: 0.92.0

 Attachments: errors-with-context.txt


 In a billion-row load on ~25 servers, I see error while syncing reasonable 
 often with the error DFSOutputStream is closed around a roll. We have some 
 race where a roll at the same time as heavy inserts causes a problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.

2011-09-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103792#comment-13103792
 ] 

stack commented on HBASE-4351:
--

+1 on patch.  J-D?

 If from Admin we try to unassign a region forcefully, though a valid region 
 name is given the master is not able to identify the region to unassign.
 

 Key: HBASE-4351
 URL: https://issues.apache.org/jira/browse/HBASE-4351
 Project: HBase
  Issue Type: Bug
 Environment: Linux
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-4351.patch, HBASE-4351_1.patch, HBASE-4351_2.patch


 The following is the problem
 Get the exact region name from UI and call
 HBaseAdmin.unassign(regionname, true).
 Here true is forceful option.
 As part of unassign api
 {code}
   public void unassign(final byte [] regionName, final boolean force)
   throws IOException {
 PairHRegionInfo, HServerAddress pair =
   MetaReader.getRegion(this.catalogTracker, regionName);
 if (pair == null) throw new 
 UnknownRegionException(Bytes.toStringBinary(regionName));
 HRegionInfo hri = pair.getFirst();
 if (force) this.assignmentManager.clearRegionFromTransition(hri);
 this.assignmentManager.unassign(hri, force);
   }
 {code}
 As part of clearRegionFromTransition()
 {code}
 synchronized (this.regions) {
   this.regions.remove(hri);
   for (SetHRegionInfo regions : this.servers.values()) {
 regions.remove(hri);
   }
 }
 {code}
 the region is also removed.  Hence when the master tries to identify the 
 region
 {code}
   if (!regions.containsKey(region)) {
 debugLog(region, Attempted to unassign region  +
   region.getRegionNameAsString() +  but it is not  +
   currently assigned anywhere);
 return;
   }
 {code}
 It is not able to identify the region.  It exists in trunk and 0.90.x also.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.

2011-09-13 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4351:
-

Fix Version/s: 0.90.5
   0.92.0

 If from Admin we try to unassign a region forcefully, though a valid region 
 name is given the master is not able to identify the region to unassign.
 

 Key: HBASE-4351
 URL: https://issues.apache.org/jira/browse/HBASE-4351
 Project: HBase
  Issue Type: Bug
 Environment: Linux
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0, 0.90.5

 Attachments: HBASE-4351.patch, HBASE-4351_1.patch, HBASE-4351_2.patch


 The following is the problem
 Get the exact region name from UI and call
 HBaseAdmin.unassign(regionname, true).
 Here true is forceful option.
 As part of unassign api
 {code}
   public void unassign(final byte [] regionName, final boolean force)
   throws IOException {
 PairHRegionInfo, HServerAddress pair =
   MetaReader.getRegion(this.catalogTracker, regionName);
 if (pair == null) throw new 
 UnknownRegionException(Bytes.toStringBinary(regionName));
 HRegionInfo hri = pair.getFirst();
 if (force) this.assignmentManager.clearRegionFromTransition(hri);
 this.assignmentManager.unassign(hri, force);
   }
 {code}
 As part of clearRegionFromTransition()
 {code}
 synchronized (this.regions) {
   this.regions.remove(hri);
   for (SetHRegionInfo regions : this.servers.values()) {
 regions.remove(hri);
   }
 }
 {code}
 the region is also removed.  Hence when the master tries to identify the 
 region
 {code}
   if (!regions.containsKey(region)) {
 debugLog(region, Attempted to unassign region  +
   region.getRegionNameAsString() +  but it is not  +
   currently assigned anywhere);
 return;
   }
 {code}
 It is not able to identify the region.  It exists in trunk and 0.90.x also.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4320) Off Heap Cache never creates Slabs

2011-09-13 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103794#comment-13103794
 ] 

Jonathan Gray commented on HBASE-4320:
--

Looks like this was committed with HBASE-4027 in the message and not 
HBASE-4320.  Guess there's no way to retroactively fix that but in case anyone 
comes here looking for the revision info it's linked over in the other jira.

 Off Heap Cache never creates Slabs
 --

 Key: HBASE-4320
 URL: https://issues.apache.org/jira/browse/HBASE-4320
 Project: HBase
  Issue Type: Sub-task
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.92.0

 Attachments: confnotloading.txt


 On testing, the configuration file is never loaded by the off heap cache.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.

2011-09-13 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103802#comment-13103802
 ] 

Jean-Daniel Cryans commented on HBASE-4351:
---

Why not just:

{code}
if (force) {
  this.assignmentManager.clearRegionFromTransition(hri);
  assignRegion(hri);
} else {
  this.assignmentManager.unassign(hri, force);
}
cpPostUnassign(hri, force);
{code}

No return, no double cpPostUnassign call.

 If from Admin we try to unassign a region forcefully, though a valid region 
 name is given the master is not able to identify the region to unassign.
 

 Key: HBASE-4351
 URL: https://issues.apache.org/jira/browse/HBASE-4351
 Project: HBase
  Issue Type: Bug
 Environment: Linux
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0, 0.90.5

 Attachments: HBASE-4351.patch, HBASE-4351_1.patch, HBASE-4351_2.patch


 The following is the problem
 Get the exact region name from UI and call
 HBaseAdmin.unassign(regionname, true).
 Here true is forceful option.
 As part of unassign api
 {code}
   public void unassign(final byte [] regionName, final boolean force)
   throws IOException {
 PairHRegionInfo, HServerAddress pair =
   MetaReader.getRegion(this.catalogTracker, regionName);
 if (pair == null) throw new 
 UnknownRegionException(Bytes.toStringBinary(regionName));
 HRegionInfo hri = pair.getFirst();
 if (force) this.assignmentManager.clearRegionFromTransition(hri);
 this.assignmentManager.unassign(hri, force);
   }
 {code}
 As part of clearRegionFromTransition()
 {code}
 synchronized (this.regions) {
   this.regions.remove(hri);
   for (SetHRegionInfo regions : this.servers.values()) {
 regions.remove(hri);
   }
 }
 {code}
 the region is also removed.  Hence when the master tries to identify the 
 region
 {code}
   if (!regions.containsKey(region)) {
 debugLog(region, Attempted to unassign region  +
   region.getRegionNameAsString() +  but it is not  +
   currently assigned anywhere);
 return;
   }
 {code}
 It is not able to identify the region.  It exists in trunk and 0.90.x also.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4375) [hbck] Add region coverage visualization to hbck

2011-09-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103808#comment-13103808
 ] 

stack commented on HBASE-4375:
--

This is grand though it explicitly does System.out.  Elsewhere when hbck 
prints, does it not take a PrintWriter or something?  Do you want to do same 
here?

Good stuff Jon.

 [hbck] Add region coverage visualization to hbck
 

 Key: HBASE-4375
 URL: https://issues.apache.org/jira/browse/HBASE-4375
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.94.0, 0.90.5
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: 
 0001-HBASE-4375-Add-region-coverage-visualization-to-hbck.patch


 After HBASE-4322 and HBASE-4321, we now have an accurate region splits / 
 coverage map for properly identifying holes, overlaps, backwards regions and 
 other kinds of problems in the .META. table.  hbck should display this 
 information so that someone can fix this.
 A simple version for a table with regions [,A], [A,B], [A,C], [C,] and would 
 dump out something like this (showing an overlap in [A,B])
 :  ['table,,..', 'table,A,..']
 A: ['table,A,..', 'B'] ['table,A,..', 'C']
 B: ['table,A,..', 'C']  
 C: ['table,C', '']
 null:
 My first thought is '-details' should this dump the full region map including 
 all good and bad regions.  Without -details, any errors should dump info with 
 some context -- dump one region before problems, problem regions, and then 
 one post problem region.
 Alternately we could add a new option or options to dump the region split map.
 What is the preferred way to toggle display of this information in hbck?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4373) HBaseAdmin.assign() doesnot use force flag

2011-09-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103804#comment-13103804
 ] 

stack commented on HBASE-4373:
--

@Ram OK. Going by your rationale above, we should deprecate the override that 
has the force flag.

 HBaseAdmin.assign() doesnot use force flag
 --

 Key: HBASE-4373
 URL: https://issues.apache.org/jira/browse/HBASE-4373
 Project: HBase
  Issue Type: Improvement
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor

 The HBaseAdmin.assign()
 {code}
   public void assign(final byte [] regionName, final boolean force)
   throws MasterNotRunningException, ZooKeeperConnectionException, IOException 
 {
 getMaster().assign(regionName, force);
   }
 {code}
 In the HMaster we call 
 {code}
 PairHRegionInfo, ServerName pair =
   MetaReader.getRegion(this.catalogTracker, regionName);
 if (pair == null) throw new 
 UnknownRegionException(Bytes.toString(regionName));
 if (cpHost != null) {
   if (cpHost.preAssign(pair.getFirst(), force)) {
 return;
   }
 }
 assignRegion(pair.getFirst());
 if (cpHost != null) {
   cpHost.postAssign(pair.getFirst(), force);
 }
 {code}
 The force flag is not getting used.  May be we need to update the javadoc or 
 do not provide the force flag as a parameter if we are not going to use it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4380) large scan caching size causes RS to throw OOME

2011-09-13 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103818#comment-13103818
 ] 

Ming Ma commented on HBASE-4380:


Thanks, Ted. That should work for a more controlled environment like predefined 
hbase map job where we know the max number of concurrent scans at a given time 
for a given RS. In the case where any numbers of clients can call at any given 
time, we will need a better solution.

 large scan caching size causes RS to throw OOME
 ---

 Key: HBASE-4380
 URL: https://issues.apache.org/jira/browse/HBASE-4380
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Ming Ma
Assignee: Ming Ma

 If the hbase application specifies a large caching size via 
 Scan.setCaching(...),  RS will try to accumulate enough rows before returning 
 to the client. This could blow up RS memory. In TableInputFormat scenario, we 
 have couple mappers with large caching size, thus RS memory usage goes up 
 quickly.
 RS perhaps should take memory usage into account, for example, return less 
 results per HRegionInterface.next(long scannerId, int numberOfRows) call in 
 the case of low memory.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4380) large scan caching size causes RS to throw OOME

2011-09-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103820#comment-13103820
 ] 

stack commented on HBASE-4380:
--

Agree Ming.

 large scan caching size causes RS to throw OOME
 ---

 Key: HBASE-4380
 URL: https://issues.apache.org/jira/browse/HBASE-4380
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Ming Ma
Assignee: Ming Ma

 If the hbase application specifies a large caching size via 
 Scan.setCaching(...),  RS will try to accumulate enough rows before returning 
 to the client. This could blow up RS memory. In TableInputFormat scenario, we 
 have couple mappers with large caching size, thus RS memory usage goes up 
 quickly.
 RS perhaps should take memory usage into account, for example, return less 
 results per HRegionInterface.next(long scannerId, int numberOfRows) call in 
 the case of low memory.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2196) Support more than one slave cluster

2011-09-13 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103849#comment-13103849
 ] 

Jean-Daniel Cryans commented on HBASE-2196:
---

bq. Might be nice also if ReplicationSource would handle their own hlogs, 
rather than ReplicationSourceManager managing all of them.

Yeah somewhere along the development the design changed but not all the parts 
moved, feel free to try it out in the scope of a follow-up jira.

bq. @J-D, are you aware of anything specific that would not work with your 
patch (or the combined patch I posted earlier)?

Have you tested it? I think it was basically done but I wanted to do more 
testing on real clusters before committing but it's really time-consuming. It's 
meant to be very simple to add multi-slave, it's just the testing part that I 
didn't want to be bothered with when I first wrote replication since we didn't 
need it back then.

 Support more than one slave cluster
 ---

 Key: HBASE-2196
 URL: https://issues.apache.org/jira/browse/HBASE-2196
 Project: HBase
  Issue Type: Sub-task
  Components: replication
Reporter: Jean-Daniel Cryans
 Fix For: 0.92.0

 Attachments: 2196-v2.txt, 2196.txt, HBASE-2196-wip.patch


 Currently replication supports only 1 slave cluster, need to ability to add 
 more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4153) Handle RegionAlreadyInTransitionException in AssignmentManager

2011-09-13 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103858#comment-13103858
 ] 

ramkrishna.s.vasudevan commented on HBASE-4153:
---

Throwing exception when we get RegionalreadyInTransition is fine but there are 
2 problems
- If we try HBaseAdmin.move() or HBaseAdmin.unassign() the ClosedRegionHandler 
will call assign() and in this flow if we throw
RegionAlreadyInTransitionException is thrown then we cannot bring the exception 
upto the user as EventHandler.run() catches the exception
So only for HBaseAdmin.assign() we can get the exception propogated upto the 
user.

- If we make the assign() to throw exception then we need to handle it in many 
places.

So i have just returned once we get RegionalreadyInTransition Exception.

Another interesting thing observed was current in RegionalreadyInTransition.java
{code}
public RegionAlreadyInTransitionException(String action, String region) {
}
{code}
we were passing 2 args.  Now in the master if i had to decode this exception 
and unwrap the exception I was not able
to do so because
{code}
private IOException instantiateException(Class? extends IOException cls)
 throws Exception {
   Constructor? extends IOException cn = cls.getConstructor(String.class);
{code}
This is what we are expecting a single arg String constructor in 
RemoteException.java.  Hence i have done one modification of
passing the exact exception msg in the OpenRegionHandler and CloseRegionHandler

and just
{code}
public RegionAlreadyInTransitionException(String action) {
super(s);
}
{code}

 Handle RegionAlreadyInTransitionException in AssignmentManager
 --

 Key: HBASE-4153
 URL: https://issues.apache.org/jira/browse/HBASE-4153
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0

 Attachments: HBASE-4153_1.patch


 Comment from Stack over in HBASE-3741:
 {quote}
 Question: Looking at this patch again, if we throw a 
 RegionAlreadyInTransitionException, won't we just assign the region elsewhere 
 though RegionAlreadyInTransitionException in at least one case here is saying 
 that the region is already open on this regionserver?
 {quote}
 Indeed looking at the code it's going to be handled the same way other 
 exceptions are. Need to add special cases for assign and unassign.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4306) Race between CatalogJanitor and LoadBalancer

2011-09-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103864#comment-13103864
 ] 

stack commented on HBASE-4306:
--

When a region splits, handleSplitReport is called on master.  It calls 
AM.regionOffline so the split parent region should be cleared from AM.regions.  
It should not be in set to balance.

HBASE-4238 being fixed should at least change this from being a blocker to 
something less?

 Race between CatalogJanitor and LoadBalancer
 

 Key: HBASE-4306
 URL: https://issues.apache.org/jira/browse/HBASE-4306
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.92.0, 0.90.5


 It is possible for the LoadBalancer to try to assign an offline/split region 
 while it is waiting to be CatalogJanitor'ed. It goes like this:
 {quote}
 2011-08-25 00:32:07,137 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: parent: Daughters; d1, d2 from 
 sv4r22s16,60020,1314211225331
 ...
 (cleaning never happens or whatever)
 ...
 2011-08-29 13:45:14,561 INFO org.apache.hadoop.hbase.master.HMaster: balance 
 hri=parent, src=sv4r22s16,60020,1314211225331, 
 dest=sv4r19s17,60020,1314218170402
 2011-08-29 13:45:14,561 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region parent (offlining)
 2011-08-29 13:45:14,588 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Server 
 serverName=sv4r22s16,60020,1314211225331, load=(requests=0, regions=0, 
 usedHeap=0, maxHeap=0) returned 
 org.apache.hadoop.hbase.NotServingRegionException: 
 org.apache.hadoop.hbase.NotServingRegionException: Received close for parent 
 but we are not serving it for parent
 {quote}
 Here it took 4 days of balancing to finally get to try to balance the parent 
 (that was never deleted because of HBASE-4238), but it can also happen if the 
 balancer decides to balance the parent just before it's cleaned. The end 
 effect is that the balancer will be disabled _forever_ until that's fixed.
 The culprit here is that the master keeps the region online until 
 AssignmentManager.regionOffline is called by the CJ, which means it's still 
 treated like any other region although it's offline.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4381) Refactor split decisions into a split policy class

2011-09-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103842#comment-13103842
 ] 

stack commented on HBASE-4381:
--

This looks great.  Commit.  Can do other policies later.

 Refactor split decisions into a split policy class
 --

 Key: HBASE-4381
 URL: https://issues.apache.org/jira/browse/HBASE-4381
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 0.92.0

 Attachments: hbase-4381.txt


 This is a semantics-preserving refactor that moves the code that decides when 
 and where to split into a new split policy class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2196) Support more than one slave cluster

2011-09-13 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103867#comment-13103867
 ] 

Lars Hofhansl commented on HBASE-2196:
--

Thanks Stack and J-D. I started on having ReplicationSource manage their own 
logs. So far it does not actually make the code nicer and easier to read, the 
version I have so far also fails TestReplication. So that's for another jira 
(as you say).

One thing I did was to remove HServerAddress from ReplicationSource in favor of 
using ServerName.
HServerAddress resolves hostnames right away, which is good in this case, but 
as HConnectionManager caches the connection anyway, that should not be a 
problem.

I'll add more tests and also do real world testing, and then send an update.


 Support more than one slave cluster
 ---

 Key: HBASE-2196
 URL: https://issues.apache.org/jira/browse/HBASE-2196
 Project: HBase
  Issue Type: Sub-task
  Components: replication
Reporter: Jean-Daniel Cryans
 Fix For: 0.92.0

 Attachments: 2196-v2.txt, 2196.txt, HBASE-2196-wip.patch


 Currently replication supports only 1 slave cluster, need to ability to add 
 more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3130) [replication] ReplicationSource can't recover from session expired on remote clusters

2011-09-13 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103869#comment-13103869
 ] 

Lars Hofhansl commented on HBASE-3130:
--

This seems like an important bug fix, can we but this into 0.92 even after we 
branched it?

 [replication] ReplicationSource can't recover from session expired on remote 
 clusters
 -

 Key: HBASE-3130
 URL: https://issues.apache.org/jira/browse/HBASE-3130
 Project: HBase
  Issue Type: Bug
  Components: replication
Reporter: Jean-Daniel Cryans

 Currently ReplicationSource cannot recover when its zookeeper connection to 
 its remote cluster expires. HLogs are still being tracked, but a cluster 
 restart is required to continue replication (or a rolling restart).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.

2011-09-13 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4351:
--

Attachment: HBASE-4351_3.patch

J-D's comment updated in patch

 If from Admin we try to unassign a region forcefully, though a valid region 
 name is given the master is not able to identify the region to unassign.
 

 Key: HBASE-4351
 URL: https://issues.apache.org/jira/browse/HBASE-4351
 Project: HBase
  Issue Type: Bug
 Environment: Linux
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0, 0.90.5

 Attachments: HBASE-4351.patch, HBASE-4351_1.patch, 
 HBASE-4351_2.patch, HBASE-4351_3.patch


 The following is the problem
 Get the exact region name from UI and call
 HBaseAdmin.unassign(regionname, true).
 Here true is forceful option.
 As part of unassign api
 {code}
   public void unassign(final byte [] regionName, final boolean force)
   throws IOException {
 PairHRegionInfo, HServerAddress pair =
   MetaReader.getRegion(this.catalogTracker, regionName);
 if (pair == null) throw new 
 UnknownRegionException(Bytes.toStringBinary(regionName));
 HRegionInfo hri = pair.getFirst();
 if (force) this.assignmentManager.clearRegionFromTransition(hri);
 this.assignmentManager.unassign(hri, force);
   }
 {code}
 As part of clearRegionFromTransition()
 {code}
 synchronized (this.regions) {
   this.regions.remove(hri);
   for (SetHRegionInfo regions : this.servers.values()) {
 regions.remove(hri);
   }
 }
 {code}
 the region is also removed.  Hence when the master tries to identify the 
 region
 {code}
   if (!regions.containsKey(region)) {
 debugLog(region, Attempted to unassign region  +
   region.getRegionNameAsString() +  but it is not  +
   currently assigned anywhere);
 return;
   }
 {code}
 It is not able to identify the region.  It exists in trunk and 0.90.x also.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.

2011-09-13 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4351:
--

Status: Open  (was: Patch Available)

 If from Admin we try to unassign a region forcefully, though a valid region 
 name is given the master is not able to identify the region to unassign.
 

 Key: HBASE-4351
 URL: https://issues.apache.org/jira/browse/HBASE-4351
 Project: HBase
  Issue Type: Bug
 Environment: Linux
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0, 0.90.5

 Attachments: HBASE-4351.patch, HBASE-4351_1.patch, 
 HBASE-4351_2.patch, HBASE-4351_3.patch


 The following is the problem
 Get the exact region name from UI and call
 HBaseAdmin.unassign(regionname, true).
 Here true is forceful option.
 As part of unassign api
 {code}
   public void unassign(final byte [] regionName, final boolean force)
   throws IOException {
 PairHRegionInfo, HServerAddress pair =
   MetaReader.getRegion(this.catalogTracker, regionName);
 if (pair == null) throw new 
 UnknownRegionException(Bytes.toStringBinary(regionName));
 HRegionInfo hri = pair.getFirst();
 if (force) this.assignmentManager.clearRegionFromTransition(hri);
 this.assignmentManager.unassign(hri, force);
   }
 {code}
 As part of clearRegionFromTransition()
 {code}
 synchronized (this.regions) {
   this.regions.remove(hri);
   for (SetHRegionInfo regions : this.servers.values()) {
 regions.remove(hri);
   }
 }
 {code}
 the region is also removed.  Hence when the master tries to identify the 
 region
 {code}
   if (!regions.containsKey(region)) {
 debugLog(region, Attempted to unassign region  +
   region.getRegionNameAsString() +  but it is not  +
   currently assigned anywhere);
 return;
   }
 {code}
 It is not able to identify the region.  It exists in trunk and 0.90.x also.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4306) Race between CatalogJanitor and LoadBalancer

2011-09-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103873#comment-13103873
 ] 

stack commented on HBASE-4306:
--

Chatting with J-D.  Something else must be going on here if the parent region 
is in the set of regions to balance, the split message must have been missed.

Changing this from blocker to major.  Removing as necessary fix on 0.92. and 
0.90.5 till we learn more.

 Race between CatalogJanitor and LoadBalancer
 

 Key: HBASE-4306
 URL: https://issues.apache.org/jira/browse/HBASE-4306
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Priority: Blocker

 It is possible for the LoadBalancer to try to assign an offline/split region 
 while it is waiting to be CatalogJanitor'ed. It goes like this:
 {quote}
 2011-08-25 00:32:07,137 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: parent: Daughters; d1, d2 from 
 sv4r22s16,60020,1314211225331
 ...
 (cleaning never happens or whatever)
 ...
 2011-08-29 13:45:14,561 INFO org.apache.hadoop.hbase.master.HMaster: balance 
 hri=parent, src=sv4r22s16,60020,1314211225331, 
 dest=sv4r19s17,60020,1314218170402
 2011-08-29 13:45:14,561 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region parent (offlining)
 2011-08-29 13:45:14,588 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Server 
 serverName=sv4r22s16,60020,1314211225331, load=(requests=0, regions=0, 
 usedHeap=0, maxHeap=0) returned 
 org.apache.hadoop.hbase.NotServingRegionException: 
 org.apache.hadoop.hbase.NotServingRegionException: Received close for parent 
 but we are not serving it for parent
 {quote}
 Here it took 4 days of balancing to finally get to try to balance the parent 
 (that was never deleted because of HBASE-4238), but it can also happen if the 
 balancer decides to balance the parent just before it's cleaned. The end 
 effect is that the balancer will be disabled _forever_ until that's fixed.
 The culprit here is that the master keeps the region online until 
 AssignmentManager.regionOffline is called by the CJ, which means it's still 
 treated like any other region although it's offline.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4306) Race between CatalogJanitor and LoadBalancer

2011-09-13 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4306:
-

 Priority: Minor  (was: Blocker)
Fix Version/s: (was: 0.90.5)
   (was: 0.92.0)

 Race between CatalogJanitor and LoadBalancer
 

 Key: HBASE-4306
 URL: https://issues.apache.org/jira/browse/HBASE-4306
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Priority: Minor

 It is possible for the LoadBalancer to try to assign an offline/split region 
 while it is waiting to be CatalogJanitor'ed. It goes like this:
 {quote}
 2011-08-25 00:32:07,137 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: parent: Daughters; d1, d2 from 
 sv4r22s16,60020,1314211225331
 ...
 (cleaning never happens or whatever)
 ...
 2011-08-29 13:45:14,561 INFO org.apache.hadoop.hbase.master.HMaster: balance 
 hri=parent, src=sv4r22s16,60020,1314211225331, 
 dest=sv4r19s17,60020,1314218170402
 2011-08-29 13:45:14,561 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region parent (offlining)
 2011-08-29 13:45:14,588 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Server 
 serverName=sv4r22s16,60020,1314211225331, load=(requests=0, regions=0, 
 usedHeap=0, maxHeap=0) returned 
 org.apache.hadoop.hbase.NotServingRegionException: 
 org.apache.hadoop.hbase.NotServingRegionException: Received close for parent 
 but we are not serving it for parent
 {quote}
 Here it took 4 days of balancing to finally get to try to balance the parent 
 (that was never deleted because of HBASE-4238), but it can also happen if the 
 balancer decides to balance the parent just before it's cleaned. The end 
 effect is that the balancer will be disabled _forever_ until that's fixed.
 The culprit here is that the master keeps the region online until 
 AssignmentManager.regionOffline is called by the CJ, which means it's still 
 treated like any other region although it's offline.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.

2011-09-13 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4351:
--

Status: Patch Available  (was: Open)

 If from Admin we try to unassign a region forcefully, though a valid region 
 name is given the master is not able to identify the region to unassign.
 

 Key: HBASE-4351
 URL: https://issues.apache.org/jira/browse/HBASE-4351
 Project: HBase
  Issue Type: Bug
 Environment: Linux
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0, 0.90.5

 Attachments: HBASE-4351.patch, HBASE-4351_1.patch, 
 HBASE-4351_2.patch, HBASE-4351_3.patch


 The following is the problem
 Get the exact region name from UI and call
 HBaseAdmin.unassign(regionname, true).
 Here true is forceful option.
 As part of unassign api
 {code}
   public void unassign(final byte [] regionName, final boolean force)
   throws IOException {
 PairHRegionInfo, HServerAddress pair =
   MetaReader.getRegion(this.catalogTracker, regionName);
 if (pair == null) throw new 
 UnknownRegionException(Bytes.toStringBinary(regionName));
 HRegionInfo hri = pair.getFirst();
 if (force) this.assignmentManager.clearRegionFromTransition(hri);
 this.assignmentManager.unassign(hri, force);
   }
 {code}
 As part of clearRegionFromTransition()
 {code}
 synchronized (this.regions) {
   this.regions.remove(hri);
   for (SetHRegionInfo regions : this.servers.values()) {
 regions.remove(hri);
   }
 }
 {code}
 the region is also removed.  Hence when the master tries to identify the 
 region
 {code}
   if (!regions.containsKey(region)) {
 debugLog(region, Attempted to unassign region  +
   region.getRegionNameAsString() +  but it is not  +
   currently assigned anywhere);
 return;
   }
 {code}
 It is not able to identify the region.  It exists in trunk and 0.90.x also.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4384) Hard to tell what causes failure in CloseRegionHandler#getCurrentVersion

2011-09-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103832#comment-13103832
 ] 

stack commented on HBASE-4384:
--

@Harsh So the patch is for 0.90 branch?

 Hard to tell what causes failure in CloseRegionHandler#getCurrentVersion
 

 Key: HBASE-4384
 URL: https://issues.apache.org/jira/browse/HBASE-4384
 Project: HBase
  Issue Type: Task
  Components: zookeeper
Affects Versions: 0.90.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
 Fix For: 0.94.0

 Attachments: HBASE-4384.r1.diff


 The current code goes like:
 {code}
 172* Get the node's current version
 173* @return The expectedVersion.  If -1, we failed getting the node
 174*/
 175   private int getCurrentVersion() {
 176 int expectedVersion = FAILED;
 177 try {
 178   if ((expectedVersion = ZKAssign.getVersion(
 179   server.getZooKeeper(), regionInfo)) == FAILED) {
 180 LOG.warn(Error getting node's version in CLOSING state, +
 181aborting close of  + regionInfo.getRegionNameAsString());
 182   }
 183 } catch (KeeperException e) {
 184   LOG.warn(Error creating node in CLOSING state, aborting close of  
 +
 185 regionInfo.getRegionNameAsString());
 186 }
 187 return expectedVersion;
 188   }
 189 }
 {code}
 Both WARN cases would be identical this way. In case of an exception, I think 
 an exception ought to be logged as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-2196) Support more than one slave cluster

2011-09-13 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-2196:
-

Fix Version/s: 0.92.0

Pulling in.  If done by friday, will commit.

 Support more than one slave cluster
 ---

 Key: HBASE-2196
 URL: https://issues.apache.org/jira/browse/HBASE-2196
 Project: HBase
  Issue Type: Sub-task
  Components: replication
Reporter: Jean-Daniel Cryans
 Fix For: 0.92.0

 Attachments: 2196-v2.txt, 2196.txt, HBASE-2196-wip.patch


 Currently replication supports only 1 slave cluster, need to ability to add 
 more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4375) [hbck] Add region coverage visualization to hbck

2011-09-13 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103837#comment-13103837
 ] 

Jonathan Hsieh commented on HBASE-4375:
---

In most other places in hbck, it prints out via System.out.println.  I just 
tried to stay consistent with it. 

Some examples:

{code}
   public synchronized void reportError(ERROR_CODE errorCode, String message) {
  errorList.add(errorCode);
  if (!summary) {
System.out.println(ERROR:  + message);
  }
  errorCount++;
  showProgress = 0;
}
{code}

{code}
   public synchronized int summarize() {
  System.out.println(Integer.toString(errorCount) +
  inconsistencies detected.);
  if (errorCount == 0) {
System.out.println(Status: OK);
return 0;
  } else {
System.out.println(Status: INCONSISTENT);
return -1;
  }
}
{code}

{code}
 /**
   * Prints summary of all tables found on the system.
   */
  private void printTableSummary() {
System.out.println(Summary:);
for (TInfo tInfo : tablesInfo.values()) {
  if (errors.tableHasErrors(tInfo)) {
System.out.println(Table  + tInfo.getName() +  is inconsistent.);
  } else {
System.out.println(   + tInfo.getName() +  is okay.);
  }
  System.out.println(Number of regions:  + tInfo.getNumRegions());
  System.out.print(Deployed on: );
  for (HServerAddress server : tInfo.deployedOn) {
System.out.print(  + server.toString());
  }
  System.out.println();
}
  }
{code}

 [hbck] Add region coverage visualization to hbck
 

 Key: HBASE-4375
 URL: https://issues.apache.org/jira/browse/HBASE-4375
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.94.0, 0.90.5
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: 
 0001-HBASE-4375-Add-region-coverage-visualization-to-hbck.patch


 After HBASE-4322 and HBASE-4321, we now have an accurate region splits / 
 coverage map for properly identifying holes, overlaps, backwards regions and 
 other kinds of problems in the .META. table.  hbck should display this 
 information so that someone can fix this.
 A simple version for a table with regions [,A], [A,B], [A,C], [C,] and would 
 dump out something like this (showing an overlap in [A,B])
 :  ['table,,..', 'table,A,..']
 A: ['table,A,..', 'B'] ['table,A,..', 'C']
 B: ['table,A,..', 'C']  
 C: ['table,C', '']
 null:
 My first thought is '-details' should this dump the full region map including 
 all good and bad regions.  Without -details, any errors should dump info with 
 some context -- dump one region before problems, problem regions, and then 
 one post problem region.
 Alternately we could add a new option or options to dump the region split map.
 What is the preferred way to toggle display of this information in hbck?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4380) large scan caching size causes RS to throw OOME

2011-09-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103828#comment-13103828
 ] 

Ted Yu commented on HBASE-4380:
---

Can we utilize the following ?
http://download.oracle.com/javase/1.5.0/docs/guide/management/mxbeans.html#low_memory
http://download.oracle.com/javase/1.5.0/docs/api/java/lang/management/MemoryPoolMXBean.html

 large scan caching size causes RS to throw OOME
 ---

 Key: HBASE-4380
 URL: https://issues.apache.org/jira/browse/HBASE-4380
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Ming Ma
Assignee: Ming Ma

 If the hbase application specifies a large caching size via 
 Scan.setCaching(...),  RS will try to accumulate enough rows before returning 
 to the client. This could blow up RS memory. In TableInputFormat scenario, we 
 have couple mappers with large caching size, thus RS memory usage goes up 
 quickly.
 RS perhaps should take memory usage into account, for example, return less 
 results per HRegionInterface.next(long scannerId, int numberOfRows) call in 
 the case of low memory.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.

2011-09-13 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103892#comment-13103892
 ] 

Jean-Daniel Cryans commented on HBASE-4351:
---

If it passes the tests :)

 If from Admin we try to unassign a region forcefully, though a valid region 
 name is given the master is not able to identify the region to unassign.
 

 Key: HBASE-4351
 URL: https://issues.apache.org/jira/browse/HBASE-4351
 Project: HBase
  Issue Type: Bug
 Environment: Linux
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0, 0.90.5

 Attachments: HBASE-4351.patch, HBASE-4351_1.patch, 
 HBASE-4351_2.patch, HBASE-4351_3.patch


 The following is the problem
 Get the exact region name from UI and call
 HBaseAdmin.unassign(regionname, true).
 Here true is forceful option.
 As part of unassign api
 {code}
   public void unassign(final byte [] regionName, final boolean force)
   throws IOException {
 PairHRegionInfo, HServerAddress pair =
   MetaReader.getRegion(this.catalogTracker, regionName);
 if (pair == null) throw new 
 UnknownRegionException(Bytes.toStringBinary(regionName));
 HRegionInfo hri = pair.getFirst();
 if (force) this.assignmentManager.clearRegionFromTransition(hri);
 this.assignmentManager.unassign(hri, force);
   }
 {code}
 As part of clearRegionFromTransition()
 {code}
 synchronized (this.regions) {
   this.regions.remove(hri);
   for (SetHRegionInfo regions : this.servers.values()) {
 regions.remove(hri);
   }
 }
 {code}
 the region is also removed.  Hence when the master tries to identify the 
 region
 {code}
   if (!regions.containsKey(region)) {
 debugLog(region, Attempted to unassign region  +
   region.getRegionNameAsString() +  but it is not  +
   currently assigned anywhere);
 return;
   }
 {code}
 It is not able to identify the region.  It exists in trunk and 0.90.x also.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4351) If from Admin we try to unassign a region forcefully, though a valid region name is given the master is not able to identify the region to unassign.

2011-09-13 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103891#comment-13103891
 ] 

Jean-Daniel Cryans commented on HBASE-4351:
---

+1

 If from Admin we try to unassign a region forcefully, though a valid region 
 name is given the master is not able to identify the region to unassign.
 

 Key: HBASE-4351
 URL: https://issues.apache.org/jira/browse/HBASE-4351
 Project: HBase
  Issue Type: Bug
 Environment: Linux
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0, 0.90.5

 Attachments: HBASE-4351.patch, HBASE-4351_1.patch, 
 HBASE-4351_2.patch, HBASE-4351_3.patch


 The following is the problem
 Get the exact region name from UI and call
 HBaseAdmin.unassign(regionname, true).
 Here true is forceful option.
 As part of unassign api
 {code}
   public void unassign(final byte [] regionName, final boolean force)
   throws IOException {
 PairHRegionInfo, HServerAddress pair =
   MetaReader.getRegion(this.catalogTracker, regionName);
 if (pair == null) throw new 
 UnknownRegionException(Bytes.toStringBinary(regionName));
 HRegionInfo hri = pair.getFirst();
 if (force) this.assignmentManager.clearRegionFromTransition(hri);
 this.assignmentManager.unassign(hri, force);
   }
 {code}
 As part of clearRegionFromTransition()
 {code}
 synchronized (this.regions) {
   this.regions.remove(hri);
   for (SetHRegionInfo regions : this.servers.values()) {
 regions.remove(hri);
   }
 }
 {code}
 the region is also removed.  Hence when the master tries to identify the 
 region
 {code}
   if (!regions.containsKey(region)) {
 debugLog(region, Attempted to unassign region  +
   region.getRegionNameAsString() +  but it is not  +
   currently assigned anywhere);
 return;
   }
 {code}
 It is not able to identify the region.  It exists in trunk and 0.90.x also.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4375) [hbck] Add region coverage visualization to hbck

2011-09-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103898#comment-13103898
 ] 

stack commented on HBASE-4375:
--

ok.  looks like something we need to clean up; do PrintWriter or System.out.  
Can do in another issue.  Let me commit Jon.

 [hbck] Add region coverage visualization to hbck
 

 Key: HBASE-4375
 URL: https://issues.apache.org/jira/browse/HBASE-4375
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.94.0, 0.90.5
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: 
 0001-HBASE-4375-Add-region-coverage-visualization-to-hbck.patch


 After HBASE-4322 and HBASE-4321, we now have an accurate region splits / 
 coverage map for properly identifying holes, overlaps, backwards regions and 
 other kinds of problems in the .META. table.  hbck should display this 
 information so that someone can fix this.
 A simple version for a table with regions [,A], [A,B], [A,C], [C,] and would 
 dump out something like this (showing an overlap in [A,B])
 :  ['table,,..', 'table,A,..']
 A: ['table,A,..', 'B'] ['table,A,..', 'C']
 B: ['table,A,..', 'C']  
 C: ['table,C', '']
 null:
 My first thought is '-details' should this dump the full region map including 
 all good and bad regions.  Without -details, any errors should dump info with 
 some context -- dump one region before problems, problem regions, and then 
 one post problem region.
 Alternately we could add a new option or options to dump the region split map.
 What is the preferred way to toggle display of this information in hbck?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4375) [hbck] Add region coverage visualization to hbck

2011-09-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103910#comment-13103910
 ] 

stack commented on HBASE-4375:
--

I tried applying and it fails.  Need to wait on other patches to go in first.  
Flag me Jon when this can go in (when we have necessary prereqs applied).  Good 
stuff.

 [hbck] Add region coverage visualization to hbck
 

 Key: HBASE-4375
 URL: https://issues.apache.org/jira/browse/HBASE-4375
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.94.0, 0.90.5
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: 
 0001-HBASE-4375-Add-region-coverage-visualization-to-hbck.patch


 After HBASE-4322 and HBASE-4321, we now have an accurate region splits / 
 coverage map for properly identifying holes, overlaps, backwards regions and 
 other kinds of problems in the .META. table.  hbck should display this 
 information so that someone can fix this.
 A simple version for a table with regions [,A], [A,B], [A,C], [C,] and would 
 dump out something like this (showing an overlap in [A,B])
 :  ['table,,..', 'table,A,..']
 A: ['table,A,..', 'B'] ['table,A,..', 'C']
 B: ['table,A,..', 'C']  
 C: ['table,C', '']
 null:
 My first thought is '-details' should this dump the full region map including 
 all good and bad regions.  Without -details, any errors should dump info with 
 some context -- dump one region before problems, problem regions, and then 
 one post problem region.
 Alternately we could add a new option or options to dump the region split map.
 What is the preferred way to toggle display of this information in hbck?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4383) SlabCache reports negative heap sizes

2011-09-13 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103914#comment-13103914
 ] 

Todd Lipcon commented on HBASE-4383:


It's also now reporting negative occupied:

2011-09-13 12:06:18,183 INFO 
org.apache.hadoop.hbase.io.hfile.slab.SingleSizeCache: For Slab of size 72089: 
-11917 occupied, out of a capacity of 226398 blocks. HeapSize is -798.5m 
bytes., churnTime=7mins, 53sec


 SlabCache reports negative heap sizes
 -

 Key: HBASE-4383
 URL: https://issues.apache.org/jira/browse/HBASE-4383
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Li Pi
 Fix For: 0.92.0


 2011-09-13 00:36:17,734 INFO org.apache.hadoop.hbase.io.hfile.slab.SlabCache: 
 Request Stats
 2011-09-13 00:36:17,734 INFO 
 org.apache.hadoop.hbase.io.hfile.slab.SingleSizeCache: For Slab of size 
 72089: 0 occupied, out of a capacity of 226398 blocks. HeapSize is -798.5m 
 bytes., churnTime=0sec
 2011-09-13 00:36:17,734 INFO 
 org.apache.hadoop.hbase.io.hfile.slab.SingleSizeCache: For Slab of size 
 137625: 0 occupied, out of a capacity of 29647 blocks. HeapSize is -202.1m 
 bytes., churnTime=0sec
 2011-09-13 00:36:17,735 INFO org.apache.hadoop.hbase.io.hfile.slab.SlabCache: 
 Current heap size is: -1000.7m
 2011-09-13 00:36:17,735 INFO org.apache.hadoop.hbase.io.hfile.slab.SlabCache: 
 Successfully Cached Stats
 2011-09-13 00:36:17,735 INFO 
 org.apache.hadoop.hbase.io.hfile.slab.SingleSizeCache: For Slab of size 
 72089: 0 occupied, out of a capacity of 226398 blocks. HeapSize is -798.5m 
 bytes., churnTime=0sec
 2011-09-13 00:36:17,735 INFO 
 org.apache.hadoop.hbase.io.hfile.slab.SingleSizeCache: For Slab of size 
 137625: 0 occupied, out of a capacity of 29647 blocks. HeapSize is -202.1m 
 bytes., churnTime=0sec
 2011-09-13 00:36:17,735 INFO org.apache.hadoop.hbase.io.hfile.slab.SlabCache: 
 Current heap size is: -1000.7m

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4238) CatalogJanitor can clear a daughter that split before processing its parent

2011-09-13 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4238:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed to trunk and branch (didn't add test to branch because wouldn't apply 
-- uses TRUNK stuff like ServerName)

 CatalogJanitor can clear a daughter that split before processing its parent
 ---

 Key: HBASE-4238
 URL: https://issues.apache.org/jira/browse/HBASE-4238
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: stack
Priority: Critical
 Fix For: 0.92.0, 0.90.5

 Attachments: 4238-v2.txt, 4238.txt


 I didn't dig a lot into this issue, but by splitting a table twice in a row I 
 was able to trigger a situation where a daughter of the first split was 
 deleted by the CatalogJanitor before it processed its parent. Will post log 
 in a comment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4381) Refactor split decisions into a split policy class

2011-09-13 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4381:
-

Status: Patch Available  (was: Open)

 Refactor split decisions into a split policy class
 --

 Key: HBASE-4381
 URL: https://issues.apache.org/jira/browse/HBASE-4381
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 0.92.0

 Attachments: hbase-4381.txt


 This is a semantics-preserving refactor that moves the code that decides when 
 and where to split into a new split policy class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4388) Second start after migration from 90 to trunk crashes

2011-09-13 Thread Todd Lipcon (JIRA)
Second start after migration from 90 to trunk crashes
-

 Key: HBASE-4388
 URL: https://issues.apache.org/jira/browse/HBASE-4388
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Priority: Blocker
 Fix For: 0.92.0


I started a trunk cluster to upgrade from 90, inserted a ton of data, then did 
a clean shutdown. When I started again, I got the following exception:

11/09/13 12:29:09 INFO master.HMaster: Meta has HRI with HTDs. Updating meta 
now.
11/09/13 12:29:09 FATAL master.HMaster: Unhandled exception. Starting shutdown.
java.lang.NegativeArraySizeException: -102
at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:147)
at 
org.apache.hadoop.hbase.HTableDescriptor.readFields(HTableDescriptor.java:606)
at 
org.apache.hadoop.hbase.migration.HRegionInfo090x.readFields(HRegionInfo090x.java:641)
at 
org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:133)
at 
org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:103)
at 
org.apache.hadoop.hbase.util.Writables.getHRegionInfoForMigration(Writables.java:228)
at 
org.apache.hadoop.hbase.catalog.MetaEditor.getHRegionInfoForMigration(MetaEditor.java:350)
at 
org.apache.hadoop.hbase.catalog.MetaEditor$1.visit(MetaEditor.java:273)
at 
org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:633)
at 
org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:255)
at 
org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:235)
at 
org.apache.hadoop.hbase.catalog.MetaEditor.updateMetaWithNewRegionInfo(MetaEditor.java:284)
at 
org.apache.hadoop.hbase.catalog.MetaEditor.migrateRootAndMeta(MetaEditor.java:298)
at 
org.apache.hadoop.hbase.master.HMaster.updateMetaWithNewHRI(HMaster.java:529)
at 
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:472)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:309)


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.

2011-09-13 Thread Subbu M Iyer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103925#comment-13103925
 ] 

Subbu M Iyer commented on HBASE-4213:
-

Based on further discussions, here are some call outs:

1. Provide separate public API's for alter instant operations so as to not 
break existing public API's with out deprecation or prior notice.

2. Provide a config level setting to enable instant schema update feature. 
(defaults to false). This will also enable us to release this feature in a more 
controlled and transparent manner.

3. We don't want to intimidate developers with scary boolean flags
that does things in a such a way that they may not completely
understand or care about.

4.
Providing a developer level API to use instant-alter is good in the
sense they can fully capitalize a scalable/fault tolerant variants. At
the same time it might be confusing to some in the sense that why we
are even providing a not scalable/fault tolerant variant in the
first place.

5. We don't want to expose implementation details such as this flag
uses ZK to track schema changes and so on

So, long story short:

In addition to review comments, V7 will include the following:

1. Add new config parameter hbase.instant.schema.change.enabled
(exact name of flag is open) and default to false so that all
existing API's will go over the current path untouched.

2. Separate public API's for all alter operations which supports the
new pattern in addition to existing public API's. New API's will take
in a boolean parameter that overrides the config setting on a per
request basis.

3. Internally both the API's will go through the same pipeline to
promote reuse as well as maintainability.

Please let me know your thoughts/comments.


 Support for fault tolerant, instant schema updates with out master's 
 intervention (i.e with out enable/disable and bulk assign/unassign) through 
 ZK.
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.92.0

 Attachments: 4213-Instant_Schema_change_through_ZK.patch, 
 4213-V5-Support_instant_schema_changes_through_ZK.patch, 4213.v6, 
 HBASE-4213-Instant_schema_change.patch, 
 HBASE-4213_Instant_schema_change_-Version_2_.patch, 
 HBASE_Instant_schema_change-version_3_.patch


 This Jira is a slight variation in approach to what is being done as part of 
 https://issues.apache.org/jira/browse/HBASE-1730
 Support instant schema updates such as Modify Table, Add Column, Modify 
 Column operations:
 1. With out enable/disabling the table.
 2. With out bulk unassign/assign of regions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4384) Hard to tell what causes failure in CloseRegionHandler#getCurrentVersion

2011-09-13 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103926#comment-13103926
 ] 

Harsh J commented on HBASE-4384:


stack, No, patch is for all branches 0.90 to trunk. Please disregard my first 
comment, it was made when I was under a great deal of workspace switching and I 
thought I was looking at one snippet of trunk source, while I was looking at 
something else instead.

This patch is targeted for trunk, but can also be backported atop other 
branches (0.92 if branched already, and 0.90).

 Hard to tell what causes failure in CloseRegionHandler#getCurrentVersion
 

 Key: HBASE-4384
 URL: https://issues.apache.org/jira/browse/HBASE-4384
 Project: HBase
  Issue Type: Task
  Components: zookeeper
Affects Versions: 0.90.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
 Fix For: 0.94.0

 Attachments: HBASE-4384.r1.diff


 The current code goes like:
 {code}
 172* Get the node's current version
 173* @return The expectedVersion.  If -1, we failed getting the node
 174*/
 175   private int getCurrentVersion() {
 176 int expectedVersion = FAILED;
 177 try {
 178   if ((expectedVersion = ZKAssign.getVersion(
 179   server.getZooKeeper(), regionInfo)) == FAILED) {
 180 LOG.warn(Error getting node's version in CLOSING state, +
 181aborting close of  + regionInfo.getRegionNameAsString());
 182   }
 183 } catch (KeeperException e) {
 184   LOG.warn(Error creating node in CLOSING state, aborting close of  
 +
 185 regionInfo.getRegionNameAsString());
 186 }
 187 return expectedVersion;
 188   }
 189 }
 {code}
 Both WARN cases would be identical this way. In case of an exception, I think 
 an exception ought to be logged as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4388) Second start after migration from 90 to trunk crashes

2011-09-13 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HBASE-4388:
---

Attachment: meta.tgz

Attached the META table directory

 Second start after migration from 90 to trunk crashes
 -

 Key: HBASE-4388
 URL: https://issues.apache.org/jira/browse/HBASE-4388
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Priority: Blocker
 Fix For: 0.92.0

 Attachments: meta.tgz


 I started a trunk cluster to upgrade from 90, inserted a ton of data, then 
 did a clean shutdown. When I started again, I got the following exception:
 11/09/13 12:29:09 INFO master.HMaster: Meta has HRI with HTDs. Updating meta 
 now.
 11/09/13 12:29:09 FATAL master.HMaster: Unhandled exception. Starting 
 shutdown.
 java.lang.NegativeArraySizeException: -102
 at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:147)
 at 
 org.apache.hadoop.hbase.HTableDescriptor.readFields(HTableDescriptor.java:606)
 at 
 org.apache.hadoop.hbase.migration.HRegionInfo090x.readFields(HRegionInfo090x.java:641)
 at 
 org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:133)
 at 
 org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:103)
 at 
 org.apache.hadoop.hbase.util.Writables.getHRegionInfoForMigration(Writables.java:228)
 at 
 org.apache.hadoop.hbase.catalog.MetaEditor.getHRegionInfoForMigration(MetaEditor.java:350)
 at 
 org.apache.hadoop.hbase.catalog.MetaEditor$1.visit(MetaEditor.java:273)
 at 
 org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:633)
 at 
 org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:255)
 at 
 org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:235)
 at 
 org.apache.hadoop.hbase.catalog.MetaEditor.updateMetaWithNewRegionInfo(MetaEditor.java:284)
 at 
 org.apache.hadoop.hbase.catalog.MetaEditor.migrateRootAndMeta(MetaEditor.java:298)
 at 
 org.apache.hadoop.hbase.master.HMaster.updateMetaWithNewHRI(HMaster.java:529)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:472)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:309)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4389) Address lots of issues with migration from 90 to trunk

2011-09-13 Thread Todd Lipcon (JIRA)
Address lots of issues with migration from 90 to trunk
--

 Key: HBASE-4389
 URL: https://issues.apache.org/jira/browse/HBASE-4389
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Priority: Critical
 Fix For: 0.92.0


Looking over the migration code that removes HTD from HRI, there are lots of 
issues. This JIRA is to redo this code in a way that will be less bug prone, 
and also future proof.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4389) Address lots of issues with migration from 90 to trunk

2011-09-13 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103938#comment-13103938
 ] 

Todd Lipcon commented on HBASE-4389:


After a quick pass through the migration code, here are the various issues I 
see:
- HRegionInfo didn't have its VERSION incremented. Hence exception catching is 
used to try to determine which version is being read.
- A single migrated boolean flag is used in ROOT to indicate that META has 
been updated to the new format. This leaves us no room for future migrations. 
migrated is not a boolean. It should instead be migratedToVersion or 
something
- Migration should be idempotent - ie even if the migratedToVersion flag 
didn't get updated, migration should be able to re-run without crashing
- Duplicated code between updateRootWithNewRegionInfo and 
updateMetaWithNewRegionInfo
- Each region that is processed results in a call to createTableDescriptor, 
which results in calls to the NN - this will take a long time on a big cluster, 
and is unnecessary
- No sanity checking that all of the HTDs for a table are equal
- Migration code should ideally be moved to a separate class, instead of mixed 
with the non-migration code paths

 Address lots of issues with migration from 90 to trunk
 --

 Key: HBASE-4389
 URL: https://issues.apache.org/jira/browse/HBASE-4389
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Priority: Critical
 Fix For: 0.92.0


 Looking over the migration code that removes HTD from HRI, there are lots of 
 issues. This JIRA is to redo this code in a way that will be less bug prone, 
 and also future proof.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4390) [replication] ReplicationSource's UncaughtExceptionHandler shouldn't join

2011-09-13 Thread Jean-Daniel Cryans (JIRA)
[replication] ReplicationSource's UncaughtExceptionHandler shouldn't join
-

 Key: HBASE-4390
 URL: https://issues.apache.org/jira/browse/HBASE-4390
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
 Fix For: 0.90.5


From Jeff Whiting on the ML:

{quote}
regionserver60020.replicationSource,dev2 daemon prio=10 
tid=0x2aaaf0312800 nid=0x69f8 in Object.wait() [0x4533e000]
  java.lang.Thread.State: TIMED_WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0x2aaab12464c0 (a 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource)
   at java.lang.Thread.join(Thread.java:1151)
   - locked 0x2aaab12464c0 (a 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource)
   at org.apache.hadoop.hbase.util.Threads.shutdown(Threads.java:91)
   at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.terminate(ReplicationSource.java:649)
   at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$1.uncaughtException(ReplicationSource.java:628)
   at java.lang.Thread.dispatchUncaughtException(Thread.java:1831)
{quote}

That's pretty dumb, the thread is trying to join itself. 
UncaughtExceptionHandler shouldn't try to terminate() but just clear resources 
and then return.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4384) Hard to tell what causes failure in CloseRegionHandler#getCurrentVersion

2011-09-13 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4384:
-

   Resolution: Fixed
Fix Version/s: (was: 0.94.0)
   0.90.5
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed branch and trunk.  Thanks for the patch Harsh.

 Hard to tell what causes failure in CloseRegionHandler#getCurrentVersion
 

 Key: HBASE-4384
 URL: https://issues.apache.org/jira/browse/HBASE-4384
 Project: HBase
  Issue Type: Task
  Components: zookeeper
Affects Versions: 0.90.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
 Fix For: 0.90.5

 Attachments: HBASE-4384.r1.diff


 The current code goes like:
 {code}
 172* Get the node's current version
 173* @return The expectedVersion.  If -1, we failed getting the node
 174*/
 175   private int getCurrentVersion() {
 176 int expectedVersion = FAILED;
 177 try {
 178   if ((expectedVersion = ZKAssign.getVersion(
 179   server.getZooKeeper(), regionInfo)) == FAILED) {
 180 LOG.warn(Error getting node's version in CLOSING state, +
 181aborting close of  + regionInfo.getRegionNameAsString());
 182   }
 183 } catch (KeeperException e) {
 184   LOG.warn(Error creating node in CLOSING state, aborting close of  
 +
 185 regionInfo.getRegionNameAsString());
 186 }
 187 return expectedVersion;
 188   }
 189 }
 {code}
 Both WARN cases would be identical this way. In case of an exception, I think 
 an exception ought to be logged as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

2011-09-13 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HBASE-3446:
---

Component/s: master

 ProcessServerShutdown fails if META moves, orphaning lots of regions
 

 Key: HBASE-3446
 URL: https://issues.apache.org/jira/browse/HBASE-3446
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.0
Reporter: Todd Lipcon
Assignee: stack
Priority: Blocker
 Fix For: 0.92.0

 Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v2.txt, 
 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt


 I ran a rolling restart on a 5 node cluster with lots of regions, and 
 afterwards had LOTS of regions left orphaned. The issue appears to be that 
 ProcessServerShutdown failed because the server hosting META was restarted 
 around the same time as another server was being processed

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-2730) Expose RS work queue contents on web UI

2011-09-13 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HBASE-2730:
---

Component/s: monitoring

 Expose RS work queue contents on web UI
 ---

 Key: HBASE-2730
 URL: https://issues.apache.org/jira/browse/HBASE-2730
 Project: HBase
  Issue Type: New Feature
  Components: monitoring, regionserver
Reporter: Todd Lipcon
Priority: Critical
 Fix For: 0.94.0


 Would be nice to be able to see the contents of the various work queues - eg 
 to know what regions are pending compaction/split/flush/etc. This is handy 
 for debugging why a region might be blocked, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2196) Support more than one slave cluster

2011-09-13 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103996#comment-13103996
 ] 

Lars Hofhansl commented on HBASE-2196:
--

One of my tests makes sure that no rows that existed before a peer was added is 
replicated to a new peer. It fails.

But that's actually potentially the case even now, isn't it? Unless we roll the 
log when a peer is added, everything in latest log (which might be older) is 
replicated to the new peer. Correct?

 Support more than one slave cluster
 ---

 Key: HBASE-2196
 URL: https://issues.apache.org/jira/browse/HBASE-2196
 Project: HBase
  Issue Type: Sub-task
  Components: replication
Reporter: Jean-Daniel Cryans
 Fix For: 0.92.0

 Attachments: 2196-v2.txt, 2196.txt, HBASE-2196-wip.patch


 Currently replication supports only 1 slave cluster, need to ability to add 
 more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4391) Add ability to start RS as root and call mlockall

2011-09-13 Thread Todd Lipcon (JIRA)
Add ability to start RS as root and call mlockall
-

 Key: HBASE-4391
 URL: https://issues.apache.org/jira/browse/HBASE-4391
 Project: HBase
  Issue Type: New Feature
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Todd Lipcon
 Fix For: 0.94.0


A common issue we've seen in practice is that users oversubscribe their region 
servers with too many MR tasks, etc. As soon as the machine starts swapping, 
the RS grinds to a halt, loses ZK session, aborts, etc.

This can be combatted by starting the RS as root, calling mlockall(), and then 
setuid down to the hbase user. We should not require this, but we should 
provide it as an option.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4392) Add metrics for read/write throughput (#cells, #bytes)

2011-09-13 Thread Todd Lipcon (JIRA)
Add metrics for read/write throughput (#cells, #bytes)
--

 Key: HBASE-4392
 URL: https://issues.apache.org/jira/browse/HBASE-4392
 Project: HBase
  Issue Type: Improvement
  Components: metrics, regionserver
Affects Versions: 0.94.0
Reporter: Todd Lipcon
 Fix For: 0.94.0


Most of our metrics are currently based on RPC count. This is an inaccurate 
metric since some RPCs can be much more heavy weight than others. We should 
maintain our current metrics but also add counters for bytes and cells inserted 
/ scanned. That gives a better idea of total load.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2196) Support more than one slave cluster

2011-09-13 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103998#comment-13103998
 ] 

Jean-Daniel Cryans commented on HBASE-2196:
---

Yes that's why I do this in TestReplication:

{code}
for ( JVMClusterUtil.RegionServerThread r : 
utility1.getHBaseCluster().getRegionServerThreads()) {
  r.getRegionServer().getWAL().rollWriter();
}
{code}

 Support more than one slave cluster
 ---

 Key: HBASE-2196
 URL: https://issues.apache.org/jira/browse/HBASE-2196
 Project: HBase
  Issue Type: Sub-task
  Components: replication
Reporter: Jean-Daniel Cryans
 Fix For: 0.92.0

 Attachments: 2196-v2.txt, 2196.txt, HBASE-2196-wip.patch


 Currently replication supports only 1 slave cluster, need to ability to add 
 more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2196) Support more than one slave cluster

2011-09-13 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104003#comment-13104003
 ] 

Lars Hofhansl commented on HBASE-2196:
--

Cool. I already added a rolling to the log in my MultiSlaveReplication as well, 
to verify that old logs are not replicated. Thanks for the clarification.

 Support more than one slave cluster
 ---

 Key: HBASE-2196
 URL: https://issues.apache.org/jira/browse/HBASE-2196
 Project: HBase
  Issue Type: Sub-task
  Components: replication
Reporter: Jean-Daniel Cryans
 Fix For: 0.92.0

 Attachments: 2196-v2.txt, 2196.txt, HBASE-2196-wip.patch


 Currently replication supports only 1 slave cluster, need to ability to add 
 more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4393) Implement a canary monitoring program

2011-09-13 Thread Todd Lipcon (JIRA)
Implement a canary monitoring program
-

 Key: HBASE-4393
 URL: https://issues.apache.org/jira/browse/HBASE-4393
 Project: HBase
  Issue Type: New Feature
  Components: monitoring
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon


This JIRA is to implement a standalone program that can be used to do canary 
monitoring of a running HBase cluster. This program would gather a list of the 
regions in the cluster, then iterate over them doing lightweight operations (eg 
short scans) to provide metrics about latency as well as alert on availability 
issues.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4394) Add support for seeking hints to FilterList

2011-09-13 Thread Jonathan Gray (JIRA)
Add support for seeking hints to FilterList
---

 Key: HBASE-4394
 URL: https://issues.apache.org/jira/browse/HBASE-4394
 Project: HBase
  Issue Type: Improvement
  Components: filters
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Minor
 Fix For: 0.92.0


Currently FilterList's do not support getNextKeyHint() even if the underlying 
filters are giving hints.  We should add support for FilterList to pass these 
through.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4394) Add support for seeking hints to FilterList

2011-09-13 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-4394:
-

Attachment: HBASE-4394-v1.patch

Adds support for seek hints to FilterList and adds a unit test to 
TestFilterList that ensures it does the right thing across the different 
variations of inputs to a filterlist.

 Add support for seeking hints to FilterList
 ---

 Key: HBASE-4394
 URL: https://issues.apache.org/jira/browse/HBASE-4394
 Project: HBase
  Issue Type: Improvement
  Components: filters
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-4394-v1.patch


 Currently FilterList's do not support getNextKeyHint() even if the underlying 
 filters are giving hints.  We should add support for FilterList to pass these 
 through.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4394) Add support for seeking hints to FilterList

2011-09-13 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-4394:
-

Status: Patch Available  (was: Open)

 Add support for seeking hints to FilterList
 ---

 Key: HBASE-4394
 URL: https://issues.apache.org/jira/browse/HBASE-4394
 Project: HBase
  Issue Type: Improvement
  Components: filters
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-4394-v1.patch


 Currently FilterList's do not support getNextKeyHint() even if the underlying 
 filters are giving hints.  We should add support for FilterList to pass these 
 through.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.

2011-09-13 Thread Subbu M Iyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subbu M Iyer updated HBASE-4213:


Attachment: 4213-V7-Support_instant_schema_changes_through_ZK.patch

 Support for fault tolerant, instant schema updates with out master's 
 intervention (i.e with out enable/disable and bulk assign/unassign) through 
 ZK.
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.92.0

 Attachments: 4213-Instant_Schema_change_through_ZK.patch, 
 4213-V5-Support_instant_schema_changes_through_ZK.patch, 
 4213-V7-Support_instant_schema_changes_through_ZK.patch, 4213.v6, 
 HBASE-4213-Instant_schema_change.patch, 
 HBASE-4213_Instant_schema_change_-Version_2_.patch, 
 HBASE_Instant_schema_change-version_3_.patch


 This Jira is a slight variation in approach to what is being done as part of 
 https://issues.apache.org/jira/browse/HBASE-1730
 Support instant schema updates such as Modify Table, Add Column, Modify 
 Column operations:
 1. With out enable/disabling the table.
 2. With out bulk unassign/assign of regions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.

2011-09-13 Thread Subbu M Iyer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104010#comment-13104010
 ] 

Subbu M Iyer commented on HBASE-4213:
-

Attached V7 with all the review comments + above mentioned additions.

 Support for fault tolerant, instant schema updates with out master's 
 intervention (i.e with out enable/disable and bulk assign/unassign) through 
 ZK.
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.92.0

 Attachments: 4213-Instant_Schema_change_through_ZK.patch, 
 4213-V5-Support_instant_schema_changes_through_ZK.patch, 
 4213-V7-Support_instant_schema_changes_through_ZK.patch, 4213.v6, 
 HBASE-4213-Instant_schema_change.patch, 
 HBASE-4213_Instant_schema_change_-Version_2_.patch, 
 HBASE_Instant_schema_change-version_3_.patch


 This Jira is a slight variation in approach to what is being done as part of 
 https://issues.apache.org/jira/browse/HBASE-1730
 Support instant schema updates such as Modify Table, Add Column, Modify 
 Column operations:
 1. With out enable/disabling the table.
 2. With out bulk unassign/assign of regions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4394) Add support for seeking hints to FilterList

2011-09-13 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-4394:
-

Attachment: HBASE-4394-trunk-v2.patch

Rebased for trunk

 Add support for seeking hints to FilterList
 ---

 Key: HBASE-4394
 URL: https://issues.apache.org/jira/browse/HBASE-4394
 Project: HBase
  Issue Type: Improvement
  Components: filters
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-4394-trunk-v2.patch, HBASE-4394-v1.patch


 Currently FilterList's do not support getNextKeyHint() even if the underlying 
 filters are giving hints.  We should add support for FilterList to pass these 
 through.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4388) Second start after migration from 90 to trunk crashes

2011-09-13 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HBASE-4388:
---

Component/s: migration

 Second start after migration from 90 to trunk crashes
 -

 Key: HBASE-4388
 URL: https://issues.apache.org/jira/browse/HBASE-4388
 Project: HBase
  Issue Type: Bug
  Components: master, migration
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Priority: Blocker
 Fix For: 0.92.0

 Attachments: meta.tgz


 I started a trunk cluster to upgrade from 90, inserted a ton of data, then 
 did a clean shutdown. When I started again, I got the following exception:
 11/09/13 12:29:09 INFO master.HMaster: Meta has HRI with HTDs. Updating meta 
 now.
 11/09/13 12:29:09 FATAL master.HMaster: Unhandled exception. Starting 
 shutdown.
 java.lang.NegativeArraySizeException: -102
 at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:147)
 at 
 org.apache.hadoop.hbase.HTableDescriptor.readFields(HTableDescriptor.java:606)
 at 
 org.apache.hadoop.hbase.migration.HRegionInfo090x.readFields(HRegionInfo090x.java:641)
 at 
 org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:133)
 at 
 org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:103)
 at 
 org.apache.hadoop.hbase.util.Writables.getHRegionInfoForMigration(Writables.java:228)
 at 
 org.apache.hadoop.hbase.catalog.MetaEditor.getHRegionInfoForMigration(MetaEditor.java:350)
 at 
 org.apache.hadoop.hbase.catalog.MetaEditor$1.visit(MetaEditor.java:273)
 at 
 org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:633)
 at 
 org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:255)
 at 
 org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:235)
 at 
 org.apache.hadoop.hbase.catalog.MetaEditor.updateMetaWithNewRegionInfo(MetaEditor.java:284)
 at 
 org.apache.hadoop.hbase.catalog.MetaEditor.migrateRootAndMeta(MetaEditor.java:298)
 at 
 org.apache.hadoop.hbase.master.HMaster.updateMetaWithNewHRI(HMaster.java:529)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:472)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:309)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4389) Address lots of issues with migration from 90 to trunk

2011-09-13 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HBASE-4389:
---

Component/s: migration

 Address lots of issues with migration from 90 to trunk
 --

 Key: HBASE-4389
 URL: https://issues.apache.org/jira/browse/HBASE-4389
 Project: HBase
  Issue Type: Bug
  Components: master, migration
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Priority: Critical
 Fix For: 0.92.0


 Looking over the migration code that removes HTD from HRI, there are lots of 
 issues. This JIRA is to redo this code in a way that will be less bug prone, 
 and also future proof.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4238) CatalogJanitor can clear a daughter that split before processing its parent

2011-09-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104020#comment-13104020
 ] 

Hudson commented on HBASE-4238:
---

Integrated in HBase-TRUNK #2205 (See 
[https://builds.apache.org/job/HBase-TRUNK/2205/])
HBASE-4238 CatalogJanitor can clear a daughter that split before processing 
its parent

stack : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/CatalogJanitor.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java


 CatalogJanitor can clear a daughter that split before processing its parent
 ---

 Key: HBASE-4238
 URL: https://issues.apache.org/jira/browse/HBASE-4238
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: stack
Priority: Critical
 Fix For: 0.92.0, 0.90.5

 Attachments: 4238-v2.txt, 4238.txt


 I didn't dig a lot into this issue, but by splitting a table twice in a row I 
 was able to trigger a situation where a daughter of the first split was 
 deleted by the CatalogJanitor before it processed its parent. Will post log 
 in a comment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4395) EnableTableHandler races with itself

2011-09-13 Thread Jean-Daniel Cryans (JIRA)
EnableTableHandler races with itself


 Key: HBASE-4395
 URL: https://issues.apache.org/jira/browse/HBASE-4395
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.90.5


Very often when we try to enable a big table we get something like:

{quote}
2011-09-02 12:21:56,619 FATAL org.apache.hadoop.hbase.master.HMaster: 
Unexpected state trying to OFFLINE; huge_ass_region_name state=PENDING_OPEN, 
ts=1314991316616
java.lang.IllegalStateException
at 
org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1074)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1030)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:858)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:838)
at 
org.apache.hadoop.hbase.master.handler.EnableTableHandler$BulkEnabler$1.run(EnableTableHandler.java:154)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
2011-09-02 12:21:56,620 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
{quote}

The issue is that EnableTableHandler calls multiple BulkEnabler and it's 
possible that by the time it calls it a second time, using a stale list of 
still-not-enabled regions, that it tries to set one region offline in ZK but 
just after its state changed. Case in point:

{quote}
2011-09-02 12:21:56,616 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Assigning region huge_ass_region_name to sv4r23s16,60020,1314880035029
2011-09-02 12:21:56,619 FATAL org.apache.hadoop.hbase.master.HMaster: 
Unexpected state trying to OFFLINE; huge_ass_region_name state=PENDING_OPEN, 
ts=1314991316616
{quote}

Here the first line is the first assign done in the first thread, and the 
second line is the second thread that got to process the same region around the 
same time. 3ms difference in time. After that, the master dies, and it's pretty 
sad when it restarts because it failovers an enabling table and it's ungodly 
slow.

I'm pretty sure there's a window where double assignment are possible.

Talking with Stack, it doesn't really make sense to call multiple enables since 
the list of regions is static (the table is disabled!). We should just call it 
and wait. Also there's a lot of cleanup to do in EnableTableHandler since it 
refers to disabling the table (copy pasta I guess).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4391) Add ability to start RS as root and call mlockall

2011-09-13 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reassigned HBASE-4391:
--

Assignee: Todd Lipcon

 Add ability to start RS as root and call mlockall
 -

 Key: HBASE-4391
 URL: https://issues.apache.org/jira/browse/HBASE-4391
 Project: HBase
  Issue Type: New Feature
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 0.94.0


 A common issue we've seen in practice is that users oversubscribe their 
 region servers with too many MR tasks, etc. As soon as the machine starts 
 swapping, the RS grinds to a halt, loses ZK session, aborts, etc.
 This can be combatted by starting the RS as root, calling mlockall(), and 
 then setuid down to the hbase user. We should not require this, but we should 
 provide it as an option.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.

2011-09-13 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104053#comment-13104053
 ] 

jirapos...@reviews.apache.org commented on HBASE-4213:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1786/#review1883
---



/src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java
https://reviews.apache.org/r/1786/#comment4335

How many times is this expected to spin? Should there be a sleep here?


- Andrew


On 2011-09-12 18:36:02, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1786/
bq.  ---
bq.  
bq.  (Updated 2011-09-12 18:36:02)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  From Subbu:
bq.  here is the latest patch that support alter_instant, an instant schema 
change command that supports (Add, Modify, Delete column and Modify table) 
actions through ZK.
bq.  
bq.  1. This pattern capitalizes on the fact that HRI's are now in HDFS and 
need not be sent over again from Master to RS cloud on every schema change 
event.
bq.  
bq.  2. Offers real time instant schema change as we bypass the explicit bulk 
reassign (unassign + assign) of regions from master to RS.
bq.  
bq.  3. Offers fault tolerant schema change support as schema changes now go 
through ZK. Secondary master taking over a failed schema change will be 
addressed through a separate JIRA.
bq.  
bq.  
bq.  This addresses bug HBASE-4213.
bq.  https://issues.apache.org/jira/browse/HBASE-4213
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq./src/main/java/org/apache/hadoop/hbase/avro/AvroServer.java 1169522 
bq./src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 1169522 
bq./src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java 
1169522 
bq./src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 1169522 
bq./src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1169522 
bq./src/main/java/org/apache/hadoop/hbase/master/MasterServices.java 
1169522 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java 
1169522 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/ModifyTableHandler.java 
1169522 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java
 1169522 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableDeleteFamilyHandler.java
 1169522 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java 
1169522 
bq.
/src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java
 1169522 
bq./src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 
1169522 
bq./src/main/java/org/apache/hadoop/hbase/regionserver/OnlineRegions.java 
1169522 
bq./src/main/java/org/apache/hadoop/hbase/rest/SchemaResource.java 1169522 
bq.
/src/main/java/org/apache/hadoop/hbase/zookeeper/MasterSchemaChangeTracker.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/zookeeper/SchemaChangeTracker.java 
PRE-CREATION 
bq./src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 
1169522 
bq./src/main/ruby/hbase/admin.rb 1169522 
bq./src/main/ruby/shell.rb 1169522 
bq./src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java 1169522 
bq./src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 
1169522 
bq.
/src/test/java/org/apache/hadoop/hbase/client/TestInstantSchemaChange.java 
PRE-CREATION 
bq.
/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java 
1169522 
bq./src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 
1169522 
bq.
/src/test/java/org/apache/hadoop/hbase/regionserver/handler/MockRegionServerServices.java
 1169522 
bq.  
bq.  Diff: https://reviews.apache.org/r/1786/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Unit tests pass.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Support for fault tolerant, instant schema updates with out master's 
 intervention (i.e with out enable/disable and bulk assign/unassign) through 
 ZK.
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.92.0

 Attachments: 

  1   2   >