[jira] Created: (HBASE-3148) Duplicate check table name in HBaseAdmin's createTable method

2010-10-25 Thread Jeff Zhang (JIRA)
Duplicate check table name in HBaseAdmin's createTable method
-

 Key: HBASE-3148
 URL: https://issues.apache.org/jira/browse/HBASE-3148
 Project: HBase
  Issue Type: Improvement
  Components: client
Reporter: Jeff Zhang




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3148) Duplicate check table name in HBaseAdmin's createTable method

2010-10-25 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924456#action_12924456
 ] 

Jeff Zhang commented on HBASE-3148:
---

When I learn the hbase code, I found that there's a duplicate check table name 
in HBaseAdmin's createTable method.
Line 282 in createTable method do one check and line 332 do another check in 
createTableAsync which is called by createTable. I believe one check can been 
removed.



 Duplicate check table name in HBaseAdmin's createTable method
 -

 Key: HBASE-3148
 URL: https://issues.apache.org/jira/browse/HBASE-3148
 Project: HBase
  Issue Type: Improvement
  Components: client
Reporter: Jeff Zhang



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3147) Regions stuck in transition after rolling restart, perpetual timeout handling but nothing happens

2010-10-25 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924602#action_12924602
 ] 

stack commented on HBASE-3147:
--

I got this when I tried running patch

{code}
java.lang.IllegalAccessError: tried to access method 
org.apache.hadoop.hbase.zookeeper.ZKAssign.getNodeName(Lorg/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher;Ljava/lang/String;)Ljava/lang/String;
 from class org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor
at 
org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor.chore(AssignmentManager.java:1457)
at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
2010-10-25 16:07:44,354 INFO 
org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor: 
sv2borg180:6.timeoutMonitor exiting
{code}

Let me try fix.

 Regions stuck in transition after rolling restart, perpetual timeout handling 
 but nothing happens
 -

 Key: HBASE-3147
 URL: https://issues.apache.org/jira/browse/HBASE-3147
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Fix For: 0.90.0


 The rolling restart script is great for bringing on the weird stuff.  On my 
 little loaded cluster if I run it, it horks the cluster and it doesn't 
 recover.  I notice two issues that need fixing:
 1. We'll miss noticing that a server was carrying .META. and it never gets 
 assigned -- the shutdown handlers get stuck in perpetual wait on a .META. 
 assign that will never happen.
 2. Perpetual cycling of the this sequence per region not succesfully assigned:
 {code}
  2010-10-23 21:37:57,404 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  usertable,user510588360,1287547556587.7f2d92497d2d03917afd574ea2aca55b. 
 state=PENDING_OPEN,   ts=1287869814294  45154 2010-10-23 
 21:37:57,404 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region 
 has been PENDING_OPEN or OPENING for too long, reassigning 
 region=usertable,user510588360,1287547556587. 
 7f2d92497d2d03917afd574ea2aca55b.  45155 2010-10-23 21:37:57,404 DEBUG 
 org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2bd57d1475046a 
 Attempting to transition node 7f2d92497d2d03917afd574ea2aca55b from 
 RS_ZK_REGION_OPENING to M_ZK_REGION_OFFLINE  45156 2010-10-23 21:37:57,404 
 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2bd57d1475046a Attempt to transition the unassigned node for 
 7f2d92497d2d03917afd574ea2aca55b from RS_ZK_REGION_OPENING to 
 M_ZK_REGION_OFFLINE failed, the node existed but was in the state 
 M_ZK_REGION_OFFLINE  45157 2010-10-23 21:37:57,404 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region transitioned OPENING 
 to OFFLINE so skipping timeout, 
 region=usertable,user510588360,1287547556587.7f2d92497d2d03917afd574ea2aca55b.
   
 ,,,
 {code}
 Timeout period again elapses an then same sequence.
 This is what I've been working on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3147) Regions stuck in transition after rolling restart, perpetual timeout handling but nothing happens

2010-10-25 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924614#action_12924614
 ] 

Jonathan Gray commented on HBASE-3147:
--

Hmm... you should have:

  public static String getNodeName(ZooKeeperWatcher zkw, String regionName) {

as part of the diff up on RB

 Regions stuck in transition after rolling restart, perpetual timeout handling 
 but nothing happens
 -

 Key: HBASE-3147
 URL: https://issues.apache.org/jira/browse/HBASE-3147
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Fix For: 0.90.0


 The rolling restart script is great for bringing on the weird stuff.  On my 
 little loaded cluster if I run it, it horks the cluster and it doesn't 
 recover.  I notice two issues that need fixing:
 1. We'll miss noticing that a server was carrying .META. and it never gets 
 assigned -- the shutdown handlers get stuck in perpetual wait on a .META. 
 assign that will never happen.
 2. Perpetual cycling of the this sequence per region not succesfully assigned:
 {code}
  2010-10-23 21:37:57,404 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  usertable,user510588360,1287547556587.7f2d92497d2d03917afd574ea2aca55b. 
 state=PENDING_OPEN,   ts=1287869814294  45154 2010-10-23 
 21:37:57,404 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region 
 has been PENDING_OPEN or OPENING for too long, reassigning 
 region=usertable,user510588360,1287547556587. 
 7f2d92497d2d03917afd574ea2aca55b.  45155 2010-10-23 21:37:57,404 DEBUG 
 org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2bd57d1475046a 
 Attempting to transition node 7f2d92497d2d03917afd574ea2aca55b from 
 RS_ZK_REGION_OPENING to M_ZK_REGION_OFFLINE  45156 2010-10-23 21:37:57,404 
 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2bd57d1475046a Attempt to transition the unassigned node for 
 7f2d92497d2d03917afd574ea2aca55b from RS_ZK_REGION_OPENING to 
 M_ZK_REGION_OFFLINE failed, the node existed but was in the state 
 M_ZK_REGION_OFFLINE  45157 2010-10-23 21:37:57,404 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region transitioned OPENING 
 to OFFLINE so skipping timeout, 
 region=usertable,user510588360,1287547556587.7f2d92497d2d03917afd574ea2aca55b.
   
 ,,,
 {code}
 Timeout period again elapses an then same sequence.
 This is what I've been working on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2001) Coprocessors: Colocate user code with regions

2010-10-25 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924653#action_12924653
 ] 

HBase Review Board commented on HBASE-2001:
---

Message from: Andrew Purtell apurt...@apache.org


bq.  On 2010-10-25 06:49:15, Himanshu Vashishtha wrote:
bq.   src/main/java/org/apache/hadoop/hbase/regionserver/CoprocessorHost.java, 
line 343
bq.   http://review.cloudera.org/r/876/diff/7/?file=14190#file14190line343
bq.  
bq.   What is its purpose here? I couldn't see it being used as of now. Is 
it for some future functionality.

The access controller coprocessor (HBASE-3025) needs a CatalogTracker. Other 
future functionality is also considered.


- Andrew


---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/876/#review1646
---





 Coprocessors: Colocate user code with regions
 -

 Key: HBASE-2001
 URL: https://issues.apache.org/jira/browse/HBASE-2001
 Project: HBase
  Issue Type: Sub-task
Reporter: Andrew Purtell
Assignee: Mingjie Lai
 Fix For: 0.92.0

 Attachments: asm-transformations.pdf, 
 HBASE-2001-RegionObserver-2.patch, HBASE-2001-RegionObserver.patch, 
 HBASE-2001.patch.gz, packge-info.html, packge-info.html


 Support user code that runs run next to each region in table. As regions 
 split and move, coprocessor code should automatically  move also.
 Use classloader which looks on HDFS.
 Associate a list of classes to load with each table. Put this in HRI so it 
 inherits from table but can be changed on a per region basis (so then those 
 region specific changes can inherited by daughters). 
 Not completely arbitrary code, should require implementation of an interface 
 with callbacks for:
 * Open
 * Close
 * Split
 * Compact
 * (Multi)get and scanner next()
 * (Multi)put
 * (Multi)delete
 Add method to HRegionInterface for invoking coprocessor methods and 
 retrieving results.  
 Add methods in o.a.h.h.regionserver or subpackage which implement convenience 
 functions for coprocessor methods and consistent/controlled access to 
 internals: store access, threading, persistent and ephemeral state, scratch 
 storage, etc. 
 GitHub: http://github.com/mlai/hbase/tree/0.90_coprocessor

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2462) Review compaction heuristic and move compaction code out so standalone and independently testable

2010-10-25 Thread Nicolas Spiegelberg (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924664#action_12924664
 ] 

Nicolas Spiegelberg commented on HBASE-2462:


So, we've been talking about a new compaction algorithm internally and wanted 
to get external feedback as well...

The existing store file selection algorithm seems to not utilize enough 
context.  We start at the oldest and compact everything else when it's no 
longer 2x the next oldest.  It seems like we want to approach from the opposite 
direction:

1. Start at the newest file
2. Unconditionally compact as long as the StoreFiles are less than a certain
size (thinking hbase.regionserver.hlog.blocksize).
3. After that metric has been met,  if next oldest file  sum(all newer files) 
* R, we include it in the compaction.  R = 2.
4. If files-to-compact  max(HColumnDescriptor.maxVersions(),3), skip the 
compaction

This algorithm can serve a very generic workload.  Axiom: It's worth compacting 
if sum(files) = 150% * max(files).  Maybe make this adjustable.  The main 
point is that the ratio between file[i], file[i+1] is less useful than 
sum(files), max(files).

A. With files[i]  files[i+1] * 2, our worst case ends up with a decreasing 
triangle of 2x.
B. With files[i]  sum(files[0..i-1]) * 2, we are dealing with the derivative.  
Our worst case ends up with decreasing triangle of 4x

With a 4x ratio  64 MB hlog blocksize, we could support up to a 21.4GB Store 
while using less than 8 files.  3 minimal threshold fiels + 5 worst case files 
that would be roughly: 64MB, 256MB, 1GB, 4GB, 16GB == 21.3GB.  Assuming that 
the average user has a 1-2 GB store, the number of HFiles should never get 
above 6.


 Review compaction heuristic and move compaction code out so standalone and 
 independently testable
 -

 Key: HBASE-2462
 URL: https://issues.apache.org/jira/browse/HBASE-2462
 Project: HBase
  Issue Type: Improvement
Reporter: stack
Assignee: Jonathan Gray
Priority: Critical

 Anything that improves our i/o profile makes hbase run smoother.  Over in 
 HBASE-2457, good work has been done already describing the tension between 
 minimizing compactions versus minimizing count of store files.  This issue is 
 about following on from what has been done in 2457 but also, breaking the 
 hard-to-read compaction code out of Store.java out to a standalone class that 
 can be the easier tested (and easily analyzed for its performance 
 characteristics).
 If possible, in the refactor, we'd allow specification of alternate merge 
 sort implementations. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3149) Make flush decisions per column family

2010-10-25 Thread Karthik Ranganathan (JIRA)
Make flush decisions per column family
--

 Key: HBASE-3149
 URL: https://issues.apache.org/jira/browse/HBASE-3149
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan


Today, the flush decision is made using the aggregate size of all column 
families. When large and small column families co-exist, this causes many small 
flushes of the smaller CF. We need to make per-CF flush decisions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3149) Make flush decisions per column family

2010-10-25 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924705#action_12924705
 ] 

Jean-Daniel Cryans commented on HBASE-3149:
---

I have been thinking about this one for some time... I think it makes sense in 
loads of ways since a common problem of multi-CF is that during the initial 
import the user ends up with thousands of small store files because some family 
grows faster and triggered the flushes, which in turn generates incredible 
compaction churn. On the other hand, it means that we almost consider a family 
as a region e.g. one region with 3 CF can have up to 3x64MB in the memstores.

 Make flush decisions per column family
 --

 Key: HBASE-3149
 URL: https://issues.apache.org/jira/browse/HBASE-3149
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan

 Today, the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2462) Review compaction heuristic and move compaction code out so standalone and independently testable

2010-10-25 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924717#action_12924717
 ] 

stack commented on HBASE-2462:
--

hbase.regionserver.hlog.blocksize == fs default block size.  Better to use fs 
default block size rather than an hlog setting.

Whats rationale of rule 4?  Do you rather mean the compaction threshold here?

Sorry, whats max(files)?  The largest file?  And sum(files) is all files or 
just some subset (you keep adding to the subset till you are  150% the 
biggest?)

So, you think this algo will make for less compactions yet keep count of files 
low?







 Review compaction heuristic and move compaction code out so standalone and 
 independently testable
 -

 Key: HBASE-2462
 URL: https://issues.apache.org/jira/browse/HBASE-2462
 Project: HBase
  Issue Type: Improvement
Reporter: stack
Assignee: Jonathan Gray
Priority: Critical

 Anything that improves our i/o profile makes hbase run smoother.  Over in 
 HBASE-2457, good work has been done already describing the tension between 
 minimizing compactions versus minimizing count of store files.  This issue is 
 about following on from what has been done in 2457 but also, breaking the 
 hard-to-read compaction code out of Store.java out to a standalone class that 
 can be the easier tested (and easily analyzed for its performance 
 characteristics).
 If possible, in the refactor, we'd allow specification of alternate merge 
 sort implementations. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3150) Allow some column to not write WALs

2010-10-25 Thread Karthik Ranganathan (JIRA)
Allow some column to not write WALs
---

 Key: HBASE-3150
 URL: https://issues.apache.org/jira/browse/HBASE-3150
 Project: HBase
  Issue Type: Improvement
Reporter: Karthik Ranganathan
Priority: Minor


We have this unique requirement where some column families hold data that is 
indexed from other existing column families. The index data is very large, and 
we end up writing these inserts into the WAL and then into the store files. In 
addition to taking more iops, this also slows down splitting files for 
recovery, etc.

Creating this task to have an option to suppress WAL logging on a per CF basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2946) Increment multiple columns in a row at once

2010-10-25 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924728#action_12924728
 ] 

HBase Review Board commented on HBASE-2946:
---

Message from: Jonathan Gray jg...@apache.org


bq.  On 2010-10-24 21:41:48, khemani wrote:
bq.   trunk/src/main/java/org/apache/hadoop/hbase/client/Increment.java, line 
45
bq.   http://review.cloudera.org/r/1088/diff/2/?file=15904#file15904line45
bq.  
bq.   setWriteToWal() is missing?

yup.  will add.


bq.  On 2010-10-24 21:41:48, khemani wrote:
bq.   trunk/src/main/java/org/apache/hadoop/hbase/client/Increment.java, lines 
46-47
bq.   http://review.cloudera.org/r/1088/diff/2/?file=15904#file15904line46
bq.  
bq.   why a navigable map? why not just a map?

u can do things like tailMap[() with it.


bq.  On 2010-10-24 21:41:48, khemani wrote:
bq.   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, 
line 3012
bq.   http://review.cloudera.org/r/1088/diff/2/?file=15907#file15907line3012
bq.  
bq.   I am not sure how it is ensured that the order of iteration over the 
columns in family.getValue.entrySet() is same as the order of results returned?
bq.   
bq.   Also, if get finds multiple matches then will it return all of them? 
If yes then this will not work.

familyMap and the map of columns to amounts are both TreeMaps ordered with 
Bytes.BYTES_COMPARATOR.  Results are also guaranteed to be in order.  And our 
Get has maxVersions=1 so we will not get multiple matches per column.





 Increment multiple columns in a row at once
 ---

 Key: HBASE-2946
 URL: https://issues.apache.org/jira/browse/HBASE-2946
 Project: HBase
  Issue Type: New Feature
  Components: client, regionserver
Reporter: Jonathan Gray
Assignee: Jonathan Gray

 Currently there is no way to do multiple increments to a single row in one 
 RPC.  This jira is about adding an HTable and HRegionInterface method to 
 increment multiple columns within a single row at once.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2753) Remove sorted() methods from Result now that Gets are Scans

2010-10-25 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924759#action_12924759
 ] 

HBase Review Board commented on HBASE-2753:
---

Message from: Ryan Rawson ryano...@gmail.com

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1092/
---

Review request for hbase.


Summary
---

the last hurrah, removing the sorting from Result. sorted() is already 
@deprecated


This addresses bug HBASE-2753.
http://issues.apache.org/jira/browse/HBASE-2753


Diffs
-

  trunk/src/main/java/org/apache/hadoop/hbase/client/Result.java 1026537 

Diff: http://review.cloudera.org/r/1092/diff


Testing
---


Thanks,

Ryan




 Remove sorted() methods from Result now that Gets are Scans
 ---

 Key: HBASE-2753
 URL: https://issues.apache.org/jira/browse/HBASE-2753
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.90.0
Reporter: Jonathan Gray
Assignee: ryan rawson
 Fix For: 0.90.0


 With the old Get codepath, we used to sometimes get results sent to the 
 client that weren't fully sorted.  Now that Gets are Scans, results should 
 always be sorted.
 Confirm that we always get back sorted results and if so drop the 
 Result.sorted() method and update javadoc accordingly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2753) Remove sorted() methods from Result now that Gets are Scans

2010-10-25 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924762#action_12924762
 ] 

HBase Review Board commented on HBASE-2753:
---

Message from: Jonathan Gray jg...@apache.org

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1092/#review1661
---

Ship it!


looks good to me

- Jonathan





 Remove sorted() methods from Result now that Gets are Scans
 ---

 Key: HBASE-2753
 URL: https://issues.apache.org/jira/browse/HBASE-2753
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.90.0
Reporter: Jonathan Gray
Assignee: ryan rawson
 Fix For: 0.90.0


 With the old Get codepath, we used to sometimes get results sent to the 
 client that weren't fully sorted.  Now that Gets are Scans, results should 
 always be sorted.
 Confirm that we always get back sorted results and if so drop the 
 Result.sorted() method and update javadoc accordingly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HBASE-2753) Remove sorted() methods from Result now that Gets are Scans

2010-10-25 Thread ryan rawson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ryan rawson resolved HBASE-2753.


Resolution: Fixed

this is committed, we no longer sort in Result.

 Remove sorted() methods from Result now that Gets are Scans
 ---

 Key: HBASE-2753
 URL: https://issues.apache.org/jira/browse/HBASE-2753
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.90.0
Reporter: Jonathan Gray
Assignee: ryan rawson
 Fix For: 0.90.0


 With the old Get codepath, we used to sometimes get results sent to the 
 client that weren't fully sorted.  Now that Gets are Scans, results should 
 always be sorted.
 Confirm that we always get back sorted results and if so drop the 
 Result.sorted() method and update javadoc accordingly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-2645) HLog writer can do 1-2 sync operations after lease has been recovered for split process.

2010-10-25 Thread ryan rawson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ryan rawson updated HBASE-2645:
---

Fix Version/s: (was: 0.90.0)
   0.92.0

moving fix version to 0.92

 HLog writer can do 1-2 sync operations after lease has been recovered for 
 split process.
 

 Key: HBASE-2645
 URL: https://issues.apache.org/jira/browse/HBASE-2645
 Project: HBase
  Issue Type: Bug
  Components: io
Affects Versions: 0.92.0
Reporter: Cosmin Lehene
Assignee: Todd Lipcon
Priority: Blocker
 Fix For: 0.92.0


 TestHLogSplit.testLogCannotBeWrittenOnceParsed is failing. 
 This test starts a thread that writes one edit to the log, syncs and counts. 
 During this, a HLog.splitLog operation is started. splitLog recovers the log 
 lease before reading the log, so that the original regionserver could not 
 wake up and write after the split process started.  
 The test compares the number of edits reported by the split process and by 
 the writer thread. Writer thread (called zombie in the test) should report = 
  than the splitLog (sync() might raise after the last edit gets written and 
 the edit won't get counted by zombie thread). However it appears that the 
 zombie counts 1-2 more edits. So it looks like it can sync without a lease.
 This might be a hdfs-0.20 related issue. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3147) Regions stuck in transition after rolling restart, perpetual timeout handling but nothing happens

2010-10-25 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924782#action_12924782
 ] 

HBase Review Board commented on HBASE-3147:
---

Message from: st...@duboce.net

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1087/
---

(Updated 2010-10-25 16:29:36.379908)


Review request for hbase and stack.


Changes
---

Added metaservershutdownhandler and rootservershutdownhandler


Summary (updated)
---

Adds new handling of the timeouts for PENDING_OPEN and PENDING_CLOSE in-memory 
master RIT states.

Adds some new broken RIT states into TestMasterFailover.

Some of these broken states don't seem possible to me but as long as we aren't 
breaking the existing behaviors and tests I think it's okay if we handle odd 
cases that can be mocked.  Who knows what will happen in the real world.

The reason TestMasterFailover didn't/doesn't really test for the issue in 
HBASE-3147 is this new broken condition happens when an RS dies / goes offline 
rather than a master failover concurrent w/ RS failure.


v4 of the patch adds to Jons' fixes.  It adds a shutdown server handler for 
root and another for meta so the processing of servers hosting meta/root do not 
get frozen out.  I've seen this in my testing.


This addresses bug HBASE-3147.
http://issues.apache.org/jira/browse/HBASE-3147


Diffs (updated)
-

  trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java 
1027291 
  trunk/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 1027291 
  trunk/src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java 
1027291 
  trunk/src/main/java/org/apache/hadoop/hbase/executor/ExecutorService.java 
1027291 
  trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 
1027291 
  trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1027291 
  trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java 1027291 
  
trunk/src/main/java/org/apache/hadoop/hbase/master/handler/MetaServerShutdownHandler.java
 PRE-CREATION 
  
trunk/src/main/java/org/apache/hadoop/hbase/master/handler/RootServerShutdownHandler.java
 PRE-CREATION 
  
trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
 1027292 
  trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 1027291 
  trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java 
1027291 

Diff: http://review.cloudera.org/r/1087/diff


Testing
---

TestMasterFailover passes.


Thanks,

Jonathan




 Regions stuck in transition after rolling restart, perpetual timeout handling 
 but nothing happens
 -

 Key: HBASE-3147
 URL: https://issues.apache.org/jira/browse/HBASE-3147
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Fix For: 0.90.0


 The rolling restart script is great for bringing on the weird stuff.  On my 
 little loaded cluster if I run it, it horks the cluster and it doesn't 
 recover.  I notice two issues that need fixing:
 1. We'll miss noticing that a server was carrying .META. and it never gets 
 assigned -- the shutdown handlers get stuck in perpetual wait on a .META. 
 assign that will never happen.
 2. Perpetual cycling of the this sequence per region not succesfully assigned:
 {code}
  2010-10-23 21:37:57,404 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  usertable,user510588360,1287547556587.7f2d92497d2d03917afd574ea2aca55b. 
 state=PENDING_OPEN,   ts=1287869814294  45154 2010-10-23 
 21:37:57,404 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region 
 has been PENDING_OPEN or OPENING for too long, reassigning 
 region=usertable,user510588360,1287547556587. 
 7f2d92497d2d03917afd574ea2aca55b.  45155 2010-10-23 21:37:57,404 DEBUG 
 org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2bd57d1475046a 
 Attempting to transition node 7f2d92497d2d03917afd574ea2aca55b from 
 RS_ZK_REGION_OPENING to M_ZK_REGION_OFFLINE  45156 2010-10-23 21:37:57,404 
 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2bd57d1475046a Attempt to transition the unassigned node for 
 7f2d92497d2d03917afd574ea2aca55b from RS_ZK_REGION_OPENING to 
 M_ZK_REGION_OFFLINE failed, the node existed but was in the state 
 M_ZK_REGION_OFFLINE  45157 2010-10-23 21:37:57,404 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region transitioned OPENING 
 to OFFLINE so skipping timeout, 
 region=usertable,user510588360,1287547556587.7f2d92497d2d03917afd574ea2aca55b.
   
 

[jira] Commented: (HBASE-3147) Regions stuck in transition after rolling restart, perpetual timeout handling but nothing happens

2010-10-25 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924784#action_12924784
 ] 

HBase Review Board commented on HBASE-3147:
---

Message from: Jonathan Gray jg...@apache.org

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1087/#review1662
---

Ship it!


Looks good.  Not sure if I can +1 my patch but I think we should commit :)


trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
http://review.cloudera.org/r/1087/#comment5542

Should we remove this code from inside of ServerShutdownHandler now?  Not a 
big deal but being done twice.


- Jonathan





 Regions stuck in transition after rolling restart, perpetual timeout handling 
 but nothing happens
 -

 Key: HBASE-3147
 URL: https://issues.apache.org/jira/browse/HBASE-3147
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Fix For: 0.90.0


 The rolling restart script is great for bringing on the weird stuff.  On my 
 little loaded cluster if I run it, it horks the cluster and it doesn't 
 recover.  I notice two issues that need fixing:
 1. We'll miss noticing that a server was carrying .META. and it never gets 
 assigned -- the shutdown handlers get stuck in perpetual wait on a .META. 
 assign that will never happen.
 2. Perpetual cycling of the this sequence per region not succesfully assigned:
 {code}
  2010-10-23 21:37:57,404 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  usertable,user510588360,1287547556587.7f2d92497d2d03917afd574ea2aca55b. 
 state=PENDING_OPEN,   ts=1287869814294  45154 2010-10-23 
 21:37:57,404 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region 
 has been PENDING_OPEN or OPENING for too long, reassigning 
 region=usertable,user510588360,1287547556587. 
 7f2d92497d2d03917afd574ea2aca55b.  45155 2010-10-23 21:37:57,404 DEBUG 
 org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2bd57d1475046a 
 Attempting to transition node 7f2d92497d2d03917afd574ea2aca55b from 
 RS_ZK_REGION_OPENING to M_ZK_REGION_OFFLINE  45156 2010-10-23 21:37:57,404 
 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2bd57d1475046a Attempt to transition the unassigned node for 
 7f2d92497d2d03917afd574ea2aca55b from RS_ZK_REGION_OPENING to 
 M_ZK_REGION_OFFLINE failed, the node existed but was in the state 
 M_ZK_REGION_OFFLINE  45157 2010-10-23 21:37:57,404 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region transitioned OPENING 
 to OFFLINE so skipping timeout, 
 region=usertable,user510588360,1287547556587.7f2d92497d2d03917afd574ea2aca55b.
   
 ,,,
 {code}
 Timeout period again elapses an then same sequence.
 This is what I've been working on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-3147) Regions stuck in transition after rolling restart, perpetual timeout handling but nothing happens

2010-10-25 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3147:
-

Attachment: HBASE-3147-v6.patch

Here is what I'll commit.  It does as Jon suggests removing check of root or  
meta carrying inside in shutdown handler since we're doing the check on the 
outside now.  This patch also includes missing hookup that testing found.

There is still work to do on this issue.  What seems to be happening is that a 
watcher is not being triggered.  Need to figure how that is happening.  I'll 
see a regionserver with all of its opener handlers stuck waiting on 
notification that meta  has been deployed Other servers will have gotten 
their watcher triggered but not one or two in the cluster   Master is then 
stuck timing out this regionservers allocations and then reassigning... calling 
open on the rpc which adds region to queue but since all openers are stuck 
waiting on meta, the queues don't get processed.

 Regions stuck in transition after rolling restart, perpetual timeout handling 
 but nothing happens
 -

 Key: HBASE-3147
 URL: https://issues.apache.org/jira/browse/HBASE-3147
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Fix For: 0.90.0

 Attachments: HBASE-3147-v6.patch


 The rolling restart script is great for bringing on the weird stuff.  On my 
 little loaded cluster if I run it, it horks the cluster and it doesn't 
 recover.  I notice two issues that need fixing:
 1. We'll miss noticing that a server was carrying .META. and it never gets 
 assigned -- the shutdown handlers get stuck in perpetual wait on a .META. 
 assign that will never happen.
 2. Perpetual cycling of the this sequence per region not succesfully assigned:
 {code}
  2010-10-23 21:37:57,404 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  usertable,user510588360,1287547556587.7f2d92497d2d03917afd574ea2aca55b. 
 state=PENDING_OPEN,   ts=1287869814294  45154 2010-10-23 
 21:37:57,404 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region 
 has been PENDING_OPEN or OPENING for too long, reassigning 
 region=usertable,user510588360,1287547556587. 
 7f2d92497d2d03917afd574ea2aca55b.  45155 2010-10-23 21:37:57,404 DEBUG 
 org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2bd57d1475046a 
 Attempting to transition node 7f2d92497d2d03917afd574ea2aca55b from 
 RS_ZK_REGION_OPENING to M_ZK_REGION_OFFLINE  45156 2010-10-23 21:37:57,404 
 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x2bd57d1475046a Attempt to transition the unassigned node for 
 7f2d92497d2d03917afd574ea2aca55b from RS_ZK_REGION_OPENING to 
 M_ZK_REGION_OFFLINE failed, the node existed but was in the state 
 M_ZK_REGION_OFFLINE  45157 2010-10-23 21:37:57,404 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region transitioned OPENING 
 to OFFLINE so skipping timeout, 
 region=usertable,user510588360,1287547556587.7f2d92497d2d03917afd574ea2aca55b.
   
 ,,,
 {code}
 Timeout period again elapses an then same sequence.
 This is what I've been working on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2462) Review compaction heuristic and move compaction code out so standalone and independently testable

2010-10-25 Thread Nicolas Spiegelberg (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924792#action_12924792
 ] 

Nicolas Spiegelberg commented on HBASE-2462:


@stack: 

1. FS default blocksize is the default for a non-custom hlog.blocksize, but 
they are not necessarily 1-1.   The idea is that new HFiles created should 
always be = hlog.blocksize, so we unconditionally compact for HFiles that have 
not already been compacted at least once.

2.  The idea behind step #4 is that compaction becomes extremely useful when 
you can use it to dedupe.  We should definitely use the compactionThreshold 
metric here instead of hard-coded 3,   However, I don't think this should be an 
absolute number of StoreFiles, but rather the number of relatively-small 
StoreFiles.  If you have huge region sizes (e.g. large object store), then you 
don't mind having 6 storefiles and really just want to compact when it will 
save a decent amount of space.

3. This algorithm will perform roughly the same for compacting small/new files; 
however it will be more aggressive about including older files in the 
compaction because it can more quickly detect when it's advantageous to 
compact.  Because of the 4x (vs. 2x) multiplier, it's 2x more scalable and 
should result in 1/2 the amount of large StoreFiles for large regions.  For 
DEFAULT_MAX_FILE_SIZE == 256MB, you should never have more than 5 StoreFiles 
before triggering a split.

 Review compaction heuristic and move compaction code out so standalone and 
 independently testable
 -

 Key: HBASE-2462
 URL: https://issues.apache.org/jira/browse/HBASE-2462
 Project: HBase
  Issue Type: Improvement
Reporter: stack
Assignee: Jonathan Gray
Priority: Critical

 Anything that improves our i/o profile makes hbase run smoother.  Over in 
 HBASE-2457, good work has been done already describing the tension between 
 minimizing compactions versus minimizing count of store files.  This issue is 
 about following on from what has been done in 2457 but also, breaking the 
 hard-to-read compaction code out of Store.java out to a standalone class that 
 can be the easier tested (and easily analyzed for its performance 
 characteristics).
 If possible, in the refactor, we'd allow specification of alternate merge 
 sort implementations. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3151) NPE when trying to read regioninfo from .META.

2010-10-25 Thread stack (JIRA)
NPE when trying to read regioninfo from .META.
--

 Key: HBASE-3151
 URL: https://issues.apache.org/jira/browse/HBASE-3151
 Project: HBase
  Issue Type: Bug
Reporter: stack


This is an old issue perhaps in a new guise.  From the list, Sebastien Bauer 
reports:

{code}
 2010-10-25 08:13:01,690 ERROR
 org.apache.hadoop.hbase.master.CatalogJanitor: Caught exception
 java.lang.NullPointerException
 2010-10-25 08:13:24,385 INFO
 org.apache.hadoop.hbase.master.ServerManager: regionservers=2,
 averageload=2538


 2010-10-23 20:16:17,890 DEBUG
  org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
  Cached location for .META.,,1.1028785192 is
  db2a.goldenline.pl:60020
  2010-10-23 20:16:18,432 FATAL org.apache.hadoop.hbase.master.HMaster:
  Unhandled exception. Starting
  shutdown.

  java.lang.NullPointerException

        at
  org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75)

        at
  org.apache.hadoop.hbase.util.Writables.getHRegionInfo(Writables.java:119)

        at
  org.apache.hadoop.hbase.client.MetaScanner$1.processRow(MetaScanner.java:188)

        at
  org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:157)

        at
  org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:69)

        at
  org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:54)

        at
  org.apache.hadoop.hbase.client.MetaScanner.listAllRegions(MetaScanner.java:195)

       at
  org.apache.hadoop.hbase.master.AssignmentManager.assignAllUserRegions(AssignmentManager.java:1048)

        at
  org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:379)

        at
  org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:265)

  2010-10-23 20:16:18,433 INFO org.apache.hadoop.hbase.master.HMaster:
  Aborting

  2010-10-23 20:16:18,433 DEBUG org.apache.hadoop.hbase.master.HMaster:
  Stopping service threads
{code}


I think he has an old master... checking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.