[jira] [Commented] (HBASE-4492) TestRollingRestart fails intermittently

2011-09-27 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115249#comment-13115249
 ] 

Ted Yu commented on HBASE-4492:
---

Found the following in output for the above timeout case:
{code}
2011-09-27 05:28:59,047 DEBUG 
[RegionServer:3;us.ciq.com,57539,1317101335695-EventThread] 
zookeeper.ZooKeeperWatcher(233): regionserver:57539-0x132a95afa18000d Received 
ZooKeeper Event, type=NodeDeleted, state=SyncConnected, 
path=/hbase/root-region-server2011-09-27 05:28:59,047 DEBUG 
[RegionServer:3;us.ciq.com,58748,131710132-EventThread] 
zookeeper.ZKUtil(226): regionserver:58748-0x132a95afa18000a 
/hbase/root-region-server does not exist. Watcher is set.2011-09-27 
05:28:59,047 DEBUG [Master:0;us.ciq.com,56327,1317101304726-EventThread] 
zookeeper.ZKUtil(226): hconnection-0x132a95afa180005 /hbase/root-region-server 
does not exist. Watcher is set.2011-09-27 05:28:59,048 DEBUG 
[Thread-1-EventThread] zookeeper.ZooKeeperWatcher(233): 
master:51567-0x132a95afa180008 Received ZooKeeper Event, 
type=NodeChildrenChanged, state=SyncConnected, path=/hbase/unassigned2011-09-27 
05:28:59,048 DEBUG [RegionServer:3;us.ciq.com,57539,1317101335695-EventThread] 
zookeeper.ZKUtil(226): regionserver:57539-0x132a95afa18000d 
/hbase/root-region-server does not exist. Watcher is set.2011-09-27 
05:28:59,049 INFO  
[MASTER_META_SERVER_OPERATIONS-us.ciq.com,51567,1317101319637-3] 
master.AssignmentManager(1485): No previous transition plan was found (or we 
are ignoring an existing plan) for -ROOT-,,0.70236052 so generated a random 
one; hri=-ROOT-,,0.70236052, src=, dest=us.ciq.com,57500,1317101330748; 3 
(online=3, exclude=null) available servers2011-09-27 05:28:59,049 INFO  
[MASTER_META_SERVER_OPERATIONS-us.ciq.com,51567,1317101319637-3] 
master.AssignmentManager(1485): Assigning region -ROOT-,,0.70236052 to 
us.ciq.com,57500,13171013307482011-09-27 05:28:59,049 DEBUG 
[MASTER_META_SERVER_OPERATIONS-us.ciq.com,51567,1317101319637-3] 
master.ServerManager(448): New connection to 
us.ciq.com,57500,13171013307482011-09-27 05:28:59,049 DEBUG 
[Thread-1-EventThread] zookeeper.ZKUtil(224): master:51567-0x132a95afa180008 
Set watcher on existing znode /hbase/unassigned/702360522011-09-27 05:28:59,049 
FATAL [MASTER_META_SERVER_OPERATIONS-us.ciq.com,51567,1317101319637-3] 
master.HMaster(1181): Master server abort: loaded coprocessors are: 
[]2011-09-27 05:28:59,050 FATAL 
[MASTER_META_SERVER_OPERATIONS-us.ciq.com,51567,1317101319637-3] 
master.HMaster(1186): Unexpected state trying to OFFLINE; -ROOT-,,0.70236052 
state=PENDING_OPEN, ts=1317101339049, server=us.ciq.com,57500,1317101330748
java.lang.IllegalStateException
at 
org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1517)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1392)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1169)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1144)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1139)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:1816)
at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:105)
at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRootWithRetries(ServerShutdownHandler.java:123)
at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:186)
{code}

 TestRollingRestart fails intermittently
 ---

 Key: HBASE-4492
 URL: https://issues.apache.org/jira/browse/HBASE-4492
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Jonathan Gray
 Attachments: 4492.txt


 I got the following when running test suite on TRUNK:
 {code}
 testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart)  
 Time elapsed: 300.28 sec   ERROR!
 java.lang.Exception: test timed out after 30 milliseconds
 at java.lang.Thread.sleep(Native Method)
 at 
 org.apache.hadoop.hbase.master.TestRollingRestart.waitForRSShutdownToStartAndFinish(TestRollingRestart.java:313)
 at 
 org.apache.hadoop.hbase.master.TestRollingRestart.testBasicRollingRestart(TestRollingRestart.java:210)
 {code}
 I ran TestRollingRestart#testBasicRollingRestart manually afterwards which 
 wiped out test output file for the failed test.
 Similar failure can be found on Jenkins:
 https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/19/testReport/junit/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/

--
This message is automatically 

[jira] [Created] (HBASE-4495) CatalogTracker has an identity crisis; needs to be cut-back in scope

2011-09-27 Thread stack (Created) (JIRA)
CatalogTracker has an identity crisis; needs to be cut-back in scope


 Key: HBASE-4495
 URL: https://issues.apache.org/jira/browse/HBASE-4495
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.94.0
Reporter: stack


CT needs a good reworking.  I'd suggest its scope be cut way down to only deal 
in zk transactions rather than zk and reading meta location in hbase (over an 
HConnection) and being a purveyor of HRegionInterfaces on meta and root servers 
and being an Abortable and a verifier of catalog locations.  Once this is done, 
I would suggest it then better belongs over under the zk package and that the 
Meta* classes then move to client package.

Here's some messy notes I added to head of CT class in hbase-3446 where I spent 
some time trying to make out what it was CT did.

{code}
  // TODO: This class needs a rethink.  The original intent was that it would be
  // the one-stop-shop for root and meta locations and that it would get this
  // info from reading and watching zk state.  The class was to be used by
  // servers when they needed to know of root and meta movement but also by
  // client-side (inside in HTable) so rather than figure root and meta
  // locations on fault, the client would instead get notifications out of zk.
  // 
  // But this original intent is frustrated by the fact that this class has to
  // read an hbase table, the -ROOT- table, to figure out the .META. region
  // location which means we depend on an HConnection.  HConnection will do
  // retrying but also, it has its own mechanism for finding root and meta
  // locations (and for 'verifying'; it tries the location and if it fails, does
  // new lookup, etc.).  So, at least for now, HConnection (or HTable) can't
  // have a CT since CT needs a HConnection (Even then, do want HT to have a CT?
  // For HT keep up a session with ZK?  Rather, shouldn't we do like asynchbase
  // where we'd open a connection to zk, read what we need then let the
  // connection go?).  The 'fix' is make it so both root and meta addresses
  // are wholey up in zk -- not in zk (root) -- and in an hbase table (meta).
  //
  // But even then, this class does 'verification' of the location and it does
  // this by making a call over an HConnection (which will do its own root
  // and meta lookups).  Isn't this verification 'useless' since when we
  // return, whatever is dependent on the result of this call then needs to
  // use HConnection; what we have verified may change in meantime (HConnection
  // uses the CT primitives, the root and meta trackers finding root locations).
  //
  // When meta is moved to zk, this class may make more sense.  In the
  // meantime, it does not cohere.  It should just watch meta and root and
  // NOT do verification -- let that be out in HConnection since its going to
  // be done there ultimately anyways.
  //
  // This class has spread throughout the codebase.  It needs to be reigned in.
  // This class should be used server-side only, even if we move meta location
  // up into zk.  Currently its used over in the client package. Its used in
  // MetaReader and MetaEditor classes usually just to get the Configuration
  // its using (It does this indirectly by asking its HConnection for its
  // Configuration and even then this is just used to get an HConnection out on
  // the other end). St.Ack 10/23/2011.
  //
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

2011-09-27 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3446:
-

Status: Patch Available  (was: Open)

 ProcessServerShutdown fails if META moves, orphaning lots of regions
 

 Key: HBASE-3446
 URL: https://issues.apache.org/jira/browse/HBASE-3446
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.0
Reporter: Todd Lipcon
Assignee: stack
Priority: Blocker
 Fix For: 0.92.0

 Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, 
 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, 
 3446v15.txt


 I ran a rolling restart on a 5 node cluster with lots of regions, and 
 afterwards had LOTS of regions left orphaned. The issue appears to be that 
 ProcessServerShutdown failed because the server hosting META was restarted 
 around the same time as another server was being processed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4496) HFile V2 does not honor setCacheBlocks when scanning.

2011-09-27 Thread Lars Hofhansl (Created) (JIRA)
HFile V2 does not honor setCacheBlocks when scanning.
-

 Key: HBASE-4496
 URL: https://issues.apache.org/jira/browse/HBASE-4496
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Lars Hofhansl
 Fix For: 0.92.0, 0.94.0


While testing the LRU cache during the scanning I noticed quite some churn in 
the cache even when Scan.cacheBlocks is set to false. After debugging this, I 
found that HFile V2 always caches blocks in the LRU cache regardless of the 
cacheBlocks setting.

Here's a trace (from Eclipse) showing the problem:

HFileReaderV2.readBlock(long, int, boolean, boolean, boolean) line: 279 
HFileReaderV2.readBlockData(long, long, int, boolean) line: 219 
HFileBlockIndex$BlockIndexReader.seekToDataBlock(byte[], int, int, HFileBlock) 
line: 191
HFileReaderV2$ScannerV2.seekTo(byte[], int, int, boolean) line: 502 
HFileReaderV2$ScannerV2.reseekTo(byte[], int, int) line: 539
StoreFileScanner.reseekAtOrAfter(HFileScanner, KeyValue) line: 151  
StoreFileScanner.reseek(KeyValue) line: 110 
KeyValueHeap.reseek(KeyValue) line: 255 
StoreScanner.reseek(KeyValue) line: 409 
StoreScanner.next(ListKeyValue, int) line: 304
KeyValueHeap.next(ListKeyValue, int) line: 114
KeyValueHeap.next(ListKeyValue) line: 143 
HRegion$RegionScannerImpl.nextRow(byte[]) line: 2774
HRegion$RegionScannerImpl.nextInternal(int) line: 2722  
HRegion$RegionScannerImpl.next(ListKeyValue, int) line: 2682  
HRegion$RegionScannerImpl.next(ListKeyValue) line: 2699   
HRegionServer.next(long, int) line: 2092

Every scanner.next causes a reseek, which eventually causes a call to 
HFileBlockIndex$BlockIndexReader.seekToDataBlock(...) at which point the 
cacheBlocks information is lost. HFileReaderV2.readBlockData calls 
HFileReaderV2.readBlock with cacheBlocks set unconditionally to true.

The fix is not immediately clear, unless we want to pass cacheBlocks to 
HFileBlockIndex$BlockIndexReader.seekToDataBlock and then on to 
HFileBlock.BasicReader.readBlockData and all its implementers, which is ugly as 
readBlockData should not care about caching.

Avoiding caching during scans is somewhat important for us.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2011-09-27 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115353#comment-13115353
 ] 

Hudson commented on HBASE-4433:
---

Integrated in HBase-TRUNK #2261 (See 
[https://builds.apache.org/job/HBase-TRUNK/2261/])
HBASE-4433  avoid extra next (potentially a seek) if done with column/row 
(kannan via jgray)

jgray : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java


 avoid extra next (potentially a seek) if done with column/row
 -

 Key: HBASE-4433
 URL: https://issues.apache.org/jira/browse/HBASE-4433
 Project: HBase
  Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
 Fix For: 0.94.0


 [Noticed this in 89, but quite likely true of trunk as well.]
 When we are done with the requested column(s) the code still does an extra 
 next() call before it realizes that it is actually done. This extra next() 
 call could potentially result in an unnecessary extra block load. This is 
 likely to be especially bad for CFs where the KVs are large blobs where each 
 KV may be occupying a block of its own. So the next() can often load a new 
 unrelated block unnecessarily.
 --
 For the simple case of reading say the top-most column in a row in a single 
 file, where each column (KV) was say a block of its own-- it seems that we 
 are reading 3 blocks, instead of 1 block!
 I am working on a simple patch and with that the number of seeks is down to 
 2. 
 [There is still an extra seek left.  I think there were two levels of 
 extra/unnecessary next() we were doing without actually confirming that the 
 next was needed. One at the StoreScanner/ScanQueryMatcher level which this 
 diff avoids. I think the other is at hfs.next() (at the storefile scanner 
 level) that's happening whenever a HFile scanner servers out a data-- and 
 perhaps that's the additional seek that we need to avoid. But I want to 
 tackle this optimization first as the two issues seem unrelated.]
 -- 
 The basic idea of the patch I am working on/testing is as follows. The 
 ExplicitColumnTracker currently returns INCLUDE to the ScanQueryMatcher if 
 the KV needs to be included and then if done, only in the the next call it 
 returns the appropriate SEEK_NEXT_COL or SEEK_NEXT_ROW hint. For the cases 
 when ExplicitColumnTracker knows it is done with a particular column/row, the 
 patch attempts to combine the INCLUDE code and done hint into a single match 
 code-- INCLUDE_AND_SEEK_NEXT_COL and INCLUDE_AND_SEEK_NEXT_ROW.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

2011-09-27 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115391#comment-13115391
 ] 

jirapos...@reviews.apache.org commented on HBASE-3446:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2083
---


Thanks for the cleanup.


src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4725

Back to future :-)



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4727

Nice.



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4726

Extra so.



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4730

Do we need to update now and pass to the helper methods ?
The helper methods can easily figure out what now should be.



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4729

I think  would be enough.



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4731

I am confused by the condition here.


- Ted


On 2011-09-27 06:38:09, Michael Stack wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2065/
bq.  ---
bq.  
bq.  (Updated 2011-09-27 06:38:09)
bq.  
bq.  
bq.  Review request for hbase and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Make the Meta* operations against meta retry.  We do it by using HTable 
instances.
bq.  (HTable calls HConnection.getRegionServerWithRetries for get, put, scan 
etc).
bq.  In 0.89, we had special RetryableMetaOperation class that was a
bq.  subclass of Callable which reproduced the guts of 
HConnection.getRegionServerWithRetries
bq.  with its retry loop.  Now we just use HTable instead (Costs some on setup 
but
bq.  otherwise, we avoid duplicating code).  Upped the retries on serverside 
too.
bq.  
bq.  Had problem with CatalogJanitor.  MetaReader and MetaEditor were relying
bq.  heavily on CT methods getting proxy connections to meta and root servers.
bq.  CT needs to be cut back.  This patch closes down access on (unused) public
bq.  methods and removes being able to get an HRegionInterface on meta and root
bq.  -- this stuff is used internally to CT only now; use MetaEditor or
bq.  MetaReader if you want to update or read catalog tables.  Opening new issue
bq.  to cutback CT use over the code base.
bq.  
bq.  A little off topic but couldn't help it since was in MetaReader and 
MetaEditor
bq.  trying to clean them up, I ended up moving meta migration code out to its
bq.  own class rather than have it in all inside in MetaEditor.
bq.  
bq.  Here is some detail to help reviews.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq.Clean up.  Shutdown access on some of these unused methods.  Don't
bq.let out HRegionInterface instances in particular since we are going
bq.away from raw HRI use to instead use a connection with retries:
bq.i.e. HTable.
bq.  
bq.Comments on state of this class. Javadoc edits.
bq.getZooKeeperWatcher on HConnection is deprecated so don't use it
bq.in constructor.  Override MetaNodeTracker and on node delete
bq.reset meta location (We used to do this over in MetaNodeTracker
bq.but to do that we had to have a CatalogTracker over in zk package
bq.which is silly -- bad package encapsulation).
bq.  
bq.(waitForRootServer) Renamed getRootServerConnection and change it
bq.from public to package private.
bq.(waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq.(getMetaServerConnection) Change from public to package private.
bq.Use MetaReader to read the meta location in root rather than a
bq.raw HRegionInterface so we get retrying.
bq.(remaining, timedout) Added utility methods.
bq.(waitForMetaServer) Changed from public to private.
bq.(resetMetaLocation) Made it synchronized on metaAvailable.
bq.Not all accesses were synchronized.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq.Refactor to use HTable instead of raw HRegionInterface so we get
bq.retrying.  For each operation we get an HTable, use it, then close it.
bq.(putToMetaTable, putsToMetaTable, etc) Utility methods.
bq.(updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq.class since these classes are for a one-time migration only.
bq.  
bq.  A 

[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

2011-09-27 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115394#comment-13115394
 ] 

jirapos...@reviews.apache.org commented on HBASE-3446:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2084
---



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4732

Log should be changed accordingly.


- Ted


On 2011-09-27 06:38:09, Michael Stack wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2065/
bq.  ---
bq.  
bq.  (Updated 2011-09-27 06:38:09)
bq.  
bq.  
bq.  Review request for hbase and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Make the Meta* operations against meta retry.  We do it by using HTable 
instances.
bq.  (HTable calls HConnection.getRegionServerWithRetries for get, put, scan 
etc).
bq.  In 0.89, we had special RetryableMetaOperation class that was a
bq.  subclass of Callable which reproduced the guts of 
HConnection.getRegionServerWithRetries
bq.  with its retry loop.  Now we just use HTable instead (Costs some on setup 
but
bq.  otherwise, we avoid duplicating code).  Upped the retries on serverside 
too.
bq.  
bq.  Had problem with CatalogJanitor.  MetaReader and MetaEditor were relying
bq.  heavily on CT methods getting proxy connections to meta and root servers.
bq.  CT needs to be cut back.  This patch closes down access on (unused) public
bq.  methods and removes being able to get an HRegionInterface on meta and root
bq.  -- this stuff is used internally to CT only now; use MetaEditor or
bq.  MetaReader if you want to update or read catalog tables.  Opening new issue
bq.  to cutback CT use over the code base.
bq.  
bq.  A little off topic but couldn't help it since was in MetaReader and 
MetaEditor
bq.  trying to clean them up, I ended up moving meta migration code out to its
bq.  own class rather than have it in all inside in MetaEditor.
bq.  
bq.  Here is some detail to help reviews.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq.Clean up.  Shutdown access on some of these unused methods.  Don't
bq.let out HRegionInterface instances in particular since we are going
bq.away from raw HRI use to instead use a connection with retries:
bq.i.e. HTable.
bq.  
bq.Comments on state of this class. Javadoc edits.
bq.getZooKeeperWatcher on HConnection is deprecated so don't use it
bq.in constructor.  Override MetaNodeTracker and on node delete
bq.reset meta location (We used to do this over in MetaNodeTracker
bq.but to do that we had to have a CatalogTracker over in zk package
bq.which is silly -- bad package encapsulation).
bq.  
bq.(waitForRootServer) Renamed getRootServerConnection and change it
bq.from public to package private.
bq.(waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq.(getMetaServerConnection) Change from public to package private.
bq.Use MetaReader to read the meta location in root rather than a
bq.raw HRegionInterface so we get retrying.
bq.(remaining, timedout) Added utility methods.
bq.(waitForMetaServer) Changed from public to private.
bq.(resetMetaLocation) Made it synchronized on metaAvailable.
bq.Not all accesses were synchronized.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq.Refactor to use HTable instead of raw HRegionInterface so we get
bq.retrying.  For each operation we get an HTable, use it, then close it.
bq.(putToMetaTable, putsToMetaTable, etc) Utility methods.
bq.(updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq.class since these classes are for a one-time migration only.
bq.  
bq.  A 
src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
bq.New class that holds all Meta* methods updating meta table used
bq.doing the one-time migration done to meta on startup.  This class
bq.is marked deprecated because its going to be dropped in 0.94.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
bq.Retrofit methods in here to use fullScan methods with Visitor.
bq.(getCatalogRegionInterface, getCatalogRegionNameForTable,
bq.  getCatalogRegionNameForRegion) Removed.
bq.(fullScan) Cleaned up the fullScans.  Fixed up wrong javadoc.
bq.(fullScanOfResults) Renamed as fullScan override.
bq.(fullScanOfRoot) Added as deprecated. We should be doing
bq.this against zk.
bq.(metaRowToRegionPair, getServerNameFromResult) 

[jira] [Commented] (HBASE-4448) HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility instances across unit tests

2011-09-27 Thread Doug Meil (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115435#comment-13115435
 ] 

Doug Meil commented on HBASE-4448:
--

More than one cluster in the same test?  That would have to be a replication 
test.  I'm not sure that's going to work, I think the best-case is to re-use 
one cluster, and then fire-up another cluster from scratch.

 HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility 
 instances across unit tests
 -

 Key: HBASE-4448
 URL: https://issues.apache.org/jira/browse/HBASE-4448
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: HBaseTestingUtilityFactory.java, 
 hbase_hbaseTestingUtility_uses_2011_09_22.xlsx, java_HBASE_4448.patch


 Setting up and tearing down HBaseTestingUtility instances in unit tests is 
 very expensive.  On my MacBook it takes about 10 seconds to set up a 
 MiniCluster, and 7 seconds to tear it down.  When multiplied by the number of 
 test classes that use this facility, that's a lot of time in the build.
 This factory assumes that the JVM is being re-used across test classes in the 
 build, otherwise this pattern won't work. 
 I don't think this is appropriate for every use, but I think it can be 
 applicable in a great many cases - especially where developers just want a 
 simple MiniCluster with 1 slave.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4485) Eliminate window of missing Data

2011-09-27 Thread Amitanand Aiyer (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amitanand Aiyer updated HBASE-4485:
---

Attachment: repro_bug-4485.diff

Here is a way to repro the bug.

uses unnecessary sleep to get things to go bad. Not intended to be included in 
the final diff/submission.

 Eliminate window of missing Data
 

 Key: HBASE-4485
 URL: https://issues.apache.org/jira/browse/HBASE-4485
 Project: HBase
  Issue Type: Sub-task
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
 Fix For: 0.94.0

 Attachments: 4485-v1.diff, repro_bug-4485.diff


 After incorporating v11 of the 2856 fix, we discovered that we are still 
 having some ACID violations.
 This time, however, the problem is not about including newer updates; but, 
 about missing older updates
 that should be including. 
 Here is what seems to be happening.
 There is a race condition in the StoreScanner.getScanners()
   private ListKeyValueScanner getScanners(Scan scan,
   final NavigableSetbyte[] columns) throws IOException {
 // First the store file scanners
 ListStoreFileScanner sfScanners = StoreFileScanner
   .getScannersForStoreFiles(store.getStorefiles(), cacheBlocks,
 isGet, false);
 ListKeyValueScanner scanners =
   new ArrayListKeyValueScanner(sfScanners.size()+1);
 // include only those scan files which pass all filters
 for (StoreFileScanner sfs : sfScanners) {
   if (sfs.shouldSeek(scan, columns)) {
 scanners.add(sfs);
   }
 }
 // Then the memstore scanners
 if (this.store.memstore.shouldSeek(scan)) {
   scanners.addAll(this.store.memstore.getScanners());
 }
 return scanners;
   }
 If for example there is a call to Store.updateStorefiles() that happens 
 between
 the store.getStorefiles() and this.store.memstore.getScanners(); then
 it is possible that there was a new HFile created, that is not seen by the
 StoreScanner, and the data is not present in the Memstore.snapshot either.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4485) Eliminate window of missing Data

2011-09-27 Thread Amitanand Aiyer (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amitanand Aiyer updated HBASE-4485:
---

Attachment: 4485-v2.diff

fix the issue using locks to ensure that Store.updateStoreFiles does not get 
called between StoreScanner getting the List of store files, and it getting to 
the MemStoreScanner.

 Eliminate window of missing Data
 

 Key: HBASE-4485
 URL: https://issues.apache.org/jira/browse/HBASE-4485
 Project: HBase
  Issue Type: Sub-task
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
 Fix For: 0.94.0

 Attachments: 4485-v1.diff, 4485-v2.diff, 4485-v3.diff, 
 repro_bug-4485.diff


 After incorporating v11 of the 2856 fix, we discovered that we are still 
 having some ACID violations.
 This time, however, the problem is not about including newer updates; but, 
 about missing older updates
 that should be including. 
 Here is what seems to be happening.
 There is a race condition in the StoreScanner.getScanners()
   private ListKeyValueScanner getScanners(Scan scan,
   final NavigableSetbyte[] columns) throws IOException {
 // First the store file scanners
 ListStoreFileScanner sfScanners = StoreFileScanner
   .getScannersForStoreFiles(store.getStorefiles(), cacheBlocks,
 isGet, false);
 ListKeyValueScanner scanners =
   new ArrayListKeyValueScanner(sfScanners.size()+1);
 // include only those scan files which pass all filters
 for (StoreFileScanner sfs : sfScanners) {
   if (sfs.shouldSeek(scan, columns)) {
 scanners.add(sfs);
   }
 }
 // Then the memstore scanners
 if (this.store.memstore.shouldSeek(scan)) {
   scanners.addAll(this.store.memstore.getScanners());
 }
 return scanners;
   }
 If for example there is a call to Store.updateStorefiles() that happens 
 between
 the store.getStorefiles() and this.store.memstore.getScanners(); then
 it is possible that there was a new HFile created, that is not seen by the
 StoreScanner, and the data is not present in the Memstore.snapshot either.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4485) Eliminate window of missing Data

2011-09-27 Thread Amitanand Aiyer (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amitanand Aiyer updated HBASE-4485:
---

Attachment: 4485-v3.diff

Move some functions to Store.java so that we do not have to access store.lock 
from StoreScanner.java

Passes the TestAcidGuarantees on the internal branch (0.89). 

Running the test suite on the open source trunk (in progress)

 Eliminate window of missing Data
 

 Key: HBASE-4485
 URL: https://issues.apache.org/jira/browse/HBASE-4485
 Project: HBase
  Issue Type: Sub-task
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
 Fix For: 0.94.0

 Attachments: 4485-v1.diff, 4485-v2.diff, 4485-v3.diff, 
 repro_bug-4485.diff


 After incorporating v11 of the 2856 fix, we discovered that we are still 
 having some ACID violations.
 This time, however, the problem is not about including newer updates; but, 
 about missing older updates
 that should be including. 
 Here is what seems to be happening.
 There is a race condition in the StoreScanner.getScanners()
   private ListKeyValueScanner getScanners(Scan scan,
   final NavigableSetbyte[] columns) throws IOException {
 // First the store file scanners
 ListStoreFileScanner sfScanners = StoreFileScanner
   .getScannersForStoreFiles(store.getStorefiles(), cacheBlocks,
 isGet, false);
 ListKeyValueScanner scanners =
   new ArrayListKeyValueScanner(sfScanners.size()+1);
 // include only those scan files which pass all filters
 for (StoreFileScanner sfs : sfScanners) {
   if (sfs.shouldSeek(scan, columns)) {
 scanners.add(sfs);
   }
 }
 // Then the memstore scanners
 if (this.store.memstore.shouldSeek(scan)) {
   scanners.addAll(this.store.memstore.getScanners());
 }
 return scanners;
   }
 If for example there is a call to Store.updateStorefiles() that happens 
 between
 the store.getStorefiles() and this.store.memstore.getScanners(); then
 it is possible that there was a new HFile created, that is not seen by the
 StoreScanner, and the data is not present in the Memstore.snapshot either.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4492) TestRollingRestart fails intermittently

2011-09-27 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115543#comment-13115543
 ] 

ramkrishna.s.vasudevan commented on HBASE-4492:
---

Hi Ted
I found the problem but i dont have the patch with me now.
Pls correct me if am wrong.
As per my analysis, the testcase always passes under one scenario when the ROOT 
and META are in the same RS.
If we take the testcode 
{code}
// Bring the RS hosting ROOT down and the RS hosting META down at once
RegionServerThread rootServer = getServerHostingRoot(cluster);
RegionServerThread metaServer = getServerHostingMeta(cluster);
if (rootServer == metaServer) {
  log(ROOT and META on the same server so killing another random server);
  int i=0;
  while (rootServer == metaServer) {
metaServer = cluster.getRegionServerThreads().get(i);
i++;
  }
}
log(Stopping server hosting ROOT);
rootServer.getRegionServer().stop(Stopping ROOT server);
log(Stopping server hosting META #1);
metaServer.getRegionServer().stop(Stopping META server);
{code}
we try to be cautious if the ROOT and META are in the same RS.  If we find ROOT 
and META in same RS we just assign metaServer to another RS(we dont check if it 
is the one having META) but still call rootServer.stop().
Now this will stop the rootServer internally closing both the root and meta 
region in it.
Hence the line
{code}
cluster.hbaseCluster.waitOnRegionServer(metaServer);
{code}
This also comes out.
Now while the ServerShutdownhandler processes this it can cleanly open a new 
ROOT and META.

If you take the failure cases the problem has been like the ROOT and META are 
assigned to different servers.
There is a time gap inbetween the ROOT rs going down and the META rs going 
down. This where something happens like the ROOT is tried to be assigned by the 
ServerShutdownHandler invoked due to ROOT rs at the same time some one else 
tries to assign the ROOT node(who is this someone am not clear :)) 
Pls do correct me if am wrong.  Will try to provide a patch so that this issue 
doesnt come.




 TestRollingRestart fails intermittently
 ---

 Key: HBASE-4492
 URL: https://issues.apache.org/jira/browse/HBASE-4492
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Jonathan Gray
 Attachments: 4492.txt


 I got the following when running test suite on TRUNK:
 {code}
 testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart)  
 Time elapsed: 300.28 sec   ERROR!
 java.lang.Exception: test timed out after 30 milliseconds
 at java.lang.Thread.sleep(Native Method)
 at 
 org.apache.hadoop.hbase.master.TestRollingRestart.waitForRSShutdownToStartAndFinish(TestRollingRestart.java:313)
 at 
 org.apache.hadoop.hbase.master.TestRollingRestart.testBasicRollingRestart(TestRollingRestart.java:210)
 {code}
 I ran TestRollingRestart#testBasicRollingRestart manually afterwards which 
 wiped out test output file for the failed test.
 Similar failure can be found on Jenkins:
 https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/19/testReport/junit/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4485) Eliminate window of missing Data

2011-09-27 Thread Amitanand Aiyer (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amitanand Aiyer updated HBASE-4485:
---

Attachment: 4485-v4.diff

move notifyChangeReaderObservers() outside the lock at all places.

Testing in progress.

 Eliminate window of missing Data
 

 Key: HBASE-4485
 URL: https://issues.apache.org/jira/browse/HBASE-4485
 Project: HBase
  Issue Type: Sub-task
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
 Fix For: 0.94.0

 Attachments: 4485-v1.diff, 4485-v2.diff, 4485-v3.diff, 4485-v4.diff, 
 repro_bug-4485.diff


 After incorporating v11 of the 2856 fix, we discovered that we are still 
 having some ACID violations.
 This time, however, the problem is not about including newer updates; but, 
 about missing older updates
 that should be including. 
 Here is what seems to be happening.
 There is a race condition in the StoreScanner.getScanners()
   private ListKeyValueScanner getScanners(Scan scan,
   final NavigableSetbyte[] columns) throws IOException {
 // First the store file scanners
 ListStoreFileScanner sfScanners = StoreFileScanner
   .getScannersForStoreFiles(store.getStorefiles(), cacheBlocks,
 isGet, false);
 ListKeyValueScanner scanners =
   new ArrayListKeyValueScanner(sfScanners.size()+1);
 // include only those scan files which pass all filters
 for (StoreFileScanner sfs : sfScanners) {
   if (sfs.shouldSeek(scan, columns)) {
 scanners.add(sfs);
   }
 }
 // Then the memstore scanners
 if (this.store.memstore.shouldSeek(scan)) {
   scanners.addAll(this.store.memstore.getScanners());
 }
 return scanners;
   }
 If for example there is a call to Store.updateStorefiles() that happens 
 between
 the store.getStorefiles() and this.store.memstore.getScanners(); then
 it is possible that there was a new HFile created, that is not seen by the
 StoreScanner, and the data is not present in the Memstore.snapshot either.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4344) Persist memstoreTS to disk

2011-09-27 Thread Amitanand Aiyer (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amitanand Aiyer updated HBASE-4344:
---

Attachment: 4344-v12.txt

combine 4344-v11.txt with 4485-v4.txt

Running the testsuite (in progress)

 Persist memstoreTS to disk
 --

 Key: HBASE-4344
 URL: https://issues.apache.org/jira/browse/HBASE-4344
 Project: HBase
  Issue Type: Sub-task
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
 Fix For: 0.89.20100924

 Attachments: 4344-v10.txt, 4344-v11.txt, 4344-v12.txt, 4344-v2.txt, 
 4344-v4.txt, 4344-v5.txt, 4344-v6.txt, 4344-v7.txt, 4344-v8.txt, 4344-v9.txt, 
 patch-2


 Atomicity can be achieved in two ways -- (i) by using  a multiversion 
 concurrency system (MVCC), or (ii) by ensuring that new writes do not 
 complete, until the old reads complete.
 Currently, Memstore uses something along the lines of MVCC (called RWCC for 
 read-write-consistency-control). But, this mechanism is not incorporated for 
 the key-values written to the disk, as they do not include the memstore TS.
 Let us make the two approaches be similar, by persisting the memstoreTS along 
 with the key-value when it is written to the disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4448) HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility instances across unit tests

2011-09-27 Thread Doug Meil (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-4448:
-

Attachment: java_HBASE_4448_v2.patch

Fixed point #3.

 HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility 
 instances across unit tests
 -

 Key: HBASE-4448
 URL: https://issues.apache.org/jira/browse/HBASE-4448
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: HBaseTestingUtilityFactory.java, 
 hbase_hbaseTestingUtility_uses_2011_09_22.xlsx, java_HBASE_4448.patch, 
 java_HBASE_4448_v2.patch


 Setting up and tearing down HBaseTestingUtility instances in unit tests is 
 very expensive.  On my MacBook it takes about 10 seconds to set up a 
 MiniCluster, and 7 seconds to tear it down.  When multiplied by the number of 
 test classes that use this facility, that's a lot of time in the build.
 This factory assumes that the JVM is being re-used across test classes in the 
 build, otherwise this pattern won't work. 
 I don't think this is appropriate for every use, but I think it can be 
 applicable in a great many cases - especially where developers just want a 
 simple MiniCluster with 1 slave.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

2011-09-27 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115592#comment-13115592
 ] 

jirapos...@reviews.apache.org commented on HBASE-3446:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2097
---



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4761

I was trying to minimize how many times we do System.currentTimeMillis



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4762

Agreed



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4764

Will fix w/ a comment.  If timeout  is 0, then we do not timeout.  I should 
call it out explicitly.


- Michael


On 2011-09-27 06:38:09, Michael Stack wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2065/
bq.  ---
bq.  
bq.  (Updated 2011-09-27 06:38:09)
bq.  
bq.  
bq.  Review request for hbase and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Make the Meta* operations against meta retry.  We do it by using HTable 
instances.
bq.  (HTable calls HConnection.getRegionServerWithRetries for get, put, scan 
etc).
bq.  In 0.89, we had special RetryableMetaOperation class that was a
bq.  subclass of Callable which reproduced the guts of 
HConnection.getRegionServerWithRetries
bq.  with its retry loop.  Now we just use HTable instead (Costs some on setup 
but
bq.  otherwise, we avoid duplicating code).  Upped the retries on serverside 
too.
bq.  
bq.  Had problem with CatalogJanitor.  MetaReader and MetaEditor were relying
bq.  heavily on CT methods getting proxy connections to meta and root servers.
bq.  CT needs to be cut back.  This patch closes down access on (unused) public
bq.  methods and removes being able to get an HRegionInterface on meta and root
bq.  -- this stuff is used internally to CT only now; use MetaEditor or
bq.  MetaReader if you want to update or read catalog tables.  Opening new issue
bq.  to cutback CT use over the code base.
bq.  
bq.  A little off topic but couldn't help it since was in MetaReader and 
MetaEditor
bq.  trying to clean them up, I ended up moving meta migration code out to its
bq.  own class rather than have it in all inside in MetaEditor.
bq.  
bq.  Here is some detail to help reviews.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq.Clean up.  Shutdown access on some of these unused methods.  Don't
bq.let out HRegionInterface instances in particular since we are going
bq.away from raw HRI use to instead use a connection with retries:
bq.i.e. HTable.
bq.  
bq.Comments on state of this class. Javadoc edits.
bq.getZooKeeperWatcher on HConnection is deprecated so don't use it
bq.in constructor.  Override MetaNodeTracker and on node delete
bq.reset meta location (We used to do this over in MetaNodeTracker
bq.but to do that we had to have a CatalogTracker over in zk package
bq.which is silly -- bad package encapsulation).
bq.  
bq.(waitForRootServer) Renamed getRootServerConnection and change it
bq.from public to package private.
bq.(waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq.(getMetaServerConnection) Change from public to package private.
bq.Use MetaReader to read the meta location in root rather than a
bq.raw HRegionInterface so we get retrying.
bq.(remaining, timedout) Added utility methods.
bq.(waitForMetaServer) Changed from public to private.
bq.(resetMetaLocation) Made it synchronized on metaAvailable.
bq.Not all accesses were synchronized.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq.Refactor to use HTable instead of raw HRegionInterface so we get
bq.retrying.  For each operation we get an HTable, use it, then close it.
bq.(putToMetaTable, putsToMetaTable, etc) Utility methods.
bq.(updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq.class since these classes are for a one-time migration only.
bq.  
bq.  A 
src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
bq.New class that holds all Meta* methods updating meta table used
bq.doing the one-time migration done to meta on startup.  This class
bq.is marked deprecated because its going to be dropped in 0.94.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
bq.Retrofit methods in here to use fullScan methods with 

[jira] [Assigned] (HBASE-4496) HFile V2 does not honor setCacheBlocks when scanning.

2011-09-27 Thread Lars Hofhansl (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl reassigned HBASE-4496:


Assignee: Lars Hofhansl

 HFile V2 does not honor setCacheBlocks when scanning.
 -

 Key: HBASE-4496
 URL: https://issues.apache.org/jira/browse/HBASE-4496
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0, 0.94.0


 While testing the LRU cache during the scanning I noticed quite some churn in 
 the cache even when Scan.cacheBlocks is set to false. After debugging this, I 
 found that HFile V2 always caches blocks in the LRU cache regardless of the 
 cacheBlocks setting.
 Here's a trace (from Eclipse) showing the problem:
 HFileReaderV2.readBlock(long, int, boolean, boolean, boolean) line: 279   
 HFileReaderV2.readBlockData(long, long, int, boolean) line: 219   
 HFileBlockIndex$BlockIndexReader.seekToDataBlock(byte[], int, int, 
 HFileBlock) line: 191  
 HFileReaderV2$ScannerV2.seekTo(byte[], int, int, boolean) line: 502   
 HFileReaderV2$ScannerV2.reseekTo(byte[], int, int) line: 539  
 StoreFileScanner.reseekAtOrAfter(HFileScanner, KeyValue) line: 151
 StoreFileScanner.reseek(KeyValue) line: 110   
 KeyValueHeap.reseek(KeyValue) line: 255   
 StoreScanner.reseek(KeyValue) line: 409   
 StoreScanner.next(ListKeyValue, int) line: 304  
 KeyValueHeap.next(ListKeyValue, int) line: 114  
 KeyValueHeap.next(ListKeyValue) line: 143   
 HRegion$RegionScannerImpl.nextRow(byte[]) line: 2774  
 HRegion$RegionScannerImpl.nextInternal(int) line: 2722
 HRegion$RegionScannerImpl.next(ListKeyValue, int) line: 2682
 HRegion$RegionScannerImpl.next(ListKeyValue) line: 2699 
 HRegionServer.next(long, int) line: 2092  
 Every scanner.next causes a reseek, which eventually causes a call to 
 HFileBlockIndex$BlockIndexReader.seekToDataBlock(...) at which point the 
 cacheBlocks information is lost. HFileReaderV2.readBlockData calls 
 HFileReaderV2.readBlock with cacheBlocks set unconditionally to true.
 The fix is not immediately clear, unless we want to pass cacheBlocks to 
 HFileBlockIndex$BlockIndexReader.seekToDataBlock and then on to 
 HFileBlock.BasicReader.readBlockData and all its implementers, which is ugly 
 as readBlockData should not care about caching.
 Avoiding caching during scans is somewhat important for us.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3025) Coprocessor based simple access control

2011-09-27 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115708#comment-13115708
 ] 

jirapos...@reviews.apache.org commented on HBASE-3025:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2041/#review2077
---


Looks good. The majority of my comments have to do with inconsistent logging 
practice.


security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessControlFilter.java
https://reviews.apache.org/r/2041/#comment4718

Could be stated better.



security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessControlLists.java
https://reviews.apache.org/r/2041/#comment4719

No.



security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessControlLists.java
https://reviews.apache.org/r/2041/#comment4720

Comment needs updating.




security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java
https://reviews.apache.org/r/2041/#comment4721

Can we make this 1?



security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java
https://reviews.apache.org/r/2041/#comment4775

Debug logging should go to LOG not AUDITLOG



security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java
https://reviews.apache.org/r/2041/#comment4782

Should be INFO or TRACE level? TRACE makes more sense to me.



security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java
https://reviews.apache.org/r/2041/#comment4776

Debug logging should go to LOG not AUDITLOG



security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java
https://reviews.apache.org/r/2041/#comment4779

Should be INFO or TRACE level? TRACE makes more sense to me.



security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java
https://reviews.apache.org/r/2041/#comment4795

Should something go to AUDITLOG here?



security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java
https://reviews.apache.org/r/2041/#comment4797

Should hasFamilyQualifierPermission log to AUDITLOG? It is used in places 
to make decisions -- an exception is thrown directly or not.



security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java
https://reviews.apache.org/r/2041/#comment4783

Another one of these was sent to AUDITLOG above. Do the same here? Should 
be INFO or TRACE level? TRACE makes more sense to me.



security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java
https://reviews.apache.org/r/2041/#comment4786

Ultimately users should be allowed to enable or disable their own tables, 
but only after such operations don't carry as much systemic risk as they do 
currently.

In that case, CREATE permission and an ownership check could follow the 
test for ADMIN permission.



security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java
https://reviews.apache.org/r/2041/#comment4787

As above



security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java
https://reviews.apache.org/r/2041/#comment4791

Should be logged with ERROR?



security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java
https://reviews.apache.org/r/2041/#comment4799

Would it be clearer then to call permissionGranted() something like 
hasColumnsPermission() ? Just a random thought.



security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java
https://reviews.apache.org/r/2041/#comment4803

Should this go to AUDITLOG? At INFO or TRACE level? My preference is TRACE.



security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java
https://reviews.apache.org/r/2041/#comment4804

Should this go to AUDITLOG? At INFO or TRACE level? My preference is TRACE.



security/src/main/java/org/apache/hadoop/hbase/security/rbac/Permission.java
https://reviews.apache.org/r/2041/#comment4807

What if instead we check for version 0 and throw an 
IllegalArgumentException if so? Technically, it is an invalid request if it 
contains an unrecognizable action code. Skipping this check if version  0 
would be a way to handle new perms while not accepting incorrect input 
otherwise.



security/src/main/java/org/apache/hadoop/hbase/security/rbac/TableAuthManager.java
https://reviews.apache.org/r/2041/#comment4813

Maybe we can call this .auth.? We don't really have an RBAC 
implementation yet. Likewise for the package name for all of this stuff? Just a 
random thought.



security/src/main/java/org/apache/hadoop/hbase/security/rbac/TableAuthManager.java
https://reviews.apache.org/r/2041/#comment4815

Isn't this an error?


[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

2011-09-27 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115728#comment-13115728
 ] 

jirapos...@reviews.apache.org commented on HBASE-3446:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2099
---



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4835

The condition, now  stopTime, is reversed for isTimedOut().


- Ted


On 2011-09-27 06:38:09, Michael Stack wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2065/
bq.  ---
bq.  
bq.  (Updated 2011-09-27 06:38:09)
bq.  
bq.  
bq.  Review request for hbase and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Make the Meta* operations against meta retry.  We do it by using HTable 
instances.
bq.  (HTable calls HConnection.getRegionServerWithRetries for get, put, scan 
etc).
bq.  In 0.89, we had special RetryableMetaOperation class that was a
bq.  subclass of Callable which reproduced the guts of 
HConnection.getRegionServerWithRetries
bq.  with its retry loop.  Now we just use HTable instead (Costs some on setup 
but
bq.  otherwise, we avoid duplicating code).  Upped the retries on serverside 
too.
bq.  
bq.  Had problem with CatalogJanitor.  MetaReader and MetaEditor were relying
bq.  heavily on CT methods getting proxy connections to meta and root servers.
bq.  CT needs to be cut back.  This patch closes down access on (unused) public
bq.  methods and removes being able to get an HRegionInterface on meta and root
bq.  -- this stuff is used internally to CT only now; use MetaEditor or
bq.  MetaReader if you want to update or read catalog tables.  Opening new issue
bq.  to cutback CT use over the code base.
bq.  
bq.  A little off topic but couldn't help it since was in MetaReader and 
MetaEditor
bq.  trying to clean them up, I ended up moving meta migration code out to its
bq.  own class rather than have it in all inside in MetaEditor.
bq.  
bq.  Here is some detail to help reviews.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq.Clean up.  Shutdown access on some of these unused methods.  Don't
bq.let out HRegionInterface instances in particular since we are going
bq.away from raw HRI use to instead use a connection with retries:
bq.i.e. HTable.
bq.  
bq.Comments on state of this class. Javadoc edits.
bq.getZooKeeperWatcher on HConnection is deprecated so don't use it
bq.in constructor.  Override MetaNodeTracker and on node delete
bq.reset meta location (We used to do this over in MetaNodeTracker
bq.but to do that we had to have a CatalogTracker over in zk package
bq.which is silly -- bad package encapsulation).
bq.  
bq.(waitForRootServer) Renamed getRootServerConnection and change it
bq.from public to package private.
bq.(waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq.(getMetaServerConnection) Change from public to package private.
bq.Use MetaReader to read the meta location in root rather than a
bq.raw HRegionInterface so we get retrying.
bq.(remaining, timedout) Added utility methods.
bq.(waitForMetaServer) Changed from public to private.
bq.(resetMetaLocation) Made it synchronized on metaAvailable.
bq.Not all accesses were synchronized.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq.Refactor to use HTable instead of raw HRegionInterface so we get
bq.retrying.  For each operation we get an HTable, use it, then close it.
bq.(putToMetaTable, putsToMetaTable, etc) Utility methods.
bq.(updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq.class since these classes are for a one-time migration only.
bq.  
bq.  A 
src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
bq.New class that holds all Meta* methods updating meta table used
bq.doing the one-time migration done to meta on startup.  This class
bq.is marked deprecated because its going to be dropped in 0.94.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
bq.Retrofit methods in here to use fullScan methods with Visitor.
bq.(getCatalogRegionInterface, getCatalogRegionNameForTable,
bq.  getCatalogRegionNameForRegion) Removed.
bq.(fullScan) Cleaned up the fullScans.  Fixed up wrong javadoc.
bq.(fullScanOfResults) Renamed as fullScan override.
bq.(fullScanOfRoot) Added as deprecated. We should be doing
bq.this against zk.
bq.(metaRowToRegionPair, 

[jira] [Issue Comment Edited] (HBASE-4492) TestRollingRestart fails intermittently

2011-09-27 Thread Ted Yu (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115249#comment-13115249
 ] 

Ted Yu edited comment on HBASE-4492 at 9/27/11 5:26 PM:


Found the following in output for the above timeout case:
{code}
2011-09-27 05:28:59,047 DEBUG 
[RegionServer:3;us.ciq.com,57539,1317101335695-EventThread] 
zookeeper.ZooKeeperWatcher(233): regionserver:57539-0x132a95afa18000d Received 
ZooKeeper Event, type=NodeDeleted, state=SyncConnected, 
path=/hbase/root-region-server
2011-09-27 05:28:59,047 DEBUG 
[RegionServer:3;us.ciq.com,58748,131710132-EventThread] 
zookeeper.ZKUtil(226): regionserver:58748-0x132a95afa18000a 
/hbase/root-region-server does not exist. Watcher is set.
2011-09-27 05:28:59,047 DEBUG 
[Master:0;us.ciq.com,56327,1317101304726-EventThread] zookeeper.ZKUtil(226): 
hconnection-0x132a95afa180005 /hbase/root-region-server does not exist. Watcher 
is set.
2011-09-27 05:28:59,048 DEBUG [Thread-1-EventThread] 
zookeeper.ZooKeeperWatcher(233): master:51567-0x132a95afa180008 Received 
ZooKeeper Event, type=NodeChildrenChanged, state=SyncConnected, 
path=/hbase/unassigned
2011-09-27 05:28:59,048 DEBUG 
[RegionServer:3;us.ciq.com,57539,1317101335695-EventThread] 
zookeeper.ZKUtil(226): regionserver:57539-0x132a95afa18000d 
/hbase/root-region-server does not exist. Watcher is set.
2011-09-27 05:28:59,049 INFO  
[MASTER_META_SERVER_OPERATIONS-us.ciq.com,51567,1317101319637-3] 
master.AssignmentManager(1485): No previous transition plan was found (or we 
are ignoring an existing plan) for -ROOT-,,0.70236052 so generated a random 
one; hri=-ROOT-,,0.70236052, src=, dest=us.ciq.com,57500,1317101330748; 3 
(online=3, exclude=null) available servers
2011-09-27 05:28:59,049 INFO  
[MASTER_META_SERVER_OPERATIONS-us.ciq.com,51567,1317101319637-3] 
master.AssignmentManager(1485): Assigning region -ROOT-,,0.70236052 to 
us.ciq.com,57500,1317101330748
2011-09-27 05:28:59,049 DEBUG 
[MASTER_META_SERVER_OPERATIONS-us.ciq.com,51567,1317101319637-3] 
master.ServerManager(448): New connection to us.ciq.com,57500,1317101330748
2011-09-27 05:28:59,049 DEBUG [Thread-1-EventThread] zookeeper.ZKUtil(224): 
master:51567-0x132a95afa180008 Set watcher on existing znode 
/hbase/unassigned/70236052
2011-09-27 05:28:59,049 FATAL 
[MASTER_META_SERVER_OPERATIONS-us.ciq.com,51567,1317101319637-3] 
master.HMaster(1181): Master server abort: loaded coprocessors are: []
2011-09-27 05:28:59,050 FATAL 
[MASTER_META_SERVER_OPERATIONS-us.ciq.com,51567,1317101319637-3] 
master.HMaster(1186): Unexpected state trying to OFFLINE; -ROOT-,,0.70236052 
state=PENDING_OPEN, ts=1317101339049, server=us.ciq.com,57500,1317101330748
java.lang.IllegalStateException
at 
org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1517)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1392)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1169)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1144)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1139)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:1816)
at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:105)
at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRootWithRetries(ServerShutdownHandler.java:123)
at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:186)
{code}

  was (Author: yuzhih...@gmail.com):
Found the following in output for the above timeout case:
{code}
2011-09-27 05:28:59,047 DEBUG 
[RegionServer:3;us.ciq.com,57539,1317101335695-EventThread] 
zookeeper.ZooKeeperWatcher(233): regionserver:57539-0x132a95afa18000d Received 
ZooKeeper Event, type=NodeDeleted, state=SyncConnected, 
path=/hbase/root-region-server2011-09-27 05:28:59,047 DEBUG 
[RegionServer:3;us.ciq.com,58748,131710132-EventThread] 
zookeeper.ZKUtil(226): regionserver:58748-0x132a95afa18000a 
/hbase/root-region-server does not exist. Watcher is set.2011-09-27 
05:28:59,047 DEBUG [Master:0;us.ciq.com,56327,1317101304726-EventThread] 
zookeeper.ZKUtil(226): hconnection-0x132a95afa180005 /hbase/root-region-server 
does not exist. Watcher is set.2011-09-27 05:28:59,048 DEBUG 
[Thread-1-EventThread] zookeeper.ZooKeeperWatcher(233): 
master:51567-0x132a95afa180008 Received ZooKeeper Event, 
type=NodeChildrenChanged, state=SyncConnected, path=/hbase/unassigned2011-09-27 
05:28:59,048 DEBUG [RegionServer:3;us.ciq.com,57539,1317101335695-EventThread] 
zookeeper.ZKUtil(226): regionserver:57539-0x132a95afa18000d 
/hbase/root-region-server does not exist. 

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2011-09-27 Thread Andrew Purtell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115743#comment-13115743
 ] 

Andrew Purtell commented on HBASE-4433:
---

According to my tests, this is safe to do on 0.92 and 0.90 branches as well. 
This change should be applied there.

 avoid extra next (potentially a seek) if done with column/row
 -

 Key: HBASE-4433
 URL: https://issues.apache.org/jira/browse/HBASE-4433
 Project: HBase
  Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
 Fix For: 0.94.0


 [Noticed this in 89, but quite likely true of trunk as well.]
 When we are done with the requested column(s) the code still does an extra 
 next() call before it realizes that it is actually done. This extra next() 
 call could potentially result in an unnecessary extra block load. This is 
 likely to be especially bad for CFs where the KVs are large blobs where each 
 KV may be occupying a block of its own. So the next() can often load a new 
 unrelated block unnecessarily.
 --
 For the simple case of reading say the top-most column in a row in a single 
 file, where each column (KV) was say a block of its own-- it seems that we 
 are reading 3 blocks, instead of 1 block!
 I am working on a simple patch and with that the number of seeks is down to 
 2. 
 [There is still an extra seek left.  I think there were two levels of 
 extra/unnecessary next() we were doing without actually confirming that the 
 next was needed. One at the StoreScanner/ScanQueryMatcher level which this 
 diff avoids. I think the other is at hfs.next() (at the storefile scanner 
 level) that's happening whenever a HFile scanner servers out a data-- and 
 perhaps that's the additional seek that we need to avoid. But I want to 
 tackle this optimization first as the two issues seem unrelated.]
 -- 
 The basic idea of the patch I am working on/testing is as follows. The 
 ExplicitColumnTracker currently returns INCLUDE to the ScanQueryMatcher if 
 the KV needs to be included and then if done, only in the the next call it 
 returns the appropriate SEEK_NEXT_COL or SEEK_NEXT_ROW hint. For the cases 
 when ExplicitColumnTracker knows it is done with a particular column/row, the 
 patch attempts to combine the INCLUDE code and done hint into a single match 
 code-- INCLUDE_AND_SEEK_NEXT_COL and INCLUDE_AND_SEEK_NEXT_ROW.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4497) If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE

2011-09-27 Thread ramkrishna.s.vasudevan (Created) (JIRA)
If region opening fails after updating META HBCK reports it as inconsistent and 
scanning the region throws NSRE
---

 Key: HBASE-4497
 URL: https://issues.apache.org/jira/browse/HBASE-4497
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Priority: Critical


As per the discussion in the mail chain HBCK reporting of possible mismatch in 
RS assignment this JIRA is created.
Consider two RS- RS1 and RS2.
A region tries to open in RS1. But it takes a while.  The RS1 has still not 
updated meta and transitioned the node from OPENING to OPENED
So timeout assigns the region to RS2.  RS2 successfully updates the META and 
opens the region.
Now RS1 tries to act on the region by first updating the META and then 
transiting the node to OPENING to OPENED.

RS1 transiting the node to OPENING to OPENED will fail.  But the META entry 
will have RS1 as the latest.
Now HBCK reports this as an inconsistency and if we try to scan the Region we 
get NotServingRegionException.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4496) HFile V2 does not honor setCacheBlocks when scanning.

2011-09-27 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115775#comment-13115775
 ] 

Lars Hofhansl commented on HBASE-4496:
--

In a simple test this shaves about 20% of scanning time (with caching switched 
off), in addition to avoiding LRU and GC churn.

 HFile V2 does not honor setCacheBlocks when scanning.
 -

 Key: HBASE-4496
 URL: https://issues.apache.org/jira/browse/HBASE-4496
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0, 0.94.0


 While testing the LRU cache during the scanning I noticed quite some churn in 
 the cache even when Scan.cacheBlocks is set to false. After debugging this, I 
 found that HFile V2 always caches blocks in the LRU cache regardless of the 
 cacheBlocks setting.
 Here's a trace (from Eclipse) showing the problem:
 HFileReaderV2.readBlock(long, int, boolean, boolean, boolean) line: 279   
 HFileReaderV2.readBlockData(long, long, int, boolean) line: 219   
 HFileBlockIndex$BlockIndexReader.seekToDataBlock(byte[], int, int, 
 HFileBlock) line: 191  
 HFileReaderV2$ScannerV2.seekTo(byte[], int, int, boolean) line: 502   
 HFileReaderV2$ScannerV2.reseekTo(byte[], int, int) line: 539  
 StoreFileScanner.reseekAtOrAfter(HFileScanner, KeyValue) line: 151
 StoreFileScanner.reseek(KeyValue) line: 110   
 KeyValueHeap.reseek(KeyValue) line: 255   
 StoreScanner.reseek(KeyValue) line: 409   
 StoreScanner.next(ListKeyValue, int) line: 304  
 KeyValueHeap.next(ListKeyValue, int) line: 114  
 KeyValueHeap.next(ListKeyValue) line: 143   
 HRegion$RegionScannerImpl.nextRow(byte[]) line: 2774  
 HRegion$RegionScannerImpl.nextInternal(int) line: 2722
 HRegion$RegionScannerImpl.next(ListKeyValue, int) line: 2682
 HRegion$RegionScannerImpl.next(ListKeyValue) line: 2699 
 HRegionServer.next(long, int) line: 2092  
 Every scanner.next causes a reseek, which eventually causes a call to 
 HFileBlockIndex$BlockIndexReader.seekToDataBlock(...) at which point the 
 cacheBlocks information is lost. HFileReaderV2.readBlockData calls 
 HFileReaderV2.readBlock with cacheBlocks set unconditionally to true.
 The fix is not immediately clear, unless we want to pass cacheBlocks to 
 HFileBlockIndex$BlockIndexReader.seekToDataBlock and then on to 
 HFileBlock.BasicReader.readBlockData and all its implementers, which is ugly 
 as readBlockData should not care about caching.
 Avoiding caching during scans is somewhat important for us.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4496) HFile V2 does not honor setCacheBlocks when scanning.

2011-09-27 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115787#comment-13115787
 ] 

stack commented on HBASE-4496:
--

Good find Lars.

 HFile V2 does not honor setCacheBlocks when scanning.
 -

 Key: HBASE-4496
 URL: https://issues.apache.org/jira/browse/HBASE-4496
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0, 0.94.0


 While testing the LRU cache during the scanning I noticed quite some churn in 
 the cache even when Scan.cacheBlocks is set to false. After debugging this, I 
 found that HFile V2 always caches blocks in the LRU cache regardless of the 
 cacheBlocks setting.
 Here's a trace (from Eclipse) showing the problem:
 HFileReaderV2.readBlock(long, int, boolean, boolean, boolean) line: 279   
 HFileReaderV2.readBlockData(long, long, int, boolean) line: 219   
 HFileBlockIndex$BlockIndexReader.seekToDataBlock(byte[], int, int, 
 HFileBlock) line: 191  
 HFileReaderV2$ScannerV2.seekTo(byte[], int, int, boolean) line: 502   
 HFileReaderV2$ScannerV2.reseekTo(byte[], int, int) line: 539  
 StoreFileScanner.reseekAtOrAfter(HFileScanner, KeyValue) line: 151
 StoreFileScanner.reseek(KeyValue) line: 110   
 KeyValueHeap.reseek(KeyValue) line: 255   
 StoreScanner.reseek(KeyValue) line: 409   
 StoreScanner.next(ListKeyValue, int) line: 304  
 KeyValueHeap.next(ListKeyValue, int) line: 114  
 KeyValueHeap.next(ListKeyValue) line: 143   
 HRegion$RegionScannerImpl.nextRow(byte[]) line: 2774  
 HRegion$RegionScannerImpl.nextInternal(int) line: 2722
 HRegion$RegionScannerImpl.next(ListKeyValue, int) line: 2682
 HRegion$RegionScannerImpl.next(ListKeyValue) line: 2699 
 HRegionServer.next(long, int) line: 2092  
 Every scanner.next causes a reseek, which eventually causes a call to 
 HFileBlockIndex$BlockIndexReader.seekToDataBlock(...) at which point the 
 cacheBlocks information is lost. HFileReaderV2.readBlockData calls 
 HFileReaderV2.readBlock with cacheBlocks set unconditionally to true.
 The fix is not immediately clear, unless we want to pass cacheBlocks to 
 HFileBlockIndex$BlockIndexReader.seekToDataBlock and then on to 
 HFileBlock.BasicReader.readBlockData and all its implementers, which is ugly 
 as readBlockData should not care about caching.
 Avoiding caching during scans is somewhat important for us.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4497) If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE

2011-09-27 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115788#comment-13115788
 ] 

stack commented on HBASE-4497:
--

Do we need to add an extra tickle of OPENING znode after open of region and 
before we go to do meta update?

 If region opening fails after updating META HBCK reports it as inconsistent 
 and scanning the region throws NSRE
 ---

 Key: HBASE-4497
 URL: https://issues.apache.org/jira/browse/HBASE-4497
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Priority: Critical

 As per the discussion in the mail chain HBCK reporting of possible mismatch 
 in RS assignment this JIRA is created.
 Consider two RS- RS1 and RS2.
 A region tries to open in RS1. But it takes a while.  The RS1 has still not 
 updated meta and transitioned the node from OPENING to OPENED
 So timeout assigns the region to RS2.  RS2 successfully updates the META and 
 opens the region.
 Now RS1 tries to act on the region by first updating the META and then 
 transiting the node to OPENING to OPENED.
 RS1 transiting the node to OPENING to OPENED will fail.  But the META entry 
 will have RS1 as the latest.
 Now HBCK reports this as an inconsistency and if we try to scan the Region we 
 get NotServingRegionException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Reopened] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2011-09-27 Thread stack (Reopened) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reopened HBASE-4433:
--


Reopening until we apply to 0.92 and 0.90 too (as per Andrew).   I'll do it 
(stop me Jon if you are already at this).

 avoid extra next (potentially a seek) if done with column/row
 -

 Key: HBASE-4433
 URL: https://issues.apache.org/jira/browse/HBASE-4433
 Project: HBase
  Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
 Fix For: 0.94.0


 [Noticed this in 89, but quite likely true of trunk as well.]
 When we are done with the requested column(s) the code still does an extra 
 next() call before it realizes that it is actually done. This extra next() 
 call could potentially result in an unnecessary extra block load. This is 
 likely to be especially bad for CFs where the KVs are large blobs where each 
 KV may be occupying a block of its own. So the next() can often load a new 
 unrelated block unnecessarily.
 --
 For the simple case of reading say the top-most column in a row in a single 
 file, where each column (KV) was say a block of its own-- it seems that we 
 are reading 3 blocks, instead of 1 block!
 I am working on a simple patch and with that the number of seeks is down to 
 2. 
 [There is still an extra seek left.  I think there were two levels of 
 extra/unnecessary next() we were doing without actually confirming that the 
 next was needed. One at the StoreScanner/ScanQueryMatcher level which this 
 diff avoids. I think the other is at hfs.next() (at the storefile scanner 
 level) that's happening whenever a HFile scanner servers out a data-- and 
 perhaps that's the additional seek that we need to avoid. But I want to 
 tackle this optimization first as the two issues seem unrelated.]
 -- 
 The basic idea of the patch I am working on/testing is as follows. The 
 ExplicitColumnTracker currently returns INCLUDE to the ScanQueryMatcher if 
 the KV needs to be included and then if done, only in the the next call it 
 returns the appropriate SEEK_NEXT_COL or SEEK_NEXT_ROW hint. For the cases 
 when ExplicitColumnTracker knows it is done with a particular column/row, the 
 patch attempts to combine the INCLUDE code and done hint into a single match 
 code-- INCLUDE_AND_SEEK_NEXT_COL and INCLUDE_AND_SEEK_NEXT_ROW.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2011-09-27 Thread stack (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-4433.
--

   Resolution: Fixed
Fix Version/s: (was: 0.94.0)
   0.92.0

Applied to 0.92 branch too.  Didn't add to 0.90 because doesn't apply clean 
(there are test files missing).

 avoid extra next (potentially a seek) if done with column/row
 -

 Key: HBASE-4433
 URL: https://issues.apache.org/jira/browse/HBASE-4433
 Project: HBase
  Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
 Fix For: 0.92.0


 [Noticed this in 89, but quite likely true of trunk as well.]
 When we are done with the requested column(s) the code still does an extra 
 next() call before it realizes that it is actually done. This extra next() 
 call could potentially result in an unnecessary extra block load. This is 
 likely to be especially bad for CFs where the KVs are large blobs where each 
 KV may be occupying a block of its own. So the next() can often load a new 
 unrelated block unnecessarily.
 --
 For the simple case of reading say the top-most column in a row in a single 
 file, where each column (KV) was say a block of its own-- it seems that we 
 are reading 3 blocks, instead of 1 block!
 I am working on a simple patch and with that the number of seeks is down to 
 2. 
 [There is still an extra seek left.  I think there were two levels of 
 extra/unnecessary next() we were doing without actually confirming that the 
 next was needed. One at the StoreScanner/ScanQueryMatcher level which this 
 diff avoids. I think the other is at hfs.next() (at the storefile scanner 
 level) that's happening whenever a HFile scanner servers out a data-- and 
 perhaps that's the additional seek that we need to avoid. But I want to 
 tackle this optimization first as the two issues seem unrelated.]
 -- 
 The basic idea of the patch I am working on/testing is as follows. The 
 ExplicitColumnTracker currently returns INCLUDE to the ScanQueryMatcher if 
 the KV needs to be included and then if done, only in the the next call it 
 returns the appropriate SEEK_NEXT_COL or SEEK_NEXT_ROW hint. For the cases 
 when ExplicitColumnTracker knows it is done with a particular column/row, the 
 patch attempts to combine the INCLUDE code and done hint into a single match 
 code-- INCLUDE_AND_SEEK_NEXT_COL and INCLUDE_AND_SEEK_NEXT_ROW.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4497) If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE

2011-09-27 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115795#comment-13115795
 ] 

Jean-Daniel Cryans commented on HBASE-4497:
---

Stack, it seems that it's already the case:

{code}
if (tickleOpening(post_region_open)) {
  if (updateMeta(region)) failed = false;
}
{code}

In any case, there's still a hole as those two operations aren't done in an 
atomic fashion.

 If region opening fails after updating META HBCK reports it as inconsistent 
 and scanning the region throws NSRE
 ---

 Key: HBASE-4497
 URL: https://issues.apache.org/jira/browse/HBASE-4497
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Priority: Critical

 As per the discussion in the mail chain HBCK reporting of possible mismatch 
 in RS assignment this JIRA is created.
 Consider two RS- RS1 and RS2.
 A region tries to open in RS1. But it takes a while.  The RS1 has still not 
 updated meta and transitioned the node from OPENING to OPENED
 So timeout assigns the region to RS2.  RS2 successfully updates the META and 
 opens the region.
 Now RS1 tries to act on the region by first updating the META and then 
 transiting the node to OPENING to OPENED.
 RS1 transiting the node to OPENING to OPENED will fail.  But the META entry 
 will have RS1 as the latest.
 Now HBCK reports this as an inconsistency and if we try to scan the Region we 
 get NotServingRegionException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4492) TestRollingRestart fails intermittently

2011-09-27 Thread Ming Ma (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115798#comment-13115798
 ] 

Ming Ma commented on HBASE-4492:


Here are the sequence of events how this could happen. It applies to any region.

T1. After AM sent openRegion RPC to RS1, RS1 set ZK state as OPENING.
T2. RS1 is stopped.
T3. ZK callbacked is delayed with ZK state OPENING. AM skip the processing 
given RS1 is offline. Thus region's state in AM is still PENDING_OPEN.
T4. During RS1's ServerShutDownHandler, it tries to assign the region by 
setting it offline first. This isn't allowed given the state isn't CLOSING or 
CLOSED.


 TestRollingRestart fails intermittently
 ---

 Key: HBASE-4492
 URL: https://issues.apache.org/jira/browse/HBASE-4492
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Jonathan Gray
 Attachments: 4492.txt


 I got the following when running test suite on TRUNK:
 {code}
 testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart)  
 Time elapsed: 300.28 sec   ERROR!
 java.lang.Exception: test timed out after 30 milliseconds
 at java.lang.Thread.sleep(Native Method)
 at 
 org.apache.hadoop.hbase.master.TestRollingRestart.waitForRSShutdownToStartAndFinish(TestRollingRestart.java:313)
 at 
 org.apache.hadoop.hbase.master.TestRollingRestart.testBasicRollingRestart(TestRollingRestart.java:210)
 {code}
 I ran TestRollingRestart#testBasicRollingRestart manually afterwards which 
 wiped out test output file for the failed test.
 Similar failure can be found on Jenkins:
 https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/19/testReport/junit/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4496) HFile V2 does not honor setCacheBlocks when scanning.

2011-09-27 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-4496:
-

Attachment: 4496.txt

This fixes the problem for me.

TestHFileBlock, TestCacheOnWrite, TestHFileBlockIndex, and TestHFileWriterV2 
still pass.

It's not pretty, though. Lots of places where cacheBlock now needs to be passed 
around.
I erred on the side of caution, where is wasn't clear whether the block should 
be cached of not, I kept it being cached (as before).

Somebody with more knowledge about HFileReaderV2 should have a look. If there's 
something nicer to do, please let me know.
It is not clear to be how to write a test for this.

 HFile V2 does not honor setCacheBlocks when scanning.
 -

 Key: HBASE-4496
 URL: https://issues.apache.org/jira/browse/HBASE-4496
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0, 0.94.0

 Attachments: 4496.txt


 While testing the LRU cache during the scanning I noticed quite some churn in 
 the cache even when Scan.cacheBlocks is set to false. After debugging this, I 
 found that HFile V2 always caches blocks in the LRU cache regardless of the 
 cacheBlocks setting.
 Here's a trace (from Eclipse) showing the problem:
 HFileReaderV2.readBlock(long, int, boolean, boolean, boolean) line: 279   
 HFileReaderV2.readBlockData(long, long, int, boolean) line: 219   
 HFileBlockIndex$BlockIndexReader.seekToDataBlock(byte[], int, int, 
 HFileBlock) line: 191  
 HFileReaderV2$ScannerV2.seekTo(byte[], int, int, boolean) line: 502   
 HFileReaderV2$ScannerV2.reseekTo(byte[], int, int) line: 539  
 StoreFileScanner.reseekAtOrAfter(HFileScanner, KeyValue) line: 151
 StoreFileScanner.reseek(KeyValue) line: 110   
 KeyValueHeap.reseek(KeyValue) line: 255   
 StoreScanner.reseek(KeyValue) line: 409   
 StoreScanner.next(ListKeyValue, int) line: 304  
 KeyValueHeap.next(ListKeyValue, int) line: 114  
 KeyValueHeap.next(ListKeyValue) line: 143   
 HRegion$RegionScannerImpl.nextRow(byte[]) line: 2774  
 HRegion$RegionScannerImpl.nextInternal(int) line: 2722
 HRegion$RegionScannerImpl.next(ListKeyValue, int) line: 2682
 HRegion$RegionScannerImpl.next(ListKeyValue) line: 2699 
 HRegionServer.next(long, int) line: 2092  
 Every scanner.next causes a reseek, which eventually causes a call to 
 HFileBlockIndex$BlockIndexReader.seekToDataBlock(...) at which point the 
 cacheBlocks information is lost. HFileReaderV2.readBlockData calls 
 HFileReaderV2.readBlock with cacheBlocks set unconditionally to true.
 The fix is not immediately clear, unless we want to pass cacheBlocks to 
 HFileBlockIndex$BlockIndexReader.seekToDataBlock and then on to 
 HFileBlock.BasicReader.readBlockData and all its implementers, which is ugly 
 as readBlockData should not care about caching.
 Avoiding caching during scans is somewhat important for us.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

2011-09-27 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115804#comment-13115804
 ] 

jirapos...@reviews.apache.org commented on HBASE-3446:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2106
---


Only part way done, will finish in the afternoon.  I like the idea though, good 
stuff stack.


src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4855

Supposed to read When meta is moved to zk?



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4857

this comment talks a lot about what is wrong but it's not clear to me what 
changes are actually made right now.  i see you say server-side only, but what 
do you propose instead?  (i imagine i will find out reading the rest of the 
diff)



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4858

update this javadoc a bit... it's missing conf and you might also add 
additional context to stuff like abortable (which now appears optional and 
falls back to the connection itself?)



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4859

when would one override this?



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4860

is the behavior of this method unchanged?  i guess now it returns before 
it's verified?  any specific reason for the name change?  (its behavior is 
definitely different from the old getRootServerConnection()).  is it to match 
the Meta method?



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4861

i thought no verification in CT?



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4862

this is public but one with specified timeout is private now?



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4863

eeek, good catch


- Jonathan


On 2011-09-27 06:38:09, Michael Stack wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2065/
bq.  ---
bq.  
bq.  (Updated 2011-09-27 06:38:09)
bq.  
bq.  
bq.  Review request for hbase and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Make the Meta* operations against meta retry.  We do it by using HTable 
instances.
bq.  (HTable calls HConnection.getRegionServerWithRetries for get, put, scan 
etc).
bq.  In 0.89, we had special RetryableMetaOperation class that was a
bq.  subclass of Callable which reproduced the guts of 
HConnection.getRegionServerWithRetries
bq.  with its retry loop.  Now we just use HTable instead (Costs some on setup 
but
bq.  otherwise, we avoid duplicating code).  Upped the retries on serverside 
too.
bq.  
bq.  Had problem with CatalogJanitor.  MetaReader and MetaEditor were relying
bq.  heavily on CT methods getting proxy connections to meta and root servers.
bq.  CT needs to be cut back.  This patch closes down access on (unused) public
bq.  methods and removes being able to get an HRegionInterface on meta and root
bq.  -- this stuff is used internally to CT only now; use MetaEditor or
bq.  MetaReader if you want to update or read catalog tables.  Opening new issue
bq.  to cutback CT use over the code base.
bq.  
bq.  A little off topic but couldn't help it since was in MetaReader and 
MetaEditor
bq.  trying to clean them up, I ended up moving meta migration code out to its
bq.  own class rather than have it in all inside in MetaEditor.
bq.  
bq.  Here is some detail to help reviews.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq.Clean up.  Shutdown access on some of these unused methods.  Don't
bq.let out HRegionInterface instances in particular since we are going
bq.away from raw HRI use to instead use a connection with retries:
bq.i.e. HTable.
bq.  
bq.Comments on state of this class. Javadoc edits.
bq.getZooKeeperWatcher on HConnection is deprecated so don't use it
bq.in constructor.  Override MetaNodeTracker and on node delete
bq.reset meta location (We used to do this over in MetaNodeTracker
bq.but to do that we had to have a CatalogTracker over in zk package
bq.which is silly -- bad package encapsulation).
bq.  
bq.(waitForRootServer) Renamed getRootServerConnection and change it
bq.from public 

[jira] [Commented] (HBASE-4488) Store could miss rows during flush

2011-09-27 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115805#comment-13115805
 ] 

Lars Hofhansl commented on HBASE-4488:
--

@Jon, so you gonna commit this? :)


 Store could miss rows during flush
 --

 Key: HBASE-4488
 URL: https://issues.apache.org/jira/browse/HBASE-4488
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0, 0.94.0

 Attachments: 4488.txt


 While looking at HBASE-4344 I found that my change HBASE-4241 contains a 
 critical mistake:
 The while(scanner.next(kvs)) loop is incorrect and might miss the last edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4448) HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility instances across unit tests

2011-09-27 Thread Jesse Yates (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115815#comment-13115815
 ] 

Jesse Yates commented on HBASE-4448:


Comments on the latest patch:

In the usage, 
{code}
-  private final static HBaseTestingUtility TEST_UTIL = new 
HBaseTestingUtility();
+  private static HBaseTestingUtility TEST_UTIL = new HBaseTestingUtility();
{code}

Should probably not even assign anything, since it is done in setup.

Going back to Ted's comment,
{code}
 Integer i = mcUsageCount.get(tu);  
+if (i == null) {
+  i = ONE;
+} else {
+  int j = i.intValue() + 1;
+  i = new Integer(j);
+}
+mcUsageCount.put(tu, i);
+  }
{code}

This seems overly complex - just use autoboxing here. Maybe we should use 
specific names for what ONE and ZERO mean (e.g. ONE - USE_INCREMENT, ZERO - 
UNUSED).

In the UtilityCleaner,
{code}
+ try {
+while (true) {
{code}

could lead to some issues. We just need to make sure that the thread is a 
daemon thread when we start.


 HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility 
 instances across unit tests
 -

 Key: HBASE-4448
 URL: https://issues.apache.org/jira/browse/HBASE-4448
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: HBaseTestingUtilityFactory.java, 
 hbase_hbaseTestingUtility_uses_2011_09_22.xlsx, java_HBASE_4448.patch, 
 java_HBASE_4448_v2.patch


 Setting up and tearing down HBaseTestingUtility instances in unit tests is 
 very expensive.  On my MacBook it takes about 10 seconds to set up a 
 MiniCluster, and 7 seconds to tear it down.  When multiplied by the number of 
 test classes that use this facility, that's a lot of time in the build.
 This factory assumes that the JVM is being re-used across test classes in the 
 build, otherwise this pattern won't work. 
 I don't think this is appropriate for every use, but I think it can be 
 applicable in a great many cases - especially where developers just want a 
 simple MiniCluster with 1 slave.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4352) Apply version of hbase-4015 to branch

2011-09-27 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115816#comment-13115816
 ] 

Todd Lipcon commented on HBASE-4352:


Can anyone volunteer to do some serious cluster testing with this patch? eg 
load up 1000 regions per server on 5+ nodes, and do rolling restarts or rolling 
crashes?

 Apply version of hbase-4015 to branch
 -

 Key: HBASE-4352
 URL: https://issues.apache.org/jira/browse/HBASE-4352
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.5

 Attachments: HBASE-4352_0.90.patch, HBASE-4352_0.90_1.patch


 Consider adding a version of hbase-4015 to 0.90.  It changes HRegionInterface 
 so would need move change to end of the Interface and then test that it 
 doesn't break rolling restart.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

2011-09-27 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115818#comment-13115818
 ] 

jirapos...@reviews.apache.org commented on HBASE-3446:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2107
---


Thanks for reviews Ted and Jon.  Will put up new patch when you fellas finish...


src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4864

@Jon True.  I opened another issue with suggested fix.  I should at least 
reference it in here.



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4865

Nah.  Thats TODO.


- Michael


On 2011-09-27 06:38:09, Michael Stack wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2065/
bq.  ---
bq.  
bq.  (Updated 2011-09-27 06:38:09)
bq.  
bq.  
bq.  Review request for hbase and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Make the Meta* operations against meta retry.  We do it by using HTable 
instances.
bq.  (HTable calls HConnection.getRegionServerWithRetries for get, put, scan 
etc).
bq.  In 0.89, we had special RetryableMetaOperation class that was a
bq.  subclass of Callable which reproduced the guts of 
HConnection.getRegionServerWithRetries
bq.  with its retry loop.  Now we just use HTable instead (Costs some on setup 
but
bq.  otherwise, we avoid duplicating code).  Upped the retries on serverside 
too.
bq.  
bq.  Had problem with CatalogJanitor.  MetaReader and MetaEditor were relying
bq.  heavily on CT methods getting proxy connections to meta and root servers.
bq.  CT needs to be cut back.  This patch closes down access on (unused) public
bq.  methods and removes being able to get an HRegionInterface on meta and root
bq.  -- this stuff is used internally to CT only now; use MetaEditor or
bq.  MetaReader if you want to update or read catalog tables.  Opening new issue
bq.  to cutback CT use over the code base.
bq.  
bq.  A little off topic but couldn't help it since was in MetaReader and 
MetaEditor
bq.  trying to clean them up, I ended up moving meta migration code out to its
bq.  own class rather than have it in all inside in MetaEditor.
bq.  
bq.  Here is some detail to help reviews.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq.Clean up.  Shutdown access on some of these unused methods.  Don't
bq.let out HRegionInterface instances in particular since we are going
bq.away from raw HRI use to instead use a connection with retries:
bq.i.e. HTable.
bq.  
bq.Comments on state of this class. Javadoc edits.
bq.getZooKeeperWatcher on HConnection is deprecated so don't use it
bq.in constructor.  Override MetaNodeTracker and on node delete
bq.reset meta location (We used to do this over in MetaNodeTracker
bq.but to do that we had to have a CatalogTracker over in zk package
bq.which is silly -- bad package encapsulation).
bq.  
bq.(waitForRootServer) Renamed getRootServerConnection and change it
bq.from public to package private.
bq.(waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq.(getMetaServerConnection) Change from public to package private.
bq.Use MetaReader to read the meta location in root rather than a
bq.raw HRegionInterface so we get retrying.
bq.(remaining, timedout) Added utility methods.
bq.(waitForMetaServer) Changed from public to private.
bq.(resetMetaLocation) Made it synchronized on metaAvailable.
bq.Not all accesses were synchronized.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq.Refactor to use HTable instead of raw HRegionInterface so we get
bq.retrying.  For each operation we get an HTable, use it, then close it.
bq.(putToMetaTable, putsToMetaTable, etc) Utility methods.
bq.(updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq.class since these classes are for a one-time migration only.
bq.  
bq.  A 
src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
bq.New class that holds all Meta* methods updating meta table used
bq.doing the one-time migration done to meta on startup.  This class
bq.is marked deprecated because its going to be dropped in 0.94.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
bq.Retrofit methods in here to use fullScan methods with Visitor.
bq.(getCatalogRegionInterface, getCatalogRegionNameForTable,
bq.  

[jira] [Commented] (HBASE-4245) Cluster hangs if RS serving root fails during startup sequence

2011-09-27 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115819#comment-13115819
 ] 

Todd Lipcon commented on HBASE-4245:


Yep, I agree that Ming's patch for 4455 fixes this.

 Cluster hangs if RS serving root fails during startup sequence
 --

 Key: HBASE-4245
 URL: https://issues.apache.org/jira/browse/HBASE-4245
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.3
Reporter: Todd Lipcon
Assignee: Ming Ma

 On a large-ish cluster, the following sequence of events was seen to happen:
 - master started, ROOT and META were both unassigned
 - ROOT is assigned to rs01
 - META is assigned to rs02
 - Upon open of META, it writes its location into ROOT on rs01
 - rs01 crashes while appending to its HLog due to some other bug
 - rs02 fails the region open sequence
 - master notices that rs01 has crashed, and enqueues a ServerShutdownHandler
 - ServerShutdownHandler blocks on CatalogTracker.waitForMeta() since ROOT and 
 META are not assigned yet
 - master times out assignment of META, but never succeeds because ROOT 
 location is still marked as rs01
 This causes the cluster to never start up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4192) Optimize HLog for Throughput Using Delayed RPCs

2011-09-27 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115824#comment-13115824
 ] 

Todd Lipcon commented on HBASE-4192:


That sounds easier to review, thanks for the suggestion, dhruba.

 Optimize HLog for Throughput Using Delayed RPCs
 ---

 Key: HBASE-4192
 URL: https://issues.apache.org/jira/browse/HBASE-4192
 Project: HBase
  Issue Type: New Feature
  Components: wal
Affects Versions: 0.92.0
Reporter: Vlad Dogaru
Priority: Minor

 Introduce a new HLog configuration parameter (batchEntries) for more 
 aggressive batching of appends.  If this is enabled, HLog appends are not 
 written to the HLog writer immediately, but batched and written either 
 periodically or when a sync is requested.  Because sync times become larger, 
 they use delayed RPCs to free up RPC handler threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4352) Apply version of hbase-4015 to branch

2011-09-27 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115822#comment-13115822
 ] 

stack commented on HBASE-4352:
--

I signed up to check we don't break rolling restart.  Let me get that done 
first.  Will report back if can do what you ask above (I'm trying to test 205 
as background task, could do two things at once).

 Apply version of hbase-4015 to branch
 -

 Key: HBASE-4352
 URL: https://issues.apache.org/jira/browse/HBASE-4352
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.5

 Attachments: HBASE-4352_0.90.patch, HBASE-4352_0.90_1.patch


 Consider adding a version of hbase-4015 to 0.90.  It changes HRegionInterface 
 so would need move change to end of the Interface and then test that it 
 doesn't break rolling restart.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4488) Store could miss rows during flush

2011-09-27 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4488:
-

Status: Patch Available  (was: Open)

Marking patch available.

 Store could miss rows during flush
 --

 Key: HBASE-4488
 URL: https://issues.apache.org/jira/browse/HBASE-4488
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0, 0.94.0

 Attachments: 4488.txt


 While looking at HBASE-4344 I found that my change HBASE-4241 contains a 
 critical mistake:
 The while(scanner.next(kvs)) loop is incorrect and might miss the last edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4497) If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE

2011-09-27 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115828#comment-13115828
 ] 

stack commented on HBASE-4497:
--

Thanks J-D.  Thats what I was too lazy to looksee for myself.  Looks like we 
are doing enough tickling.  Weird that timeout monitor can cut in, region can 
be assigned elsewhere AND successfully update meta before this comes back.  
Here is from Rams email up on list earlier with log snippets:

{code}
RS1
===
2011-09-23 22:34:34,000 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: addToOnlineRegions is
doneREGION = {NAME =
't5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9.', TableName = 't5',
STARTKEY = '', ENDKEY = '', ENCODED = 2d06b3ca4d398ec96920ae86441a68c9,}
2011-09-23 22:34:34,009 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
Updated row t5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9. in region
.META.,,1 with serverName=linux76,60020,1316796517682
2011-09-23 22:34:34,009 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Done with post open
deploy taks for region=t5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9.,
daughter=false
2011-09-23 22:34:34,009 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:60020-0x1328ceaa1ff0037 Attempting to transition node
2d06b3ca4d398ec96920ae86441a68c9 from RS_ZK_REGION_OPENING to
RS_ZK_REGION_OPENED
2011-09-23 22:34:34,038 WARN
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Completed
the OPEN of region t5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9. but
when transitioning from  OPENING to OPENED got a version mismatch, someone
else clashed so now unassigning -- closing region
2011-09-23 22:34:34,038 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Closing t5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9.: disabling
compactions  flushes
2011-09-23 22:34:34,038 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Updates disabled for region
t5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9.
2011-09-23 22:34:34,038 DEBUG org.apache.hadoop.hbase.regionserver.Store:
closed f5
2011-09-23 22:34:34,038 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed t5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9.

RS2
===
2011-09-23 22:33:56,546 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:60020-0x1328ceaa1ff0039 Successfully transitioned node
2d06b3ca4d398ec96920ae86441a68c9 from RS_ZK_REGION_OPENING to
RS_ZK_REGION_OPENING
2011-09-23 22:33:56,845 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks
for region=t5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9.,
daughter=false
2011-09-23 22:33:56,845 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: addToOnlineRegions is
doneREGION = {NAME =
't5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9.', TableName = 't5',
STARTKEY = '', ENDKEY = '', ENCODED = 2d06b3ca4d398ec96920ae86441a68c9,}
2011-09-23 22:33:56,856 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
Updated row t5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9. in region
.META.,,1 with serverName=linux146,60020,1316796499216
2011-09-23 22:33:56,856 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Done with post open
deploy taks for region=t5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9.,
daughter=false
2011-09-23 22:33:58,887 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:60020-0x1328ceaa1ff0039 Attempting to transition node
2d06b3ca4d398ec96920ae86441a68c9 from RS_ZK_REGION_OPENING to
RS_ZK_REGION_OPENED
2011-09-23 22:33:58,893 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:60020-0x1328ceaa1ff0039 Successfully transitioned node
2d06b3ca4d398ec96920ae86441a68c9 from RS_ZK_REGION_OPENING to
RS_ZK_REGION_OPENED
2011-09-23 22:33:58,893 DEBUG
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened
t5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9.
{code}

 If region opening fails after updating META HBCK reports it as inconsistent 
 and scanning the region throws NSRE
 ---

 Key: HBASE-4497
 URL: https://issues.apache.org/jira/browse/HBASE-4497
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Priority: Critical

 As per the discussion in the mail chain HBCK reporting of possible mismatch 
 in RS assignment this JIRA is created.
 Consider two RS- RS1 and RS2.
 A region tries to open in RS1. But it takes a while.  The RS1 has still not 
 updated meta and transitioned the node from OPENING to OPENED
 So timeout assigns the region to RS2.  RS2 successfully updates the META and 
 opens the region.
 Now RS1 tries to act on the region by first updating the META and then 
 transiting the node to OPENING to OPENED.
 RS1 transiting the node to OPENING to OPENED 

[jira] [Commented] (HBASE-4494) AvroServer:: get fails with NPE on a non-existent row

2011-09-27 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115833#comment-13115833
 ] 

stack commented on HBASE-4494:
--

Are you back Kay Kay?

 AvroServer:: get fails with NPE on a non-existent row
 -

 Key: HBASE-4494
 URL: https://issues.apache.org/jira/browse/HBASE-4494
 Project: HBase
  Issue Type: Bug
  Components: avro
Affects Versions: 0.90.4
Reporter: Kay Kay
Assignee: Kay Kay
 Fix For: 0.90.5

 Attachments: HBASE-4494.patch


 Try to submit a get request to the avro gateway. 
 If the row specified for a given table does not exist, the server request 
 fails with a NPE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2011-09-27 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115867#comment-13115867
 ] 

Jonathan Gray commented on HBASE-4433:
--

Is this not strictly an improvement/feature?  It seems like it doesn't belong 
in stable branches :)

 avoid extra next (potentially a seek) if done with column/row
 -

 Key: HBASE-4433
 URL: https://issues.apache.org/jira/browse/HBASE-4433
 Project: HBase
  Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
 Fix For: 0.92.0


 [Noticed this in 89, but quite likely true of trunk as well.]
 When we are done with the requested column(s) the code still does an extra 
 next() call before it realizes that it is actually done. This extra next() 
 call could potentially result in an unnecessary extra block load. This is 
 likely to be especially bad for CFs where the KVs are large blobs where each 
 KV may be occupying a block of its own. So the next() can often load a new 
 unrelated block unnecessarily.
 --
 For the simple case of reading say the top-most column in a row in a single 
 file, where each column (KV) was say a block of its own-- it seems that we 
 are reading 3 blocks, instead of 1 block!
 I am working on a simple patch and with that the number of seeks is down to 
 2. 
 [There is still an extra seek left.  I think there were two levels of 
 extra/unnecessary next() we were doing without actually confirming that the 
 next was needed. One at the StoreScanner/ScanQueryMatcher level which this 
 diff avoids. I think the other is at hfs.next() (at the storefile scanner 
 level) that's happening whenever a HFile scanner servers out a data-- and 
 perhaps that's the additional seek that we need to avoid. But I want to 
 tackle this optimization first as the two issues seem unrelated.]
 -- 
 The basic idea of the patch I am working on/testing is as follows. The 
 ExplicitColumnTracker currently returns INCLUDE to the ScanQueryMatcher if 
 the KV needs to be included and then if done, only in the the next call it 
 returns the appropriate SEEK_NEXT_COL or SEEK_NEXT_ROW hint. For the cases 
 when ExplicitColumnTracker knows it is done with a particular column/row, the 
 patch attempts to combine the INCLUDE code and done hint into a single match 
 code-- INCLUDE_AND_SEEK_NEXT_COL and INCLUDE_AND_SEEK_NEXT_ROW.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3025) Coprocessor based simple access control

2011-09-27 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115868#comment-13115868
 ] 

jirapos...@reviews.apache.org commented on HBASE-3025:
--



bq.  On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.   
security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java,
 line 98
bq.   https://reviews.apache.org/r/2041/diff/1/?file=45404#file45404line98
bq.  
bq.   Can we make this 1?

sure


bq.  On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.   
security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java,
 line 192
bq.   https://reviews.apache.org/r/2041/diff/1/?file=45404#file45404line192
bq.  
bq.   Debug logging should go to LOG not AUDITLOG

The idea was that all authorization decisions should be separated into audit 
log.  Here we're allowing access, so AUDITLOG seemed to make sense.  I agree 
that this still needs to be cleaned up a lot.  Maybe all audit logging should 
be done up in requirePermission() with authorization result?  At the very least 
we need a consistent format and consistent logging levels for messages (trace, 
right?).


bq.  On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.   
security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java,
 line 200
bq.   https://reviews.apache.org/r/2041/diff/1/?file=45404#file45404line200
bq.  
bq.   Should be INFO or TRACE level? TRACE makes more sense to me.

Sure, can use trace for all audit log decisions.


bq.  On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.   
security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java,
 line 208
bq.   https://reviews.apache.org/r/2041/diff/1/?file=45404#file45404line208
bq.  
bq.   Debug logging should go to LOG not AUDITLOG

This is an authorization decision since we're returning true below.  We can 
make this trace level, and improve the format, but I think AUDITLOG (if 
enabled) should contain a single message per request on why the request was 
allowed or denied.


bq.  On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.   
security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java,
 line 274
bq.   https://reviews.apache.org/r/2041/diff/1/?file=45404#file45404line274
bq.  
bq.   Should be INFO or TRACE level? TRACE makes more sense to me.

will change to trace.


bq.  On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.   
security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java,
 line 354
bq.   https://reviews.apache.org/r/2041/diff/1/?file=45404#file45404line354
bq.  
bq.   Should something go to AUDITLOG here?

Failure should already have been recorded in AUDITLOG via logDenied().  Agree 
that moving AUDITLOG messages up here with consistent format would be clearer, 
but will require some restructuring of return value from permissionGranted() so 
that some context specific reason can be pulled back up for logging.


bq.  On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.   
security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java,
 line 366
bq.   https://reviews.apache.org/r/2041/diff/1/?file=45404#file45404line366
bq.  
bq.   Should hasFamilyQualifierPermission log to AUDITLOG? It is used in 
places to make decisions -- an exception is thrown directly or not.

Yes, agree, we should either log to AUDITLOG at decision points here or 
consistently move the AUDITLOG logging up a level out of permissionGranted() 
and hasFamilyQualifierPermission().


bq.  On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.   
security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java,
 line 375
bq.   https://reviews.apache.org/r/2041/diff/1/?file=45404#file45404line375
bq.  
bq.   Another one of these was sent to AUDITLOG above. Do the same here? 
Should be INFO or TRACE level? TRACE makes more sense to me.

Agree, should go to AUDITLOG at trace.


bq.  On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.   
security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java,
 line 590
bq.   https://reviews.apache.org/r/2041/diff/1/?file=45404#file45404line590
bq.  
bq.   Should be logged with ERROR?

sure


bq.  On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.   
security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java,
 line 856
bq.   https://reviews.apache.org/r/2041/diff/1/?file=45404#file45404line856
bq.  
bq.   Should this go to AUDITLOG? At INFO or TRACE level? My preference is 
TRACE.

Yes, agree.


bq.  On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.   
security/src/main/java/org/apache/hadoop/hbase/security/rbac/Permission.java, 
line 174
bq.   https://reviews.apache.org/r/2041/diff/1/?file=45406#file45406line174
bq.  
bq.   What if instead we check for version 0 and throw an 
IllegalArgumentException if so? 

[jira] [Commented] (HBASE-3025) Coprocessor based simple access control

2011-09-27 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115869#comment-13115869
 ] 

jirapos...@reviews.apache.org commented on HBASE-3025:
--



bq.  On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.   Looks good. The majority of my comments have to do with inconsistent 
logging practice.

Thanks for the review.  I'll post an update with some cleanups and some 
reworking of the AUDITLOG handling.


- Gary


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2041/#review2077
---


On 2011-09-23 19:14:20, Gary Helmling wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2041/
bq.  ---
bq.  
bq.  (Updated 2011-09-23 19:14:20)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch implements access control list based authorization of HBase 
operations.  The patch depends on the currently posted patch for HBASE-2742 
(secure RPC engine).
bq.  
bq.  Key parts of the implementation are:
bq.  
bq.  * AccessControlLists - encapsulates storage of permission grants in a 
metadata table (_acl_).  This differs from previous implementation where the 
.META. table was used to store permissions.
bq.  
bq.  * AccessController - 
bq.- implements MasterObserver and RegionObserver, performing authorization 
checks in each of the preXXX() hooks.  If authorization fails, an 
AccessDeniedException is thrown.
bq.- implements AccessControllerProtocol as a coprocessor endpoint to 
provide RPC methods for granting, revoking and listing permissions.
bq.  
bq.  * ZKPermissionWatcher (and TableAuthManager) - synchronizes ACL entries 
and updates throughout the cluster nodes using ZK.  ACL entries are stored in 
per-table znodes as /hbase/acl/tablename.
bq.  
bq.  * Additional ruby shell scripts providing the grant, revoke and 
user_permission commands
bq.  
bq.  * Support for a new OWNER attribute in HTableDescriptor.  I could separate 
out this change into a new JIRA for discussion, but I don't see it as currently 
useful outside of security.  Alternately, I could handle the OWNER attribute 
completely in AccessController without changing HTD, but that would make 
interaction via hbase shell a bit uglier.
bq.  
bq.  
bq.  This addresses bug HBASE-3025.
bq.  https://issues.apache.org/jira/browse/HBASE-3025
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessControlFilter.java
 PRE-CREATION 
bq.
security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessControlLists.java
 PRE-CREATION 
bq.
security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java
 PRE-CREATION 
bq.
security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessControllerProtocol.java
 PRE-CREATION 
bq.
security/src/main/java/org/apache/hadoop/hbase/security/rbac/Permission.java 
PRE-CREATION 
bq.
security/src/main/java/org/apache/hadoop/hbase/security/rbac/TableAuthManager.java
 PRE-CREATION 
bq.
security/src/main/java/org/apache/hadoop/hbase/security/rbac/TablePermission.java
 PRE-CREATION 
bq.
security/src/main/java/org/apache/hadoop/hbase/security/rbac/UserPermission.java
 PRE-CREATION 
bq.
security/src/main/java/org/apache/hadoop/hbase/security/rbac/ZKPermissionWatcher.java
 PRE-CREATION 
bq.
security/src/test/java/org/apache/hadoop/hbase/security/rbac/SecureTestUtil.java
 PRE-CREATION 
bq.
security/src/test/java/org/apache/hadoop/hbase/security/rbac/TestAccessControlFilter.java
 PRE-CREATION 
bq.
security/src/test/java/org/apache/hadoop/hbase/security/rbac/TestAccessController.java
 PRE-CREATION 
bq.
security/src/test/java/org/apache/hadoop/hbase/security/rbac/TestTablePermissions.java
 PRE-CREATION 
bq.
security/src/test/java/org/apache/hadoop/hbase/security/rbac/TestZKPermissionsWatcher.java
 PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java 46a1a3d 
bq.src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 699a5f5 
bq.src/main/resources/hbase-default.xml 2c8f44b 
bq.src/main/ruby/hbase.rb 4d27191 
bq.src/main/ruby/hbase/admin.rb b244ffe 
bq.src/main/ruby/hbase/hbase.rb beb2450 
bq.src/main/ruby/hbase/security.rb PRE-CREATION 
bq.src/main/ruby/shell.rb 9a47600 
bq.src/main/ruby/shell/commands.rb a352c2e 
bq.src/main/ruby/shell/commands/grant.rb PRE-CREATION 
bq.src/main/ruby/shell/commands/revoke.rb PRE-CREATION 
bq.src/main/ruby/shell/commands/table_permission.rb PRE-CREATION 
bq.

[jira] [Commented] (HBASE-4489) Better key splitting in RegionSplitter

2011-09-27 Thread Dave Revell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115870#comment-13115870
 ] 

Dave Revell commented on HBASE-4489:


@Ted: Bytes.split() will check the number of splits and throw 
IllegalArgumentException if =0. It seems like a shame to add more code to 
duplicate a check that's already being done.

 Better key splitting in RegionSplitter
 --

 Key: HBASE-4489
 URL: https://issues.apache.org/jira/browse/HBASE-4489
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Dave Revell
Assignee: Dave Revell
 Attachments: HBASE-4489-branch0.90-v1.patch, HBASE-4489-trunk-v1.patch


 The RegionSplitter utility allows users to create a pre-split table from the 
 command line or do a rolling split on an existing table. It supports 
 pluggable split algorithms that implement the SplitAlgorithm interface. The 
 only/default SplitAlgorithm is one that assumes keys fall in the range from 
 ASCII string  to ASCII string 7FFF. This is not a sane 
 default, and seems useless to most users. Users are likely to be surprised by 
 the fact that all the region splits occur in in the byte range of ASCII 
 characters.
 A better default split algorithm would be one that evenly divides the space 
 of all bytes, which is what this patch does. Making a table with five regions 
 would split at \x33\x33..., \x66\x66, \x99\x99..., \xCC\xCC..., and 
 \xFF\xFF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4488) Store could miss rows during flush

2011-09-27 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115876#comment-13115876
 ] 

Jonathan Gray commented on HBASE-4488:
--

Aren't you a committer now?  Or didn't get the rights yet? :)  Yes, I will 
commit later today.

 Store could miss rows during flush
 --

 Key: HBASE-4488
 URL: https://issues.apache.org/jira/browse/HBASE-4488
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0, 0.94.0

 Attachments: 4488.txt


 While looking at HBASE-4344 I found that my change HBASE-4241 contains a 
 critical mistake:
 The while(scanner.next(kvs)) loop is incorrect and might miss the last edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4488) Store could miss rows during flush

2011-09-27 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115877#comment-13115877
 ] 

Lars Hofhansl commented on HBASE-4488:
--

Not yet... :)

 Store could miss rows during flush
 --

 Key: HBASE-4488
 URL: https://issues.apache.org/jira/browse/HBASE-4488
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0, 0.94.0

 Attachments: 4488.txt


 While looking at HBASE-4344 I found that my change HBASE-4241 contains a 
 critical mistake:
 The while(scanner.next(kvs)) loop is incorrect and might miss the last edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4498) HBase RPM/DEB packages attempt to setup ZooKeeper environment incorrectly

2011-09-27 Thread Eric Yang (Created) (JIRA)
HBase RPM/DEB packages attempt to setup ZooKeeper environment incorrectly
-

 Key: HBASE-4498
 URL: https://issues.apache.org/jira/browse/HBASE-4498
 Project: HBase
  Issue Type: Bug
  Components: build, scripts
Affects Versions: 0.92.0
 Environment: Java, Linux
Reporter: Eric Yang
Assignee: Eric Yang
 Fix For: 0.92.0


HBase RPM packaging was done prior to completion of ZooKeeper RPM packaging.  
In update-hbase-env.sh, it expects ZooKeeper environment script to exist in 
/etc/default/zookeeper-env.sh.  After several revision of ZOOKEEPER-999, it was 
decided to remove /etc/default/zookeeper-env.sh by ZooKeeper community.  Hence, 
update-hbase-env.sh should not depend on /etc/default/zookeeper-env.sh.  
Instead, update-hbase-env.sh should assume ZooKeeper exists in /usr for RPM/deb 
packages.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4498) HBase RPM/DEB packages attempt to setup ZooKeeper environment incorrectly

2011-09-27 Thread Eric Yang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated HBASE-4498:
-

Status: Patch Available  (was: Open)

 HBase RPM/DEB packages attempt to setup ZooKeeper environment incorrectly
 -

 Key: HBASE-4498
 URL: https://issues.apache.org/jira/browse/HBASE-4498
 Project: HBase
  Issue Type: Bug
  Components: build, scripts
Affects Versions: 0.92.0
 Environment: Java, Linux
Reporter: Eric Yang
Assignee: Eric Yang
 Fix For: 0.92.0

 Attachments: HBASE-4498.patch


 HBase RPM packaging was done prior to completion of ZooKeeper RPM packaging.  
 In update-hbase-env.sh, it expects ZooKeeper environment script to exist in 
 /etc/default/zookeeper-env.sh.  After several revision of ZOOKEEPER-999, it 
 was decided to remove /etc/default/zookeeper-env.sh by ZooKeeper community.  
 Hence, update-hbase-env.sh should not depend on 
 /etc/default/zookeeper-env.sh.  Instead, update-hbase-env.sh should assume 
 ZooKeeper exists in /usr for RPM/deb packages.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4498) HBase RPM/DEB packages attempt to setup ZooKeeper environment incorrectly

2011-09-27 Thread Eric Yang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated HBASE-4498:
-

Attachment: HBASE-4498.patch

Set ZOOKEEPER_HOME to /usr by default.

 HBase RPM/DEB packages attempt to setup ZooKeeper environment incorrectly
 -

 Key: HBASE-4498
 URL: https://issues.apache.org/jira/browse/HBASE-4498
 Project: HBase
  Issue Type: Bug
  Components: build, scripts
Affects Versions: 0.92.0
 Environment: Java, Linux
Reporter: Eric Yang
Assignee: Eric Yang
 Fix For: 0.92.0

 Attachments: HBASE-4498.patch


 HBase RPM packaging was done prior to completion of ZooKeeper RPM packaging.  
 In update-hbase-env.sh, it expects ZooKeeper environment script to exist in 
 /etc/default/zookeeper-env.sh.  After several revision of ZOOKEEPER-999, it 
 was decided to remove /etc/default/zookeeper-env.sh by ZooKeeper community.  
 Hence, update-hbase-env.sh should not depend on 
 /etc/default/zookeeper-env.sh.  Instead, update-hbase-env.sh should assume 
 ZooKeeper exists in /usr for RPM/deb packages.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3130) [replication] ReplicationSource can't recover from session expired on remote clusters

2011-09-27 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115886#comment-13115886
 ] 

Jean-Daniel Cryans commented on HBASE-3130:
---

Another thing about the latest patch, this line needs to be removed in RP:

bq. * @param zkw zookeeper connection to the peer

 [replication] ReplicationSource can't recover from session expired on remote 
 clusters
 -

 Key: HBASE-3130
 URL: https://issues.apache.org/jira/browse/HBASE-3130
 Project: HBase
  Issue Type: Bug
  Components: replication
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Chris Trezzo
 Fix For: 0.92.0

 Attachments: 3130-v2.txt, 3130-v3.txt, 3130.txt


 Currently ReplicationSource cannot recover when its zookeeper connection to 
 its remote cluster expires. HLogs are still being tracked, but a cluster 
 restart is required to continue replication (or a rolling restart).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4498) HBase RPM/DEB packages attempt to setup ZooKeeper environment incorrectly

2011-09-27 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115895#comment-13115895
 ] 

jirapos...@reviews.apache.org commented on HBASE-4498:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2075/
---

Review request for hbase.


Summary
---

Updated ZOOKEEPER_HOME location to default to /usr (default location of 
ZooKeeper rpm/deb package)


This addresses bug HBASE-4498.
https://issues.apache.org/jira/browse/HBASE-4498


Diffs
-

  /src/packages/update-hbase-env.sh 1176602 

Diff: https://reviews.apache.org/r/2075/diff


Testing
---


Thanks,

Eric



 HBase RPM/DEB packages attempt to setup ZooKeeper environment incorrectly
 -

 Key: HBASE-4498
 URL: https://issues.apache.org/jira/browse/HBASE-4498
 Project: HBase
  Issue Type: Bug
  Components: build, scripts
Affects Versions: 0.92.0
 Environment: Java, Linux
Reporter: Eric Yang
Assignee: Eric Yang
 Fix For: 0.92.0

 Attachments: HBASE-4498.patch


 HBase RPM packaging was done prior to completion of ZooKeeper RPM packaging.  
 In update-hbase-env.sh, it expects ZooKeeper environment script to exist in 
 /etc/default/zookeeper-env.sh.  After several revision of ZOOKEEPER-999, it 
 was decided to remove /etc/default/zookeeper-env.sh by ZooKeeper community.  
 Hence, update-hbase-env.sh should not depend on 
 /etc/default/zookeeper-env.sh.  Instead, update-hbase-env.sh should assume 
 ZooKeeper exists in /usr for RPM/deb packages.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3025) Coprocessor based simple access control

2011-09-27 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115897#comment-13115897
 ] 

jirapos...@reviews.apache.org commented on HBASE-3025:
--



bq.  On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.   
security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java,
 line 192
bq.   https://reviews.apache.org/r/2041/diff/1/?file=45404#file45404line192
bq.  
bq.   Debug logging should go to LOG not AUDITLOG
bq.  
bq.  Gary Helmling wrote:
bq.  The idea was that all authorization decisions should be separated into 
audit log.  Here we're allowing access, so AUDITLOG seemed to make sense.  I 
agree that this still needs to be cleaned up a lot.  Maybe all audit logging 
should be done up in requirePermission() with authorization result?  At the 
very least we need a consistent format and consistent logging levels for 
messages (trace, right?).

bq.  Maybe all audit logging should be done up in requirePermission() with 
authorization result?

Sounds good.

bq.  At the very least we need a consistent format and consistent logging 
levels for messages (trace, right?).

I'd argue for TRACE


bq.  On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.   
security/src/main/java/org/apache/hadoop/hbase/security/rbac/TableAuthManager.java,
 line 47
bq.   https://reviews.apache.org/r/2041/diff/1/?file=45407#file45407line47
bq.  
bq.   Maybe we can call this .auth.? We don't really have an RBAC 
implementation yet. Likewise for the package name for all of this stuff? Just a 
random thought.
bq.  
bq.  Gary Helmling wrote:
bq.  Yeah rbac here and in package name is a misnomer.  How about using 
access instead?  auth seems ambiguous to me as it could mean 
authentication or authorization.  JDK uses auth in javax.security.auth 
and claims it's for both, but seems like that and sub-packages are more 
authentication related to me.  Hadoop uses authorize for a similar package 
to this.

access sounds good to me.


bq.  On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.   
security/src/main/java/org/apache/hadoop/hbase/security/rbac/TableAuthManager.java,
 line 84
bq.   https://reviews.apache.org/r/2041/diff/1/?file=45407#file45407line84
bq.  
bq.   Isn't this an error?
bq.  
bq.  Gary Helmling wrote:
bq.  Yes, and in this context a pretty bad one, as it probably means region 
server initiated RPCs won't work or will be denied.  We should probably let the 
IOE escape here...

Agree.


bq.  On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.   
security/src/main/java/org/apache/hadoop/hbase/security/rbac/Permission.java, 
line 174
bq.   https://reviews.apache.org/r/2041/diff/1/?file=45406#file45406line174
bq.  
bq.   What if instead we check for version 0 and throw an 
IllegalArgumentException if so? Technically, it is an invalid request if it 
contains an unrecognizable action code. Skipping this check if version  0 
would be a way to handle new perms while not accepting incorrect input 
otherwise.
bq.  
bq.  Gary Helmling wrote:
bq.  Yeah, seems safer to throw an exception here than to ignore invalid 
input.  What about throwing an IOException (to tie in to existing error 
handling)?
bq.  
bq.  We could potentially trap the VersionMismatchException from 
VersionedWritable to allow skip and continue when reading newer versions of 
Permission with potentially added Action codes.  Would need to think about what 
kind of errors that would expose us to.

bq.  What about throwing an IOException (to tie in to existing error handling)?

Throwing an IOE sounds good.

bq.  We could potentially trap the VersionMismatchException from 
VersionedWritable to allow skip and continue when reading newer versions of 
Permission with potentially added Action codes. 

I think that is reasonable, with something logged at WARN level. The idea here 
is to ride over a rolling restart. Would not see long term operation with 
mismatching versions.


bq.  On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.   
security/src/main/java/org/apache/hadoop/hbase/security/rbac/ZKPermissionWatcher.java,
 line 59
bq.   https://reviews.apache.org/r/2041/diff/1/?file=45410#file45410line59
bq.  
bq.   I wonder if there is some way we can check if a secure variant of 
ZooKeeper is running, and refuse to initialize if not.
bq.  
bq.  Gary Helmling wrote:
bq.  My thinking has been to handle all secure ZooKeeper changes 
separately.  So I'd prefer to handle any check here as part of that.
bq.  
bq.  I do think it's reasonable to run AccessController with only SIMPLE 
auth and no secure ZooKeeper.  It's not secure but could still be useful (we 
currently use this setup for tests).
bq.  
bq.  We could complain loudly to give an indication that you have a 
security hole though.

bq.  I do think it's reasonable to run AccessController with only SIMPLE auth 
and 

[jira] [Commented] (HBASE-4455) Rolling restart RSs scenario, -ROOT-, .META. regions are lost in AssignmentManager

2011-09-27 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115917#comment-13115917
 ] 

Ted Yu commented on HBASE-4455:
---

Integrated to 0.92 and TRUNK.

Thanks for the nice work, Ming.

Thanks for the review Todd, Jonathan, Ramkrishna and Stack.

 Rolling restart RSs scenario, -ROOT-, .META. regions are lost in 
 AssignmentManager
 --

 Key: HBASE-4455
 URL: https://issues.apache.org/jira/browse/HBASE-4455
 Project: HBase
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
 Fix For: 0.92.0


 Keep Master up all the time, do rolling restart of RSs like this - stop RS1, 
 wait for 2 seconds, stop RS2, start RS1, wait for 2 seconds, stop RS3, start 
 RS2, wait for 2 seconds, etc. After a while, you will find the -ROOT-, .META. 
 regions aren't in regions in transtion from AssignmentManager point of 
 view, but they aren't assigned to any regions. Here are the issues.
 1. .-ROOT- or .META. location is stale when MetaServerShutdownHandler is 
 invoked to check if it contains -ROOT- region. That is due to long delay from 
 ZK notification and async nature of the system. Here is an example, even 
 though new root region server sea-lab-1,60020,1316380133656 is set at T2, at 
 T3 the shutdown process for sea-lab-1,60020,1316380133656, the root location 
 still points to old server sea-lab-3,60020,1316380037898.
 T1: 2011-09-18 14:08:52,470 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
 master:6
 -0x1327e43175e Retrieved 29 byte(s) of data from znode 
 /hbase/root-regio
 n-server and set watcher; sea-lab-3,60020,1316380037898
 T2: 2011-09-18 14:08:57,173 INFO 
 org.apache.hadoop.hbase.catalog.RootLocationEditor: Setting ROOT region 
 location in ZooKeeper as sea-lab-1,60020,1316380133656
 T3: 2011-09-18 14:10:26,393 DEBUG 
 org.apache.hadoop.hbase.master.ServerManager: Adde
 d=sea-lab-1,60020,1316380133656 to dead servers, submitted shutdown handler 
 to be executed, root=false, meta=true, current Root Location: 
 sea-lab-3,60020,1316380037898
 T4: 2011-09-18 14:12:37,314 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
 master:6
 -0x1327e43175e Retrieved 29 byte(s) of data from znode 
 /hbase/root-region-server and set watcher; sea-lab-1,60020,1316380133656
 2. The MetaServerShutdownHandler worker thread that waits for -ROOT- or 
 .META. availability could be blocked. If meanwhile, the new server that 
 -ROOT- or .META. is being assigned restarted, another instance of 
 MetaServerShutdownHandler is queued. Eventually, all 
 MetaServerShutdownHandler worker threads are filled up. It looks like 
 HBASE-4245.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2011-09-27 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115938#comment-13115938
 ] 

Hudson commented on HBASE-4433:
---

Integrated in HBase-0.92 #23 (See 
[https://builds.apache.org/job/HBase-0.92/23/])
HBASE-4433: avoid extra next (potentially a seek) if done with column/row
HBASE-4433: avoid extra next (potentially a seek) if done with column/row

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java


 avoid extra next (potentially a seek) if done with column/row
 -

 Key: HBASE-4433
 URL: https://issues.apache.org/jira/browse/HBASE-4433
 Project: HBase
  Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
 Fix For: 0.92.0


 [Noticed this in 89, but quite likely true of trunk as well.]
 When we are done with the requested column(s) the code still does an extra 
 next() call before it realizes that it is actually done. This extra next() 
 call could potentially result in an unnecessary extra block load. This is 
 likely to be especially bad for CFs where the KVs are large blobs where each 
 KV may be occupying a block of its own. So the next() can often load a new 
 unrelated block unnecessarily.
 --
 For the simple case of reading say the top-most column in a row in a single 
 file, where each column (KV) was say a block of its own-- it seems that we 
 are reading 3 blocks, instead of 1 block!
 I am working on a simple patch and with that the number of seeks is down to 
 2. 
 [There is still an extra seek left.  I think there were two levels of 
 extra/unnecessary next() we were doing without actually confirming that the 
 next was needed. One at the StoreScanner/ScanQueryMatcher level which this 
 diff avoids. I think the other is at hfs.next() (at the storefile scanner 
 level) that's happening whenever a HFile scanner servers out a data-- and 
 perhaps that's the additional seek that we need to avoid. But I want to 
 tackle this optimization first as the two issues seem unrelated.]
 -- 
 The basic idea of the patch I am working on/testing is as follows. The 
 ExplicitColumnTracker currently returns INCLUDE to the ScanQueryMatcher if 
 the KV needs to be included and then if done, only in the the next call it 
 returns the appropriate SEEK_NEXT_COL or SEEK_NEXT_ROW hint. For the cases 
 when ExplicitColumnTracker knows it is done with a particular column/row, the 
 patch attempts to combine the INCLUDE code and done hint into a single match 
 code-- INCLUDE_AND_SEEK_NEXT_COL and INCLUDE_AND_SEEK_NEXT_ROW.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4499) [replication] Source shouldn't update ZK if it didn't progress

2011-09-27 Thread Jean-Daniel Cryans (Created) (JIRA)
[replication] Source shouldn't update ZK if it didn't progress
--

 Key: HBASE-4499
 URL: https://issues.apache.org/jira/browse/HBASE-4499
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Priority: Minor
 Fix For: 0.92.0


A relatively minor optimization to be done in ReplicationSource, currently it 
calls ReplicationSourceManager.logPositionAndCleanOldLogs whether it made 
progress or not, generating more load on ZK than necessary. The last position 
should be kept around so that we can compare.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4480) Testing script to simplfy local testing

2011-09-27 Thread Scott Kuehn (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115966#comment-13115966
 ] 

Scott Kuehn commented on HBASE-4480:


I'm happy to pick this up, if nobody is working on it.

 Testing script to simplfy local testing
 ---

 Key: HBASE-4480
 URL: https://issues.apache.org/jira/browse/HBASE-4480
 Project: HBase
  Issue Type: Improvement
Reporter: Jesse Yates
Priority: Minor
  Labels: test
 Attachments: runtest.sh


 As mentioned by http://search-hadoop.com/m/r2Ab624ES3e and 
 http://search-hadoop.com/m/cZjDH1ykGIA it would be nice if we could have a 
 script that would handle more of the finer points of running/checking our 
 test suite.
 This script should:
 (1) Allow people to determine which tests are hanging/taking a long time to 
 run
 (2) Allow rerunning of particular tests to make sure it wasn't an artifact of 
 running the whole suite that caused the failure
 (3) Allow people to specify to run just unit tests or also integration tests 
 (essentially wrapping calls to 'maven test' and 'maven verify').
 This script should just be a convenience script - running tests directly from 
 maven should not be impacted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4500) [replication] Add a delayed option

2011-09-27 Thread Jean-Daniel Cryans (Created) (JIRA)
[replication] Add a delayed option
--

 Key: HBASE-4500
 URL: https://issues.apache.org/jira/browse/HBASE-4500
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
 Fix For: 0.94.0


A typical DR solution with databases is to have one slave that receives data a 
few hours late to guard against destructive operations. On top of my head one 
way to implement this would be tail the log, see how current the edit is, sleep 
if needed, replicate, repeat.

The harder part will be adding a configuration to the stream.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4480) Testing script to simplfy local testing

2011-09-27 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115972#comment-13115972
 ] 

stack commented on HBASE-4480:
--

@Ted You want to check this in?  Where?  Into src/test/bin?

 Testing script to simplfy local testing
 ---

 Key: HBASE-4480
 URL: https://issues.apache.org/jira/browse/HBASE-4480
 Project: HBase
  Issue Type: Improvement
Reporter: Jesse Yates
Priority: Minor
  Labels: test
 Attachments: runtest.sh


 As mentioned by http://search-hadoop.com/m/r2Ab624ES3e and 
 http://search-hadoop.com/m/cZjDH1ykGIA it would be nice if we could have a 
 script that would handle more of the finer points of running/checking our 
 test suite.
 This script should:
 (1) Allow people to determine which tests are hanging/taking a long time to 
 run
 (2) Allow rerunning of particular tests to make sure it wasn't an artifact of 
 running the whole suite that caused the failure
 (3) Allow people to specify to run just unit tests or also integration tests 
 (essentially wrapping calls to 'maven test' and 'maven verify').
 This script should just be a convenience script - running tests directly from 
 maven should not be impacted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4480) Testing script to simplfy local testing

2011-09-27 Thread Jesse Yates (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115975#comment-13115975
 ] 

Jesse Yates commented on HBASE-4480:


@stack - I was thinking this script would be incorporated into a bigger script 
that we would run for testing.

@Scott - I'm fine if you want to do the bigger script.

 Testing script to simplfy local testing
 ---

 Key: HBASE-4480
 URL: https://issues.apache.org/jira/browse/HBASE-4480
 Project: HBase
  Issue Type: Improvement
Reporter: Jesse Yates
Priority: Minor
  Labels: test
 Attachments: runtest.sh


 As mentioned by http://search-hadoop.com/m/r2Ab624ES3e and 
 http://search-hadoop.com/m/cZjDH1ykGIA it would be nice if we could have a 
 script that would handle more of the finer points of running/checking our 
 test suite.
 This script should:
 (1) Allow people to determine which tests are hanging/taking a long time to 
 run
 (2) Allow rerunning of particular tests to make sure it wasn't an artifact of 
 running the whole suite that caused the failure
 (3) Allow people to specify to run just unit tests or also integration tests 
 (essentially wrapping calls to 'maven test' and 'maven verify').
 This script should just be a convenience script - running tests directly from 
 maven should not be impacted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4480) Testing script to simplfy local testing

2011-09-27 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115978#comment-13115978
 ] 

stack commented on HBASE-4480:
--

@Jesse Sounds good

 Testing script to simplfy local testing
 ---

 Key: HBASE-4480
 URL: https://issues.apache.org/jira/browse/HBASE-4480
 Project: HBase
  Issue Type: Improvement
Reporter: Jesse Yates
Priority: Minor
  Labels: test
 Attachments: runtest.sh


 As mentioned by http://search-hadoop.com/m/r2Ab624ES3e and 
 http://search-hadoop.com/m/cZjDH1ykGIA it would be nice if we could have a 
 script that would handle more of the finer points of running/checking our 
 test suite.
 This script should:
 (1) Allow people to determine which tests are hanging/taking a long time to 
 run
 (2) Allow rerunning of particular tests to make sure it wasn't an artifact of 
 running the whole suite that caused the failure
 (3) Allow people to specify to run just unit tests or also integration tests 
 (essentially wrapping calls to 'maven test' and 'maven verify').
 This script should just be a convenience script - running tests directly from 
 maven should not be impacted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4501) [replication] Shutting down a stream leaves recovered sources running

2011-09-27 Thread Jean-Daniel Cryans (Created) (JIRA)
[replication] Shutting down a stream leaves recovered sources running
-

 Key: HBASE-4501
 URL: https://issues.apache.org/jira/browse/HBASE-4501
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
 Fix For: 0.92.0


When removing a peer it will call ReplicationSourceManager.removePeer which 
calls closeRecoveredQueue which does this:

{code}
LOG.info(Done with the recovered queue  + src.getPeerClusterZnode());
this.oldsources.remove(src);
this.zkHelper.deleteSource(src.getPeerClusterZnode(), false);
{code}

This works in the case where the recovered source is done and is calling this 
method, but when removing a peer it never calls terminate on thus it leaving it 
running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-2196) Support more than one slave cluster

2011-09-27 Thread Jean-Daniel Cryans (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans reassigned HBASE-2196:
-

Assignee: Lars Hofhansl

 Support more than one slave cluster
 ---

 Key: HBASE-2196
 URL: https://issues.apache.org/jira/browse/HBASE-2196
 Project: HBase
  Issue Type: Sub-task
  Components: replication
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
 Fix For: 0.92.0

 Attachments: 2196-add.txt, 2196-v2.txt, 2196-v5.txt, 2196-v6.txt, 
 2196.txt, HBASE-2196-0.90-v2.patch, HBASE-2196-0.90.patch, 
 HBASE-2196-wip.patch


 Currently replication supports only 1 slave cluster, need to ability to add 
 more.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4502) Error at HMaster start

2011-09-27 Thread Madhab Nayak (Created) (JIRA)
Error at HMaster start
--

 Key: HBASE-4502
 URL: https://issues.apache.org/jira/browse/HBASE-4502
 Project: HBase
  Issue Type: Bug
 Environment: Ubuntu 11
Reporter: Madhab Nayak


I have set up hadoop 0.20.2 as Pseduo distributed. This is running fine.

And trying to integrate hbase-0.90.4/3 with hadoop. But while starting, getting 
following error at Master log.


2011-09-27 15:01:16,295 INFO org.apache.zookeeper.server.NIOServerCnxn: Client 
attempting to establish new session at /127.0.0.1:37643
2011-09-27 15:01:16,296 INFO org.apache.zookeeper.server.NIOServerCnxn: 
Established session 0x132ace7f4fa0002 with negotiated timeout 4 for client 
/127.0.0.1:37643
2011-09-27 15:01:16,296 INFO org.apache.zookeeper.ClientCnxn: Session 
establishment complete on server localhost/127.0.0.1:2181, sessionid = 
0x132ace7f4fa0002, negotiated timeout = 4
2011-09-27 15:01:16,468 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled 
exception. Starting shutdown.
java.io.IOException: Call to localhost/127.0.0.1:8020 failed on local 
exception: java.io.EOFException
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1139)
at org.apache.hadoop.ipc.Client.call(Client.java:1107)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
at $Proxy6.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:398)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384)
at 
org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:111)
at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:213)
at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:180)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1514)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1548)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1530)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:183)
at org.apache.hadoop.hbase.util.FSUtils.getRootDir(FSUtils.java:364)
at 
org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:81)
at 
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:346)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:282)
at 
org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:193)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at 
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:812)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:720)
2011-09-27 15:01:16,470 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
2011-09-27 15:01:16,470 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping 
service threads
2011-09-27 15:01:16,470 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server 
on 51567



hbase-site.xml file structure is as follows

configuration
property
namehbase.rootdir/name
valuehdfs://localhost:8020/hbase/value
  /property

property
namedfs.replication/name
value1/value
  /property
/configuration


Please help me to fix this error 


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

2011-09-27 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116008#comment-13116008
 ] 

jirapos...@reviews.apache.org commented on HBASE-3446:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2111
---


There is a lot of excellence in here.  I'm going to look at the code itself 
with this diff applied to try and understand where/how CT is now being used.  
I'm a little unclear between the lines you'd like to draw and the lines you 
actually draw in this diff.

Great work!


src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4873

maybe note here that you should not be synchronized on metaAvailable (and 
it will do so in the method)... the next method below is nicely clear in this 
regard



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4874

verify the connection works, and also that the server is actually hosting 
the region we think it is... the comment makes me think this is looking up 
which server hosts the passed region but it's just verifying if we can connect 
to the server we think is hosting the region and verifies whether it's hosting 
it or not (so this fails if we can't connect or if the region is not on this 
server)



src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
https://reviews.apache.org/r/2065/#comment4875

i'm still trying to understand exactly what you've changed and what is 
still a TODO, but this looks much nicer now! :)



src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
https://reviews.apache.org/r/2065/#comment4876

same here!  nice  (old stuff looks ripe with race conditions)



src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
https://reviews.apache.org/r/2065/#comment4877

missing copyright and year?



src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
https://reviews.apache.org/r/2065/#comment4878

nice moving cruft to separate classes



src/main/java/org/apache/hadoop/hbase/client/Result.java
https://reviews.apache.org/r/2065/#comment4881

seems like this should be moved to static methods in a helper class rather 
than exposing to our client-side Result



src/main/java/org/apache/hadoop/hbase/client/Result.java
https://reviews.apache.org/r/2065/#comment4882

yeah, shouldn't this be in MetaReader or some such class?



src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
https://reviews.apache.org/r/2065/#comment4895

missed some whitespace



src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
https://reviews.apache.org/r/2065/#comment4896

nice



src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
https://reviews.apache.org/r/2065/#comment4897

this seems like an important public method.  i like the rename and your 
additional comments, but maybe we should add more.  default behavior is to use 
a cached location, if one is not found, it is looked up in a catalog.  setting 
reload to true bypasses the cache and forces the lookup to a catalog.  and 
then, under what cases do we get an exception?  does this verify that the 
server is actually hosting the region?  or it just looks up in the catalog (i 
guess failure there could cause IOE) and if it finds something, just returns a 
connection to that RS (w/ no verification)... correct?



src/main/java/org/apache/hadoop/hbase/master/HMaster.java
https://reviews.apache.org/r/2065/#comment4898

why do you remove the javadoc on this method?



src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
https://reviews.apache.org/r/2065/#comment4899

not even necessary to put this method in here at all now (we're just using 
it for getting the node name at this point but it's probably still nice to have 
the name in stacks and such)



src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
https://reviews.apache.org/r/2065/#comment4900

yay!  3



src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
https://reviews.apache.org/r/2065/#comment4901

huh? :)



src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
https://reviews.apache.org/r/2065/#comment4902

awesome



src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
https://reviews.apache.org/r/2065/#comment4903

30,000 ft desc?  i guess test name is self descriptive? :)


- Jonathan


On 2011-09-27 06:38:09, Michael Stack wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  

[jira] [Commented] (HBASE-4448) HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility instances across unit tests

2011-09-27 Thread Doug Meil (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116012#comment-13116012
 ] 

Doug Meil commented on HBASE-4448:
--

Thanks Stack.  In general, do you support the HBaseTestingUtilityFactory 
approach, where generic MiniClusters can be re-used (where-ever possible)?

I think that HBaseClusterTestCase could be refactored to use 
HBaseTestingUtilityFactory to re-use the MiniCluster since they are all the 
same config.  It's a plus but not a huge win, but still should be done.  
Every little bit helps.

 HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility 
 instances across unit tests
 -

 Key: HBASE-4448
 URL: https://issues.apache.org/jira/browse/HBASE-4448
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: HBaseTestingUtilityFactory.java, 
 hbase_hbaseTestingUtility_uses_2011_09_22.xlsx, java_HBASE_4448.patch, 
 java_HBASE_4448_v2.patch


 Setting up and tearing down HBaseTestingUtility instances in unit tests is 
 very expensive.  On my MacBook it takes about 10 seconds to set up a 
 MiniCluster, and 7 seconds to tear it down.  When multiplied by the number of 
 test classes that use this facility, that's a lot of time in the build.
 This factory assumes that the JVM is being re-used across test classes in the 
 build, otherwise this pattern won't work. 
 I don't think this is appropriate for every use, but I think it can be 
 applicable in a great many cases - especially where developers just want a 
 simple MiniCluster with 1 slave.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4496) HFile V2 does not honor setCacheBlocks when scanning.

2011-09-27 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116014#comment-13116014
 ] 

Jonathan Gray commented on HBASE-4496:
--

I can roll this in to my changes for CacheConfig.  Don't have the JIRA off hand 
(but the point of it is to make it so we dont' have to change a bunch of 
constructors all the time).

I'm hoping to have a diff out tonight or tomorrow morning.

 HFile V2 does not honor setCacheBlocks when scanning.
 -

 Key: HBASE-4496
 URL: https://issues.apache.org/jira/browse/HBASE-4496
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0, 0.94.0

 Attachments: 4496.txt


 While testing the LRU cache during the scanning I noticed quite some churn in 
 the cache even when Scan.cacheBlocks is set to false. After debugging this, I 
 found that HFile V2 always caches blocks in the LRU cache regardless of the 
 cacheBlocks setting.
 Here's a trace (from Eclipse) showing the problem:
 HFileReaderV2.readBlock(long, int, boolean, boolean, boolean) line: 279   
 HFileReaderV2.readBlockData(long, long, int, boolean) line: 219   
 HFileBlockIndex$BlockIndexReader.seekToDataBlock(byte[], int, int, 
 HFileBlock) line: 191  
 HFileReaderV2$ScannerV2.seekTo(byte[], int, int, boolean) line: 502   
 HFileReaderV2$ScannerV2.reseekTo(byte[], int, int) line: 539  
 StoreFileScanner.reseekAtOrAfter(HFileScanner, KeyValue) line: 151
 StoreFileScanner.reseek(KeyValue) line: 110   
 KeyValueHeap.reseek(KeyValue) line: 255   
 StoreScanner.reseek(KeyValue) line: 409   
 StoreScanner.next(ListKeyValue, int) line: 304  
 KeyValueHeap.next(ListKeyValue, int) line: 114  
 KeyValueHeap.next(ListKeyValue) line: 143   
 HRegion$RegionScannerImpl.nextRow(byte[]) line: 2774  
 HRegion$RegionScannerImpl.nextInternal(int) line: 2722
 HRegion$RegionScannerImpl.next(ListKeyValue, int) line: 2682
 HRegion$RegionScannerImpl.next(ListKeyValue) line: 2699 
 HRegionServer.next(long, int) line: 2092  
 Every scanner.next causes a reseek, which eventually causes a call to 
 HFileBlockIndex$BlockIndexReader.seekToDataBlock(...) at which point the 
 cacheBlocks information is lost. HFileReaderV2.readBlockData calls 
 HFileReaderV2.readBlock with cacheBlocks set unconditionally to true.
 The fix is not immediately clear, unless we want to pass cacheBlocks to 
 HFileBlockIndex$BlockIndexReader.seekToDataBlock and then on to 
 HFileBlock.BasicReader.readBlockData and all its implementers, which is ugly 
 as readBlockData should not care about caching.
 Avoiding caching during scans is somewhat important for us.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3130) [replication] ReplicationSource can't recover from session expired on remote clusters

2011-09-27 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116016#comment-13116016
 ] 

Jean-Daniel Cryans commented on HBASE-3130:
---

Also I'd add that I was able to test the patch (on 0.90) and it really works, 
proof:

First it loses the connection:
{quote}
2011-09-27 16:44:54,984 WARN 
org.apache.hadoop.hbase.replication.ReplicationPeer: connection to cluster: 
10.10.30.7:2181:/hbase1-0x132ad0f29d70017 connection to cluster: 
10.10.30.7:2181:/hbase1-0x132ad0f29d70017 received expired from ZooKeeper, 
aborting
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired
at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:343)
at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:261)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
2011-09-27 16:44:54,984 INFO org.apache.zookeeper.ClientCnxn: EventThread shut 
down
{quote}

Then later when it tries to replicate it tries to talk to ZK again and it works 
after a reload:

{quote}
2011-09-27 16:49:03,738 DEBUG 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Since we 
are unable to replicate, sleeping 1000 times 10
2011-09-27 16:49:13,738 WARN 
org.apache.hadoop.hbase.replication.ReplicationZookeeper: Lost the ZooKeeper 
connection for peer 1
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired for /hbase1/rs
at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243)
at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenNoWatch(ZKUtil.java:389)
at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndGetAsAddresses(ZKUtil.java:355)
at 
org.apache.hadoop.hbase.replication.ReplicationZookeeper.fetchSlavesAddresses(ReplicationZookeeper.java:268)
at 
org.apache.hadoop.hbase.replication.ReplicationZookeeper.getSlavesAddresses(ReplicationZookeeper.java:239)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.chooseSinks(ReplicationSource.java:205)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:588)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:341)
2011-09-27 16:49:13,772 INFO org.apache.zookeeper.ZooKeeper: Initiating client 
connection, connectString=10.10.30.7:2181 sessionTimeout=2 
watcher=connection to cluster: 10.10.30.7:2181:/hbase1
2011-09-27 16:49:13,773 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
connection to server /10.10.30.7:2181
2011-09-27 16:49:14,111 INFO org.apache.zookeeper.ClientCnxn: Socket connection 
established to hbasedev.sfo.stumble.net/10.10.30.7:2181, initiating session
2011-09-27 16:49:14,140 INFO org.apache.zookeeper.ClientCnxn: Session 
establishment complete on server hbasedev.sfo.stumble.net/10.10.30.7:2181, 
sessionid = 0x132ad0f29d70024, negotiated timeout = 2
{quote}

 [replication] ReplicationSource can't recover from session expired on remote 
 clusters
 -

 Key: HBASE-3130
 URL: https://issues.apache.org/jira/browse/HBASE-3130
 Project: HBase
  Issue Type: Bug
  Components: replication
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Chris Trezzo
 Fix For: 0.92.0

 Attachments: 3130-v2.txt, 3130-v3.txt, 3130.txt


 Currently ReplicationSource cannot recover when its zookeeper connection to 
 its remote cluster expires. HLogs are still being tracked, but a cluster 
 restart is required to continue replication (or a rolling restart).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

2011-09-27 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116019#comment-13116019
 ] 

jirapos...@reviews.apache.org commented on HBASE-3446:
--



bq.  On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq.   There is a lot of excellence in here.  I'm going to look at the code 
itself with this diff applied to try and understand where/how CT is now being 
used.  I'm a little unclear between the lines you'd like to draw and the lines 
you actually draw in this diff.
bq.   
bq.   Great work!

Sorry about that.  Let me get you better answer to your question.  I think its 
not very clear because I myself was unclear on scope of CT when I started in.  
What this patch has here is an attempt at shutting down CT scope with 
subsequent work put off for HBASE-4495.


bq.  On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq.   src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java, line 
513
bq.   https://reviews.apache.org/r/2065/diff/1/?file=45907#file45907line513
bq.  
bq.   maybe note here that you should not be synchronized on metaAvailable 
(and it will do so in the method)... the next method below is nicely clear in 
this regard

Will do


bq.  On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq.   src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java, line 
575
bq.   https://reviews.apache.org/r/2065/diff/1/?file=45907#file45907line575
bq.  
bq.   verify the connection works, and also that the server is actually 
hosting the region we think it is... the comment makes me think this is looking 
up which server hosts the passed region but it's just verifying if we can 
connect to the server we think is hosting the region and verifies whether it's 
hosting it or not (so this fails if we can't connect or if the region is not on 
this server)

Good point.


bq.  On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq.   src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java, line 194
bq.   https://reviews.apache.org/r/2065/diff/1/?file=45908#file45908line194
bq.  
bq.   i'm still trying to understand exactly what you've changed and what 
is still a TODO, but this looks much nicer now! :)

In the above, we'd get the HRegionInterface and do the invocation on the actual 
Interface.  The alternative steps back and asks an HTable instance to do the 
work.  If an issue with former we'd just let the exception out.  In the 
alternative, we'll do HTable retries before we let the exception out (and the 
retries are boosted in server-context).


bq.  On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq.   
src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java, 
line 2
bq.   https://reviews.apache.org/r/2065/diff/1/?file=45909#file45909line2
bq.  
bq.   missing copyright and year?

Turns out that copyright is not actually needed 
https://issues.apache.org/jira/browse/HBASE-3870


bq.  On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq.   src/main/java/org/apache/hadoop/hbase/client/Result.java, line 568
bq.   https://reviews.apache.org/r/2065/diff/1/?file=45915#file45915line568
bq.  
bq.   seems like this should be moved to static methods in a helper class 
rather than exposing to our client-side Result

OK.  It was kinda nice being able to do result.getServerNameFromCatalogResult.  
I suppose it does pollute.  I can move it back to MetaReader since that seems 
like next best place.  You are right shouldn't be generally public stuff.  Will 
fix.


bq.  On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq.   src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java, line 70
bq.   https://reviews.apache.org/r/2065/diff/1/?file=45918#file45918line70
bq.  
bq.   this seems like an important public method.  i like the rename and 
your additional comments, but maybe we should add more.  default behavior is to 
use a cached location, if one is not found, it is looked up in a catalog.  
setting reload to true bypasses the cache and forces the lookup to a catalog.  
and then, under what cases do we get an exception?  does this verify that the 
server is actually hosting the region?  or it just looks up in the catalog (i 
guess failure there could cause IOE) and if it finds something, just returns a 
connection to that RS (w/ no verification)... correct?

Will look into this.


bq.  On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq.   src/main/java/org/apache/hadoop/hbase/master/HMaster.java, lines 
1041-1047
bq.   https://reviews.apache.org/r/2065/diff/1/?file=45921#file45921line1041
bq.  
bq.   why do you remove the javadoc on this method?

Will look into this.


bq.  On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq.   src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java, 
line 132
bq.   https://reviews.apache.org/r/2065/diff/1/?file=45931#file45931line132
bq.  
bq.   huh? :)

Let me fix.


bq.  On 2011-09-27 23:36:00, 

[jira] [Created] (HBASE-4503) Purge deprecated HBaseClusterTestCase

2011-09-27 Thread stack (Created) (JIRA)
Purge deprecated HBaseClusterTestCase
-

 Key: HBASE-4503
 URL: https://issues.apache.org/jira/browse/HBASE-4503
 Project: HBase
  Issue Type: Improvement
Reporter: stack


It could gain us a few minutes on overall test run in the cases where we don't 
spin up a cluster for each test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4503) Purge deprecated HBaseClusterTestCase

2011-09-27 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4503:
-

Attachment: 4503.txt

Here are a few tests converted.  Few more to do.

 Purge deprecated HBaseClusterTestCase
 -

 Key: HBASE-4503
 URL: https://issues.apache.org/jira/browse/HBASE-4503
 Project: HBase
  Issue Type: Improvement
Reporter: stack
 Attachments: 4503.txt


 It could gain us a few minutes on overall test run in the cases where we 
 don't spin up a cluster for each test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4489) Better key splitting in RegionSplitter

2011-09-27 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116021#comment-13116021
 ] 

Jonathan Hsieh commented on HBASE-4489:
---

A few thoughts:

I agree with jgray -- I think one fix should correct the MD5 string split so 
that it splits from 0x00.. 0xff.  I think there could be another separate patch 
that adds the UniformSplit.  

I'd be wary of changing the default, especially if this is means to go into a 
0.90.x branch.  It looks like as a user you can add and use the UniformSplit by 
changing the conf option. 

Ideally patches with new functionality or changing semantics would also 
introduce corresponding tests.  There were no tests on the previous code, and 
no tests in on the newly introduced code.  Adding tests especially around edge 
cases could accommodate Ted's concerns, and it doesn't really hurt to be extra 
defensive when coding on non-performance sensitive code.



 Better key splitting in RegionSplitter
 --

 Key: HBASE-4489
 URL: https://issues.apache.org/jira/browse/HBASE-4489
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Dave Revell
Assignee: Dave Revell
 Attachments: HBASE-4489-branch0.90-v1.patch, HBASE-4489-trunk-v1.patch


 The RegionSplitter utility allows users to create a pre-split table from the 
 command line or do a rolling split on an existing table. It supports 
 pluggable split algorithms that implement the SplitAlgorithm interface. The 
 only/default SplitAlgorithm is one that assumes keys fall in the range from 
 ASCII string  to ASCII string 7FFF. This is not a sane 
 default, and seems useless to most users. Users are likely to be surprised by 
 the fact that all the region splits occur in in the byte range of ASCII 
 characters.
 A better default split algorithm would be one that evenly divides the space 
 of all bytes, which is what this patch does. Making a table with five regions 
 would split at \x33\x33..., \x66\x66, \x99\x99..., \xCC\xCC..., and 
 \xFF\xFF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4448) HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility instances across unit tests

2011-09-27 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116023#comment-13116023
 ] 

stack commented on HBASE-4448:
--

I agree with your Every little bit helps.  (I am working on hbase-4503 after 
going through your excel spread sheet).  I'm still going through this issue.  
On the factory, I'm not sure yet.  I'm unclear on how it'll be passed from test 
to test and also how we prevent one test polluting another tests's run.  I also 
need to see for myself how this approach is different from the approach where 
we spin up the cluster at start of a suite and then do a bunch of tests as we 
do in TestFromClientSide and TestAdmin. 

Will be back with more comments.

A two hour test run is killing us.  It means we only get max of 12 runs a day 
(and never get that many anyways)... and we have four test branches run up on 
hudson so the long running test suite is a progress killer (as well as a 
productivity killer).  Reminds me of the microsoft build.

 HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility 
 instances across unit tests
 -

 Key: HBASE-4448
 URL: https://issues.apache.org/jira/browse/HBASE-4448
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: HBaseTestingUtilityFactory.java, 
 hbase_hbaseTestingUtility_uses_2011_09_22.xlsx, java_HBASE_4448.patch, 
 java_HBASE_4448_v2.patch


 Setting up and tearing down HBaseTestingUtility instances in unit tests is 
 very expensive.  On my MacBook it takes about 10 seconds to set up a 
 MiniCluster, and 7 seconds to tear it down.  When multiplied by the number of 
 test classes that use this facility, that's a lot of time in the build.
 This factory assumes that the JVM is being re-used across test classes in the 
 build, otherwise this pattern won't work. 
 I don't think this is appropriate for every use, but I think it can be 
 applicable in a great many cases - especially where developers just want a 
 simple MiniCluster with 1 slave.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

2011-09-27 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116024#comment-13116024
 ] 

jirapos...@reviews.apache.org commented on HBASE-3446:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2124
---



src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
https://reviews.apache.org/r/2065/#comment4913

This doesn't seem right.


- Ted


On 2011-09-27 06:38:09, Michael Stack wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2065/
bq.  ---
bq.  
bq.  (Updated 2011-09-27 06:38:09)
bq.  
bq.  
bq.  Review request for hbase and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Make the Meta* operations against meta retry.  We do it by using HTable 
instances.
bq.  (HTable calls HConnection.getRegionServerWithRetries for get, put, scan 
etc).
bq.  In 0.89, we had special RetryableMetaOperation class that was a
bq.  subclass of Callable which reproduced the guts of 
HConnection.getRegionServerWithRetries
bq.  with its retry loop.  Now we just use HTable instead (Costs some on setup 
but
bq.  otherwise, we avoid duplicating code).  Upped the retries on serverside 
too.
bq.  
bq.  Had problem with CatalogJanitor.  MetaReader and MetaEditor were relying
bq.  heavily on CT methods getting proxy connections to meta and root servers.
bq.  CT needs to be cut back.  This patch closes down access on (unused) public
bq.  methods and removes being able to get an HRegionInterface on meta and root
bq.  -- this stuff is used internally to CT only now; use MetaEditor or
bq.  MetaReader if you want to update or read catalog tables.  Opening new issue
bq.  to cutback CT use over the code base.
bq.  
bq.  A little off topic but couldn't help it since was in MetaReader and 
MetaEditor
bq.  trying to clean them up, I ended up moving meta migration code out to its
bq.  own class rather than have it in all inside in MetaEditor.
bq.  
bq.  Here is some detail to help reviews.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq.Clean up.  Shutdown access on some of these unused methods.  Don't
bq.let out HRegionInterface instances in particular since we are going
bq.away from raw HRI use to instead use a connection with retries:
bq.i.e. HTable.
bq.  
bq.Comments on state of this class. Javadoc edits.
bq.getZooKeeperWatcher on HConnection is deprecated so don't use it
bq.in constructor.  Override MetaNodeTracker and on node delete
bq.reset meta location (We used to do this over in MetaNodeTracker
bq.but to do that we had to have a CatalogTracker over in zk package
bq.which is silly -- bad package encapsulation).
bq.  
bq.(waitForRootServer) Renamed getRootServerConnection and change it
bq.from public to package private.
bq.(waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq.(getMetaServerConnection) Change from public to package private.
bq.Use MetaReader to read the meta location in root rather than a
bq.raw HRegionInterface so we get retrying.
bq.(remaining, timedout) Added utility methods.
bq.(waitForMetaServer) Changed from public to private.
bq.(resetMetaLocation) Made it synchronized on metaAvailable.
bq.Not all accesses were synchronized.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq.Refactor to use HTable instead of raw HRegionInterface so we get
bq.retrying.  For each operation we get an HTable, use it, then close it.
bq.(putToMetaTable, putsToMetaTable, etc) Utility methods.
bq.(updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq.class since these classes are for a one-time migration only.
bq.  
bq.  A 
src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
bq.New class that holds all Meta* methods updating meta table used
bq.doing the one-time migration done to meta on startup.  This class
bq.is marked deprecated because its going to be dropped in 0.94.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
bq.Retrofit methods in here to use fullScan methods with Visitor.
bq.(getCatalogRegionInterface, getCatalogRegionNameForTable,
bq.  getCatalogRegionNameForRegion) Removed.
bq.(fullScan) Cleaned up the fullScans.  Fixed up wrong javadoc.
bq.(fullScanOfResults) Renamed as fullScan override.
bq.(fullScanOfRoot) Added as deprecated. We should be doing
bq.this against zk.
bq.(metaRowToRegionPair, getServerNameFromResult) Moved to Result

[jira] [Commented] (HBASE-4497) If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE

2011-09-27 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116032#comment-13116032
 ] 

Jonathan Gray commented on HBASE-4497:
--

I was just discussing this scenario with Dhruba a few days back.  There's 
definitely a race condition here and I don't see a trivial fix.

We use HLog IO-fencing to ensure that edits don't slip into an HLog after a 
server is considered dead by the Master.  But the Master has no way to prevent 
this META update from slipping in.

We need to make some modification to how the master can safely timeout an 
OPENING.  One possibility is for the master to require either an acknowledgment 
from the RS before moving the region elsewhere or for the RS to die.  It seems 
unlikely that we will actually see the RS to Master acknowledgment since 
OPENING taking too long is usually a sign of brokenness or the RS being backed 
up, I think.  But in any case I'd imagine some kind of OPEN_CANCEL_REQUESTED 
state that the Master transitions the node to and only when the RS transitions 
to OPEN_CANCELED or OFFLINE or something, then it's safe to reassign elsewhere.

I think this design still has a hole in it though because there are scenarios 
where the RS doesn't actually die but for some reason doesn't OPEN or ack the 
cancel. 

Another option would be to do the RS performed META edits using a CheckAndPut 
rather than straight Put.  Or we could move META editing back to the Master 
where it's easy to do things atomically :)

The CheckAndPut idea is kind of neat but we'd probably have to send more data 
on the OPEN_RPC.  For example, the existing server start code or server name + 
start code or something guaranteed unique (guaranteed that a conflicting RS 
opening stuff wouldn't be able to use the same thing).  Then the atomicity is 
on the META region.

 If region opening fails after updating META HBCK reports it as inconsistent 
 and scanning the region throws NSRE
 ---

 Key: HBASE-4497
 URL: https://issues.apache.org/jira/browse/HBASE-4497
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Priority: Critical

 As per the discussion in the mail chain HBCK reporting of possible mismatch 
 in RS assignment this JIRA is created.
 Consider two RS- RS1 and RS2.
 A region tries to open in RS1. But it takes a while.  The RS1 has still not 
 updated meta and transitioned the node from OPENING to OPENED
 So timeout assigns the region to RS2.  RS2 successfully updates the META and 
 opens the region.
 Now RS1 tries to act on the region by first updating the META and then 
 transiting the node to OPENING to OPENED.
 RS1 transiting the node to OPENING to OPENED will fail.  But the META entry 
 will have RS1 as the latest.
 Now HBCK reports this as an inconsistency and if we try to scan the Region we 
 get NotServingRegionException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4209) The HBase hbase-daemon.sh SIGKILLs master when stopping it

2011-09-27 Thread Roman Shaposhnik (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Shaposhnik updated HBASE-4209:


Attachment: HBASE-4209.final.patch.txt

Latest version of patch attached. Unfortunately, it has grown since last time 
I've submitted it to incorporated feedback from this JIRA. 

All tests pass on my local workstation.

 The HBase hbase-daemon.sh SIGKILLs master when stopping it
 --

 Key: HBASE-4209
 URL: https://issues.apache.org/jira/browse/HBASE-4209
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik
 Attachments: HBASE-4209.final.patch.txt, HBASE-4209.patch.txt


 There's a bit of code in hbase-daemon.sh that makes HBase master being 
 SIGKILLed when stopping it rather than trying SIGTERM (like it does for other 
 daemons). When HBase is executed in a standalone mode (and the only daemon 
 you need to run is master) that causes newly created tables to go missing as 
 unflushed data is thrown out. If there was not a good reason to kill master 
 with SIGKILL perhaps we can take that special case out and rely on SIGTERM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4455) Rolling restart RSs scenario, -ROOT-, .META. regions are lost in AssignmentManager

2011-09-27 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116059#comment-13116059
 ] 

Hudson commented on HBASE-4455:
---

Integrated in HBase-0.92 #24 (See 
[https://builds.apache.org/job/HBase-0.92/24/])
HBASE-4455 Rolling restart RSs scenario, -ROOT-, .META. regions are lost in 
AssignmentManager (Ming Ma)

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/MasterAddressTracker.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/handler/MetaServerShutdownHandler.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ClusterStatusTracker.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperNodeTracker.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperNodeTracker.java


 Rolling restart RSs scenario, -ROOT-, .META. regions are lost in 
 AssignmentManager
 --

 Key: HBASE-4455
 URL: https://issues.apache.org/jira/browse/HBASE-4455
 Project: HBase
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
 Fix For: 0.92.0


 Keep Master up all the time, do rolling restart of RSs like this - stop RS1, 
 wait for 2 seconds, stop RS2, start RS1, wait for 2 seconds, stop RS3, start 
 RS2, wait for 2 seconds, etc. After a while, you will find the -ROOT-, .META. 
 regions aren't in regions in transtion from AssignmentManager point of 
 view, but they aren't assigned to any regions. Here are the issues.
 1. .-ROOT- or .META. location is stale when MetaServerShutdownHandler is 
 invoked to check if it contains -ROOT- region. That is due to long delay from 
 ZK notification and async nature of the system. Here is an example, even 
 though new root region server sea-lab-1,60020,1316380133656 is set at T2, at 
 T3 the shutdown process for sea-lab-1,60020,1316380133656, the root location 
 still points to old server sea-lab-3,60020,1316380037898.
 T1: 2011-09-18 14:08:52,470 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
 master:6
 -0x1327e43175e Retrieved 29 byte(s) of data from znode 
 /hbase/root-regio
 n-server and set watcher; sea-lab-3,60020,1316380037898
 T2: 2011-09-18 14:08:57,173 INFO 
 org.apache.hadoop.hbase.catalog.RootLocationEditor: Setting ROOT region 
 location in ZooKeeper as sea-lab-1,60020,1316380133656
 T3: 2011-09-18 14:10:26,393 DEBUG 
 org.apache.hadoop.hbase.master.ServerManager: Adde
 d=sea-lab-1,60020,1316380133656 to dead servers, submitted shutdown handler 
 to be executed, root=false, meta=true, current Root Location: 
 sea-lab-3,60020,1316380037898
 T4: 2011-09-18 14:12:37,314 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
 master:6
 -0x1327e43175e Retrieved 29 byte(s) of data from znode 
 /hbase/root-region-server and set watcher; sea-lab-1,60020,1316380133656
 2. The MetaServerShutdownHandler worker thread that waits for -ROOT- or 
 .META. availability could be blocked. If meanwhile, the new server that 
 -ROOT- or .META. is being assigned restarted, another instance of 
 MetaServerShutdownHandler is queued. Eventually, all 
 MetaServerShutdownHandler worker threads are filled up. It looks like 
 HBASE-4245.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4448) HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility instances across unit tests

2011-09-27 Thread Doug Meil (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116061#comment-13116061
 ] 

Doug Meil commented on HBASE-4448:
--

re:  passed from test to test

This is described in the issue description:  this approach depends on JVM 
re-use, otherwise, it won't work.  The factory has a cache of MiniClusters 
based on the number of slaves, and it can be added down the road for DFS 
cluster or ZkCluster as needed.

re:  pollution

I hate pollution!  :-)

There are two cases:  intra-test and inter-test.
Inter-test is handled via all the tables getting disabled and whacked when the 
minicluster is returned to the factory.  
Intra-test (i.e., multiple test methods in the same class) is the same as is 
now.  There is no automatic cleanup between test-methods in the same class even 
without this utility, so it's no worse.  
  

 HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility 
 instances across unit tests
 -

 Key: HBASE-4448
 URL: https://issues.apache.org/jira/browse/HBASE-4448
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: HBaseTestingUtilityFactory.java, 
 hbase_hbaseTestingUtility_uses_2011_09_22.xlsx, java_HBASE_4448.patch, 
 java_HBASE_4448_v2.patch


 Setting up and tearing down HBaseTestingUtility instances in unit tests is 
 very expensive.  On my MacBook it takes about 10 seconds to set up a 
 MiniCluster, and 7 seconds to tear it down.  When multiplied by the number of 
 test classes that use this facility, that's a lot of time in the build.
 This factory assumes that the JVM is being re-used across test classes in the 
 build, otherwise this pattern won't work. 
 I don't think this is appropriate for every use, but I think it can be 
 applicable in a great many cases - especially where developers just want a 
 simple MiniCluster with 1 slave.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3130) [replication] ReplicationSource can't recover from session expired on remote clusters

2011-09-27 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116072#comment-13116072
 ] 

Lars Hofhansl commented on HBASE-3130:
--

You sound like you are surprised :)




 [replication] ReplicationSource can't recover from session expired on remote 
 clusters
 -

 Key: HBASE-3130
 URL: https://issues.apache.org/jira/browse/HBASE-3130
 Project: HBase
  Issue Type: Bug
  Components: replication
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Chris Trezzo
 Fix For: 0.92.0

 Attachments: 3130-v2.txt, 3130-v3.txt, 3130.txt


 Currently ReplicationSource cannot recover when its zookeeper connection to 
 its remote cluster expires. HLogs are still being tracked, but a cluster 
 restart is required to continue replication (or a rolling restart).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3130) [replication] ReplicationSource can't recover from session expired on remote clusters

2011-09-27 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116073#comment-13116073
 ] 

Jean-Daniel Cryans commented on HBASE-3130:
---

More like joyful amusement, it's been broken for so long...  we're pushing this 
in prod really soon.

 [replication] ReplicationSource can't recover from session expired on remote 
 clusters
 -

 Key: HBASE-3130
 URL: https://issues.apache.org/jira/browse/HBASE-3130
 Project: HBase
  Issue Type: Bug
  Components: replication
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Chris Trezzo
 Fix For: 0.92.0

 Attachments: 3130-v2.txt, 3130-v3.txt, 3130.txt


 Currently ReplicationSource cannot recover when its zookeeper connection to 
 its remote cluster expires. HLogs are still being tracked, but a cluster 
 restart is required to continue replication (or a rolling restart).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4062) Multi-column scanner unit test

2011-09-27 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116075#comment-13116075
 ] 

jirapos...@reviews.apache.org commented on HBASE-4062:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1012/
---

(Updated 2011-09-28 02:16:10.418401)


Review request for hbase and Michael Stack.


Changes
---

Linking the HBASE-4062 JIRA to this review.


Summary
---

Adding a unit test for the multi-column scanner. We are using this this to test 
an optimization we are making to multi-column scans using row-column Bloom 
filters. The scanner creates multiple StoreFiles for a single column family, 
each containing a randomized set of columns with different timestamps, and then 
tests scanning through the whole region with all possible sets of columns 
specified in the query. Point deletes (deletes of a specific timestamp) are 
also tested.


This addresses bug HBASE-4062.
https://issues.apache.org/jira/browse/HBASE-4062


Diffs
-

  
src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/1012/diff


Testing
---

Run the unit test. Break the store scanner and make sure the test breaks.


Thanks,

Mikhail



 Multi-column scanner unit test
 --

 Key: HBASE-4062
 URL: https://issues.apache.org/jira/browse/HBASE-4062
 Project: HBase
  Issue Type: Improvement
  Components: regionserver, test
Affects Versions: 0.94.0
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
  Labels: test
 Attachments: test-multi-column-scanner.patch


 Adding a unit test for the multi-column scanner. We are using this this to 
 test an optimization we are making to multi-column scans using row-column 
 Bloom filters. The scanner creates multiple StoreFiles for a single column 
 family, each containing a randomized set of columns with different 
 timestamps, and then tests scanning through the whole region with all 
 possible sets of columns specified in the query. Point deletes (deletes of a 
 specific timestamp) are also tested.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4062) Multi-column scanner unit test

2011-09-27 Thread Mikhail Bautin (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116077#comment-13116077
 ] 

Mikhail Bautin commented on HBASE-4062:
---

@Michael: apparently the patch is already committed -- can I close this JIRA?

 Multi-column scanner unit test
 --

 Key: HBASE-4062
 URL: https://issues.apache.org/jira/browse/HBASE-4062
 Project: HBase
  Issue Type: Improvement
  Components: regionserver, test
Affects Versions: 0.94.0
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
  Labels: test
 Attachments: test-multi-column-scanner.patch


 Adding a unit test for the multi-column scanner. We are using this this to 
 test an optimization we are making to multi-column scans using row-column 
 Bloom filters. The scanner creates multiple StoreFiles for a single column 
 family, each containing a randomized set of columns with different 
 timestamps, and then tests scanning through the whole region with all 
 possible sets of columns specified in the query. Point deletes (deletes of a 
 specific timestamp) are also tested.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4496) HFile V2 does not honor setCacheBlocks when scanning.

2011-09-27 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116101#comment-13116101
 ] 

Lars Hofhansl commented on HBASE-4496:
--

That sounds like a plan Jon. Let me know if you'd like me to have a look at the 
combined patch.

 HFile V2 does not honor setCacheBlocks when scanning.
 -

 Key: HBASE-4496
 URL: https://issues.apache.org/jira/browse/HBASE-4496
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0, 0.94.0

 Attachments: 4496.txt


 While testing the LRU cache during the scanning I noticed quite some churn in 
 the cache even when Scan.cacheBlocks is set to false. After debugging this, I 
 found that HFile V2 always caches blocks in the LRU cache regardless of the 
 cacheBlocks setting.
 Here's a trace (from Eclipse) showing the problem:
 HFileReaderV2.readBlock(long, int, boolean, boolean, boolean) line: 279   
 HFileReaderV2.readBlockData(long, long, int, boolean) line: 219   
 HFileBlockIndex$BlockIndexReader.seekToDataBlock(byte[], int, int, 
 HFileBlock) line: 191  
 HFileReaderV2$ScannerV2.seekTo(byte[], int, int, boolean) line: 502   
 HFileReaderV2$ScannerV2.reseekTo(byte[], int, int) line: 539  
 StoreFileScanner.reseekAtOrAfter(HFileScanner, KeyValue) line: 151
 StoreFileScanner.reseek(KeyValue) line: 110   
 KeyValueHeap.reseek(KeyValue) line: 255   
 StoreScanner.reseek(KeyValue) line: 409   
 StoreScanner.next(ListKeyValue, int) line: 304  
 KeyValueHeap.next(ListKeyValue, int) line: 114  
 KeyValueHeap.next(ListKeyValue) line: 143   
 HRegion$RegionScannerImpl.nextRow(byte[]) line: 2774  
 HRegion$RegionScannerImpl.nextInternal(int) line: 2722
 HRegion$RegionScannerImpl.next(ListKeyValue, int) line: 2682
 HRegion$RegionScannerImpl.next(ListKeyValue) line: 2699 
 HRegionServer.next(long, int) line: 2092  
 Every scanner.next causes a reseek, which eventually causes a call to 
 HFileBlockIndex$BlockIndexReader.seekToDataBlock(...) at which point the 
 cacheBlocks information is lost. HFileReaderV2.readBlockData calls 
 HFileReaderV2.readBlock with cacheBlocks set unconditionally to true.
 The fix is not immediately clear, unless we want to pass cacheBlocks to 
 HFileBlockIndex$BlockIndexReader.seekToDataBlock and then on to 
 HFileBlock.BasicReader.readBlockData and all its implementers, which is ugly 
 as readBlockData should not care about caching.
 Avoiding caching during scans is somewhat important for us.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4477) Ability for an application to store metadata into the transaction log

2011-09-27 Thread dhruba borthakur (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116106#comment-13116106
 ] 

dhruba borthakur commented on HBASE-4477:
-

@Andrew: I would like to do this via a co-processor API, can I change the 
signature of RegionObserver.prePut() to take in two additional arguments: a 
WALEdit and Put object? If so, shall I mark the existing prePut precessor api 
as Deprecated?

anybody else have any opinions on where to put the code that implements this 
functionality? One place is 
org.apache.hadoop.hbase.coprocessor.library.walMetadata. 

 Ability for an application to store metadata into the transaction log
 -

 Key: HBASE-4477
 URL: https://issues.apache.org/jira/browse/HBASE-4477
 Project: HBase
  Issue Type: Improvement
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: hlogMetadata1.txt


 mySQL allows an application to store an arbitrary blob along with each 
 transaction in its transaction logs. This JIRA is to have a similar feature 
 request for HBASE.
 The use case is as follows: An application on one data center A stores a blob 
 of data along with each transaction. A replication software picks up these 
 blobs from the transaction logs in A and hands it to another instance of the 
 same application running on a remote data center B. The application in B is 
 responsible for applying this to the remote Hbase cluster (and also handle 
 conflict resolution if any).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4497) If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE

2011-09-27 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116134#comment-13116134
 ] 

ramkrishna.s.vasudevan commented on HBASE-4497:
---

I got this problem for 3 regions before HBASE-4452 went in. Now HBASE-4452 will 
definitely reduce the probability of this happening.



 If region opening fails after updating META HBCK reports it as inconsistent 
 and scanning the region throws NSRE
 ---

 Key: HBASE-4497
 URL: https://issues.apache.org/jira/browse/HBASE-4497
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Priority: Critical

 As per the discussion in the mail chain HBCK reporting of possible mismatch 
 in RS assignment this JIRA is created.
 Consider two RS- RS1 and RS2.
 A region tries to open in RS1. But it takes a while.  The RS1 has still not 
 updated meta and transitioned the node from OPENING to OPENED
 So timeout assigns the region to RS2.  RS2 successfully updates the META and 
 opens the region.
 Now RS1 tries to act on the region by first updating the META and then 
 transiting the node to OPENING to OPENED.
 RS1 transiting the node to OPENING to OPENED will fail.  But the META entry 
 will have RS1 as the latest.
 Now HBCK reports this as an inconsistency and if we try to scan the Region we 
 get NotServingRegionException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4497) If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE

2011-09-27 Thread dhruba borthakur (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116139#comment-13116139
 ] 

dhruba borthakur commented on HBASE-4497:
-

Can somebody pl elaborate any disadvantages if we make the master be the only 
entity that can update META about where the region is being served from?

 If region opening fails after updating META HBCK reports it as inconsistent 
 and scanning the region throws NSRE
 ---

 Key: HBASE-4497
 URL: https://issues.apache.org/jira/browse/HBASE-4497
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Priority: Critical

 As per the discussion in the mail chain HBCK reporting of possible mismatch 
 in RS assignment this JIRA is created.
 Consider two RS- RS1 and RS2.
 A region tries to open in RS1. But it takes a while.  The RS1 has still not 
 updated meta and transitioned the node from OPENING to OPENED
 So timeout assigns the region to RS2.  RS2 successfully updates the META and 
 opens the region.
 Now RS1 tries to act on the region by first updating the META and then 
 transiting the node to OPENING to OPENED.
 RS1 transiting the node to OPENING to OPENED will fail.  But the META entry 
 will have RS1 as the latest.
 Now HBCK reports this as an inconsistency and if we try to scan the Region we 
 get NotServingRegionException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4497) If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE

2011-09-27 Thread Ming Ma (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116151#comment-13116151
 ] 

Ming Ma commented on HBASE-4497:


checkAndPut might work. We will use checkAndPut on both ZK as well as HBase. 
There are other bugs due to the lack of strong synchronization on the ZK nodes 
among AssignmentManager and RSs. Here is another scenario for race between AM 
timeoutMonitor and the first RS's openRegion operation.

RS1 successfully transition to OPENED state around the same time as 
timeoutMonitor kicks in, timeoutMonitor gets data from ZK right before RS1 set 
it to OPENED, thus timeoutMonitor has RS_ZK_REGION_OPENING and tries to 
reassign the region. In that case, we will end up with the same region on two 
RSs.


Will the followings work?

1. ZKAssign.transitionNode has some sort of checkAndPut semantics when it tries 
to enforce the original state is the correct one. However, it isn't atomic. It 
first tries to getData from ZK and then compare. Instead, we can use ZK's 
checkAndPut API to enforce the atomicity.
2. Introduce a ZK-base global AtomicInteger for region operation; e.g., each 
openRegion operation will use a new incremental region_operation_ID. Each 
openRegion operation will validate its own ID with ZK state via checkAndPut. 
Thus one of the two openRegion operations on RSs won't work.
3. With regard to HBase .META. update, we can put region_operation_ID into the 
table and enforce new update's region operation ID has to be greater than the 
previous version for a given region. In that way the older RS won't be able to 
update the table properly. We will need to introduce a new API for HBase, 
similar to checkAndPut, more like checkGreaterandPut.


 If region opening fails after updating META HBCK reports it as inconsistent 
 and scanning the region throws NSRE
 ---

 Key: HBASE-4497
 URL: https://issues.apache.org/jira/browse/HBASE-4497
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Priority: Critical

 As per the discussion in the mail chain HBCK reporting of possible mismatch 
 in RS assignment this JIRA is created.
 Consider two RS- RS1 and RS2.
 A region tries to open in RS1. But it takes a while.  The RS1 has still not 
 updated meta and transitioned the node from OPENING to OPENED
 So timeout assigns the region to RS2.  RS2 successfully updates the META and 
 opens the region.
 Now RS1 tries to act on the region by first updating the META and then 
 transiting the node to OPENING to OPENED.
 RS1 transiting the node to OPENING to OPENED will fail.  But the META entry 
 will have RS1 as the latest.
 Now HBCK reports this as an inconsistency and if we try to scan the Region we 
 get NotServingRegionException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4459) HbaseObjectWritable code is a byte, we will eventually run out of codes

2011-09-27 Thread dhruba borthakur (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116155#comment-13116155
 ] 

dhruba borthakur commented on HBASE-4459:
-

@Todd: which varint are you referring to here?

 HbaseObjectWritable code is a byte, we will eventually run out of codes
 ---

 Key: HBASE-4459
 URL: https://issues.apache.org/jira/browse/HBASE-4459
 Project: HBase
  Issue Type: Bug
  Components: io
Reporter: Jonathan Gray
Priority: Critical
 Fix For: 0.94.0


 There are about 90 classes/codes in HbaseObjectWritable currently and 
 Byte.MAX_VALUE is 127.  In addition, anyone wanting to add custom classes but 
 not break compatibility might want to leave a gap before using codes and 
 that's difficult in such limited space.
 Eventually we should get rid of this pattern that makes compatibility 
 difficult (better client/server protocol handshake) but we should probably at 
 least bump this to a short for 0.94.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4497) If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE

2011-09-27 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116160#comment-13116160
 ] 

ramkrishna.s.vasudevan commented on HBASE-4497:
---

I am not aware of ZK much.
Your 3rd point looks good to me Ming.
I think HBASE-4015 may handle the race you have told above.
{code}
  if (hijack  null != curDataInZNode) {
EventType eventType = curDataInZNode.getEventType();
if (eventType.equals(EventType.RS_ZK_REGION_CLOSING)
|| eventType.equals(EventType.RS_ZK_REGION_CLOSED)
|| eventType.equals(EventType.RS_ZK_REGION_OPENED)) {
  return -1;
}
{code}
Also if the timeout succeeds the transiting from OPENING to OPENED will fail in 
RS.
May be there may be an entry in META with the old RS.


 If region opening fails after updating META HBCK reports it as inconsistent 
 and scanning the region throws NSRE
 ---

 Key: HBASE-4497
 URL: https://issues.apache.org/jira/browse/HBASE-4497
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Priority: Critical

 As per the discussion in the mail chain HBCK reporting of possible mismatch 
 in RS assignment this JIRA is created.
 Consider two RS- RS1 and RS2.
 A region tries to open in RS1. But it takes a while.  The RS1 has still not 
 updated meta and transitioned the node from OPENING to OPENED
 So timeout assigns the region to RS2.  RS2 successfully updates the META and 
 opens the region.
 Now RS1 tries to act on the region by first updating the META and then 
 transiting the node to OPENING to OPENED.
 RS1 transiting the node to OPENING to OPENED will fail.  But the META entry 
 will have RS1 as the latest.
 Now HBCK reports this as an inconsistency and if we try to scan the Region we 
 get NotServingRegionException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira