date:20110927


 [ 
https://issues.apache.org/jira/browse/HBASE-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amitanand Aiyer updated HBASE-4485:
---

Attachment: repro_bug-4485.diff

Here is a way to repro the bug.

uses unnecessary sleep to get things to go bad. Not intended to be included in 
the final diff/submission.

 Eliminate window of missing Data
 

 Key: HBASE-4485
 URL: https://issues.apache.org/jira/browse/HBASE-4485
 Project: HBase
  Issue Type: Sub-task
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
 Fix For: 0.94.0

 Attachments: 4485-v1.diff, repro_bug-4485.diff


 After incorporating v11 of the 2856 fix, we discovered that we are still 
 having some ACID violations.
 This time, however, the problem is not about including newer updates; but, 
 about missing older updates
 that should be including. 
 Here is what seems to be happening.
 There is a race condition in the StoreScanner.getScanners()
   private ListKeyValueScanner getScanners(Scan scan,
   final NavigableSetbyte[] columns) throws IOException {
 // First the store file scanners
 ListStoreFileScanner sfScanners = StoreFileScanner
   .getScannersForStoreFiles(store.getStorefiles(), cacheBlocks,
 isGet, false);
 ListKeyValueScanner scanners =
   new ArrayListKeyValueScanner(sfScanners.size()+1);
 // include only those scan files which pass all filters
 for (StoreFileScanner sfs : sfScanners) {
   if (sfs.shouldSeek(scan, columns)) {
 scanners.add(sfs);
   }
 }
 // Then the memstore scanners
 if (this.store.memstore.shouldSeek(scan)) {
   scanners.addAll(this.store.memstore.getScanners());
 }
 return scanners;
   }
 If for example there is a call to Store.updateStorefiles() that happens 
 between
 the store.getStorefiles() and this.store.memstore.getScanners(); then
 it is possible that there was a new HFile created, that is not seen by the
 StoreScanner, and the data is not present in the Memstore.snapshot either.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4485) Eliminate window of missing Data


 [ 
https://issues.apache.org/jira/browse/HBASE-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amitanand Aiyer updated HBASE-4485:
---

Attachment: 4485-v2.diff

fix the issue using locks to ensure that Store.updateStoreFiles does not get 
called between StoreScanner getting the List of store files, and it getting to 
the MemStoreScanner.

 Eliminate window of missing Data
 

 Key: HBASE-4485
 URL: https://issues.apache.org/jira/browse/HBASE-4485
 Project: HBase
  Issue Type: Sub-task
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
 Fix For: 0.94.0

 Attachments: 4485-v1.diff, 4485-v2.diff, 4485-v3.diff, 
 repro_bug-4485.diff


 After incorporating v11 of the 2856 fix, we discovered that we are still 
 having some ACID violations.
 This time, however, the problem is not about including newer updates; but, 
 about missing older updates
 that should be including. 
 Here is what seems to be happening.
 There is a race condition in the StoreScanner.getScanners()
   private ListKeyValueScanner getScanners(Scan scan,
   final NavigableSetbyte[] columns) throws IOException {
 // First the store file scanners
 ListStoreFileScanner sfScanners = StoreFileScanner
   .getScannersForStoreFiles(store.getStorefiles(), cacheBlocks,
 isGet, false);
 ListKeyValueScanner scanners =
   new ArrayListKeyValueScanner(sfScanners.size()+1);
 // include only those scan files which pass all filters
 for (StoreFileScanner sfs : sfScanners) {
   if (sfs.shouldSeek(scan, columns)) {
 scanners.add(sfs);
   }
 }
 // Then the memstore scanners
 if (this.store.memstore.shouldSeek(scan)) {
   scanners.addAll(this.store.memstore.getScanners());
 }
 return scanners;
   }
 If for example there is a call to Store.updateStorefiles() that happens 
 between
 the store.getStorefiles() and this.store.memstore.getScanners(); then
 it is possible that there was a new HFile created, that is not seen by the
 StoreScanner, and the data is not present in the Memstore.snapshot either.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4485) Eliminate window of missing Data

2011-09-27 Thread ramkrishna.s.vasudevan (Commented) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amitanand Aiyer updated HBASE-4485:
---

Attachment: 4485-v3.diff

Move some functions to Store.java so that we do not have to access store.lock 
from StoreScanner.java

Passes the TestAcidGuarantees on the internal branch (0.89). 

Running the test suite on the open source trunk (in progress)

 Eliminate window of missing Data
 

 Key: HBASE-4485
 URL: https://issues.apache.org/jira/browse/HBASE-4485
 Project: HBase
  Issue Type: Sub-task
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
 Fix For: 0.94.0

 Attachments: 4485-v1.diff, 4485-v2.diff, 4485-v3.diff, 
 repro_bug-4485.diff


 After incorporating v11 of the 2856 fix, we discovered that we are still 
 having some ACID violations.
 This time, however, the problem is not about including newer updates; but, 
 about missing older updates
 that should be including. 
 Here is what seems to be happening.
 There is a race condition in the StoreScanner.getScanners()
   private ListKeyValueScanner getScanners(Scan scan,
   final NavigableSetbyte[] columns) throws IOException {
 // First the store file scanners
 ListStoreFileScanner sfScanners = StoreFileScanner
   .getScannersForStoreFiles(store.getStorefiles(), cacheBlocks,
 isGet, false);
 ListKeyValueScanner scanners =
   new ArrayListKeyValueScanner(sfScanners.size()+1);
 // include only those scan files which pass all filters
 for (StoreFileScanner sfs : sfScanners) {
   if (sfs.shouldSeek(scan, columns)) {
 scanners.add(sfs);
   }
 }
 // Then the memstore scanners
 if (this.store.memstore.shouldSeek(scan)) {
   scanners.addAll(this.store.memstore.getScanners());
 }
 return scanners;
   }
 If for example there is a call to Store.updateStorefiles() that happens 
 between
 the store.getStorefiles() and this.store.memstore.getScanners(); then
 it is possible that there was a new HFile created, that is not seen by the
 StoreScanner, and the data is not present in the Memstore.snapshot either.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4492) TestRollingRestart fails intermittently


[ 
https://issues.apache.org/jira/browse/HBASE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115543#comment-13115543
 ] 

ramkrishna.s.vasudevan commented on HBASE-4492:
---

Hi Ted
I found the problem but i dont have the patch with me now.
Pls correct me if am wrong.
As per my analysis, the testcase always passes under one scenario when the ROOT 
and META are in the same RS.
If we take the testcode 
{code}
// Bring the RS hosting ROOT down and the RS hosting META down at once
RegionServerThread rootServer = getServerHostingRoot(cluster);
RegionServerThread metaServer = getServerHostingMeta(cluster);
if (rootServer == metaServer) {
  log(ROOT and META on the same server so killing another random server);
  int i=0;
  while (rootServer == metaServer) {
metaServer = cluster.getRegionServerThreads().get(i);
i++;
  }
}
log(Stopping server hosting ROOT);
rootServer.getRegionServer().stop(Stopping ROOT server);
log(Stopping server hosting META #1);
metaServer.getRegionServer().stop(Stopping META server);
{code}
we try to be cautious if the ROOT and META are in the same RS.  If we find ROOT 
and META in same RS we just assign metaServer to another RS(we dont check if it 
is the one having META) but still call rootServer.stop().
Now this will stop the rootServer internally closing both the root and meta 
region in it.
Hence the line
{code}
cluster.hbaseCluster.waitOnRegionServer(metaServer);
{code}
This also comes out.
Now while the ServerShutdownhandler processes this it can cleanly open a new 
ROOT and META.

If you take the failure cases the problem has been like the ROOT and META are 
assigned to different servers.
There is a time gap inbetween the ROOT rs going down and the META rs going 
down. This where something happens like the ROOT is tried to be assigned by the 
ServerShutdownHandler invoked due to ROOT rs at the same time some one else 
tries to assign the ROOT node(who is this someone am not clear :)) 
Pls do correct me if am wrong.  Will try to provide a patch so that this issue 
doesnt come.




 TestRollingRestart fails intermittently
 ---

 Key: HBASE-4492
 URL: https://issues.apache.org/jira/browse/HBASE-4492
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Jonathan Gray
 Attachments: 4492.txt


 I got the following when running test suite on TRUNK:
 {code}
 testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart)  
 Time elapsed: 300.28 sec   ERROR!
 java.lang.Exception: test timed out after 30 milliseconds
 at java.lang.Thread.sleep(Native Method)
 at 
 org.apache.hadoop.hbase.master.TestRollingRestart.waitForRSShutdownToStartAndFinish(TestRollingRestart.java:313)
 at 
 org.apache.hadoop.hbase.master.TestRollingRestart.testBasicRollingRestart(TestRollingRestart.java:210)
 {code}
 I ran TestRollingRestart#testBasicRollingRestart manually afterwards which 
 wiped out test output file for the failed test.
 Similar failure can be found on Jenkins:
 https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/19/testReport/junit/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4485) Eliminate window of missing Data


 [ 
https://issues.apache.org/jira/browse/HBASE-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amitanand Aiyer updated HBASE-4485:
---

Attachment: 4485-v4.diff

move notifyChangeReaderObservers() outside the lock at all places.

Testing in progress.

 Eliminate window of missing Data
 

 Key: HBASE-4485
 URL: https://issues.apache.org/jira/browse/HBASE-4485
 Project: HBase
  Issue Type: Sub-task
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
 Fix For: 0.94.0

 Attachments: 4485-v1.diff, 4485-v2.diff, 4485-v3.diff, 4485-v4.diff, 
 repro_bug-4485.diff


 After incorporating v11 of the 2856 fix, we discovered that we are still 
 having some ACID violations.
 This time, however, the problem is not about including newer updates; but, 
 about missing older updates
 that should be including. 
 Here is what seems to be happening.
 There is a race condition in the StoreScanner.getScanners()
   private ListKeyValueScanner getScanners(Scan scan,
   final NavigableSetbyte[] columns) throws IOException {
 // First the store file scanners
 ListStoreFileScanner sfScanners = StoreFileScanner
   .getScannersForStoreFiles(store.getStorefiles(), cacheBlocks,
 isGet, false);
 ListKeyValueScanner scanners =
   new ArrayListKeyValueScanner(sfScanners.size()+1);
 // include only those scan files which pass all filters
 for (StoreFileScanner sfs : sfScanners) {
   if (sfs.shouldSeek(scan, columns)) {
 scanners.add(sfs);
   }
 }
 // Then the memstore scanners
 if (this.store.memstore.shouldSeek(scan)) {
   scanners.addAll(this.store.memstore.getScanners());
 }
 return scanners;
   }
 If for example there is a call to Store.updateStorefiles() that happens 
 between
 the store.getStorefiles() and this.store.memstore.getScanners(); then
 it is possible that there was a new HFile created, that is not seen by the
 StoreScanner, and the data is not present in the Memstore.snapshot either.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4344) Persist memstoreTS to disk

2011-09-27 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amitanand Aiyer updated HBASE-4344:
---

Attachment: 4344-v12.txt

combine 4344-v11.txt with 4485-v4.txt

Running the testsuite (in progress)

 Persist memstoreTS to disk
 --

 Key: HBASE-4344
 URL: https://issues.apache.org/jira/browse/HBASE-4344
 Project: HBase
  Issue Type: Sub-task
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
 Fix For: 0.89.20100924

 Attachments: 4344-v10.txt, 4344-v11.txt, 4344-v12.txt, 4344-v2.txt, 
 4344-v4.txt, 4344-v5.txt, 4344-v6.txt, 4344-v7.txt, 4344-v8.txt, 4344-v9.txt, 
 patch-2


 Atomicity can be achieved in two ways -- (i) by using  a multiversion 
 concurrency system (MVCC), or (ii) by ensuring that new writes do not 
 complete, until the old reads complete.
 Currently, Memstore uses something along the lines of MVCC (called RWCC for 
 read-write-consistency-control). But, this mechanism is not incorporated for 
 the key-values written to the disk, as they do not include the memstore TS.
 Let us make the two approaches be similar, by persisting the memstoreTS along 
 with the key-value when it is written to the disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4448) HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility instances across unit tests

2011-09-27 Thread Doug Meil (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-4448:
-

Attachment: java_HBASE_4448_v2.patch

Fixed point #3.

 HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility 
 instances across unit tests
 -

 Key: HBASE-4448
 URL: https://issues.apache.org/jira/browse/HBASE-4448
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: HBaseTestingUtilityFactory.java, 
 hbase_hbaseTestingUtility_uses_2011_09_22.xlsx, java_HBASE_4448.patch, 
 java_HBASE_4448_v2.patch


 Setting up and tearing down HBaseTestingUtility instances in unit tests is 
 very expensive.  On my MacBook it takes about 10 seconds to set up a 
 MiniCluster, and 7 seconds to tear it down.  When multiplied by the number of 
 test classes that use this facility, that's a lot of time in the build.
 This factory assumes that the JVM is being re-used across test classes in the 
 build, otherwise this pattern won't work. 
 I don't think this is appropriate for every use, but I think it can be 
 applicable in a great many cases - especially where developers just want a 
 simple MiniCluster with 1 slave.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions


[ 
https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115592#comment-13115592
 ] 

jirapos...@reviews.apache.org commented on HBASE-3446:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2097
---



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4761

I was trying to minimize how many times we do System.currentTimeMillis



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4762

Agreed



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4764

Will fix w/ a comment.  If timeout  is 0, then we do not timeout.  I should 
call it out explicitly.


- Michael


On 2011-09-27 06:38:09, Michael Stack wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2065/
bq.  ---
bq.  
bq.  (Updated 2011-09-27 06:38:09)
bq.  
bq.  
bq.  Review request for hbase and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Make the Meta* operations against meta retry.  We do it by using HTable 
instances.
bq.  (HTable calls HConnection.getRegionServerWithRetries for get, put, scan 
etc).
bq.  In 0.89, we had special RetryableMetaOperation class that was a
bq.  subclass of Callable which reproduced the guts of 
HConnection.getRegionServerWithRetries
bq.  with its retry loop.  Now we just use HTable instead (Costs some on setup 
but
bq.  otherwise, we avoid duplicating code).  Upped the retries on serverside 
too.
bq.  
bq.  Had problem with CatalogJanitor.  MetaReader and MetaEditor were relying
bq.  heavily on CT methods getting proxy connections to meta and root servers.
bq.  CT needs to be cut back.  This patch closes down access on (unused) public
bq.  methods and removes being able to get an HRegionInterface on meta and root
bq.  -- this stuff is used internally to CT only now; use MetaEditor or
bq.  MetaReader if you want to update or read catalog tables.  Opening new issue
bq.  to cutback CT use over the code base.
bq.  
bq.  A little off topic but couldn't help it since was in MetaReader and 
MetaEditor
bq.  trying to clean them up, I ended up moving meta migration code out to its
bq.  own class rather than have it in all inside in MetaEditor.
bq.  
bq.  Here is some detail to help reviews.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq.Clean up.  Shutdown access on some of these unused methods.  Don't
bq.let out HRegionInterface instances in particular since we are going
bq.away from raw HRI use to instead use a connection with retries:
bq.i.e. HTable.
bq.  
bq.Comments on state of this class. Javadoc edits.
bq.getZooKeeperWatcher on HConnection is deprecated so don't use it
bq.in constructor.  Override MetaNodeTracker and on node delete
bq.reset meta location (We used to do this over in MetaNodeTracker
bq.but to do that we had to have a CatalogTracker over in zk package
bq.which is silly -- bad package encapsulation).
bq.  
bq.(waitForRootServer) Renamed getRootServerConnection and change it
bq.from public to package private.
bq.(waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq.(getMetaServerConnection) Change from public to package private.
bq.Use MetaReader to read the meta location in root rather than a
bq.raw HRegionInterface so we get retrying.
bq.(remaining, timedout) Added utility methods.
bq.(waitForMetaServer) Changed from public to private.
bq.(resetMetaLocation) Made it synchronized on metaAvailable.
bq.Not all accesses were synchronized.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq.Refactor to use HTable instead of raw HRegionInterface so we get
bq.retrying.  For each operation we get an HTable, use it, then close it.
bq.(putToMetaTable, putsToMetaTable, etc) Utility methods.
bq.(updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq.class since these classes are for a one-time migration only.
bq.  
bq.  A 
src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
bq.New class that holds all Meta* methods updating meta table used
bq.doing the one-time migration done to meta on startup.  This class
bq.is marked deprecated because its going to be dropped in 0.94.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
bq.Retrofit methods in here to use fullScan methods with

[jira] [Assigned] (HBASE-4496) HFile V2 does not honor setCacheBlocks when scanning.

2011-09-27 Thread Lars Hofhansl (Assigned) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl reassigned HBASE-4496:


Assignee: Lars Hofhansl

 HFile V2 does not honor setCacheBlocks when scanning.
 -

 Key: HBASE-4496
 URL: https://issues.apache.org/jira/browse/HBASE-4496
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0, 0.94.0


 While testing the LRU cache during the scanning I noticed quite some churn in 
 the cache even when Scan.cacheBlocks is set to false. After debugging this, I 
 found that HFile V2 always caches blocks in the LRU cache regardless of the 
 cacheBlocks setting.
 Here's a trace (from Eclipse) showing the problem:
 HFileReaderV2.readBlock(long, int, boolean, boolean, boolean) line: 279   
 HFileReaderV2.readBlockData(long, long, int, boolean) line: 219   
 HFileBlockIndex$BlockIndexReader.seekToDataBlock(byte[], int, int, 
 HFileBlock) line: 191  
 HFileReaderV2$ScannerV2.seekTo(byte[], int, int, boolean) line: 502   
 HFileReaderV2$ScannerV2.reseekTo(byte[], int, int) line: 539  
 StoreFileScanner.reseekAtOrAfter(HFileScanner, KeyValue) line: 151
 StoreFileScanner.reseek(KeyValue) line: 110   
 KeyValueHeap.reseek(KeyValue) line: 255   
 StoreScanner.reseek(KeyValue) line: 409   
 StoreScanner.next(ListKeyValue, int) line: 304  
 KeyValueHeap.next(ListKeyValue, int) line: 114  
 KeyValueHeap.next(ListKeyValue) line: 143   
 HRegion$RegionScannerImpl.nextRow(byte[]) line: 2774  
 HRegion$RegionScannerImpl.nextInternal(int) line: 2722
 HRegion$RegionScannerImpl.next(ListKeyValue, int) line: 2682
 HRegion$RegionScannerImpl.next(ListKeyValue) line: 2699 
 HRegionServer.next(long, int) line: 2092  
 Every scanner.next causes a reseek, which eventually causes a call to 
 HFileBlockIndex$BlockIndexReader.seekToDataBlock(...) at which point the 
 cacheBlocks information is lost. HFileReaderV2.readBlockData calls 
 HFileReaderV2.readBlock with cacheBlocks set unconditionally to true.
 The fix is not immediately clear, unless we want to pass cacheBlocks to 
 HFileBlockIndex$BlockIndexReader.seekToDataBlock and then on to 
 HFileBlock.BasicReader.readBlockData and all its implementers, which is ugly 
 as readBlockData should not care about caching.
 Avoiding caching during scans is somewhat important for us.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3025) Coprocessor based simple access control

2011-09-27 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115708#comment-13115708
 ] 

jirapos...@reviews.apache.org commented on HBASE-3025:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2041/#review2077
---


Looks good. The majority of my comments have to do with inconsistent logging 
practice.


security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessControlFilter.java
https://reviews.apache.org/r/2041/#comment4718

Could be stated better.



security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessControlLists.java
https://reviews.apache.org/r/2041/#comment4719

No.



security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessControlLists.java
https://reviews.apache.org/r/2041/#comment4720

Comment needs updating.




security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java
https://reviews.apache.org/r/2041/#comment4721

Can we make this 1?



security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java
https://reviews.apache.org/r/2041/#comment4775

Debug logging should go to LOG not AUDITLOG



security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java
https://reviews.apache.org/r/2041/#comment4782

Should be INFO or TRACE level? TRACE makes more sense to me.



security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java
https://reviews.apache.org/r/2041/#comment4776

Debug logging should go to LOG not AUDITLOG



security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java
https://reviews.apache.org/r/2041/#comment4779

Should be INFO or TRACE level? TRACE makes more sense to me.



security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java
https://reviews.apache.org/r/2041/#comment4795

Should something go to AUDITLOG here?



security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java
https://reviews.apache.org/r/2041/#comment4797

Should hasFamilyQualifierPermission log to AUDITLOG? It is used in places 
to make decisions -- an exception is thrown directly or not.



security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java
https://reviews.apache.org/r/2041/#comment4783

Another one of these was sent to AUDITLOG above. Do the same here? Should 
be INFO or TRACE level? TRACE makes more sense to me.



security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java
https://reviews.apache.org/r/2041/#comment4786

Ultimately users should be allowed to enable or disable their own tables, 
but only after such operations don't carry as much systemic risk as they do 
currently.

In that case, CREATE permission and an ownership check could follow the 
test for ADMIN permission.



security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java
https://reviews.apache.org/r/2041/#comment4787

As above



security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java
https://reviews.apache.org/r/2041/#comment4791

Should be logged with ERROR?



security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java
https://reviews.apache.org/r/2041/#comment4799

Would it be clearer then to call permissionGranted() something like 
hasColumnsPermission() ? Just a random thought.



security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java
https://reviews.apache.org/r/2041/#comment4803

Should this go to AUDITLOG? At INFO or TRACE level? My preference is TRACE.



security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java
https://reviews.apache.org/r/2041/#comment4804

Should this go to AUDITLOG? At INFO or TRACE level? My preference is TRACE.



security/src/main/java/org/apache/hadoop/hbase/security/rbac/Permission.java
https://reviews.apache.org/r/2041/#comment4807

What if instead we check for version 0 and throw an 
IllegalArgumentException if so? Technically, it is an invalid request if it 
contains an unrecognizable action code. Skipping this check if version  0 
would be a way to handle new perms while not accepting incorrect input 
otherwise.



security/src/main/java/org/apache/hadoop/hbase/security/rbac/TableAuthManager.java
https://reviews.apache.org/r/2041/#comment4813

Maybe we can call this .auth.? We don't really have an RBAC 
implementation yet. Likewise for the package name for all of this stuff? Just a 
random thought.



security/src/main/java/org/apache/hadoop/hbase/security/rbac/TableAuthManager.java
https://reviews.apache.org/r/2041/#comment4815

Isn't this an error?

[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

2011-09-27 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115728#comment-13115728
 ] 

jirapos...@reviews.apache.org commented on HBASE-3446:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2099
---



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4835

The condition, now  stopTime, is reversed for isTimedOut().


- Ted


On 2011-09-27 06:38:09, Michael Stack wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2065/
bq.  ---
bq.  
bq.  (Updated 2011-09-27 06:38:09)
bq.  
bq.  
bq.  Review request for hbase and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Make the Meta* operations against meta retry.  We do it by using HTable 
instances.
bq.  (HTable calls HConnection.getRegionServerWithRetries for get, put, scan 
etc).
bq.  In 0.89, we had special RetryableMetaOperation class that was a
bq.  subclass of Callable which reproduced the guts of 
HConnection.getRegionServerWithRetries
bq.  with its retry loop.  Now we just use HTable instead (Costs some on setup 
but
bq.  otherwise, we avoid duplicating code).  Upped the retries on serverside 
too.
bq.  
bq.  Had problem with CatalogJanitor.  MetaReader and MetaEditor were relying
bq.  heavily on CT methods getting proxy connections to meta and root servers.
bq.  CT needs to be cut back.  This patch closes down access on (unused) public
bq.  methods and removes being able to get an HRegionInterface on meta and root
bq.  -- this stuff is used internally to CT only now; use MetaEditor or
bq.  MetaReader if you want to update or read catalog tables.  Opening new issue
bq.  to cutback CT use over the code base.
bq.  
bq.  A little off topic but couldn't help it since was in MetaReader and 
MetaEditor
bq.  trying to clean them up, I ended up moving meta migration code out to its
bq.  own class rather than have it in all inside in MetaEditor.
bq.  
bq.  Here is some detail to help reviews.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq.Clean up.  Shutdown access on some of these unused methods.  Don't
bq.let out HRegionInterface instances in particular since we are going
bq.away from raw HRI use to instead use a connection with retries:
bq.i.e. HTable.
bq.  
bq.Comments on state of this class. Javadoc edits.
bq.getZooKeeperWatcher on HConnection is deprecated so don't use it
bq.in constructor.  Override MetaNodeTracker and on node delete
bq.reset meta location (We used to do this over in MetaNodeTracker
bq.but to do that we had to have a CatalogTracker over in zk package
bq.which is silly -- bad package encapsulation).
bq.  
bq.(waitForRootServer) Renamed getRootServerConnection and change it
bq.from public to package private.
bq.(waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq.(getMetaServerConnection) Change from public to package private.
bq.Use MetaReader to read the meta location in root rather than a
bq.raw HRegionInterface so we get retrying.
bq.(remaining, timedout) Added utility methods.
bq.(waitForMetaServer) Changed from public to private.
bq.(resetMetaLocation) Made it synchronized on metaAvailable.
bq.Not all accesses were synchronized.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq.Refactor to use HTable instead of raw HRegionInterface so we get
bq.retrying.  For each operation we get an HTable, use it, then close it.
bq.(putToMetaTable, putsToMetaTable, etc) Utility methods.
bq.(updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq.class since these classes are for a one-time migration only.
bq.  
bq.  A 
src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
bq.New class that holds all Meta* methods updating meta table used
bq.doing the one-time migration done to meta on startup.  This class
bq.is marked deprecated because its going to be dropped in 0.94.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
bq.Retrofit methods in here to use fullScan methods with Visitor.
bq.(getCatalogRegionInterface, getCatalogRegionNameForTable,
bq.  getCatalogRegionNameForRegion) Removed.
bq.(fullScan) Cleaned up the fullScans.  Fixed up wrong javadoc.
bq.(fullScanOfResults) Renamed as fullScan override.
bq.(fullScanOfRoot) Added as deprecated. We should be doing
bq.this against zk.
bq.(metaRowToRegionPair,

[jira] [Issue Comment Edited] (HBASE-4492) TestRollingRestart fails intermittently

2011-09-27 Thread Ted Yu (Issue Comment Edited) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115249#comment-13115249
 ] 

Ted Yu edited comment on HBASE-4492 at 9/27/11 5:26 PM:


Found the following in output for the above timeout case:
{code}
2011-09-27 05:28:59,047 DEBUG 
[RegionServer:3;us.ciq.com,57539,1317101335695-EventThread] 
zookeeper.ZooKeeperWatcher(233): regionserver:57539-0x132a95afa18000d Received 
ZooKeeper Event, type=NodeDeleted, state=SyncConnected, 
path=/hbase/root-region-server
2011-09-27 05:28:59,047 DEBUG 
[RegionServer:3;us.ciq.com,58748,131710132-EventThread] 
zookeeper.ZKUtil(226): regionserver:58748-0x132a95afa18000a 
/hbase/root-region-server does not exist. Watcher is set.
2011-09-27 05:28:59,047 DEBUG 
[Master:0;us.ciq.com,56327,1317101304726-EventThread] zookeeper.ZKUtil(226): 
hconnection-0x132a95afa180005 /hbase/root-region-server does not exist. Watcher 
is set.
2011-09-27 05:28:59,048 DEBUG [Thread-1-EventThread] 
zookeeper.ZooKeeperWatcher(233): master:51567-0x132a95afa180008 Received 
ZooKeeper Event, type=NodeChildrenChanged, state=SyncConnected, 
path=/hbase/unassigned
2011-09-27 05:28:59,048 DEBUG 
[RegionServer:3;us.ciq.com,57539,1317101335695-EventThread] 
zookeeper.ZKUtil(226): regionserver:57539-0x132a95afa18000d 
/hbase/root-region-server does not exist. Watcher is set.
2011-09-27 05:28:59,049 INFO  
[MASTER_META_SERVER_OPERATIONS-us.ciq.com,51567,1317101319637-3] 
master.AssignmentManager(1485): No previous transition plan was found (or we 
are ignoring an existing plan) for -ROOT-,,0.70236052 so generated a random 
one; hri=-ROOT-,,0.70236052, src=, dest=us.ciq.com,57500,1317101330748; 3 
(online=3, exclude=null) available servers
2011-09-27 05:28:59,049 INFO  
[MASTER_META_SERVER_OPERATIONS-us.ciq.com,51567,1317101319637-3] 
master.AssignmentManager(1485): Assigning region -ROOT-,,0.70236052 to 
us.ciq.com,57500,1317101330748
2011-09-27 05:28:59,049 DEBUG 
[MASTER_META_SERVER_OPERATIONS-us.ciq.com,51567,1317101319637-3] 
master.ServerManager(448): New connection to us.ciq.com,57500,1317101330748
2011-09-27 05:28:59,049 DEBUG [Thread-1-EventThread] zookeeper.ZKUtil(224): 
master:51567-0x132a95afa180008 Set watcher on existing znode 
/hbase/unassigned/70236052
2011-09-27 05:28:59,049 FATAL 
[MASTER_META_SERVER_OPERATIONS-us.ciq.com,51567,1317101319637-3] 
master.HMaster(1181): Master server abort: loaded coprocessors are: []
2011-09-27 05:28:59,050 FATAL 
[MASTER_META_SERVER_OPERATIONS-us.ciq.com,51567,1317101319637-3] 
master.HMaster(1186): Unexpected state trying to OFFLINE; -ROOT-,,0.70236052 
state=PENDING_OPEN, ts=1317101339049, server=us.ciq.com,57500,1317101330748
java.lang.IllegalStateException
at 
org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1517)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1392)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1169)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1144)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1139)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:1816)
at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:105)
at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRootWithRetries(ServerShutdownHandler.java:123)
at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:186)
{code}

  was (Author: yuzhih...@gmail.com):
Found the following in output for the above timeout case:
{code}
2011-09-27 05:28:59,047 DEBUG 
[RegionServer:3;us.ciq.com,57539,1317101335695-EventThread] 
zookeeper.ZooKeeperWatcher(233): regionserver:57539-0x132a95afa18000d Received 
ZooKeeper Event, type=NodeDeleted, state=SyncConnected, 
path=/hbase/root-region-server2011-09-27 05:28:59,047 DEBUG 
[RegionServer:3;us.ciq.com,58748,131710132-EventThread] 
zookeeper.ZKUtil(226): regionserver:58748-0x132a95afa18000a 
/hbase/root-region-server does not exist. Watcher is set.2011-09-27 
05:28:59,047 DEBUG [Master:0;us.ciq.com,56327,1317101304726-EventThread] 
zookeeper.ZKUtil(226): hconnection-0x132a95afa180005 /hbase/root-region-server 
does not exist. Watcher is set.2011-09-27 05:28:59,048 DEBUG 
[Thread-1-EventThread] zookeeper.ZooKeeperWatcher(233): 
master:51567-0x132a95afa180008 Received ZooKeeper Event, 
type=NodeChildrenChanged, state=SyncConnected, path=/hbase/unassigned2011-09-27 
05:28:59,048 DEBUG [RegionServer:3;us.ciq.com,57539,1317101335695-EventThread] 
zookeeper.ZKUtil(226): regionserver:57539-0x132a95afa18000d 
/hbase/root-region-server does not exist.

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2011-09-27 Thread Andrew Purtell (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115743#comment-13115743
]

Andrew Purtell commented on HBASE-4433:
---

According to my tests, this is safe to do on 0.92 and 0.90 branches as well.
This change should be applied there.

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
Fix For: 0.94.0

[Noticed this in 89, but quite likely true of trunk as well.]
When we are done with the requested column(s) the code still does an extra
next() call before it realizes that it is actually done. This extra next()
call could potentially result in an unnecessary extra block load. This is
likely to be especially bad for CFs where the KVs are large blobs where each
KV may be occupying a block of its own. So the next() can often load a new
unrelated block unnecessarily.
--
For the simple case of reading say the top-most column in a row in a single
file, where each column (KV) was say a block of its own-- it seems that we
are reading 3 blocks, instead of 1 block!
I am working on a simple patch and with that the number of seeks is down to
2.
[There is still an extra seek left. I think there were two levels of
extra/unnecessary next() we were doing without actually confirming that the
next was needed. One at the StoreScanner/ScanQueryMatcher level which this
diff avoids. I think the other is at hfs.next() (at the storefile scanner
level) that's happening whenever a HFile scanner servers out a data-- and
perhaps that's the additional seek that we need to avoid. But I want to
tackle this optimization first as the two issues seem unrelated.]
--
The basic idea of the patch I am working on/testing is as follows. The
ExplicitColumnTracker currently returns INCLUDE to the ScanQueryMatcher if
the KV needs to be included and then if done, only in the the next call it
returns the appropriate SEEK_NEXT_COL or SEEK_NEXT_ROW hint. For the cases
when ExplicitColumnTracker knows it is done with a particular column/row, the
patch attempts to combine the INCLUDE code and done hint into a single match
code-- INCLUDE_AND_SEEK_NEXT_COL and INCLUDE_AND_SEEK_NEXT_ROW.

[jira] [Created] (HBASE-4497) If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE

2011-09-27 Thread ramkrishna.s.vasudevan (Created) (JIRA)

If region opening fails after updating META HBCK reports it as inconsistent and 
scanning the region throws NSRE
---

 Key: HBASE-4497
 URL: https://issues.apache.org/jira/browse/HBASE-4497
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Priority: Critical


As per the discussion in the mail chain HBCK reporting of possible mismatch in 
RS assignment this JIRA is created.
Consider two RS- RS1 and RS2.
A region tries to open in RS1. But it takes a while.  The RS1 has still not 
updated meta and transitioned the node from OPENING to OPENED
So timeout assigns the region to RS2.  RS2 successfully updates the META and 
opens the region.
Now RS1 tries to act on the region by first updating the META and then 
transiting the node to OPENING to OPENED.

RS1 transiting the node to OPENING to OPENED will fail.  But the META entry 
will have RS1 as the latest.
Now HBCK reports this as an inconsistency and if we try to scan the Region we 
get NotServingRegionException.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4496) HFile V2 does not honor setCacheBlocks when scanning.


[ 
https://issues.apache.org/jira/browse/HBASE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115775#comment-13115775
 ] 

Lars Hofhansl commented on HBASE-4496:
--

In a simple test this shaves about 20% of scanning time (with caching switched 
off), in addition to avoiding LRU and GC churn.

 HFile V2 does not honor setCacheBlocks when scanning.
 -

 Key: HBASE-4496
 URL: https://issues.apache.org/jira/browse/HBASE-4496
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0, 0.94.0


 While testing the LRU cache during the scanning I noticed quite some churn in 
 the cache even when Scan.cacheBlocks is set to false. After debugging this, I 
 found that HFile V2 always caches blocks in the LRU cache regardless of the 
 cacheBlocks setting.
 Here's a trace (from Eclipse) showing the problem:
 HFileReaderV2.readBlock(long, int, boolean, boolean, boolean) line: 279   
 HFileReaderV2.readBlockData(long, long, int, boolean) line: 219   
 HFileBlockIndex$BlockIndexReader.seekToDataBlock(byte[], int, int, 
 HFileBlock) line: 191  
 HFileReaderV2$ScannerV2.seekTo(byte[], int, int, boolean) line: 502   
 HFileReaderV2$ScannerV2.reseekTo(byte[], int, int) line: 539  
 StoreFileScanner.reseekAtOrAfter(HFileScanner, KeyValue) line: 151
 StoreFileScanner.reseek(KeyValue) line: 110   
 KeyValueHeap.reseek(KeyValue) line: 255   
 StoreScanner.reseek(KeyValue) line: 409   
 StoreScanner.next(ListKeyValue, int) line: 304  
 KeyValueHeap.next(ListKeyValue, int) line: 114  
 KeyValueHeap.next(ListKeyValue) line: 143   
 HRegion$RegionScannerImpl.nextRow(byte[]) line: 2774  
 HRegion$RegionScannerImpl.nextInternal(int) line: 2722
 HRegion$RegionScannerImpl.next(ListKeyValue, int) line: 2682
 HRegion$RegionScannerImpl.next(ListKeyValue) line: 2699 
 HRegionServer.next(long, int) line: 2092  
 Every scanner.next causes a reseek, which eventually causes a call to 
 HFileBlockIndex$BlockIndexReader.seekToDataBlock(...) at which point the 
 cacheBlocks information is lost. HFileReaderV2.readBlockData calls 
 HFileReaderV2.readBlock with cacheBlocks set unconditionally to true.
 The fix is not immediately clear, unless we want to pass cacheBlocks to 
 HFileBlockIndex$BlockIndexReader.seekToDataBlock and then on to 
 HFileBlock.BasicReader.readBlockData and all its implementers, which is ugly 
 as readBlockData should not care about caching.
 Avoiding caching during scans is somewhat important for us.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4496) HFile V2 does not honor setCacheBlocks when scanning.


[ 
https://issues.apache.org/jira/browse/HBASE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115787#comment-13115787
 ] 

stack commented on HBASE-4496:
--

Good find Lars.

 HFile V2 does not honor setCacheBlocks when scanning.
 -

 Key: HBASE-4496
 URL: https://issues.apache.org/jira/browse/HBASE-4496
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0, 0.94.0


 While testing the LRU cache during the scanning I noticed quite some churn in 
 the cache even when Scan.cacheBlocks is set to false. After debugging this, I 
 found that HFile V2 always caches blocks in the LRU cache regardless of the 
 cacheBlocks setting.
 Here's a trace (from Eclipse) showing the problem:
 HFileReaderV2.readBlock(long, int, boolean, boolean, boolean) line: 279   
 HFileReaderV2.readBlockData(long, long, int, boolean) line: 219   
 HFileBlockIndex$BlockIndexReader.seekToDataBlock(byte[], int, int, 
 HFileBlock) line: 191  
 HFileReaderV2$ScannerV2.seekTo(byte[], int, int, boolean) line: 502   
 HFileReaderV2$ScannerV2.reseekTo(byte[], int, int) line: 539  
 StoreFileScanner.reseekAtOrAfter(HFileScanner, KeyValue) line: 151
 StoreFileScanner.reseek(KeyValue) line: 110   
 KeyValueHeap.reseek(KeyValue) line: 255   
 StoreScanner.reseek(KeyValue) line: 409   
 StoreScanner.next(ListKeyValue, int) line: 304  
 KeyValueHeap.next(ListKeyValue, int) line: 114  
 KeyValueHeap.next(ListKeyValue) line: 143   
 HRegion$RegionScannerImpl.nextRow(byte[]) line: 2774  
 HRegion$RegionScannerImpl.nextInternal(int) line: 2722
 HRegion$RegionScannerImpl.next(ListKeyValue, int) line: 2682
 HRegion$RegionScannerImpl.next(ListKeyValue) line: 2699 
 HRegionServer.next(long, int) line: 2092  
 Every scanner.next causes a reseek, which eventually causes a call to 
 HFileBlockIndex$BlockIndexReader.seekToDataBlock(...) at which point the 
 cacheBlocks information is lost. HFileReaderV2.readBlockData calls 
 HFileReaderV2.readBlock with cacheBlocks set unconditionally to true.
 The fix is not immediately clear, unless we want to pass cacheBlocks to 
 HFileBlockIndex$BlockIndexReader.seekToDataBlock and then on to 
 HFileBlock.BasicReader.readBlockData and all its implementers, which is ugly 
 as readBlockData should not care about caching.
 Avoiding caching during scans is somewhat important for us.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4497) If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE

2011-09-27 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115788#comment-13115788
]

stack commented on HBASE-4497:
--

Do we need to add an extra tickle of OPENING znode after open of region and
before we go to do meta update?

If region opening fails after updating META HBCK reports it as inconsistent
and scanning the region throws NSRE
---

Key: HBASE-4497
URL: https://issues.apache.org/jira/browse/HBASE-4497
Project: HBase
Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Priority: Critical

As per the discussion in the mail chain HBCK reporting of possible mismatch
in RS assignment this JIRA is created.
Consider two RS- RS1 and RS2.
A region tries to open in RS1. But it takes a while. The RS1 has still not
updated meta and transitioned the node from OPENING to OPENED
So timeout assigns the region to RS2. RS2 successfully updates the META and
opens the region.
Now RS1 tries to act on the region by first updating the META and then
transiting the node to OPENING to OPENED.
RS1 transiting the node to OPENING to OPENED will fail. But the META entry
will have RS1 as the latest.
Now HBCK reports this as an inconsistency and if we try to scan the Region we
get NotServingRegionException.

[jira] [Reopened] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2011-09-27 Thread stack (Reopened) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack reopened HBASE-4433:
--

Reopening until we apply to 0.92 and 0.90 too (as per Andrew). I'll do it
(stop me Jon if you are already at this).

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
Fix For: 0.94.0

[jira] [Resolved] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2011-09-27 Thread stack (Resolved) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack resolved HBASE-4433.
--

Resolution: Fixed
Fix Version/s: (was: 0.94.0)
0.92.0

Applied to 0.92 branch too. Didn't add to 0.90 because doesn't apply clean
(there are test files missing).

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
Fix For: 0.92.0

[jira] [Commented] (HBASE-4497) If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE

2011-09-27 Thread Jean-Daniel Cryans (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115795#comment-13115795
]

Jean-Daniel Cryans commented on HBASE-4497:
---

Stack, it seems that it's already the case:

{code}
if (tickleOpening(post_region_open)) {
if (updateMeta(region)) failed = false;
}
{code}

In any case, there's still a hole as those two operations aren't done in an
atomic fashion.

If region opening fails after updating META HBCK reports it as inconsistent
and scanning the region throws NSRE
---

Key: HBASE-4497
URL: https://issues.apache.org/jira/browse/HBASE-4497
Project: HBase
Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Priority: Critical

[jira] [Commented] (HBASE-4492) TestRollingRestart fails intermittently

2011-09-27 Thread Ming Ma (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115798#comment-13115798
 ] 

Ming Ma commented on HBASE-4492:


Here are the sequence of events how this could happen. It applies to any region.

T1. After AM sent openRegion RPC to RS1, RS1 set ZK state as OPENING.
T2. RS1 is stopped.
T3. ZK callbacked is delayed with ZK state OPENING. AM skip the processing 
given RS1 is offline. Thus region's state in AM is still PENDING_OPEN.
T4. During RS1's ServerShutDownHandler, it tries to assign the region by 
setting it offline first. This isn't allowed given the state isn't CLOSING or 
CLOSED.


 TestRollingRestart fails intermittently
 ---

 Key: HBASE-4492
 URL: https://issues.apache.org/jira/browse/HBASE-4492
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Jonathan Gray
 Attachments: 4492.txt


 I got the following when running test suite on TRUNK:
 {code}
 testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart)  
 Time elapsed: 300.28 sec   ERROR!
 java.lang.Exception: test timed out after 30 milliseconds
 at java.lang.Thread.sleep(Native Method)
 at 
 org.apache.hadoop.hbase.master.TestRollingRestart.waitForRSShutdownToStartAndFinish(TestRollingRestart.java:313)
 at 
 org.apache.hadoop.hbase.master.TestRollingRestart.testBasicRollingRestart(TestRollingRestart.java:210)
 {code}
 I ran TestRollingRestart#testBasicRollingRestart manually afterwards which 
 wiped out test output file for the failed test.
 Similar failure can be found on Jenkins:
 https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/19/testReport/junit/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4496) HFile V2 does not honor setCacheBlocks when scanning.

2011-09-27 Thread Lars Hofhansl (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-4496:
-

Attachment: 4496.txt

This fixes the problem for me.

TestHFileBlock, TestCacheOnWrite, TestHFileBlockIndex, and TestHFileWriterV2 
still pass.

It's not pretty, though. Lots of places where cacheBlock now needs to be passed 
around.
I erred on the side of caution, where is wasn't clear whether the block should 
be cached of not, I kept it being cached (as before).

Somebody with more knowledge about HFileReaderV2 should have a look. If there's 
something nicer to do, please let me know.
It is not clear to be how to write a test for this.

 HFile V2 does not honor setCacheBlocks when scanning.
 -

 Key: HBASE-4496
 URL: https://issues.apache.org/jira/browse/HBASE-4496
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0, 0.94.0

 Attachments: 4496.txt


 While testing the LRU cache during the scanning I noticed quite some churn in 
 the cache even when Scan.cacheBlocks is set to false. After debugging this, I 
 found that HFile V2 always caches blocks in the LRU cache regardless of the 
 cacheBlocks setting.
 Here's a trace (from Eclipse) showing the problem:
 HFileReaderV2.readBlock(long, int, boolean, boolean, boolean) line: 279   
 HFileReaderV2.readBlockData(long, long, int, boolean) line: 219   
 HFileBlockIndex$BlockIndexReader.seekToDataBlock(byte[], int, int, 
 HFileBlock) line: 191  
 HFileReaderV2$ScannerV2.seekTo(byte[], int, int, boolean) line: 502   
 HFileReaderV2$ScannerV2.reseekTo(byte[], int, int) line: 539  
 StoreFileScanner.reseekAtOrAfter(HFileScanner, KeyValue) line: 151
 StoreFileScanner.reseek(KeyValue) line: 110   
 KeyValueHeap.reseek(KeyValue) line: 255   
 StoreScanner.reseek(KeyValue) line: 409   
 StoreScanner.next(ListKeyValue, int) line: 304  
 KeyValueHeap.next(ListKeyValue, int) line: 114  
 KeyValueHeap.next(ListKeyValue) line: 143   
 HRegion$RegionScannerImpl.nextRow(byte[]) line: 2774  
 HRegion$RegionScannerImpl.nextInternal(int) line: 2722
 HRegion$RegionScannerImpl.next(ListKeyValue, int) line: 2682
 HRegion$RegionScannerImpl.next(ListKeyValue) line: 2699 
 HRegionServer.next(long, int) line: 2092  
 Every scanner.next causes a reseek, which eventually causes a call to 
 HFileBlockIndex$BlockIndexReader.seekToDataBlock(...) at which point the 
 cacheBlocks information is lost. HFileReaderV2.readBlockData calls 
 HFileReaderV2.readBlock with cacheBlocks set unconditionally to true.
 The fix is not immediately clear, unless we want to pass cacheBlocks to 
 HFileBlockIndex$BlockIndexReader.seekToDataBlock and then on to 
 HFileBlock.BasicReader.readBlockData and all its implementers, which is ugly 
 as readBlockData should not care about caching.
 Avoiding caching during scans is somewhat important for us.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions


[ 
https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115804#comment-13115804
 ] 

jirapos...@reviews.apache.org commented on HBASE-3446:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2106
---


Only part way done, will finish in the afternoon.  I like the idea though, good 
stuff stack.


src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4855

Supposed to read When meta is moved to zk?



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4857

this comment talks a lot about what is wrong but it's not clear to me what 
changes are actually made right now.  i see you say server-side only, but what 
do you propose instead?  (i imagine i will find out reading the rest of the 
diff)



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4858

update this javadoc a bit... it's missing conf and you might also add 
additional context to stuff like abortable (which now appears optional and 
falls back to the connection itself?)



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4859

when would one override this?



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4860

is the behavior of this method unchanged?  i guess now it returns before 
it's verified?  any specific reason for the name change?  (its behavior is 
definitely different from the old getRootServerConnection()).  is it to match 
the Meta method?



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4861

i thought no verification in CT?



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4862

this is public but one with specified timeout is private now?



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4863

eeek, good catch


- Jonathan


On 2011-09-27 06:38:09, Michael Stack wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2065/
bq.  ---
bq.  
bq.  (Updated 2011-09-27 06:38:09)
bq.  
bq.  
bq.  Review request for hbase and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Make the Meta* operations against meta retry.  We do it by using HTable 
instances.
bq.  (HTable calls HConnection.getRegionServerWithRetries for get, put, scan 
etc).
bq.  In 0.89, we had special RetryableMetaOperation class that was a
bq.  subclass of Callable which reproduced the guts of 
HConnection.getRegionServerWithRetries
bq.  with its retry loop.  Now we just use HTable instead (Costs some on setup 
but
bq.  otherwise, we avoid duplicating code).  Upped the retries on serverside 
too.
bq.  
bq.  Had problem with CatalogJanitor.  MetaReader and MetaEditor were relying
bq.  heavily on CT methods getting proxy connections to meta and root servers.
bq.  CT needs to be cut back.  This patch closes down access on (unused) public
bq.  methods and removes being able to get an HRegionInterface on meta and root
bq.  -- this stuff is used internally to CT only now; use MetaEditor or
bq.  MetaReader if you want to update or read catalog tables.  Opening new issue
bq.  to cutback CT use over the code base.
bq.  
bq.  A little off topic but couldn't help it since was in MetaReader and 
MetaEditor
bq.  trying to clean them up, I ended up moving meta migration code out to its
bq.  own class rather than have it in all inside in MetaEditor.
bq.  
bq.  Here is some detail to help reviews.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq.Clean up.  Shutdown access on some of these unused methods.  Don't
bq.let out HRegionInterface instances in particular since we are going
bq.away from raw HRI use to instead use a connection with retries:
bq.i.e. HTable.
bq.  
bq.Comments on state of this class. Javadoc edits.
bq.getZooKeeperWatcher on HConnection is deprecated so don't use it
bq.in constructor.  Override MetaNodeTracker and on node delete
bq.reset meta location (We used to do this over in MetaNodeTracker
bq.but to do that we had to have a CatalogTracker over in zk package
bq.which is silly -- bad package encapsulation).
bq.  
bq.(waitForRootServer) Renamed getRootServerConnection and change it
bq.from public

[jira] [Commented] (HBASE-4488) Store could miss rows during flush

2011-09-27 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115805#comment-13115805
 ] 

Lars Hofhansl commented on HBASE-4488:
--

@Jon, so you gonna commit this? :)


 Store could miss rows during flush
 --

 Key: HBASE-4488
 URL: https://issues.apache.org/jira/browse/HBASE-4488
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0, 0.94.0

 Attachments: 4488.txt


 While looking at HBASE-4344 I found that my change HBASE-4241 contains a 
 critical mistake:
 The while(scanner.next(kvs)) loop is incorrect and might miss the last edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4448) HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility instances across unit tests

2011-09-27 Thread Jesse Yates (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115815#comment-13115815
 ] 

Jesse Yates commented on HBASE-4448:


Comments on the latest patch:

In the usage, 
{code}
-  private final static HBaseTestingUtility TEST_UTIL = new 
HBaseTestingUtility();
+  private static HBaseTestingUtility TEST_UTIL = new HBaseTestingUtility();
{code}

Should probably not even assign anything, since it is done in setup.

Going back to Ted's comment,
{code}
 Integer i = mcUsageCount.get(tu);  
+if (i == null) {
+  i = ONE;
+} else {
+  int j = i.intValue() + 1;
+  i = new Integer(j);
+}
+mcUsageCount.put(tu, i);
+  }
{code}

This seems overly complex - just use autoboxing here. Maybe we should use 
specific names for what ONE and ZERO mean (e.g. ONE - USE_INCREMENT, ZERO - 
UNUSED).

In the UtilityCleaner,
{code}
+ try {
+while (true) {
{code}

could lead to some issues. We just need to make sure that the thread is a 
daemon thread when we start.


 HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility 
 instances across unit tests
 -

 Key: HBASE-4448
 URL: https://issues.apache.org/jira/browse/HBASE-4448
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: HBaseTestingUtilityFactory.java, 
 hbase_hbaseTestingUtility_uses_2011_09_22.xlsx, java_HBASE_4448.patch, 
 java_HBASE_4448_v2.patch


 Setting up and tearing down HBaseTestingUtility instances in unit tests is 
 very expensive.  On my MacBook it takes about 10 seconds to set up a 
 MiniCluster, and 7 seconds to tear it down.  When multiplied by the number of 
 test classes that use this facility, that's a lot of time in the build.
 This factory assumes that the JVM is being re-used across test classes in the 
 build, otherwise this pattern won't work. 
 I don't think this is appropriate for every use, but I think it can be 
 applicable in a great many cases - especially where developers just want a 
 simple MiniCluster with 1 slave.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4352) Apply version of hbase-4015 to branch

2011-09-27 Thread Todd Lipcon (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115816#comment-13115816
 ] 

Todd Lipcon commented on HBASE-4352:


Can anyone volunteer to do some serious cluster testing with this patch? eg 
load up 1000 regions per server on 5+ nodes, and do rolling restarts or rolling 
crashes?

 Apply version of hbase-4015 to branch
 -

 Key: HBASE-4352
 URL: https://issues.apache.org/jira/browse/HBASE-4352
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.5

 Attachments: HBASE-4352_0.90.patch, HBASE-4352_0.90_1.patch


 Consider adding a version of hbase-4015 to 0.90.  It changes HRegionInterface 
 so would need move change to end of the Interface and then test that it 
 doesn't break rolling restart.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions


[ 
https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115818#comment-13115818
 ] 

jirapos...@reviews.apache.org commented on HBASE-3446:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2107
---


Thanks for reviews Ted and Jon.  Will put up new patch when you fellas finish...


src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4864

@Jon True.  I opened another issue with suggested fix.  I should at least 
reference it in here.



src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4865

Nah.  Thats TODO.


- Michael


On 2011-09-27 06:38:09, Michael Stack wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2065/
bq.  ---
bq.  
bq.  (Updated 2011-09-27 06:38:09)
bq.  
bq.  
bq.  Review request for hbase and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Make the Meta* operations against meta retry.  We do it by using HTable 
instances.
bq.  (HTable calls HConnection.getRegionServerWithRetries for get, put, scan 
etc).
bq.  In 0.89, we had special RetryableMetaOperation class that was a
bq.  subclass of Callable which reproduced the guts of 
HConnection.getRegionServerWithRetries
bq.  with its retry loop.  Now we just use HTable instead (Costs some on setup 
but
bq.  otherwise, we avoid duplicating code).  Upped the retries on serverside 
too.
bq.  
bq.  Had problem with CatalogJanitor.  MetaReader and MetaEditor were relying
bq.  heavily on CT methods getting proxy connections to meta and root servers.
bq.  CT needs to be cut back.  This patch closes down access on (unused) public
bq.  methods and removes being able to get an HRegionInterface on meta and root
bq.  -- this stuff is used internally to CT only now; use MetaEditor or
bq.  MetaReader if you want to update or read catalog tables.  Opening new issue
bq.  to cutback CT use over the code base.
bq.  
bq.  A little off topic but couldn't help it since was in MetaReader and 
MetaEditor
bq.  trying to clean them up, I ended up moving meta migration code out to its
bq.  own class rather than have it in all inside in MetaEditor.
bq.  
bq.  Here is some detail to help reviews.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq.Clean up.  Shutdown access on some of these unused methods.  Don't
bq.let out HRegionInterface instances in particular since we are going
bq.away from raw HRI use to instead use a connection with retries:
bq.i.e. HTable.
bq.  
bq.Comments on state of this class. Javadoc edits.
bq.getZooKeeperWatcher on HConnection is deprecated so don't use it
bq.in constructor.  Override MetaNodeTracker and on node delete
bq.reset meta location (We used to do this over in MetaNodeTracker
bq.but to do that we had to have a CatalogTracker over in zk package
bq.which is silly -- bad package encapsulation).
bq.  
bq.(waitForRootServer) Renamed getRootServerConnection and change it
bq.from public to package private.
bq.(waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq.(getMetaServerConnection) Change from public to package private.
bq.Use MetaReader to read the meta location in root rather than a
bq.raw HRegionInterface so we get retrying.
bq.(remaining, timedout) Added utility methods.
bq.(waitForMetaServer) Changed from public to private.
bq.(resetMetaLocation) Made it synchronized on metaAvailable.
bq.Not all accesses were synchronized.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq.Refactor to use HTable instead of raw HRegionInterface so we get
bq.retrying.  For each operation we get an HTable, use it, then close it.
bq.(putToMetaTable, putsToMetaTable, etc) Utility methods.
bq.(updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq.class since these classes are for a one-time migration only.
bq.  
bq.  A 
src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
bq.New class that holds all Meta* methods updating meta table used
bq.doing the one-time migration done to meta on startup.  This class
bq.is marked deprecated because its going to be dropped in 0.94.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
bq.Retrofit methods in here to use fullScan methods with Visitor.
bq.(getCatalogRegionInterface, getCatalogRegionNameForTable,
bq.

[jira] [Commented] (HBASE-4245) Cluster hangs if RS serving root fails during startup sequence

2011-09-27 Thread Todd Lipcon (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115819#comment-13115819
 ] 

Todd Lipcon commented on HBASE-4245:


Yep, I agree that Ming's patch for 4455 fixes this.

 Cluster hangs if RS serving root fails during startup sequence
 --

 Key: HBASE-4245
 URL: https://issues.apache.org/jira/browse/HBASE-4245
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.3
Reporter: Todd Lipcon
Assignee: Ming Ma

 On a large-ish cluster, the following sequence of events was seen to happen:
 - master started, ROOT and META were both unassigned
 - ROOT is assigned to rs01
 - META is assigned to rs02
 - Upon open of META, it writes its location into ROOT on rs01
 - rs01 crashes while appending to its HLog due to some other bug
 - rs02 fails the region open sequence
 - master notices that rs01 has crashed, and enqueues a ServerShutdownHandler
 - ServerShutdownHandler blocks on CatalogTracker.waitForMeta() since ROOT and 
 META are not assigned yet
 - master times out assignment of META, but never succeeds because ROOT 
 location is still marked as rs01
 This causes the cluster to never start up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4192) Optimize HLog for Throughput Using Delayed RPCs

2011-09-27 Thread Todd Lipcon (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115824#comment-13115824
 ] 

Todd Lipcon commented on HBASE-4192:


That sounds easier to review, thanks for the suggestion, dhruba.

 Optimize HLog for Throughput Using Delayed RPCs
 ---

 Key: HBASE-4192
 URL: https://issues.apache.org/jira/browse/HBASE-4192
 Project: HBase
  Issue Type: New Feature
  Components: wal
Affects Versions: 0.92.0
Reporter: Vlad Dogaru
Priority: Minor

 Introduce a new HLog configuration parameter (batchEntries) for more 
 aggressive batching of appends.  If this is enabled, HLog appends are not 
 written to the HLog writer immediately, but batched and written either 
 periodically or when a sync is requested.  Because sync times become larger, 
 they use delayed RPCs to free up RPC handler threads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4352) Apply version of hbase-4015 to branch


[ 
https://issues.apache.org/jira/browse/HBASE-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115822#comment-13115822
 ] 

stack commented on HBASE-4352:
--

I signed up to check we don't break rolling restart.  Let me get that done 
first.  Will report back if can do what you ask above (I'm trying to test 205 
as background task, could do two things at once).

 Apply version of hbase-4015 to branch
 -

 Key: HBASE-4352
 URL: https://issues.apache.org/jira/browse/HBASE-4352
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.5

 Attachments: HBASE-4352_0.90.patch, HBASE-4352_0.90_1.patch


 Consider adding a version of hbase-4015 to 0.90.  It changes HRegionInterface 
 so would need move change to end of the Interface and then test that it 
 doesn't break rolling restart.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4488) Store could miss rows during flush

2011-09-27 Thread stack (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4488:
-

Status: Patch Available  (was: Open)

Marking patch available.

 Store could miss rows during flush
 --

 Key: HBASE-4488
 URL: https://issues.apache.org/jira/browse/HBASE-4488
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0, 0.94.0

 Attachments: 4488.txt


 While looking at HBASE-4344 I found that my change HBASE-4241 contains a 
 critical mistake:
 The while(scanner.next(kvs)) loop is incorrect and might miss the last edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4497) If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE


[ 
https://issues.apache.org/jira/browse/HBASE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115828#comment-13115828
 ] 

stack commented on HBASE-4497:
--

Thanks J-D.  Thats what I was too lazy to looksee for myself.  Looks like we 
are doing enough tickling.  Weird that timeout monitor can cut in, region can 
be assigned elsewhere AND successfully update meta before this comes back.  
Here is from Rams email up on list earlier with log snippets:

{code}
RS1
===
2011-09-23 22:34:34,000 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: addToOnlineRegions is
doneREGION = {NAME =
't5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9.', TableName = 't5',
STARTKEY = '', ENDKEY = '', ENCODED = 2d06b3ca4d398ec96920ae86441a68c9,}
2011-09-23 22:34:34,009 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
Updated row t5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9. in region
.META.,,1 with serverName=linux76,60020,1316796517682
2011-09-23 22:34:34,009 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Done with post open
deploy taks for region=t5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9.,
daughter=false
2011-09-23 22:34:34,009 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:60020-0x1328ceaa1ff0037 Attempting to transition node
2d06b3ca4d398ec96920ae86441a68c9 from RS_ZK_REGION_OPENING to
RS_ZK_REGION_OPENED
2011-09-23 22:34:34,038 WARN
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Completed
the OPEN of region t5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9. but
when transitioning from  OPENING to OPENED got a version mismatch, someone
else clashed so now unassigning -- closing region
2011-09-23 22:34:34,038 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Closing t5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9.: disabling
compactions  flushes
2011-09-23 22:34:34,038 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Updates disabled for region
t5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9.
2011-09-23 22:34:34,038 DEBUG org.apache.hadoop.hbase.regionserver.Store:
closed f5
2011-09-23 22:34:34,038 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed t5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9.

RS2
===
2011-09-23 22:33:56,546 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:60020-0x1328ceaa1ff0039 Successfully transitioned node
2d06b3ca4d398ec96920ae86441a68c9 from RS_ZK_REGION_OPENING to
RS_ZK_REGION_OPENING
2011-09-23 22:33:56,845 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks
for region=t5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9.,
daughter=false
2011-09-23 22:33:56,845 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: addToOnlineRegions is
doneREGION = {NAME =
't5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9.', TableName = 't5',
STARTKEY = '', ENDKEY = '', ENCODED = 2d06b3ca4d398ec96920ae86441a68c9,}
2011-09-23 22:33:56,856 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
Updated row t5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9. in region
.META.,,1 with serverName=linux146,60020,1316796499216
2011-09-23 22:33:56,856 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Done with post open
deploy taks for region=t5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9.,
daughter=false
2011-09-23 22:33:58,887 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:60020-0x1328ceaa1ff0039 Attempting to transition node
2d06b3ca4d398ec96920ae86441a68c9 from RS_ZK_REGION_OPENING to
RS_ZK_REGION_OPENED
2011-09-23 22:33:58,893 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:60020-0x1328ceaa1ff0039 Successfully transitioned node
2d06b3ca4d398ec96920ae86441a68c9 from RS_ZK_REGION_OPENING to
RS_ZK_REGION_OPENED
2011-09-23 22:33:58,893 DEBUG
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened
t5,,1316797380065.2d06b3ca4d398ec96920ae86441a68c9.
{code}

 If region opening fails after updating META HBCK reports it as inconsistent 
 and scanning the region throws NSRE
 ---

 Key: HBASE-4497
 URL: https://issues.apache.org/jira/browse/HBASE-4497
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Priority: Critical

 As per the discussion in the mail chain HBCK reporting of possible mismatch 
 in RS assignment this JIRA is created.
 Consider two RS- RS1 and RS2.
 A region tries to open in RS1. But it takes a while.  The RS1 has still not 
 updated meta and transitioned the node from OPENING to OPENED
 So timeout assigns the region to RS2.  RS2 successfully updates the META and 
 opens the region.
 Now RS1 tries to act on the region by first updating the META and then 
 transiting the node to OPENING to OPENED.
 RS1 transiting the node to OPENING to OPENED

[jira] [Commented] (HBASE-4494) AvroServer:: get fails with NPE on a non-existent row


[ 
https://issues.apache.org/jira/browse/HBASE-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115833#comment-13115833
 ] 

stack commented on HBASE-4494:
--

Are you back Kay Kay?

 AvroServer:: get fails with NPE on a non-existent row
 -

 Key: HBASE-4494
 URL: https://issues.apache.org/jira/browse/HBASE-4494
 Project: HBase
  Issue Type: Bug
  Components: avro
Affects Versions: 0.90.4
Reporter: Kay Kay
Assignee: Kay Kay
 Fix For: 0.90.5

 Attachments: HBASE-4494.patch


 Try to submit a get request to the avro gateway. 
 If the row specified for a given table does not exist, the server request 
 fails with a NPE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2011-09-27 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115867#comment-13115867
]

Jonathan Gray commented on HBASE-4433:
--

Is this not strictly an improvement/feature? It seems like it doesn't belong
in stable branches :)

avoid extra next (potentially a seek) if done with column/row
-

Key: HBASE-4433
URL: https://issues.apache.org/jira/browse/HBASE-4433
Project: HBase
Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
Fix For: 0.92.0

[jira] [Commented] (HBASE-3025) Coprocessor based simple access control


[ 
https://issues.apache.org/jira/browse/HBASE-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115868#comment-13115868
 ] 

jirapos...@reviews.apache.org commented on HBASE-3025:
--



bq.  On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.   
security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java,
 line 98
bq.   https://reviews.apache.org/r/2041/diff/1/?file=45404#file45404line98
bq.  
bq.   Can we make this 1?

sure


bq.  On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.   
security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java,
 line 192
bq.   https://reviews.apache.org/r/2041/diff/1/?file=45404#file45404line192
bq.  
bq.   Debug logging should go to LOG not AUDITLOG

The idea was that all authorization decisions should be separated into audit 
log.  Here we're allowing access, so AUDITLOG seemed to make sense.  I agree 
that this still needs to be cleaned up a lot.  Maybe all audit logging should 
be done up in requirePermission() with authorization result?  At the very least 
we need a consistent format and consistent logging levels for messages (trace, 
right?).


bq.  On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.   
security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java,
 line 200
bq.   https://reviews.apache.org/r/2041/diff/1/?file=45404#file45404line200
bq.  
bq.   Should be INFO or TRACE level? TRACE makes more sense to me.

Sure, can use trace for all audit log decisions.


bq.  On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.   
security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java,
 line 208
bq.   https://reviews.apache.org/r/2041/diff/1/?file=45404#file45404line208
bq.  
bq.   Debug logging should go to LOG not AUDITLOG

This is an authorization decision since we're returning true below.  We can 
make this trace level, and improve the format, but I think AUDITLOG (if 
enabled) should contain a single message per request on why the request was 
allowed or denied.


bq.  On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.   
security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java,
 line 274
bq.   https://reviews.apache.org/r/2041/diff/1/?file=45404#file45404line274
bq.  
bq.   Should be INFO or TRACE level? TRACE makes more sense to me.

will change to trace.


bq.  On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.   
security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java,
 line 354
bq.   https://reviews.apache.org/r/2041/diff/1/?file=45404#file45404line354
bq.  
bq.   Should something go to AUDITLOG here?

Failure should already have been recorded in AUDITLOG via logDenied().  Agree 
that moving AUDITLOG messages up here with consistent format would be clearer, 
but will require some restructuring of return value from permissionGranted() so 
that some context specific reason can be pulled back up for logging.


bq.  On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.   
security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java,
 line 366
bq.   https://reviews.apache.org/r/2041/diff/1/?file=45404#file45404line366
bq.  
bq.   Should hasFamilyQualifierPermission log to AUDITLOG? It is used in 
places to make decisions -- an exception is thrown directly or not.

Yes, agree, we should either log to AUDITLOG at decision points here or 
consistently move the AUDITLOG logging up a level out of permissionGranted() 
and hasFamilyQualifierPermission().


bq.  On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.   
security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java,
 line 375
bq.   https://reviews.apache.org/r/2041/diff/1/?file=45404#file45404line375
bq.  
bq.   Another one of these was sent to AUDITLOG above. Do the same here? 
Should be INFO or TRACE level? TRACE makes more sense to me.

Agree, should go to AUDITLOG at trace.


bq.  On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.   
security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java,
 line 590
bq.   https://reviews.apache.org/r/2041/diff/1/?file=45404#file45404line590
bq.  
bq.   Should be logged with ERROR?

sure


bq.  On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.   
security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java,
 line 856
bq.   https://reviews.apache.org/r/2041/diff/1/?file=45404#file45404line856
bq.  
bq.   Should this go to AUDITLOG? At INFO or TRACE level? My preference is 
TRACE.

Yes, agree.


bq.  On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.   
security/src/main/java/org/apache/hadoop/hbase/security/rbac/Permission.java, 
line 174
bq.   https://reviews.apache.org/r/2041/diff/1/?file=45406#file45406line174
bq.  
bq.   What if instead we check for version 0 and throw an 
IllegalArgumentException if so?

[jira] [Commented] (HBASE-3025) Coprocessor based simple access control

2011-09-27 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115869#comment-13115869
 ] 

jirapos...@reviews.apache.org commented on HBASE-3025:
--



bq.  On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.   Looks good. The majority of my comments have to do with inconsistent 
logging practice.

Thanks for the review.  I'll post an update with some cleanups and some 
reworking of the AUDITLOG handling.


- Gary


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2041/#review2077
---


On 2011-09-23 19:14:20, Gary Helmling wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2041/
bq.  ---
bq.  
bq.  (Updated 2011-09-23 19:14:20)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch implements access control list based authorization of HBase 
operations.  The patch depends on the currently posted patch for HBASE-2742 
(secure RPC engine).
bq.  
bq.  Key parts of the implementation are:
bq.  
bq.  * AccessControlLists - encapsulates storage of permission grants in a 
metadata table (_acl_).  This differs from previous implementation where the 
.META. table was used to store permissions.
bq.  
bq.  * AccessController - 
bq.- implements MasterObserver and RegionObserver, performing authorization 
checks in each of the preXXX() hooks.  If authorization fails, an 
AccessDeniedException is thrown.
bq.- implements AccessControllerProtocol as a coprocessor endpoint to 
provide RPC methods for granting, revoking and listing permissions.
bq.  
bq.  * ZKPermissionWatcher (and TableAuthManager) - synchronizes ACL entries 
and updates throughout the cluster nodes using ZK.  ACL entries are stored in 
per-table znodes as /hbase/acl/tablename.
bq.  
bq.  * Additional ruby shell scripts providing the grant, revoke and 
user_permission commands
bq.  
bq.  * Support for a new OWNER attribute in HTableDescriptor.  I could separate 
out this change into a new JIRA for discussion, but I don't see it as currently 
useful outside of security.  Alternately, I could handle the OWNER attribute 
completely in AccessController without changing HTD, but that would make 
interaction via hbase shell a bit uglier.
bq.  
bq.  
bq.  This addresses bug HBASE-3025.
bq.  https://issues.apache.org/jira/browse/HBASE-3025
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessControlFilter.java
 PRE-CREATION 
bq.
security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessControlLists.java
 PRE-CREATION 
bq.
security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java
 PRE-CREATION 
bq.
security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessControllerProtocol.java
 PRE-CREATION 
bq.
security/src/main/java/org/apache/hadoop/hbase/security/rbac/Permission.java 
PRE-CREATION 
bq.
security/src/main/java/org/apache/hadoop/hbase/security/rbac/TableAuthManager.java
 PRE-CREATION 
bq.
security/src/main/java/org/apache/hadoop/hbase/security/rbac/TablePermission.java
 PRE-CREATION 
bq.
security/src/main/java/org/apache/hadoop/hbase/security/rbac/UserPermission.java
 PRE-CREATION 
bq.
security/src/main/java/org/apache/hadoop/hbase/security/rbac/ZKPermissionWatcher.java
 PRE-CREATION 
bq.
security/src/test/java/org/apache/hadoop/hbase/security/rbac/SecureTestUtil.java
 PRE-CREATION 
bq.
security/src/test/java/org/apache/hadoop/hbase/security/rbac/TestAccessControlFilter.java
 PRE-CREATION 
bq.
security/src/test/java/org/apache/hadoop/hbase/security/rbac/TestAccessController.java
 PRE-CREATION 
bq.
security/src/test/java/org/apache/hadoop/hbase/security/rbac/TestTablePermissions.java
 PRE-CREATION 
bq.
security/src/test/java/org/apache/hadoop/hbase/security/rbac/TestZKPermissionsWatcher.java
 PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java 46a1a3d 
bq.src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 699a5f5 
bq.src/main/resources/hbase-default.xml 2c8f44b 
bq.src/main/ruby/hbase.rb 4d27191 
bq.src/main/ruby/hbase/admin.rb b244ffe 
bq.src/main/ruby/hbase/hbase.rb beb2450 
bq.src/main/ruby/hbase/security.rb PRE-CREATION 
bq.src/main/ruby/shell.rb 9a47600 
bq.src/main/ruby/shell/commands.rb a352c2e 
bq.src/main/ruby/shell/commands/grant.rb PRE-CREATION 
bq.src/main/ruby/shell/commands/revoke.rb PRE-CREATION 
bq.src/main/ruby/shell/commands/table_permission.rb PRE-CREATION 
bq.

[jira] [Commented] (HBASE-4489) Better key splitting in RegionSplitter

2011-09-27 Thread Dave Revell (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115870#comment-13115870
]

Dave Revell commented on HBASE-4489:

@Ted: Bytes.split() will check the number of splits and throw
IllegalArgumentException if =0. It seems like a shame to add more code to
duplicate a check that's already being done.

Better key splitting in RegionSplitter
--

Key: HBASE-4489
URL: https://issues.apache.org/jira/browse/HBASE-4489
Project: HBase
Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Dave Revell
Assignee: Dave Revell
Attachments: HBASE-4489-branch0.90-v1.patch, HBASE-4489-trunk-v1.patch

The RegionSplitter utility allows users to create a pre-split table from the
command line or do a rolling split on an existing table. It supports
pluggable split algorithms that implement the SplitAlgorithm interface. The
only/default SplitAlgorithm is one that assumes keys fall in the range from
ASCII string to ASCII string 7FFF. This is not a sane
default, and seems useless to most users. Users are likely to be surprised by
the fact that all the region splits occur in in the byte range of ASCII
characters.
A better default split algorithm would be one that evenly divides the space
of all bytes, which is what this patch does. Making a table with five regions
would split at \x33\x33..., \x66\x66, \x99\x99..., \xCC\xCC..., and
\xFF\xFF.

[jira] [Commented] (HBASE-4488) Store could miss rows during flush


[ 
https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115876#comment-13115876
 ] 

Jonathan Gray commented on HBASE-4488:
--

Aren't you a committer now?  Or didn't get the rights yet? :)  Yes, I will 
commit later today.

 Store could miss rows during flush
 --

 Key: HBASE-4488
 URL: https://issues.apache.org/jira/browse/HBASE-4488
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0, 0.94.0

 Attachments: 4488.txt


 While looking at HBASE-4344 I found that my change HBASE-4241 contains a 
 critical mistake:
 The while(scanner.next(kvs)) loop is incorrect and might miss the last edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4488) Store could miss rows during flush

2011-09-27 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115877#comment-13115877
 ] 

Lars Hofhansl commented on HBASE-4488:
--

Not yet... :)

 Store could miss rows during flush
 --

 Key: HBASE-4488
 URL: https://issues.apache.org/jira/browse/HBASE-4488
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0, 0.94.0

 Attachments: 4488.txt


 While looking at HBASE-4344 I found that my change HBASE-4241 contains a 
 critical mistake:
 The while(scanner.next(kvs)) loop is incorrect and might miss the last edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4498) HBase RPM/DEB packages attempt to setup ZooKeeper environment incorrectly

2011-09-27 Thread Eric Yang (Created) (JIRA)

HBase RPM/DEB packages attempt to setup ZooKeeper environment incorrectly
-

 Key: HBASE-4498
 URL: https://issues.apache.org/jira/browse/HBASE-4498
 Project: HBase
  Issue Type: Bug
  Components: build, scripts
Affects Versions: 0.92.0
 Environment: Java, Linux
Reporter: Eric Yang
Assignee: Eric Yang
 Fix For: 0.92.0


HBase RPM packaging was done prior to completion of ZooKeeper RPM packaging.  
In update-hbase-env.sh, it expects ZooKeeper environment script to exist in 
/etc/default/zookeeper-env.sh.  After several revision of ZOOKEEPER-999, it was 
decided to remove /etc/default/zookeeper-env.sh by ZooKeeper community.  Hence, 
update-hbase-env.sh should not depend on /etc/default/zookeeper-env.sh.  
Instead, update-hbase-env.sh should assume ZooKeeper exists in /usr for RPM/deb 
packages.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4498) HBase RPM/DEB packages attempt to setup ZooKeeper environment incorrectly

2011-09-27 Thread Eric Yang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated HBASE-4498:
-

Status: Patch Available  (was: Open)

 HBase RPM/DEB packages attempt to setup ZooKeeper environment incorrectly
 -

 Key: HBASE-4498
 URL: https://issues.apache.org/jira/browse/HBASE-4498
 Project: HBase
  Issue Type: Bug
  Components: build, scripts
Affects Versions: 0.92.0
 Environment: Java, Linux
Reporter: Eric Yang
Assignee: Eric Yang
 Fix For: 0.92.0

 Attachments: HBASE-4498.patch


 HBase RPM packaging was done prior to completion of ZooKeeper RPM packaging.  
 In update-hbase-env.sh, it expects ZooKeeper environment script to exist in 
 /etc/default/zookeeper-env.sh.  After several revision of ZOOKEEPER-999, it 
 was decided to remove /etc/default/zookeeper-env.sh by ZooKeeper community.  
 Hence, update-hbase-env.sh should not depend on 
 /etc/default/zookeeper-env.sh.  Instead, update-hbase-env.sh should assume 
 ZooKeeper exists in /usr for RPM/deb packages.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4498) HBase RPM/DEB packages attempt to setup ZooKeeper environment incorrectly

2011-09-27 Thread Eric Yang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated HBASE-4498:
-

Attachment: HBASE-4498.patch

Set ZOOKEEPER_HOME to /usr by default.

 HBase RPM/DEB packages attempt to setup ZooKeeper environment incorrectly
 -

 Key: HBASE-4498
 URL: https://issues.apache.org/jira/browse/HBASE-4498
 Project: HBase
  Issue Type: Bug
  Components: build, scripts
Affects Versions: 0.92.0
 Environment: Java, Linux
Reporter: Eric Yang
Assignee: Eric Yang
 Fix For: 0.92.0

 Attachments: HBASE-4498.patch


 HBase RPM packaging was done prior to completion of ZooKeeper RPM packaging.  
 In update-hbase-env.sh, it expects ZooKeeper environment script to exist in 
 /etc/default/zookeeper-env.sh.  After several revision of ZOOKEEPER-999, it 
 was decided to remove /etc/default/zookeeper-env.sh by ZooKeeper community.  
 Hence, update-hbase-env.sh should not depend on 
 /etc/default/zookeeper-env.sh.  Instead, update-hbase-env.sh should assume 
 ZooKeeper exists in /usr for RPM/deb packages.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3130) [replication] ReplicationSource can't recover from session expired on remote clusters

2011-09-27 Thread Jean-Daniel Cryans (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115886#comment-13115886
 ] 

Jean-Daniel Cryans commented on HBASE-3130:
---

Another thing about the latest patch, this line needs to be removed in RP:

bq. * @param zkw zookeeper connection to the peer

 [replication] ReplicationSource can't recover from session expired on remote 
 clusters
 -

 Key: HBASE-3130
 URL: https://issues.apache.org/jira/browse/HBASE-3130
 Project: HBase
  Issue Type: Bug
  Components: replication
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Chris Trezzo
 Fix For: 0.92.0

 Attachments: 3130-v2.txt, 3130-v3.txt, 3130.txt


 Currently ReplicationSource cannot recover when its zookeeper connection to 
 its remote cluster expires. HLogs are still being tracked, but a cluster 
 restart is required to continue replication (or a rolling restart).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4498) HBase RPM/DEB packages attempt to setup ZooKeeper environment incorrectly


[ 
https://issues.apache.org/jira/browse/HBASE-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115895#comment-13115895
 ] 

jirapos...@reviews.apache.org commented on HBASE-4498:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2075/
---

Review request for hbase.


Summary
---

Updated ZOOKEEPER_HOME location to default to /usr (default location of 
ZooKeeper rpm/deb package)


This addresses bug HBASE-4498.
https://issues.apache.org/jira/browse/HBASE-4498


Diffs
-

  /src/packages/update-hbase-env.sh 1176602 

Diff: https://reviews.apache.org/r/2075/diff


Testing
---


Thanks,

Eric



 HBase RPM/DEB packages attempt to setup ZooKeeper environment incorrectly
 -

 Key: HBASE-4498
 URL: https://issues.apache.org/jira/browse/HBASE-4498
 Project: HBase
  Issue Type: Bug
  Components: build, scripts
Affects Versions: 0.92.0
 Environment: Java, Linux
Reporter: Eric Yang
Assignee: Eric Yang
 Fix For: 0.92.0

 Attachments: HBASE-4498.patch


 HBase RPM packaging was done prior to completion of ZooKeeper RPM packaging.  
 In update-hbase-env.sh, it expects ZooKeeper environment script to exist in 
 /etc/default/zookeeper-env.sh.  After several revision of ZOOKEEPER-999, it 
 was decided to remove /etc/default/zookeeper-env.sh by ZooKeeper community.  
 Hence, update-hbase-env.sh should not depend on 
 /etc/default/zookeeper-env.sh.  Instead, update-hbase-env.sh should assume 
 ZooKeeper exists in /usr for RPM/deb packages.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3025) Coprocessor based simple access control

2011-09-27 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115897#comment-13115897
]

jirapos...@reviews.apache.org commented on HBASE-3025:
--

bq. On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.
security/src/main/java/org/apache/hadoop/hbase/security/rbac/AccessController.java,
line 192
bq. https://reviews.apache.org/r/2041/diff/1/?file=45404#file45404line192
bq.
bq. Debug logging should go to LOG not AUDITLOG
bq.
bq. Gary Helmling wrote:
bq. The idea was that all authorization decisions should be separated into
audit log. Here we're allowing access, so AUDITLOG seemed to make sense. I
agree that this still needs to be cleaned up a lot. Maybe all audit logging
should be done up in requirePermission() with authorization result? At the
very least we need a consistent format and consistent logging levels for
messages (trace, right?).

bq. Maybe all audit logging should be done up in requirePermission() with
authorization result?

Sounds good.

bq. At the very least we need a consistent format and consistent logging
levels for messages (trace, right?).

I'd argue for TRACE

bq. On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.
security/src/main/java/org/apache/hadoop/hbase/security/rbac/TableAuthManager.java,
line 47
bq. https://reviews.apache.org/r/2041/diff/1/?file=45407#file45407line47
bq.
bq. Maybe we can call this .auth.? We don't really have an RBAC
implementation yet. Likewise for the package name for all of this stuff? Just a
random thought.
bq.
bq. Gary Helmling wrote:
bq. Yeah rbac here and in package name is a misnomer. How about using
access instead? auth seems ambiguous to me as it could mean
authentication or authorization. JDK uses auth in javax.security.auth
and claims it's for both, but seems like that and sub-packages are more
authentication related to me. Hadoop uses authorize for a similar package
to this.

access sounds good to me.

bq. On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.
security/src/main/java/org/apache/hadoop/hbase/security/rbac/TableAuthManager.java,
line 84
bq. https://reviews.apache.org/r/2041/diff/1/?file=45407#file45407line84
bq.
bq. Isn't this an error?
bq.
bq. Gary Helmling wrote:
bq. Yes, and in this context a pretty bad one, as it probably means region
server initiated RPCs won't work or will be denied. We should probably let the
IOE escape here...

Agree.

bq. On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.
security/src/main/java/org/apache/hadoop/hbase/security/rbac/Permission.java,
line 174
bq. https://reviews.apache.org/r/2041/diff/1/?file=45406#file45406line174
bq.
bq. What if instead we check for version 0 and throw an
IllegalArgumentException if so? Technically, it is an invalid request if it
contains an unrecognizable action code. Skipping this check if version 0
would be a way to handle new perms while not accepting incorrect input
otherwise.
bq.
bq. Gary Helmling wrote:
bq. Yeah, seems safer to throw an exception here than to ignore invalid
input. What about throwing an IOException (to tie in to existing error
handling)?
bq.
bq. We could potentially trap the VersionMismatchException from
VersionedWritable to allow skip and continue when reading newer versions of
Permission with potentially added Action codes. Would need to think about what
kind of errors that would expose us to.

bq. What about throwing an IOException (to tie in to existing error handling)?

Throwing an IOE sounds good.

bq. We could potentially trap the VersionMismatchException from
VersionedWritable to allow skip and continue when reading newer versions of
Permission with potentially added Action codes.

I think that is reasonable, with something logged at WARN level. The idea here
is to ride over a rolling restart. Would not see long term operation with
mismatching versions.

bq. On 2011-09-27 16:58:47, Andrew Purtell wrote:
bq.
security/src/main/java/org/apache/hadoop/hbase/security/rbac/ZKPermissionWatcher.java,
line 59
bq. https://reviews.apache.org/r/2041/diff/1/?file=45410#file45410line59
bq.
bq. I wonder if there is some way we can check if a secure variant of
ZooKeeper is running, and refuse to initialize if not.
bq.
bq. Gary Helmling wrote:
bq. My thinking has been to handle all secure ZooKeeper changes
separately. So I'd prefer to handle any check here as part of that.
bq.
bq. I do think it's reasonable to run AccessController with only SIMPLE
auth and no secure ZooKeeper. It's not secure but could still be useful (we
currently use this setup for tests).
bq.
bq. We could complain loudly to give an indication that you have a
security hole though.

bq. I do think it's reasonable to run AccessController with only SIMPLE auth
and

[jira] [Commented] (HBASE-4455) Rolling restart RSs scenario, -ROOT-, .META. regions are lost in AssignmentManager

2011-09-27 Thread Ted Yu (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115917#comment-13115917
]

Ted Yu commented on HBASE-4455:
---

Integrated to 0.92 and TRUNK.

Thanks for the nice work, Ming.

Thanks for the review Todd, Jonathan, Ramkrishna and Stack.

Rolling restart RSs scenario, -ROOT-, .META. regions are lost in
AssignmentManager
--

Key: HBASE-4455
URL: https://issues.apache.org/jira/browse/HBASE-4455
Project: HBase
Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
Fix For: 0.92.0

Keep Master up all the time, do rolling restart of RSs like this - stop RS1,
wait for 2 seconds, stop RS2, start RS1, wait for 2 seconds, stop RS3, start
RS2, wait for 2 seconds, etc. After a while, you will find the -ROOT-, .META.
regions aren't in regions in transtion from AssignmentManager point of
view, but they aren't assigned to any regions. Here are the issues.
1. .-ROOT- or .META. location is stale when MetaServerShutdownHandler is
invoked to check if it contains -ROOT- region. That is due to long delay from
ZK notification and async nature of the system. Here is an example, even
though new root region server sea-lab-1,60020,1316380133656 is set at T2, at
T3 the shutdown process for sea-lab-1,60020,1316380133656, the root location
still points to old server sea-lab-3,60020,1316380037898.
T1: 2011-09-18 14:08:52,470 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
master:6
-0x1327e43175e Retrieved 29 byte(s) of data from znode
/hbase/root-regio
n-server and set watcher; sea-lab-3,60020,1316380037898
T2: 2011-09-18 14:08:57,173 INFO
org.apache.hadoop.hbase.catalog.RootLocationEditor: Setting ROOT region
location in ZooKeeper as sea-lab-1,60020,1316380133656
T3: 2011-09-18 14:10:26,393 DEBUG
org.apache.hadoop.hbase.master.ServerManager: Adde
d=sea-lab-1,60020,1316380133656 to dead servers, submitted shutdown handler
to be executed, root=false, meta=true, current Root Location:
sea-lab-3,60020,1316380037898
T4: 2011-09-18 14:12:37,314 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
master:6
-0x1327e43175e Retrieved 29 byte(s) of data from znode
/hbase/root-region-server and set watcher; sea-lab-1,60020,1316380133656
2. The MetaServerShutdownHandler worker thread that waits for -ROOT- or
.META. availability could be blocked. If meanwhile, the new server that
-ROOT- or .META. is being assigned restarted, another instance of
MetaServerShutdownHandler is queued. Eventually, all
MetaServerShutdownHandler worker threads are filled up. It looks like
HBASE-4245.

[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2011-09-27 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115938#comment-13115938
 ] 

Hudson commented on HBASE-4433:
---

Integrated in HBase-0.92 #23 (See 
[https://builds.apache.org/job/HBase-0.92/23/])
HBASE-4433: avoid extra next (potentially a seek) if done with column/row
HBASE-4433: avoid extra next (potentially a seek) if done with column/row

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java


 avoid extra next (potentially a seek) if done with column/row
 -

 Key: HBASE-4433
 URL: https://issues.apache.org/jira/browse/HBASE-4433
 Project: HBase
  Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
 Fix For: 0.92.0


 [Noticed this in 89, but quite likely true of trunk as well.]
 When we are done with the requested column(s) the code still does an extra 
 next() call before it realizes that it is actually done. This extra next() 
 call could potentially result in an unnecessary extra block load. This is 
 likely to be especially bad for CFs where the KVs are large blobs where each 
 KV may be occupying a block of its own. So the next() can often load a new 
 unrelated block unnecessarily.
 --
 For the simple case of reading say the top-most column in a row in a single 
 file, where each column (KV) was say a block of its own-- it seems that we 
 are reading 3 blocks, instead of 1 block!
 I am working on a simple patch and with that the number of seeks is down to 
 2. 
 [There is still an extra seek left.  I think there were two levels of 
 extra/unnecessary next() we were doing without actually confirming that the 
 next was needed. One at the StoreScanner/ScanQueryMatcher level which this 
 diff avoids. I think the other is at hfs.next() (at the storefile scanner 
 level) that's happening whenever a HFile scanner servers out a data-- and 
 perhaps that's the additional seek that we need to avoid. But I want to 
 tackle this optimization first as the two issues seem unrelated.]
 -- 
 The basic idea of the patch I am working on/testing is as follows. The 
 ExplicitColumnTracker currently returns INCLUDE to the ScanQueryMatcher if 
 the KV needs to be included and then if done, only in the the next call it 
 returns the appropriate SEEK_NEXT_COL or SEEK_NEXT_ROW hint. For the cases 
 when ExplicitColumnTracker knows it is done with a particular column/row, the 
 patch attempts to combine the INCLUDE code and done hint into a single match 
 code-- INCLUDE_AND_SEEK_NEXT_COL and INCLUDE_AND_SEEK_NEXT_ROW.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4499) [replication] Source shouldn't update ZK if it didn't progress

2011-09-27 Thread Jean-Daniel Cryans (Created) (JIRA)

[replication] Source shouldn't update ZK if it didn't progress
--

 Key: HBASE-4499
 URL: https://issues.apache.org/jira/browse/HBASE-4499
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Priority: Minor
 Fix For: 0.92.0


A relatively minor optimization to be done in ReplicationSource, currently it 
calls ReplicationSourceManager.logPositionAndCleanOldLogs whether it made 
progress or not, generating more load on ZK than necessary. The last position 
should be kept around so that we can compare.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4480) Testing script to simplfy local testing

2011-09-27 Thread Scott Kuehn (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115966#comment-13115966
 ] 

Scott Kuehn commented on HBASE-4480:


I'm happy to pick this up, if nobody is working on it.

 Testing script to simplfy local testing
 ---

 Key: HBASE-4480
 URL: https://issues.apache.org/jira/browse/HBASE-4480
 Project: HBase
  Issue Type: Improvement
Reporter: Jesse Yates
Priority: Minor
  Labels: test
 Attachments: runtest.sh


 As mentioned by http://search-hadoop.com/m/r2Ab624ES3e and 
 http://search-hadoop.com/m/cZjDH1ykGIA it would be nice if we could have a 
 script that would handle more of the finer points of running/checking our 
 test suite.
 This script should:
 (1) Allow people to determine which tests are hanging/taking a long time to 
 run
 (2) Allow rerunning of particular tests to make sure it wasn't an artifact of 
 running the whole suite that caused the failure
 (3) Allow people to specify to run just unit tests or also integration tests 
 (essentially wrapping calls to 'maven test' and 'maven verify').
 This script should just be a convenience script - running tests directly from 
 maven should not be impacted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4500) [replication] Add a delayed option

2011-09-27 Thread Jean-Daniel Cryans (Created) (JIRA)

[replication] Add a delayed option
--

 Key: HBASE-4500
 URL: https://issues.apache.org/jira/browse/HBASE-4500
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
 Fix For: 0.94.0


A typical DR solution with databases is to have one slave that receives data a 
few hours late to guard against destructive operations. On top of my head one 
way to implement this would be tail the log, see how current the edit is, sleep 
if needed, replicate, repeat.

The harder part will be adding a configuration to the stream.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4480) Testing script to simplfy local testing


[ 
https://issues.apache.org/jira/browse/HBASE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115972#comment-13115972
 ] 

stack commented on HBASE-4480:
--

@Ted You want to check this in?  Where?  Into src/test/bin?

 Testing script to simplfy local testing
 ---

 Key: HBASE-4480
 URL: https://issues.apache.org/jira/browse/HBASE-4480
 Project: HBase
  Issue Type: Improvement
Reporter: Jesse Yates
Priority: Minor
  Labels: test
 Attachments: runtest.sh


 As mentioned by http://search-hadoop.com/m/r2Ab624ES3e and 
 http://search-hadoop.com/m/cZjDH1ykGIA it would be nice if we could have a 
 script that would handle more of the finer points of running/checking our 
 test suite.
 This script should:
 (1) Allow people to determine which tests are hanging/taking a long time to 
 run
 (2) Allow rerunning of particular tests to make sure it wasn't an artifact of 
 running the whole suite that caused the failure
 (3) Allow people to specify to run just unit tests or also integration tests 
 (essentially wrapping calls to 'maven test' and 'maven verify').
 This script should just be a convenience script - running tests directly from 
 maven should not be impacted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4480) Testing script to simplfy local testing

2011-09-27 Thread Jesse Yates (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115975#comment-13115975
 ] 

Jesse Yates commented on HBASE-4480:


@stack - I was thinking this script would be incorporated into a bigger script 
that we would run for testing.

@Scott - I'm fine if you want to do the bigger script.

 Testing script to simplfy local testing
 ---

 Key: HBASE-4480
 URL: https://issues.apache.org/jira/browse/HBASE-4480
 Project: HBase
  Issue Type: Improvement
Reporter: Jesse Yates
Priority: Minor
  Labels: test
 Attachments: runtest.sh


 As mentioned by http://search-hadoop.com/m/r2Ab624ES3e and 
 http://search-hadoop.com/m/cZjDH1ykGIA it would be nice if we could have a 
 script that would handle more of the finer points of running/checking our 
 test suite.
 This script should:
 (1) Allow people to determine which tests are hanging/taking a long time to 
 run
 (2) Allow rerunning of particular tests to make sure it wasn't an artifact of 
 running the whole suite that caused the failure
 (3) Allow people to specify to run just unit tests or also integration tests 
 (essentially wrapping calls to 'maven test' and 'maven verify').
 This script should just be a convenience script - running tests directly from 
 maven should not be impacted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4480) Testing script to simplfy local testing

2011-09-27 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115978#comment-13115978
 ] 

stack commented on HBASE-4480:
--

@Jesse Sounds good

 Testing script to simplfy local testing
 ---

 Key: HBASE-4480
 URL: https://issues.apache.org/jira/browse/HBASE-4480
 Project: HBase
  Issue Type: Improvement
Reporter: Jesse Yates
Priority: Minor
  Labels: test
 Attachments: runtest.sh


 As mentioned by http://search-hadoop.com/m/r2Ab624ES3e and 
 http://search-hadoop.com/m/cZjDH1ykGIA it would be nice if we could have a 
 script that would handle more of the finer points of running/checking our 
 test suite.
 This script should:
 (1) Allow people to determine which tests are hanging/taking a long time to 
 run
 (2) Allow rerunning of particular tests to make sure it wasn't an artifact of 
 running the whole suite that caused the failure
 (3) Allow people to specify to run just unit tests or also integration tests 
 (essentially wrapping calls to 'maven test' and 'maven verify').
 This script should just be a convenience script - running tests directly from 
 maven should not be impacted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4501) [replication] Shutting down a stream leaves recovered sources running

2011-09-27 Thread Jean-Daniel Cryans (Created) (JIRA)

[replication] Shutting down a stream leaves recovered sources running
-

 Key: HBASE-4501
 URL: https://issues.apache.org/jira/browse/HBASE-4501
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
 Fix For: 0.92.0


When removing a peer it will call ReplicationSourceManager.removePeer which 
calls closeRecoveredQueue which does this:

{code}
LOG.info(Done with the recovered queue  + src.getPeerClusterZnode());
this.oldsources.remove(src);
this.zkHelper.deleteSource(src.getPeerClusterZnode(), false);
{code}

This works in the case where the recovered source is done and is calling this 
method, but when removing a peer it never calls terminate on thus it leaving it 
running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-2196) Support more than one slave cluster

2011-09-27 Thread Jean-Daniel Cryans (Assigned) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans reassigned HBASE-2196:
-

Assignee: Lars Hofhansl

 Support more than one slave cluster
 ---

 Key: HBASE-2196
 URL: https://issues.apache.org/jira/browse/HBASE-2196
 Project: HBase
  Issue Type: Sub-task
  Components: replication
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
 Fix For: 0.92.0

 Attachments: 2196-add.txt, 2196-v2.txt, 2196-v5.txt, 2196-v6.txt, 
 2196.txt, HBASE-2196-0.90-v2.patch, HBASE-2196-0.90.patch, 
 HBASE-2196-wip.patch


 Currently replication supports only 1 slave cluster, need to ability to add 
 more.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4502) Error at HMaster start

2011-09-27 Thread Madhab Nayak (Created) (JIRA)

Error at HMaster start
--

 Key: HBASE-4502
 URL: https://issues.apache.org/jira/browse/HBASE-4502
 Project: HBase
  Issue Type: Bug
 Environment: Ubuntu 11
Reporter: Madhab Nayak


I have set up hadoop 0.20.2 as Pseduo distributed. This is running fine.

And trying to integrate hbase-0.90.4/3 with hadoop. But while starting, getting 
following error at Master log.


2011-09-27 15:01:16,295 INFO org.apache.zookeeper.server.NIOServerCnxn: Client 
attempting to establish new session at /127.0.0.1:37643
2011-09-27 15:01:16,296 INFO org.apache.zookeeper.server.NIOServerCnxn: 
Established session 0x132ace7f4fa0002 with negotiated timeout 4 for client 
/127.0.0.1:37643
2011-09-27 15:01:16,296 INFO org.apache.zookeeper.ClientCnxn: Session 
establishment complete on server localhost/127.0.0.1:2181, sessionid = 
0x132ace7f4fa0002, negotiated timeout = 4
2011-09-27 15:01:16,468 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled 
exception. Starting shutdown.
java.io.IOException: Call to localhost/127.0.0.1:8020 failed on local 
exception: java.io.EOFException
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1139)
at org.apache.hadoop.ipc.Client.call(Client.java:1107)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
at $Proxy6.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:398)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384)
at 
org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:111)
at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:213)
at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:180)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1514)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1548)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1530)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:183)
at org.apache.hadoop.hbase.util.FSUtils.getRootDir(FSUtils.java:364)
at 
org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:81)
at 
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:346)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:282)
at 
org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:193)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at 
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:812)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:720)
2011-09-27 15:01:16,470 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
2011-09-27 15:01:16,470 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping 
service threads
2011-09-27 15:01:16,470 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server 
on 51567



hbase-site.xml file structure is as follows

configuration
property
namehbase.rootdir/name
valuehdfs://localhost:8020/hbase/value
  /property

property
namedfs.replication/name
value1/value
  /property
/configuration


Please help me to fix this error 


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions

[
https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116008#comment-13116008
]

jirapos...@reviews.apache.org commented on HBASE-3446:
--

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2111
---

There is a lot of excellence in here. I'm going to look at the code itself
with this diff applied to try and understand where/how CT is now being used.
I'm a little unclear between the lines you'd like to draw and the lines you
actually draw in this diff.

Great work!

src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4873

maybe note here that you should not be synchronized on metaAvailable (and
it will do so in the method)... the next method below is nicely clear in this
regard

src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
https://reviews.apache.org/r/2065/#comment4874

verify the connection works, and also that the server is actually hosting
the region we think it is... the comment makes me think this is looking up
which server hosts the passed region but it's just verifying if we can connect
to the server we think is hosting the region and verifies whether it's hosting
it or not (so this fails if we can't connect or if the region is not on this
server)

src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
https://reviews.apache.org/r/2065/#comment4875

i'm still trying to understand exactly what you've changed and what is
still a TODO, but this looks much nicer now! :)

src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
https://reviews.apache.org/r/2065/#comment4876

same here! nice (old stuff looks ripe with race conditions)

src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
https://reviews.apache.org/r/2065/#comment4877

missing copyright and year?

src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
https://reviews.apache.org/r/2065/#comment4878

nice moving cruft to separate classes

src/main/java/org/apache/hadoop/hbase/client/Result.java
https://reviews.apache.org/r/2065/#comment4881

seems like this should be moved to static methods in a helper class rather
than exposing to our client-side Result

src/main/java/org/apache/hadoop/hbase/client/Result.java
https://reviews.apache.org/r/2065/#comment4882

yeah, shouldn't this be in MetaReader or some such class?

src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java
https://reviews.apache.org/r/2065/#comment4895

missed some whitespace

src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
https://reviews.apache.org/r/2065/#comment4896

nice

src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
https://reviews.apache.org/r/2065/#comment4897

this seems like an important public method. i like the rename and your
additional comments, but maybe we should add more. default behavior is to use
a cached location, if one is not found, it is looked up in a catalog. setting
reload to true bypasses the cache and forces the lookup to a catalog. and
then, under what cases do we get an exception? does this verify that the
server is actually hosting the region? or it just looks up in the catalog (i
guess failure there could cause IOE) and if it finds something, just returns a
connection to that RS (w/ no verification)... correct?

src/main/java/org/apache/hadoop/hbase/master/HMaster.java
https://reviews.apache.org/r/2065/#comment4898

why do you remove the javadoc on this method?

src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
https://reviews.apache.org/r/2065/#comment4899

not even necessary to put this method in here at all now (we're just using
it for getting the node name at this point but it's probably still nice to have
the name in stacks and such)

src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
https://reviews.apache.org/r/2065/#comment4900

yay! 3

src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
https://reviews.apache.org/r/2065/#comment4901

huh? :)

src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java
https://reviews.apache.org/r/2065/#comment4902

awesome

src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
https://reviews.apache.org/r/2065/#comment4903

30,000 ft desc? i guess test name is self descriptive? :)

- Jonathan

On 2011-09-27 06:38:09, Michael Stack wrote:
bq.
bq. ---
bq. This is an automatically generated e-mail. To reply, visit:
bq.

[jira] [Commented] (HBASE-4448) HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility instances across unit tests

2011-09-27 Thread Doug Meil (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116012#comment-13116012
]

Doug Meil commented on HBASE-4448:
--

Thanks Stack. In general, do you support the HBaseTestingUtilityFactory
approach, where generic MiniClusters can be re-used (where-ever possible)?

I think that HBaseClusterTestCase could be refactored to use
HBaseTestingUtilityFactory to re-use the MiniCluster since they are all the
same config. It's a plus but not a huge win, but still should be done.
Every little bit helps.

HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility
instances across unit tests
-

[jira] [Commented] (HBASE-4496) HFile V2 does not honor setCacheBlocks when scanning.

2011-09-27 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116014#comment-13116014
 ] 

Jonathan Gray commented on HBASE-4496:
--

I can roll this in to my changes for CacheConfig.  Don't have the JIRA off hand 
(but the point of it is to make it so we dont' have to change a bunch of 
constructors all the time).

I'm hoping to have a diff out tonight or tomorrow morning.

 HFile V2 does not honor setCacheBlocks when scanning.
 -

 Key: HBASE-4496
 URL: https://issues.apache.org/jira/browse/HBASE-4496
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0, 0.94.0

 Attachments: 4496.txt


 While testing the LRU cache during the scanning I noticed quite some churn in 
 the cache even when Scan.cacheBlocks is set to false. After debugging this, I 
 found that HFile V2 always caches blocks in the LRU cache regardless of the 
 cacheBlocks setting.
 Here's a trace (from Eclipse) showing the problem:
 HFileReaderV2.readBlock(long, int, boolean, boolean, boolean) line: 279   
 HFileReaderV2.readBlockData(long, long, int, boolean) line: 219   
 HFileBlockIndex$BlockIndexReader.seekToDataBlock(byte[], int, int, 
 HFileBlock) line: 191  
 HFileReaderV2$ScannerV2.seekTo(byte[], int, int, boolean) line: 502   
 HFileReaderV2$ScannerV2.reseekTo(byte[], int, int) line: 539  
 StoreFileScanner.reseekAtOrAfter(HFileScanner, KeyValue) line: 151
 StoreFileScanner.reseek(KeyValue) line: 110   
 KeyValueHeap.reseek(KeyValue) line: 255   
 StoreScanner.reseek(KeyValue) line: 409   
 StoreScanner.next(ListKeyValue, int) line: 304  
 KeyValueHeap.next(ListKeyValue, int) line: 114  
 KeyValueHeap.next(ListKeyValue) line: 143   
 HRegion$RegionScannerImpl.nextRow(byte[]) line: 2774  
 HRegion$RegionScannerImpl.nextInternal(int) line: 2722
 HRegion$RegionScannerImpl.next(ListKeyValue, int) line: 2682
 HRegion$RegionScannerImpl.next(ListKeyValue) line: 2699 
 HRegionServer.next(long, int) line: 2092  
 Every scanner.next causes a reseek, which eventually causes a call to 
 HFileBlockIndex$BlockIndexReader.seekToDataBlock(...) at which point the 
 cacheBlocks information is lost. HFileReaderV2.readBlockData calls 
 HFileReaderV2.readBlock with cacheBlocks set unconditionally to true.
 The fix is not immediately clear, unless we want to pass cacheBlocks to 
 HFileBlockIndex$BlockIndexReader.seekToDataBlock and then on to 
 HFileBlock.BasicReader.readBlockData and all its implementers, which is ugly 
 as readBlockData should not care about caching.
 Avoiding caching during scans is somewhat important for us.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3130) [replication] ReplicationSource can't recover from session expired on remote clusters

2011-09-27 Thread Jean-Daniel Cryans (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116016#comment-13116016
 ] 

Jean-Daniel Cryans commented on HBASE-3130:
---

Also I'd add that I was able to test the patch (on 0.90) and it really works, 
proof:

First it loses the connection:
{quote}
2011-09-27 16:44:54,984 WARN 
org.apache.hadoop.hbase.replication.ReplicationPeer: connection to cluster: 
10.10.30.7:2181:/hbase1-0x132ad0f29d70017 connection to cluster: 
10.10.30.7:2181:/hbase1-0x132ad0f29d70017 received expired from ZooKeeper, 
aborting
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired
at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:343)
at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:261)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
2011-09-27 16:44:54,984 INFO org.apache.zookeeper.ClientCnxn: EventThread shut 
down
{quote}

Then later when it tries to replicate it tries to talk to ZK again and it works 
after a reload:

{quote}
2011-09-27 16:49:03,738 DEBUG 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Since we 
are unable to replicate, sleeping 1000 times 10
2011-09-27 16:49:13,738 WARN 
org.apache.hadoop.hbase.replication.ReplicationZookeeper: Lost the ZooKeeper 
connection for peer 1
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired for /hbase1/rs
at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243)
at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenNoWatch(ZKUtil.java:389)
at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndGetAsAddresses(ZKUtil.java:355)
at 
org.apache.hadoop.hbase.replication.ReplicationZookeeper.fetchSlavesAddresses(ReplicationZookeeper.java:268)
at 
org.apache.hadoop.hbase.replication.ReplicationZookeeper.getSlavesAddresses(ReplicationZookeeper.java:239)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.chooseSinks(ReplicationSource.java:205)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:588)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:341)
2011-09-27 16:49:13,772 INFO org.apache.zookeeper.ZooKeeper: Initiating client 
connection, connectString=10.10.30.7:2181 sessionTimeout=2 
watcher=connection to cluster: 10.10.30.7:2181:/hbase1
2011-09-27 16:49:13,773 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
connection to server /10.10.30.7:2181
2011-09-27 16:49:14,111 INFO org.apache.zookeeper.ClientCnxn: Socket connection 
established to hbasedev.sfo.stumble.net/10.10.30.7:2181, initiating session
2011-09-27 16:49:14,140 INFO org.apache.zookeeper.ClientCnxn: Session 
establishment complete on server hbasedev.sfo.stumble.net/10.10.30.7:2181, 
sessionid = 0x132ad0f29d70024, negotiated timeout = 2
{quote}

 [replication] ReplicationSource can't recover from session expired on remote 
 clusters
 -

 Key: HBASE-3130
 URL: https://issues.apache.org/jira/browse/HBASE-3130
 Project: HBase
  Issue Type: Bug
  Components: replication
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Chris Trezzo
 Fix For: 0.92.0

 Attachments: 3130-v2.txt, 3130-v3.txt, 3130.txt


 Currently ReplicationSource cannot recover when its zookeeper connection to 
 its remote cluster expires. HLogs are still being tracked, but a cluster 
 restart is required to continue replication (or a rolling restart).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions


[ 
https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116019#comment-13116019
 ] 

jirapos...@reviews.apache.org commented on HBASE-3446:
--



bq.  On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq.   There is a lot of excellence in here.  I'm going to look at the code 
itself with this diff applied to try and understand where/how CT is now being 
used.  I'm a little unclear between the lines you'd like to draw and the lines 
you actually draw in this diff.
bq.   
bq.   Great work!

Sorry about that.  Let me get you better answer to your question.  I think its 
not very clear because I myself was unclear on scope of CT when I started in.  
What this patch has here is an attempt at shutting down CT scope with 
subsequent work put off for HBASE-4495.


bq.  On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq.   src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java, line 
513
bq.   https://reviews.apache.org/r/2065/diff/1/?file=45907#file45907line513
bq.  
bq.   maybe note here that you should not be synchronized on metaAvailable 
(and it will do so in the method)... the next method below is nicely clear in 
this regard

Will do


bq.  On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq.   src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java, line 
575
bq.   https://reviews.apache.org/r/2065/diff/1/?file=45907#file45907line575
bq.  
bq.   verify the connection works, and also that the server is actually 
hosting the region we think it is... the comment makes me think this is looking 
up which server hosts the passed region but it's just verifying if we can 
connect to the server we think is hosting the region and verifies whether it's 
hosting it or not (so this fails if we can't connect or if the region is not on 
this server)

Good point.


bq.  On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq.   src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java, line 194
bq.   https://reviews.apache.org/r/2065/diff/1/?file=45908#file45908line194
bq.  
bq.   i'm still trying to understand exactly what you've changed and what 
is still a TODO, but this looks much nicer now! :)

In the above, we'd get the HRegionInterface and do the invocation on the actual 
Interface.  The alternative steps back and asks an HTable instance to do the 
work.  If an issue with former we'd just let the exception out.  In the 
alternative, we'll do HTable retries before we let the exception out (and the 
retries are boosted in server-context).


bq.  On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq.   
src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java, 
line 2
bq.   https://reviews.apache.org/r/2065/diff/1/?file=45909#file45909line2
bq.  
bq.   missing copyright and year?

Turns out that copyright is not actually needed 
https://issues.apache.org/jira/browse/HBASE-3870


bq.  On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq.   src/main/java/org/apache/hadoop/hbase/client/Result.java, line 568
bq.   https://reviews.apache.org/r/2065/diff/1/?file=45915#file45915line568
bq.  
bq.   seems like this should be moved to static methods in a helper class 
rather than exposing to our client-side Result

OK.  It was kinda nice being able to do result.getServerNameFromCatalogResult.  
I suppose it does pollute.  I can move it back to MetaReader since that seems 
like next best place.  You are right shouldn't be generally public stuff.  Will 
fix.


bq.  On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq.   src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java, line 70
bq.   https://reviews.apache.org/r/2065/diff/1/?file=45918#file45918line70
bq.  
bq.   this seems like an important public method.  i like the rename and 
your additional comments, but maybe we should add more.  default behavior is to 
use a cached location, if one is not found, it is looked up in a catalog.  
setting reload to true bypasses the cache and forces the lookup to a catalog.  
and then, under what cases do we get an exception?  does this verify that the 
server is actually hosting the region?  or it just looks up in the catalog (i 
guess failure there could cause IOE) and if it finds something, just returns a 
connection to that RS (w/ no verification)... correct?

Will look into this.


bq.  On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq.   src/main/java/org/apache/hadoop/hbase/master/HMaster.java, lines 
1041-1047
bq.   https://reviews.apache.org/r/2065/diff/1/?file=45921#file45921line1041
bq.  
bq.   why do you remove the javadoc on this method?

Will look into this.


bq.  On 2011-09-27 23:36:00, Jonathan Gray wrote:
bq.   src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java, 
line 132
bq.   https://reviews.apache.org/r/2065/diff/1/?file=45931#file45931line132
bq.  
bq.   huh? :)

Let me fix.


bq.  On 2011-09-27 23:36:00,

[jira] [Created] (HBASE-4503) Purge deprecated HBaseClusterTestCase

2011-09-27 Thread stack (Created) (JIRA)

Purge deprecated HBaseClusterTestCase
-

 Key: HBASE-4503
 URL: https://issues.apache.org/jira/browse/HBASE-4503
 Project: HBase
  Issue Type: Improvement
Reporter: stack


It could gain us a few minutes on overall test run in the cases where we don't 
spin up a cluster for each test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4503) Purge deprecated HBaseClusterTestCase

2011-09-27 Thread stack (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4503:
-

Attachment: 4503.txt

Here are a few tests converted.  Few more to do.

 Purge deprecated HBaseClusterTestCase
 -

 Key: HBASE-4503
 URL: https://issues.apache.org/jira/browse/HBASE-4503
 Project: HBase
  Issue Type: Improvement
Reporter: stack
 Attachments: 4503.txt


 It could gain us a few minutes on overall test run in the cases where we 
 don't spin up a cluster for each test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4489) Better key splitting in RegionSplitter

2011-09-27 Thread Jonathan Hsieh (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116021#comment-13116021
]

Jonathan Hsieh commented on HBASE-4489:
---

A few thoughts:

I agree with jgray -- I think one fix should correct the MD5 string split so
that it splits from 0x00.. 0xff. I think there could be another separate patch
that adds the UniformSplit.

I'd be wary of changing the default, especially if this is means to go into a
0.90.x branch. It looks like as a user you can add and use the UniformSplit by
changing the conf option.

Ideally patches with new functionality or changing semantics would also
introduce corresponding tests. There were no tests on the previous code, and
no tests in on the newly introduced code. Adding tests especially around edge
cases could accommodate Ted's concerns, and it doesn't really hurt to be extra
defensive when coding on non-performance sensitive code.

Better key splitting in RegionSplitter
--

[jira] [Commented] (HBASE-4448) HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility instances across unit tests

2011-09-27 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116023#comment-13116023
]

stack commented on HBASE-4448:
--

I agree with your Every little bit helps. (I am working on hbase-4503 after
going through your excel spread sheet). I'm still going through this issue.
On the factory, I'm not sure yet. I'm unclear on how it'll be passed from test
to test and also how we prevent one test polluting another tests's run. I also
need to see for myself how this approach is different from the approach where
we spin up the cluster at start of a suite and then do a bunch of tests as we
do in TestFromClientSide and TestAdmin.

Will be back with more comments.

A two hour test run is killing us. It means we only get max of 12 runs a day
(and never get that many anyways)... and we have four test branches run up on
hudson so the long running test suite is a progress killer (as well as a
productivity killer). Reminds me of the microsoft build.

HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility
instances across unit tests
-

[jira] [Commented] (HBASE-3446) ProcessServerShutdown fails if META moves, orphaning lots of regions


[ 
https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116024#comment-13116024
 ] 

jirapos...@reviews.apache.org commented on HBASE-3446:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2065/#review2124
---



src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
https://reviews.apache.org/r/2065/#comment4913

This doesn't seem right.


- Ted


On 2011-09-27 06:38:09, Michael Stack wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2065/
bq.  ---
bq.  
bq.  (Updated 2011-09-27 06:38:09)
bq.  
bq.  
bq.  Review request for hbase and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Make the Meta* operations against meta retry.  We do it by using HTable 
instances.
bq.  (HTable calls HConnection.getRegionServerWithRetries for get, put, scan 
etc).
bq.  In 0.89, we had special RetryableMetaOperation class that was a
bq.  subclass of Callable which reproduced the guts of 
HConnection.getRegionServerWithRetries
bq.  with its retry loop.  Now we just use HTable instead (Costs some on setup 
but
bq.  otherwise, we avoid duplicating code).  Upped the retries on serverside 
too.
bq.  
bq.  Had problem with CatalogJanitor.  MetaReader and MetaEditor were relying
bq.  heavily on CT methods getting proxy connections to meta and root servers.
bq.  CT needs to be cut back.  This patch closes down access on (unused) public
bq.  methods and removes being able to get an HRegionInterface on meta and root
bq.  -- this stuff is used internally to CT only now; use MetaEditor or
bq.  MetaReader if you want to update or read catalog tables.  Opening new issue
bq.  to cutback CT use over the code base.
bq.  
bq.  A little off topic but couldn't help it since was in MetaReader and 
MetaEditor
bq.  trying to clean them up, I ended up moving meta migration code out to its
bq.  own class rather than have it in all inside in MetaEditor.
bq.  
bq.  Here is some detail to help reviews.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
bq.Clean up.  Shutdown access on some of these unused methods.  Don't
bq.let out HRegionInterface instances in particular since we are going
bq.away from raw HRI use to instead use a connection with retries:
bq.i.e. HTable.
bq.  
bq.Comments on state of this class. Javadoc edits.
bq.getZooKeeperWatcher on HConnection is deprecated so don't use it
bq.in constructor.  Override MetaNodeTracker and on node delete
bq.reset meta location (We used to do this over in MetaNodeTracker
bq.but to do that we had to have a CatalogTracker over in zk package
bq.which is silly -- bad package encapsulation).
bq.  
bq.(waitForRootServer) Renamed getRootServerConnection and change it
bq.from public to package private.
bq.(waitForRootServerConnectionDefault, getRootServerConnection) Removed.
bq.(getMetaServerConnection) Change from public to package private.
bq.Use MetaReader to read the meta location in root rather than a
bq.raw HRegionInterface so we get retrying.
bq.(remaining, timedout) Added utility methods.
bq.(waitForMetaServer) Changed from public to private.
bq.(resetMetaLocation) Made it synchronized on metaAvailable.
bq.Not all accesses were synchronized.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
bq.Refactor to use HTable instead of raw HRegionInterface so we get
bq.retrying.  For each operation we get an HTable, use it, then close it.
bq.(putToMetaTable, putsToMetaTable, etc) Utility methods.
bq.(updateRootWithMetaMigrationStatus, etc.) Moved out to own
bq.class since these classes are for a one-time migration only.
bq.  
bq.  A 
src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java
bq.New class that holds all Meta* methods updating meta table used
bq.doing the one-time migration done to meta on startup.  This class
bq.is marked deprecated because its going to be dropped in 0.94.
bq.  
bq.  M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
bq.Retrofit methods in here to use fullScan methods with Visitor.
bq.(getCatalogRegionInterface, getCatalogRegionNameForTable,
bq.  getCatalogRegionNameForRegion) Removed.
bq.(fullScan) Cleaned up the fullScans.  Fixed up wrong javadoc.
bq.(fullScanOfResults) Renamed as fullScan override.
bq.(fullScanOfRoot) Added as deprecated. We should be doing
bq.this against zk.
bq.(metaRowToRegionPair, getServerNameFromResult) Moved to Result

[jira] [Commented] (HBASE-4497) If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE

[
https://issues.apache.org/jira/browse/HBASE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116032#comment-13116032
]

Jonathan Gray commented on HBASE-4497:
--

I was just discussing this scenario with Dhruba a few days back. There's
definitely a race condition here and I don't see a trivial fix.

We use HLog IO-fencing to ensure that edits don't slip into an HLog after a
server is considered dead by the Master. But the Master has no way to prevent
this META update from slipping in.

We need to make some modification to how the master can safely timeout an
OPENING. One possibility is for the master to require either an acknowledgment
from the RS before moving the region elsewhere or for the RS to die. It seems
unlikely that we will actually see the RS to Master acknowledgment since
OPENING taking too long is usually a sign of brokenness or the RS being backed
up, I think. But in any case I'd imagine some kind of OPEN_CANCEL_REQUESTED
state that the Master transitions the node to and only when the RS transitions
to OPEN_CANCELED or OFFLINE or something, then it's safe to reassign elsewhere.

I think this design still has a hole in it though because there are scenarios
where the RS doesn't actually die but for some reason doesn't OPEN or ack the
cancel.

Another option would be to do the RS performed META edits using a CheckAndPut
rather than straight Put. Or we could move META editing back to the Master
where it's easy to do things atomically :)

The CheckAndPut idea is kind of neat but we'd probably have to send more data
on the OPEN_RPC. For example, the existing server start code or server name +
start code or something guaranteed unique (guaranteed that a conflicting RS
opening stuff wouldn't be able to use the same thing). Then the atomicity is
on the META region.

If region opening fails after updating META HBCK reports it as inconsistent
and scanning the region throws NSRE
---

Key: HBASE-4497
URL: https://issues.apache.org/jira/browse/HBASE-4497
Project: HBase
Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Priority: Critical

[jira] [Updated] (HBASE-4209) The HBase hbase-daemon.sh SIGKILLs master when stopping it

2011-09-27 Thread Roman Shaposhnik (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Shaposhnik updated HBASE-4209:


Attachment: HBASE-4209.final.patch.txt

Latest version of patch attached. Unfortunately, it has grown since last time 
I've submitted it to incorporated feedback from this JIRA. 

All tests pass on my local workstation.

 The HBase hbase-daemon.sh SIGKILLs master when stopping it
 --

 Key: HBASE-4209
 URL: https://issues.apache.org/jira/browse/HBASE-4209
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik
 Attachments: HBASE-4209.final.patch.txt, HBASE-4209.patch.txt


 There's a bit of code in hbase-daemon.sh that makes HBase master being 
 SIGKILLed when stopping it rather than trying SIGTERM (like it does for other 
 daemons). When HBase is executed in a standalone mode (and the only daemon 
 you need to run is master) that causes newly created tables to go missing as 
 unflushed data is thrown out. If there was not a good reason to kill master 
 with SIGKILL perhaps we can take that special case out and rely on SIGTERM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4455) Rolling restart RSs scenario, -ROOT-, .META. regions are lost in AssignmentManager

2011-09-27 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116059#comment-13116059
 ] 

Hudson commented on HBASE-4455:
---

Integrated in HBase-0.92 #24 (See 
[https://builds.apache.org/job/HBase-0.92/24/])
HBASE-4455 Rolling restart RSs scenario, -ROOT-, .META. regions are lost in 
AssignmentManager (Ming Ma)

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/MasterAddressTracker.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/handler/MetaServerShutdownHandler.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ClusterStatusTracker.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperNodeTracker.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperNodeTracker.java


 Rolling restart RSs scenario, -ROOT-, .META. regions are lost in 
 AssignmentManager
 --

 Key: HBASE-4455
 URL: https://issues.apache.org/jira/browse/HBASE-4455
 Project: HBase
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
 Fix For: 0.92.0


 Keep Master up all the time, do rolling restart of RSs like this - stop RS1, 
 wait for 2 seconds, stop RS2, start RS1, wait for 2 seconds, stop RS3, start 
 RS2, wait for 2 seconds, etc. After a while, you will find the -ROOT-, .META. 
 regions aren't in regions in transtion from AssignmentManager point of 
 view, but they aren't assigned to any regions. Here are the issues.
 1. .-ROOT- or .META. location is stale when MetaServerShutdownHandler is 
 invoked to check if it contains -ROOT- region. That is due to long delay from 
 ZK notification and async nature of the system. Here is an example, even 
 though new root region server sea-lab-1,60020,1316380133656 is set at T2, at 
 T3 the shutdown process for sea-lab-1,60020,1316380133656, the root location 
 still points to old server sea-lab-3,60020,1316380037898.
 T1: 2011-09-18 14:08:52,470 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
 master:6
 -0x1327e43175e Retrieved 29 byte(s) of data from znode 
 /hbase/root-regio
 n-server and set watcher; sea-lab-3,60020,1316380037898
 T2: 2011-09-18 14:08:57,173 INFO 
 org.apache.hadoop.hbase.catalog.RootLocationEditor: Setting ROOT region 
 location in ZooKeeper as sea-lab-1,60020,1316380133656
 T3: 2011-09-18 14:10:26,393 DEBUG 
 org.apache.hadoop.hbase.master.ServerManager: Adde
 d=sea-lab-1,60020,1316380133656 to dead servers, submitted shutdown handler 
 to be executed, root=false, meta=true, current Root Location: 
 sea-lab-3,60020,1316380037898
 T4: 2011-09-18 14:12:37,314 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
 master:6
 -0x1327e43175e Retrieved 29 byte(s) of data from znode 
 /hbase/root-region-server and set watcher; sea-lab-1,60020,1316380133656
 2. The MetaServerShutdownHandler worker thread that waits for -ROOT- or 
 .META. availability could be blocked. If meanwhile, the new server that 
 -ROOT- or .META. is being assigned restarted, another instance of 
 MetaServerShutdownHandler is queued. Eventually, all 
 MetaServerShutdownHandler worker threads are filled up. It looks like 
 HBASE-4245.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4448) HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility instances across unit tests

2011-09-27 Thread Doug Meil (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116061#comment-13116061
]

Doug Meil commented on HBASE-4448:
--

re: passed from test to test

This is described in the issue description: this approach depends on JVM
re-use, otherwise, it won't work. The factory has a cache of MiniClusters
based on the number of slaves, and it can be added down the road for DFS
cluster or ZkCluster as needed.

re: pollution

I hate pollution! :-)

There are two cases: intra-test and inter-test.
Inter-test is handled via all the tables getting disabled and whacked when the
minicluster is returned to the factory.
Intra-test (i.e., multiple test methods in the same class) is the same as is
now. There is no automatic cleanup between test-methods in the same class even
without this utility, so it's no worse.

HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility
instances across unit tests
-

[jira] [Commented] (HBASE-3130) [replication] ReplicationSource can't recover from session expired on remote clusters

2011-09-27 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116072#comment-13116072
 ] 

Lars Hofhansl commented on HBASE-3130:
--

You sound like you are surprised :)




 [replication] ReplicationSource can't recover from session expired on remote 
 clusters
 -

 Key: HBASE-3130
 URL: https://issues.apache.org/jira/browse/HBASE-3130
 Project: HBase
  Issue Type: Bug
  Components: replication
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Chris Trezzo
 Fix For: 0.92.0

 Attachments: 3130-v2.txt, 3130-v3.txt, 3130.txt


 Currently ReplicationSource cannot recover when its zookeeper connection to 
 its remote cluster expires. HLogs are still being tracked, but a cluster 
 restart is required to continue replication (or a rolling restart).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3130) [replication] ReplicationSource can't recover from session expired on remote clusters

2011-09-27 Thread Jean-Daniel Cryans (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116073#comment-13116073
 ] 

Jean-Daniel Cryans commented on HBASE-3130:
---

More like joyful amusement, it's been broken for so long...  we're pushing this 
in prod really soon.

 [replication] ReplicationSource can't recover from session expired on remote 
 clusters
 -

 Key: HBASE-3130
 URL: https://issues.apache.org/jira/browse/HBASE-3130
 Project: HBase
  Issue Type: Bug
  Components: replication
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: Chris Trezzo
 Fix For: 0.92.0

 Attachments: 3130-v2.txt, 3130-v3.txt, 3130.txt


 Currently ReplicationSource cannot recover when its zookeeper connection to 
 its remote cluster expires. HLogs are still being tracked, but a cluster 
 restart is required to continue replication (or a rolling restart).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4062) Multi-column scanner unit test

[
https://issues.apache.org/jira/browse/HBASE-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116075#comment-13116075
]

jirapos...@reviews.apache.org commented on HBASE-4062:
--

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1012/
---

(Updated 2011-09-28 02:16:10.418401)

Review request for hbase and Michael Stack.

Changes
---

Linking the HBASE-4062 JIRA to this review.

Summary
---

Adding a unit test for the multi-column scanner. We are using this this to test
an optimization we are making to multi-column scans using row-column Bloom
filters. The scanner creates multiple StoreFiles for a single column family,
each containing a randomized set of columns with different timestamps, and then
tests scanning through the whole region with all possible sets of columns
specified in the query. Point deletes (deletes of a specific timestamp) are
also tested.

This addresses bug HBASE-4062.
https://issues.apache.org/jira/browse/HBASE-4062

Diffs
-

src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java
PRE-CREATION

Diff: https://reviews.apache.org/r/1012/diff

Testing
---

Run the unit test. Break the store scanner and make sure the test breaks.

Thanks,

Mikhail

Multi-column scanner unit test
--

Key: HBASE-4062
URL: https://issues.apache.org/jira/browse/HBASE-4062
Project: HBase
Issue Type: Improvement
Components: regionserver, test
Affects Versions: 0.94.0
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
Labels: test
Attachments: test-multi-column-scanner.patch

Adding a unit test for the multi-column scanner. We are using this this to
test an optimization we are making to multi-column scans using row-column
Bloom filters. The scanner creates multiple StoreFiles for a single column
family, each containing a randomized set of columns with different
timestamps, and then tests scanning through the whole region with all
possible sets of columns specified in the query. Point deletes (deletes of a
specific timestamp) are also tested.

[jira] [Commented] (HBASE-4062) Multi-column scanner unit test

2011-09-27 Thread Mikhail Bautin (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116077#comment-13116077
 ] 

Mikhail Bautin commented on HBASE-4062:
---

@Michael: apparently the patch is already committed -- can I close this JIRA?

 Multi-column scanner unit test
 --

 Key: HBASE-4062
 URL: https://issues.apache.org/jira/browse/HBASE-4062
 Project: HBase
  Issue Type: Improvement
  Components: regionserver, test
Affects Versions: 0.94.0
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
  Labels: test
 Attachments: test-multi-column-scanner.patch


 Adding a unit test for the multi-column scanner. We are using this this to 
 test an optimization we are making to multi-column scans using row-column 
 Bloom filters. The scanner creates multiple StoreFiles for a single column 
 family, each containing a randomized set of columns with different 
 timestamps, and then tests scanning through the whole region with all 
 possible sets of columns specified in the query. Point deletes (deletes of a 
 specific timestamp) are also tested.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4496) HFile V2 does not honor setCacheBlocks when scanning.