[jira] [Commented] (HBASE-6411) Move Master Metrics to metrics 2

2012-07-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418968#comment-13418968
 ] 

Hadoop QA commented on HBASE-6411:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12537268/HBASE-6411-0.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 18 new or modified tests.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 5 javac compiler warnings (more than 
the trunk's current 4 warnings).

-1 findbugs.  The patch appears to introduce 16 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.coprocessor.TestRowProcessorEndpoint
  
org.apache.hadoop.hbase.security.access.TestZKPermissionsWatcher

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2418//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2418//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2418//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2418//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2418//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2418//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2418//console

This message is automatically generated.

> Move Master Metrics to metrics 2
> 
>
> Key: HBASE-6411
> URL: https://issues.apache.org/jira/browse/HBASE-6411
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HBASE-6411-0.patch, HBASE-6411_concept.patch
>
>
> Move Master Metrics to metrics 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3725) HBase increments from old value after delete and write to disk

2012-07-19 Thread ShiXing (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ShiXing updated HBASE-3725:
---

Attachment: HBASE-3725-0.92-V6.patch

toTed

bq. TestHRegion#testIncrementWithFlushAndDelete passed without that assignment.

Because the iscan is also read from memstore after I remove the code:
{code}
List fileResults = new ArrayList();
- iscan.checkOnlyStoreFiles();
scanner = null;
try {
scanner = getScanner(iscan);
{code}

And there is no result in memstore, so increment will treat it as 0, it has the 
same effect as delete.

I add this case in TestHRegion#testIncrementWithFlushAndDelete in V6.

> HBase increments from old value after delete and write to disk
> --
>
> Key: HBASE-3725
> URL: https://issues.apache.org/jira/browse/HBASE-3725
> Project: HBase
>  Issue Type: Bug
>  Components: io, regionserver
>Affects Versions: 0.90.1
>Reporter: Nathaniel Cook
>Assignee: Jonathan Gray
> Attachments: HBASE-3725-0.92-V1.patch, HBASE-3725-0.92-V2.patch, 
> HBASE-3725-0.92-V3.patch, HBASE-3725-0.92-V4.patch, HBASE-3725-0.92-V5.patch, 
> HBASE-3725-0.92-V6.patch, HBASE-3725-Test-v1.patch, HBASE-3725-v3.patch, 
> HBASE-3725.patch
>
>
> Deleted row values are sometimes used for starting points on new increments.
> To reproduce:
> Create a row "r". Set column "x" to some default value.
> Force hbase to write that value to the file system (such as restarting the 
> cluster).
> Delete the row.
> Call table.incrementColumnValue with "some_value"
> Get the row.
> The returned value in the column was incremented from the old value before 
> the row was deleted instead of being initialized to "some_value".
> Code to reproduce:
> {code}
> import java.io.IOException;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.hbase.HBaseConfiguration;
> import org.apache.hadoop.hbase.HColumnDescriptor;
> import org.apache.hadoop.hbase.HTableDescriptor;
> import org.apache.hadoop.hbase.client.Delete;
> import org.apache.hadoop.hbase.client.Get;
> import org.apache.hadoop.hbase.client.HBaseAdmin;
> import org.apache.hadoop.hbase.client.HTableInterface;
> import org.apache.hadoop.hbase.client.HTablePool;
> import org.apache.hadoop.hbase.client.Increment;
> import org.apache.hadoop.hbase.client.Result;
> import org.apache.hadoop.hbase.util.Bytes;
> public class HBaseTestIncrement
> {
>   static String tableName  = "testIncrement";
>   static byte[] infoCF = Bytes.toBytes("info");
>   static byte[] rowKey = Bytes.toBytes("test-rowKey");
>   static byte[] newInc = Bytes.toBytes("new");
>   static byte[] oldInc = Bytes.toBytes("old");
>   /**
>* This code reproduces a bug with increment column values in hbase
>* Usage: First run part one by passing '1' as the first arg
>*Then restart the hbase cluster so it writes everything to disk
>*Run part two by passing '2' as the first arg
>*
>* This will result in the old deleted data being found and used for 
> the increment calls
>*
>* @param args
>* @throws IOException
>*/
>   public static void main(String[] args) throws IOException
>   {
>   if("1".equals(args[0]))
>   partOne();
>   if("2".equals(args[0]))
>   partTwo();
>   if ("both".equals(args[0]))
>   {
>   partOne();
>   partTwo();
>   }
>   }
>   /**
>* Creates a table and increments a column value 10 times by 10 each 
> time.
>* Results in a value of 100 for the column
>*
>* @throws IOException
>*/
>   static void partOne()throws IOException
>   {
>   Configuration conf = HBaseConfiguration.create();
>   HBaseAdmin admin = new HBaseAdmin(conf);
>   HTableDescriptor tableDesc = new HTableDescriptor(tableName);
>   tableDesc.addFamily(new HColumnDescriptor(infoCF));
>   if(admin.tableExists(tableName))
>   {
>   admin.disableTable(tableName);
>   admin.deleteTable(tableName);
>   }
>   admin.createTable(tableDesc);
>   HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE);
>   HTableInterface table = pool.getTable(Bytes.toBytes(tableName));
>   //Increment unitialized column
>   for (int j = 0; j < 10; j++)
>   {
>   table.incrementColumnValue(rowKey, infoCF, oldInc, 
> (long)10);
>   Increment inc = new Increment(rowKey);
>   inc.addColumn(infoCF, newInc, (long)10);
>  

[jira] [Commented] (HBASE-6363) HBaseConfiguration can carry a main method that dumps XML output for debug purposes

2012-07-19 Thread Shengsheng Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418940#comment-13418940
 ] 

Shengsheng Huang commented on HBASE-6363:
-

Seems reasonable. I only got a little bit concern about package dependency, 
because some of our customers are very reluctant to upgrade their stable hadoop 
deployment. A standalone patch is good to have.   

> HBaseConfiguration can carry a main method that dumps XML output for debug 
> purposes
> ---
>
> Key: HBASE-6363
> URL: https://issues.apache.org/jira/browse/HBASE-6363
> Project: HBase
>  Issue Type: Improvement
>  Components: util
>Affects Versions: 0.94.0
>Reporter: Harsh J
>Priority: Trivial
>  Labels: conf, newbie, noob
> Attachments: HBASE-6363.2.patch, HBASE-6363.patch
>
>
> Just like the Configuration class carries a main() method in it, that simply 
> loads itself and writes XML out to System.out, HBaseConfiguration can use the 
> same kinda method.
> That way we can do "hbase org.apache.hadoop.….HBaseConfiguration" to get an 
> Xml dump of things HBaseConfiguration has properly loaded. Nifty in checking 
> app classpaths sometimes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6429) Filter with filterRow() returning true is also incompatible with scan with limit

2012-07-19 Thread Jie Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Huang updated HBASE-6429:
-

Attachment: hbase-6429-trunk.patch

1. Prepare a patch against trunk
2. Add one more unit test case (TestFilterWithScanLimits)
3. Fix 2 unit test failures in the previous version.

> Filter with filterRow() returning true is also incompatible with scan with 
> limit
> 
>
> Key: HBASE-6429
> URL: https://issues.apache.org/jira/browse/HBASE-6429
> Project: HBase
>  Issue Type: Bug
>  Components: filters
>Affects Versions: 0.96.0
>Reporter: Jason Dai
> Attachments: hbase-6429-trunk.patch, hbase-6429_0_94_0.patch
>
>
> Currently if we scan with bot limit and a Filter with 
> filterRow(List) implemented, an  IncompatibleFilterException will 
> be thrown. The same exception should also be thrown if the filer has its 
> filterRow() implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6406) TestReplicationPeer.testResetZooKeeperSession and TestZooKeeper.testClientSessionExpired fail frequently

2012-07-19 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418936#comment-13418936
 ] 

Lars Hofhansl commented on HBASE-6406:
--

TestZooKeeper.testClientSessionExpired failed again in latest 0.94 build.
Although this is not obvious from the logs the pattern in the code is that same 
as in TestReplicationPeer.

My initial suspicion was RecoverableZooKeeper and that it somehow retries the 
operation and thereby reconnects the expired session. According to the code it 
does not do that, though.

Somehow HBaseTestingUtil.expireSession is subject to racing.
In the case of TestReplicationPeer that happened when expireSession is called 
before the connection was actually established.

Is there a way to check whether the connection was established first and wait 
if it wasn't?
Otherwise, I'd say we disable this test for now.


> TestReplicationPeer.testResetZooKeeperSession and 
> TestZooKeeper.testClientSessionExpired fail frequently
> 
>
> Key: HBASE-6406
> URL: https://issues.apache.org/jira/browse/HBASE-6406
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.1
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.96.0, 0.94.1
>
> Attachments: 6406.txt, testReplication.jstack, testZooKeeper.jstack
>
>
> Looking back through the 0.94 test runs these two tests accounted for 11 of 
> 34 failed tests.
> They should be fixed or (temporarily) disabled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3725) HBase increments from old value after delete and write to disk

2012-07-19 Thread ShiXing (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418933#comment-13418933
 ] 

ShiXing commented on HBASE-3725:


@Ted, the reassignment is because there is no interface to set the iscan back 
to both memstore and filestore, because at the begining, the iscan is set 
memstore
{code}
// memstore scan
iscan.checkOnlyMemStore();
{code}

> HBase increments from old value after delete and write to disk
> --
>
> Key: HBASE-3725
> URL: https://issues.apache.org/jira/browse/HBASE-3725
> Project: HBase
>  Issue Type: Bug
>  Components: io, regionserver
>Affects Versions: 0.90.1
>Reporter: Nathaniel Cook
>Assignee: Jonathan Gray
> Attachments: HBASE-3725-0.92-V1.patch, HBASE-3725-0.92-V2.patch, 
> HBASE-3725-0.92-V3.patch, HBASE-3725-0.92-V4.patch, HBASE-3725-0.92-V5.patch, 
> HBASE-3725-Test-v1.patch, HBASE-3725-v3.patch, HBASE-3725.patch
>
>
> Deleted row values are sometimes used for starting points on new increments.
> To reproduce:
> Create a row "r". Set column "x" to some default value.
> Force hbase to write that value to the file system (such as restarting the 
> cluster).
> Delete the row.
> Call table.incrementColumnValue with "some_value"
> Get the row.
> The returned value in the column was incremented from the old value before 
> the row was deleted instead of being initialized to "some_value".
> Code to reproduce:
> {code}
> import java.io.IOException;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.hbase.HBaseConfiguration;
> import org.apache.hadoop.hbase.HColumnDescriptor;
> import org.apache.hadoop.hbase.HTableDescriptor;
> import org.apache.hadoop.hbase.client.Delete;
> import org.apache.hadoop.hbase.client.Get;
> import org.apache.hadoop.hbase.client.HBaseAdmin;
> import org.apache.hadoop.hbase.client.HTableInterface;
> import org.apache.hadoop.hbase.client.HTablePool;
> import org.apache.hadoop.hbase.client.Increment;
> import org.apache.hadoop.hbase.client.Result;
> import org.apache.hadoop.hbase.util.Bytes;
> public class HBaseTestIncrement
> {
>   static String tableName  = "testIncrement";
>   static byte[] infoCF = Bytes.toBytes("info");
>   static byte[] rowKey = Bytes.toBytes("test-rowKey");
>   static byte[] newInc = Bytes.toBytes("new");
>   static byte[] oldInc = Bytes.toBytes("old");
>   /**
>* This code reproduces a bug with increment column values in hbase
>* Usage: First run part one by passing '1' as the first arg
>*Then restart the hbase cluster so it writes everything to disk
>*Run part two by passing '2' as the first arg
>*
>* This will result in the old deleted data being found and used for 
> the increment calls
>*
>* @param args
>* @throws IOException
>*/
>   public static void main(String[] args) throws IOException
>   {
>   if("1".equals(args[0]))
>   partOne();
>   if("2".equals(args[0]))
>   partTwo();
>   if ("both".equals(args[0]))
>   {
>   partOne();
>   partTwo();
>   }
>   }
>   /**
>* Creates a table and increments a column value 10 times by 10 each 
> time.
>* Results in a value of 100 for the column
>*
>* @throws IOException
>*/
>   static void partOne()throws IOException
>   {
>   Configuration conf = HBaseConfiguration.create();
>   HBaseAdmin admin = new HBaseAdmin(conf);
>   HTableDescriptor tableDesc = new HTableDescriptor(tableName);
>   tableDesc.addFamily(new HColumnDescriptor(infoCF));
>   if(admin.tableExists(tableName))
>   {
>   admin.disableTable(tableName);
>   admin.deleteTable(tableName);
>   }
>   admin.createTable(tableDesc);
>   HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE);
>   HTableInterface table = pool.getTable(Bytes.toBytes(tableName));
>   //Increment unitialized column
>   for (int j = 0; j < 10; j++)
>   {
>   table.incrementColumnValue(rowKey, infoCF, oldInc, 
> (long)10);
>   Increment inc = new Increment(rowKey);
>   inc.addColumn(infoCF, newInc, (long)10);
>   table.increment(inc);
>   }
>   Get get = new Get(rowKey);
>   Result r = table.get(get);
>   System.out.println("initial values: new " + 
> Bytes.toLong(r.getValue(infoCF, newInc)) + " old " + 
> B

[jira] [Updated] (HBASE-6432) HRegionServer doesn't properly set clusterId in conf

2012-07-19 Thread Francis Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francis Liu updated HBASE-6432:
---

Description: 
ClusterId is normally set into the passed conf during instantiation of an 
HTable class. In the case of a HRegionServer this is bypassed and set to 
"default" since getMaster() since it uses HBaseRPC to create the proxy directly 
and bypasses the class which retrieves and sets the correct clusterId. 

This becomes a problem with clients (ie within a coprocessor) using delegation 
tokens for authentication. Since the token's service will be the correct 
clusterId and while the TokenSelector is looking for one with service "default".

  was:
ClusterId is normally set into the passed conf during instantiation of an 
HTable class. In the case of a HRegionServer this is bypassed and set to 
"default" since getMaster() bypasses the class which sets clusterID clusterId 
since it uses HBaseRPC to create the proxy to create the proxy directly. 

This becomes a problem with clients (ie within a coprocessor) using delegation 
tokens for authentication. Since the token's service will be the correct 
clusterId and while the TokenSelector is looking for one with service "default".


> HRegionServer doesn't properly set clusterId in conf
> 
>
> Key: HBASE-6432
> URL: https://issues.apache.org/jira/browse/HBASE-6432
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.0
>Reporter: Francis Liu
>Assignee: Francis Liu
> Fix For: 0.96.0
>
> Attachments: HBASE-6432_94.patch
>
>
> ClusterId is normally set into the passed conf during instantiation of an 
> HTable class. In the case of a HRegionServer this is bypassed and set to 
> "default" since getMaster() since it uses HBaseRPC to create the proxy 
> directly and bypasses the class which retrieves and sets the correct 
> clusterId. 
> This becomes a problem with clients (ie within a coprocessor) using 
> delegation tokens for authentication. Since the token's service will be the 
> correct clusterId and while the TokenSelector is looking for one with service 
> "default".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6431) Some FilterList Constructors break addFilter

2012-07-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418931#comment-13418931
 ] 

Hadoop QA commented on HBASE-6431:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12537269/0001-HBASE-6431.-Some-FilterList-Constructors-break-addFi.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 5 javac compiler warnings (more than 
the trunk's current 4 warnings).

-1 findbugs.  The patch appears to introduce 12 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2417//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2417//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2417//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2417//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2417//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2417//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2417//console

This message is automatically generated.

> Some FilterList Constructors break addFilter
> 
>
> Key: HBASE-6431
> URL: https://issues.apache.org/jira/browse/HBASE-6431
> Project: HBase
>  Issue Type: Bug
>  Components: filters
>Affects Versions: 0.92.1, 0.94.0
>Reporter: Alex Newman
>Assignee: Alex Newman
>Priority: Minor
> Attachments: 
> 0001-HBASE-6431.-Some-FilterList-Constructors-break-addFi.patch
>
>
> Some of the constructors for FilterList set the internal list of filters to 
> list types which don't support the add operation. As a result 
> FilterList(final List rowFilters)
> FilterList(final Filter... rowFilters)
> FilterList(final Operator operator, final List rowFilters)
> FilterList(final Operator operator, final Filter... rowFilters)
> may init private List filters = new ArrayList(); incorrectly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5498) Secure Bulk Load

2012-07-19 Thread Francis Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francis Liu updated HBASE-5498:
---

Attachment: HBASE-5498_draft_94.patch

Laxman, here's a working patch. It incorporates HBASE-6432 which took some time 
debugging. I still have to address the other comments, some cleanup and TODOs. 
Let me know if this works for you.

> Secure Bulk Load
> 
>
> Key: HBASE-5498
> URL: https://issues.apache.org/jira/browse/HBASE-5498
> Project: HBase
>  Issue Type: Improvement
>  Components: mapred, security
>Reporter: Francis Liu
>Assignee: Francis Liu
> Fix For: 0.96.0
>
> Attachments: HBASE-5498_draft.patch, HBASE-5498_draft_94.patch
>
>
> Design doc: 
> https://cwiki.apache.org/confluence/display/HCATALOG/HBase+Secure+Bulk+Load
> Short summary:
> Security as it stands does not cover the bulkLoadHFiles() feature. Users 
> calling this method will bypass ACLs. Also loading is made more cumbersome in 
> a secure setting because of hdfs privileges. bulkLoadHFiles() moves the data 
> from user's directory to the hbase directory, which would require certain 
> write access privileges set.
> Our solution is to create a coprocessor which makes use of AuthManager to 
> verify if a user has write access to the table. If so, launches a MR job as 
> the hbase user to do the importing (ie rewrite from text to hfiles). One 
> tricky part this job will have to do is impersonate the calling user when 
> reading the input files. We can do this by expecting the user to pass an hdfs 
> delegation token as part of the secureBulkLoad() coprocessor call and extend 
> an inputformat to make use of that token. The output is written to a 
> temporary directory accessible only by hbase and then bulkloadHFiles() is 
> called.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6406) TestReplicationPeer.testResetZooKeeperSession and TestZooKeeper.testClientSessionExpired fail frequently

2012-07-19 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6406:
-

Fix Version/s: (was: 0.94.2)
   0.94.1
   0.96.0

> TestReplicationPeer.testResetZooKeeperSession and 
> TestZooKeeper.testClientSessionExpired fail frequently
> 
>
> Key: HBASE-6406
> URL: https://issues.apache.org/jira/browse/HBASE-6406
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.1
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.96.0, 0.94.1
>
> Attachments: 6406.txt, testReplication.jstack, testZooKeeper.jstack
>
>
> Looking back through the 0.94 test runs these two tests accounted for 11 of 
> 34 failed tests.
> They should be fixed or (temporarily) disabled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6428) Pluggable Compaction policies

2012-07-19 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418924#comment-13418924
 ] 

Lars Hofhansl commented on HBASE-6428:
--

Another way of looking at this is a possible policy that considers all HFile in 
terms of a baseline + changes on top of that baseline.

(For the record: I am not saying that I will do this any time soon, just 
recording this as an idea).


> Pluggable Compaction policies
> -
>
> Key: HBASE-6428
> URL: https://issues.apache.org/jira/browse/HBASE-6428
> Project: HBase
>  Issue Type: New Feature
>Reporter: Lars Hofhansl
>
> For some usecases is useful to allow more control over how KVs get compacted.
> For example one could envision storing old versions of a KV separate HFiles, 
> which then rarely have to be touched/cached by queries querying for new data.
> In addition these date ranged HFile can be easily used for backups while 
> maintaining historical data.
> This would be a major change, allowing compactions to provide multiple 
> targets (not just a filter).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6411) Move Master Metrics to metrics 2

2012-07-19 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418923#comment-13418923
 ] 

Elliott Clark commented on HBASE-6411:
--

Sorry didn't mean to re-assign.  I must have done that when submitting to 
hadoop qa.  Sorry I didn't mean to step on any toes.

I agree that a metrics factory or something like it could be very useful.  
However like I said above I was hoping to take a crack using guice to do most 
of the factory stuff.  However maybe until I get that up it would be useful.

On #2 I don't think removing them interface completely is really the way to go 
since both the replication metrics and the region server metrics are mostly 
dynamic metrics; ie they aren't pre-created like the master metrics. I think it 
still makes sense to have a source that's mostly focused on those map based 
metrics.

> Move Master Metrics to metrics 2
> 
>
> Key: HBASE-6411
> URL: https://issues.apache.org/jira/browse/HBASE-6411
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HBASE-6411-0.patch, HBASE-6411_concept.patch
>
>
> Move Master Metrics to metrics 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6427) Pluggable policy for smallestReadPoint in HRegion

2012-07-19 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418922#comment-13418922
 ] 

Lars Hofhansl commented on HBASE-6427:
--

Let me clarify what I mean by this:
If I wanted to implement an MVCC based optimistic transaction engine on top of 
HBase I would naturally want to use HBase's built in versioning (where 
possible).
In that case it is not clear a priori how many versions to keep or for how long 
(i.e. specifying VERSION/TTL is too static). The outside engine would need to 
determine that.
The simplest of all approaches would be to do that via the smallestReadpoint in 
each region, by making its determination pluggable.


> Pluggable policy for smallestReadPoint in HRegion
> -
>
> Key: HBASE-6427
> URL: https://issues.apache.org/jira/browse/HBASE-6427
> Project: HBase
>  Issue Type: New Feature
>Reporter: Lars Hofhansl
>Priority: Minor
>
> When implementing higher level stores on top of HBase it is necessary to 
> allow dynamic control over how long KVs must be kept around.
> Semi-static config options for ColumnFamilies (# of version or TTL) is not 
> sufficient.
> The simplest way to achieve this is to have a pluggable class to determine 
> the smallestReadpoint for Region. That way outside code can control what KVs 
> to retain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6432) HRegionServer doesn't properly set clusterId in conf

2012-07-19 Thread Francis Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francis Liu updated HBASE-6432:
---

Attachment: HBASE-6432_94.patch

a patch for 0.94 to get feedback on the approach. Things changed significant 
enough in trunk to need a separate patch. I'm hoping to get this backported to 
0.94 since it is needed for security.

> HRegionServer doesn't properly set clusterId in conf
> 
>
> Key: HBASE-6432
> URL: https://issues.apache.org/jira/browse/HBASE-6432
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.0
>Reporter: Francis Liu
>Assignee: Francis Liu
> Fix For: 0.96.0
>
> Attachments: HBASE-6432_94.patch
>
>
> ClusterId is normally set into the passed conf during instantiation of an 
> HTable class. In the case of a HRegionServer this is bypassed and set to 
> "default" since getMaster() bypasses the class which sets clusterID clusterId 
> since it uses HBaseRPC to create the proxy to create the proxy directly. 
> This becomes a problem with clients (ie within a coprocessor) using 
> delegation tokens for authentication. Since the token's service will be the 
> correct clusterId and while the TokenSelector is looking for one with service 
> "default".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6432) HRegionServer doesn't properly set clusterId in conf

2012-07-19 Thread Francis Liu (JIRA)
Francis Liu created HBASE-6432:
--

 Summary: HRegionServer doesn't properly set clusterId in conf
 Key: HBASE-6432
 URL: https://issues.apache.org/jira/browse/HBASE-6432
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Francis Liu
Assignee: Francis Liu
 Fix For: 0.96.0


ClusterId is normally set into the passed conf during instantiation of an 
HTable class. In the case of a HRegionServer this is bypassed and set to 
"default" since getMaster() bypasses the class which sets clusterID clusterId 
since it uses HBaseRPC to create the proxy to create the proxy directly. 

This becomes a problem with clients (ie within a coprocessor) using delegation 
tokens for authentication. Since the token's service will be the correct 
clusterId and while the TokenSelector is looking for one with service "default".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5843) Improve HBase MTTR - Mean Time To Recover

2012-07-19 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418919#comment-13418919
 ] 

nkeywal commented on HBASE-5843:


bq. I'm confused as to what the 180s gap refers to. I see 980 (test 2) - 800 
(test1) = 180, but that is against 0.92, which doesn't have HBASE-5970, right? 
Could you clarify?
Yes, it's because with a clean stop, the RS unregisters itself in ZK, so the 
recovery starts immediately. With a kill -9, the RS remains registered in ZK. 
So if you don't have HBASE-5844 or HBASE-5926, you wait for the ZK timeout.

bq. Awesome.. We think this is also due to HBASE-5970 and HBASE-6109? 
Yes.
 
bq. Has a JIRA been filed?
Not yet. I'm writing specific unit tests for this, I found issues that I have 
not yet fully analyzed, and I need to create the jiras. Also, may be my test 
was not good for this part: as I was doing the test without a datanode, it 
could be that the recovery was not working for this reason (I wonder if the 
sync works with the local file system for example).


bq. Test to be changed to get a real difference when we need to replay the wal.
bq. Could you clarify what you mean here?
It's does not last long enough, so I won't be able to see much difference even 
if there is one. So I need to redo the work with a real datanode, check that it 
recovers, then check that I measure something meaningful.
I will also redo the first tests with a DN to see if there is still a gap.




> Improve HBase MTTR - Mean Time To Recover
> -
>
> Key: HBASE-5843
> URL: https://issues.apache.org/jira/browse/HBASE-5843
> Project: HBase
>  Issue Type: Umbrella
>Affects Versions: 0.96.0
>Reporter: nkeywal
>Assignee: nkeywal
>
> A part of the approach is described here: 
> https://docs.google.com/document/d/1z03xRoZrIJmg7jsWuyKYl6zNournF_7ZHzdi0qz_B4c/edit
> The ideal target is:
> - failure impact client applications only by an added delay to execute a 
> query, whatever the failure.
> - this delay is always inferior to 1 second.
> We're not going to achieve that immediately...
> Priority will be given to the most frequent issues.
> Short term:
> - software crash
> - standard administrative tasks as stop/start of a cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6363) HBaseConfiguration can carry a main method that dumps XML output for debug purposes

2012-07-19 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418917#comment-13418917
 ] 

Harsh J commented on HBASE-6363:


Sorry, I didn't notice 1.x didn't have it! (I checked only against my 2.x 
installation, and CDH3 here seems to have had it backported at some point too). 
Instead of working around, I think we can rather backport it to a v1 future 
release, via: HADOOP-8567.

> HBaseConfiguration can carry a main method that dumps XML output for debug 
> purposes
> ---
>
> Key: HBASE-6363
> URL: https://issues.apache.org/jira/browse/HBASE-6363
> Project: HBase
>  Issue Type: Improvement
>  Components: util
>Affects Versions: 0.94.0
>Reporter: Harsh J
>Priority: Trivial
>  Labels: conf, newbie, noob
> Attachments: HBASE-6363.2.patch, HBASE-6363.patch
>
>
> Just like the Configuration class carries a main() method in it, that simply 
> loads itself and writes XML out to System.out, HBaseConfiguration can use the 
> same kinda method.
> That way we can do "hbase org.apache.hadoop.….HBaseConfiguration" to get an 
> Xml dump of things HBaseConfiguration has properly loaded. Nifty in checking 
> app classpaths sometimes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3725) HBase increments from old value after delete and write to disk

2012-07-19 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418904#comment-13418904
 ] 

Zhihong Ted Yu commented on HBASE-3725:
---

Looking at existing code:
{code}
  private List getLastIncrement(final Get get) throws IOException {
InternalScan iscan = new InternalScan(get);
{code}
iscan was assigned at the beginning. Looks like the assignment in else block is 
redundant.

TestHRegion#testIncrementWithFlushAndDelete passed without that assignment.

> HBase increments from old value after delete and write to disk
> --
>
> Key: HBASE-3725
> URL: https://issues.apache.org/jira/browse/HBASE-3725
> Project: HBase
>  Issue Type: Bug
>  Components: io, regionserver
>Affects Versions: 0.90.1
>Reporter: Nathaniel Cook
>Assignee: Jonathan Gray
> Attachments: HBASE-3725-0.92-V1.patch, HBASE-3725-0.92-V2.patch, 
> HBASE-3725-0.92-V3.patch, HBASE-3725-0.92-V4.patch, HBASE-3725-0.92-V5.patch, 
> HBASE-3725-Test-v1.patch, HBASE-3725-v3.patch, HBASE-3725.patch
>
>
> Deleted row values are sometimes used for starting points on new increments.
> To reproduce:
> Create a row "r". Set column "x" to some default value.
> Force hbase to write that value to the file system (such as restarting the 
> cluster).
> Delete the row.
> Call table.incrementColumnValue with "some_value"
> Get the row.
> The returned value in the column was incremented from the old value before 
> the row was deleted instead of being initialized to "some_value".
> Code to reproduce:
> {code}
> import java.io.IOException;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.hbase.HBaseConfiguration;
> import org.apache.hadoop.hbase.HColumnDescriptor;
> import org.apache.hadoop.hbase.HTableDescriptor;
> import org.apache.hadoop.hbase.client.Delete;
> import org.apache.hadoop.hbase.client.Get;
> import org.apache.hadoop.hbase.client.HBaseAdmin;
> import org.apache.hadoop.hbase.client.HTableInterface;
> import org.apache.hadoop.hbase.client.HTablePool;
> import org.apache.hadoop.hbase.client.Increment;
> import org.apache.hadoop.hbase.client.Result;
> import org.apache.hadoop.hbase.util.Bytes;
> public class HBaseTestIncrement
> {
>   static String tableName  = "testIncrement";
>   static byte[] infoCF = Bytes.toBytes("info");
>   static byte[] rowKey = Bytes.toBytes("test-rowKey");
>   static byte[] newInc = Bytes.toBytes("new");
>   static byte[] oldInc = Bytes.toBytes("old");
>   /**
>* This code reproduces a bug with increment column values in hbase
>* Usage: First run part one by passing '1' as the first arg
>*Then restart the hbase cluster so it writes everything to disk
>*Run part two by passing '2' as the first arg
>*
>* This will result in the old deleted data being found and used for 
> the increment calls
>*
>* @param args
>* @throws IOException
>*/
>   public static void main(String[] args) throws IOException
>   {
>   if("1".equals(args[0]))
>   partOne();
>   if("2".equals(args[0]))
>   partTwo();
>   if ("both".equals(args[0]))
>   {
>   partOne();
>   partTwo();
>   }
>   }
>   /**
>* Creates a table and increments a column value 10 times by 10 each 
> time.
>* Results in a value of 100 for the column
>*
>* @throws IOException
>*/
>   static void partOne()throws IOException
>   {
>   Configuration conf = HBaseConfiguration.create();
>   HBaseAdmin admin = new HBaseAdmin(conf);
>   HTableDescriptor tableDesc = new HTableDescriptor(tableName);
>   tableDesc.addFamily(new HColumnDescriptor(infoCF));
>   if(admin.tableExists(tableName))
>   {
>   admin.disableTable(tableName);
>   admin.deleteTable(tableName);
>   }
>   admin.createTable(tableDesc);
>   HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE);
>   HTableInterface table = pool.getTable(Bytes.toBytes(tableName));
>   //Increment unitialized column
>   for (int j = 0; j < 10; j++)
>   {
>   table.incrementColumnValue(rowKey, infoCF, oldInc, 
> (long)10);
>   Increment inc = new Increment(rowKey);
>   inc.addColumn(infoCF, newInc, (long)10);
>   table.increment(inc);
>   }
>   Get get = new Get(rowKey);
>   Result r = table.get(get);
> 

[jira] [Resolved] (HBASE-6345) Utilize fault injection in testing using AspectJ

2012-07-19 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu resolved HBASE-6345.
---

Resolution: Won't Fix

There was not enough incentive to pursue fault injection using AspectJ.

> Utilize fault injection in testing using AspectJ
> 
>
> Key: HBASE-6345
> URL: https://issues.apache.org/jira/browse/HBASE-6345
> Project: HBase
>  Issue Type: Bug
>Reporter: Zhihong Ted Yu
>
> HDFS uses fault injection to test pipeline failure in addition to mock, spy. 
> HBase uses mock, spy. But there are cases where mock, spy aren't convenient.
> Some example from DFSClientAspects.aj :
> {code}
>   pointcut pipelineInitNonAppend(DataStreamer datastreamer):
> callCreateBlockOutputStream(datastreamer)
> && cflow(execution(* nextBlockOutputStream(..)))
> && within(DataStreamer);
>   after(DataStreamer datastreamer) returning : 
> pipelineInitNonAppend(datastreamer) {
> LOG.info("FI: after pipelineInitNonAppend: hasError="
> + datastreamer.hasError + " errorIndex=" + datastreamer.errorIndex);
> if (datastreamer.hasError) {
>   DataTransferTest dtTest = DataTransferTestUtil.getDataTransferTest();
>   if (dtTest != null)
> dtTest.fiPipelineInitErrorNonAppend.run(datastreamer.errorIndex);
> }
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6363) HBaseConfiguration can carry a main method that dumps XML output for debug purposes

2012-07-19 Thread Shengsheng Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418900#comment-13418900
 ] 

Shengsheng Huang commented on HBASE-6363:
-

Thanks very much for clarification Harsh. It seems /conf is only added into 
Hadoop since release 0.21 (HADOOP-6408). As we're using hadoop v1 it didn't 
work at our local cluster. We would consider adding HADOOP-6408 patch into our 
local hadoop branch. After all, servlet config dump would contain all the 
configuration changes in code. Anyway, do you think it worth a seperate servlet 
to dump configuration as xml only? Or reorganize the dump output into more 
consistent format to make it easier for automatic parsing?  

> HBaseConfiguration can carry a main method that dumps XML output for debug 
> purposes
> ---
>
> Key: HBASE-6363
> URL: https://issues.apache.org/jira/browse/HBASE-6363
> Project: HBase
>  Issue Type: Improvement
>  Components: util
>Affects Versions: 0.94.0
>Reporter: Harsh J
>Priority: Trivial
>  Labels: conf, newbie, noob
> Attachments: HBASE-6363.2.patch, HBASE-6363.patch
>
>
> Just like the Configuration class carries a main() method in it, that simply 
> loads itself and writes XML out to System.out, HBaseConfiguration can use the 
> same kinda method.
> That way we can do "hbase org.apache.hadoop.….HBaseConfiguration" to get an 
> Xml dump of things HBaseConfiguration has properly loaded. Nifty in checking 
> app classpaths sometimes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-19 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6389:
--

Status: Open  (was: Patch Available)

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, 
> testReplication.jstack
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || timeout > slept || count < 
> minToStart)
>   ){
> ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6411) Move Master Metrics to metrics 2

2012-07-19 Thread Alex Baranau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418898#comment-13418898
 ] 

Alex Baranau commented on HBASE-6411:
-

Looks like you reassigned the task, so I should probably not touch the patch to 
avoid intersection, right?

Was going to add actual metrics tests (which test metrics values changes in 
addition to testing factories/classes loading) and perhaps apply the 2nd point 
above, if it makes sense to you.

> Move Master Metrics to metrics 2
> 
>
> Key: HBASE-6411
> URL: https://issues.apache.org/jira/browse/HBASE-6411
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HBASE-6411-0.patch, HBASE-6411_concept.patch
>
>
> Move Master Metrics to metrics 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418895#comment-13418895
 ] 

Hadoop QA commented on HBASE-6389:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12537286/testReplication.jstack
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 12 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2416//console

This message is automatically generated.

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, 
> testReplication.jstack
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.maste

[jira] [Comment Edited] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-19 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418866#comment-13418866
 ] 

Zhihong Ted Yu edited comment on HBASE-6389 at 7/20/12 2:53 AM:


I ran test suite with latest patch on trunk and got:
{code}
Failed tests:   
testRunThriftServer[12](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine):
 expected:<1> but was:<0>
  
testRunThriftServer[14](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine):
 expected:<1> but was:<0>
  
testRunThriftServer[15](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine):
 expected:<1> but was:<0>
  
testRunThriftServer[16](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine):
 expected:<1> but was:<0>
  
testRunThriftServer[17](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine):
 expected:<1> but was:<0>

Tests in error:
  testRegionCaching(org.apache.hadoop.hbase.client.TestHCM): 
org.apache.hadoop.hbase.UnknownRegionException: bd992463917ba68fe5389c5bf9e94a3a
  
testCloseRegionThatFetchesTheHRIFromMeta(org.apache.hadoop.hbase.client.TestAdmin):
 -1
  testTableExists(org.apache.hadoop.hbase.catalog.TestMetaReaderEditor): 
org.apache.hadoop.hbase.TableNotEnabledException: testTableExists
  
testRunThriftServer[11](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine):
 test timed out after 6 milliseconds
  
testRunThriftServer[13](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine):
 test timed out after 6 milliseconds
{code}
There was one hanging test:
{code}
at 
org.apache.hadoop.hbase.replication.TestReplication.setUp(TestReplication.java:183)
{code}

BTW what do *R*~i~, C and *F*~i~ represent in the formula above ?

  was (Author: zhi...@ebaysf.com):
I ran test suite with latest patch on trunk and got:
{code}
Running org.apache.hadoop.hbase.client.TestHCM
Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 37.265 sec <<< 
FAILURE!
--
Running org.apache.hadoop.hbase.client.TestAdmin
Tests run: 40, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 322.872 sec 
<<< FAILURE!
--
Running org.apache.hadoop.hbase.catalog.TestMetaReaderEditor
Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 134.193 sec <<< 
FAILURE!
--
Running org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine
Tests run: 20, Failures: 5, Errors: 2, Skipped: 0, Time elapsed: 669.588 sec 
<<< FAILURE!
{code}
There was one hanging test:
{code}
at 
org.apache.hadoop.hbase.replication.TestReplication.setUp(TestReplication.java:183)
{code}

BTW what do *R*~i~, C and *F*~i~ represent in the formula above ?
  
> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, 
> testReplication.jstack
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> .

[jira] [Commented] (HBASE-3725) HBase increments from old value after delete and write to disk

2012-07-19 Thread ShiXing (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418892#comment-13418892
 ] 

ShiXing commented on HBASE-3725:


@Ted
bq.  I generate a region with 3 store files. The increment slows from 1810 tps 
to 1020 tps, it slows 43.6%, .
The tps is increment the same rowkey.


The performance depends on how frequently the memstore flushed to the store 
file. If I also do the same test case, the latest patch's performance is same 
as the orig, because the increment rowkey is always in the memstore, and we do 
not need to read the store file. 

The difference is only for the rowKey that can't get the value from memstore, 
it need do a more read from memstore , compared to the 0.92 trunk: read only 
from store file.

You must know, the orig's high performance is just benefit by only read from 
the memstore.

> HBase increments from old value after delete and write to disk
> --
>
> Key: HBASE-3725
> URL: https://issues.apache.org/jira/browse/HBASE-3725
> Project: HBase
>  Issue Type: Bug
>  Components: io, regionserver
>Affects Versions: 0.90.1
>Reporter: Nathaniel Cook
>Assignee: Jonathan Gray
> Attachments: HBASE-3725-0.92-V1.patch, HBASE-3725-0.92-V2.patch, 
> HBASE-3725-0.92-V3.patch, HBASE-3725-0.92-V4.patch, HBASE-3725-0.92-V5.patch, 
> HBASE-3725-Test-v1.patch, HBASE-3725-v3.patch, HBASE-3725.patch
>
>
> Deleted row values are sometimes used for starting points on new increments.
> To reproduce:
> Create a row "r". Set column "x" to some default value.
> Force hbase to write that value to the file system (such as restarting the 
> cluster).
> Delete the row.
> Call table.incrementColumnValue with "some_value"
> Get the row.
> The returned value in the column was incremented from the old value before 
> the row was deleted instead of being initialized to "some_value".
> Code to reproduce:
> {code}
> import java.io.IOException;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.hbase.HBaseConfiguration;
> import org.apache.hadoop.hbase.HColumnDescriptor;
> import org.apache.hadoop.hbase.HTableDescriptor;
> import org.apache.hadoop.hbase.client.Delete;
> import org.apache.hadoop.hbase.client.Get;
> import org.apache.hadoop.hbase.client.HBaseAdmin;
> import org.apache.hadoop.hbase.client.HTableInterface;
> import org.apache.hadoop.hbase.client.HTablePool;
> import org.apache.hadoop.hbase.client.Increment;
> import org.apache.hadoop.hbase.client.Result;
> import org.apache.hadoop.hbase.util.Bytes;
> public class HBaseTestIncrement
> {
>   static String tableName  = "testIncrement";
>   static byte[] infoCF = Bytes.toBytes("info");
>   static byte[] rowKey = Bytes.toBytes("test-rowKey");
>   static byte[] newInc = Bytes.toBytes("new");
>   static byte[] oldInc = Bytes.toBytes("old");
>   /**
>* This code reproduces a bug with increment column values in hbase
>* Usage: First run part one by passing '1' as the first arg
>*Then restart the hbase cluster so it writes everything to disk
>*Run part two by passing '2' as the first arg
>*
>* This will result in the old deleted data being found and used for 
> the increment calls
>*
>* @param args
>* @throws IOException
>*/
>   public static void main(String[] args) throws IOException
>   {
>   if("1".equals(args[0]))
>   partOne();
>   if("2".equals(args[0]))
>   partTwo();
>   if ("both".equals(args[0]))
>   {
>   partOne();
>   partTwo();
>   }
>   }
>   /**
>* Creates a table and increments a column value 10 times by 10 each 
> time.
>* Results in a value of 100 for the column
>*
>* @throws IOException
>*/
>   static void partOne()throws IOException
>   {
>   Configuration conf = HBaseConfiguration.create();
>   HBaseAdmin admin = new HBaseAdmin(conf);
>   HTableDescriptor tableDesc = new HTableDescriptor(tableName);
>   tableDesc.addFamily(new HColumnDescriptor(infoCF));
>   if(admin.tableExists(tableName))
>   {
>   admin.disableTable(tableName);
>   admin.deleteTable(tableName);
>   }
>   admin.createTable(tableDesc);
>   HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE);
>   HTableInterface table = pool.getTable(Bytes.toBytes(tableName));
>   //Increment unitialized column
>   for (int j = 0; j < 10; j++)
>   {
> 

[jira] [Commented] (HBASE-6411) Move Master Metrics to metrics 2

2012-07-19 Thread Alex Baranau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418891#comment-13418891
 ] 

Alex Baranau commented on HBASE-6411:
-

Glanced over your patch. I like this way better (over initial patch at 4050): 
exposing the real interface of MetricsSource (in this case master metrics). 
I.e. with methods defines, not empty + hashmap.

1. What do you think about having MasterMetricsFactory  available through 
compat module (created by CompatibilitySingletonFactory?) which is creating 
MetricsSource, like this:

interface MasterMetricsFactory {
  MasterMetricsSource create(final String name, final String sessionId);
}

This way we could pass parameters and control creation of metrics source.

2. Independent on the above: how about removing BaseMetricsSource interface 
from compat as we don't really need it with explicit definition of metrics in 
sources? This way current BaseMetricsSourceImpl could be renamed to 
MetricsRegistry and used via composition (as a field) in metrics sources 
instead of realization. Thus, creating & initializing of the sources which 
might be different for each could stay in metrics source implementation itself. 
Including deciding on using JvmMetricsSource (I assume not every source should 
create it), etc. 
This way they would look as normal metricsSources from hadoop codebase, just 
that they will use hbase's MetricsRegistry which allows metrics removals.

Thoughts?
  
  



> Move Master Metrics to metrics 2
> 
>
> Key: HBASE-6411
> URL: https://issues.apache.org/jira/browse/HBASE-6411
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HBASE-6411-0.patch, HBASE-6411_concept.patch
>
>
> Move Master Metrics to metrics 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6417) hbck merges .META. regions if there's an old leftover

2012-07-19 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418887#comment-13418887
 ] 

Jonathan Hsieh commented on HBASE-6417:
---

Feels like we could add an option to not do repairs on META unless forced to.

> hbck merges .META. regions if there's an old leftover
> -
>
> Key: HBASE-6417
> URL: https://issues.apache.org/jira/browse/HBASE-6417
> Project: HBase
>  Issue Type: Bug
>Reporter: Jean-Daniel Cryans
> Fix For: 0.96.0, 0.94.2
>
> Attachments: hbck.log
>
>
> Trying to see what caused HBASE-6310, one of the things I figured is that the 
> bad .META. row is actually one from the time that we were permitting meta 
> splitting and that folder had just been staying there for a while.
> So I tried to recreate the issue with -repair and it merged my good .META. 
> region with the one that's 3 years old that also has the same start key. I 
> ended up with a brand new .META. region!
> I'll be attaching the full log in a separate file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6363) HBaseConfiguration can carry a main method that dumps XML output for debug purposes

2012-07-19 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418882#comment-13418882
 ] 

Harsh J commented on HBASE-6363:


Thanks again Shengsheng.

The /dump servlet is more verbose than the simple XML given by /conf servlet. 
If its just config you need, /conf is where you need to go to, not /dump. But 
for the sake of debuggability, suggesting /dump in the javadoc does seem fine 
to do for HBase.

I think the patch looks good. If needed, we can switch /dump with /conf (since 
we're discussing just configs, not env. info as well), but otherwise I think it 
does what the goal of this report was. Thanks again!

> HBaseConfiguration can carry a main method that dumps XML output for debug 
> purposes
> ---
>
> Key: HBASE-6363
> URL: https://issues.apache.org/jira/browse/HBASE-6363
> Project: HBase
>  Issue Type: Improvement
>  Components: util
>Affects Versions: 0.94.0
>Reporter: Harsh J
>Priority: Trivial
>  Labels: conf, newbie, noob
> Attachments: HBASE-6363.2.patch, HBASE-6363.patch
>
>
> Just like the Configuration class carries a main() method in it, that simply 
> loads itself and writes XML out to System.out, HBaseConfiguration can use the 
> same kinda method.
> That way we can do "hbase org.apache.hadoop.….HBaseConfiguration" to get an 
> Xml dump of things HBaseConfiguration has properly loaded. Nifty in checking 
> app classpaths sometimes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6325) [replication] Race in ReplicationSourceManager.init can initiate a failover even if the node is alive

2012-07-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418880#comment-13418880
 ] 

Hudson commented on HBASE-6325:
---

Integrated in HBase-0.92 #480 (See 
[https://builds.apache.org/job/HBase-0.92/480/])
HBASE-6319  ReplicationSource can call terminate on itself and deadlock
HBASE-6325  [replication] Race in ReplicationSourceManager.init can initiate a 
failover even if the node is alive (Revision 1363571)

 Result = FAILURE
jdcryans : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java


> [replication] Race in ReplicationSourceManager.init can initiate a failover 
> even if the node is alive
> -
>
> Key: HBASE-6325
> URL: https://issues.apache.org/jira/browse/HBASE-6325
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.6, 0.92.1, 0.94.0
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-6325-0.92-v2.patch, HBASE-6325-0.92.patch
>
>
> Yet another bug found during the leap second madness, it's possible to miss 
> the registration of new region servers so that in 
> ReplicationSourceManager.init we start the failover of a live and replicating 
> region server. I don't think there's data loss but the RS that's being failed 
> over will die on:
> {noformat}
> 2012-07-01 06:25:15,604 FATAL 
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
> sv4r23s48,10304,1341112194623: Writing replication status
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> NoNode for 
> /hbase/replication/rs/sv4r23s48,10304,1341112194623/4/sv4r23s48%2C10304%2C1341112194623.1341112195369
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
> at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
> at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:655)
> at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:697)
> at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:470)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:154)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:607)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:368)
> {noformat}
> It seems to me that just refreshing {{otherRegionServers}} after getting the 
> list of {{currentReplicators}} would be enough to fix this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6319) ReplicationSource can call terminate on itself and deadlock

2012-07-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418879#comment-13418879
 ] 

Hudson commented on HBASE-6319:
---

Integrated in HBase-0.92 #480 (See 
[https://builds.apache.org/job/HBase-0.92/480/])
HBASE-6319  ReplicationSource can call terminate on itself and deadlock
HBASE-6325  [replication] Race in ReplicationSourceManager.init can initiate a 
failover even if the node is alive (Revision 1363571)

 Result = FAILURE
jdcryans : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java


> ReplicationSource can call terminate on itself and deadlock
> ---
>
> Key: HBASE-6319
> URL: https://issues.apache.org/jira/browse/HBASE-6319
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.6, 0.92.1, 0.94.0
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.92.2, 0.94.1
>
> Attachments: HBASE-6319-0.92.patch
>
>
> In a few places in the ReplicationSource code calls terminate on itself which 
> is a problem since in terminate() we wait on that thread to die.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6363) HBaseConfiguration can carry a main method that dumps XML output for debug purposes

2012-07-19 Thread Shengsheng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shengsheng Huang updated HBASE-6363:


Attachment: HBASE-6363.2.patch

Updated the patch according to @Harsh's comments. Actually we did the patch for 
automation purposes. Http master/dump contains much more information than we 
needed. 

> HBaseConfiguration can carry a main method that dumps XML output for debug 
> purposes
> ---
>
> Key: HBASE-6363
> URL: https://issues.apache.org/jira/browse/HBASE-6363
> Project: HBase
>  Issue Type: Improvement
>  Components: util
>Affects Versions: 0.94.0
>Reporter: Harsh J
>Priority: Trivial
>  Labels: conf, newbie, noob
> Attachments: HBASE-6363.2.patch, HBASE-6363.patch
>
>
> Just like the Configuration class carries a main() method in it, that simply 
> loads itself and writes XML out to System.out, HBaseConfiguration can use the 
> same kinda method.
> That way we can do "hbase org.apache.hadoop.….HBaseConfiguration" to get an 
> Xml dump of things HBaseConfiguration has properly loaded. Nifty in checking 
> app classpaths sometimes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Comment Edited] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-19 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418866#comment-13418866
 ] 

Zhihong Ted Yu edited comment on HBASE-6389 at 7/20/12 1:41 AM:


I ran test suite with latest patch on trunk and got:
{code}
Running org.apache.hadoop.hbase.client.TestHCM
Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 37.265 sec <<< 
FAILURE!
--
Running org.apache.hadoop.hbase.client.TestAdmin
Tests run: 40, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 322.872 sec 
<<< FAILURE!
--
Running org.apache.hadoop.hbase.catalog.TestMetaReaderEditor
Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 134.193 sec <<< 
FAILURE!
--
Running org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine
Tests run: 20, Failures: 5, Errors: 2, Skipped: 0, Time elapsed: 669.588 sec 
<<< FAILURE!
{code}
There was one hanging test:
{code}
at 
org.apache.hadoop.hbase.replication.TestReplication.setUp(TestReplication.java:183)
{code}

BTW what do *R*~i~, C and *F*~i~ represent in the formula above ?

  was (Author: zhi...@ebaysf.com):
I ran test suite with latest patch on trunk and got:
{code}
Running org.apache.hadoop.hbase.client.TestHCM
Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 37.265 sec <<< 
FAILURE!
--
Running org.apache.hadoop.hbase.client.TestAdmin
Tests run: 40, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 322.872 sec 
<<< FAILURE!
--
Running org.apache.hadoop.hbase.catalog.TestMetaReaderEditor
Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 134.193 sec <<< 
FAILURE!
--
Running org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine
Tests run: 20, Failures: 5, Errors: 2, Skipped: 0, Time elapsed: 669.588 sec 
<<< FAILURE!
{code}
There was one hanging test:
{code}
at 
org.apache.hadoop.hbase.replication.TestReplication.setUp(TestReplication.java:183)
{code}

BTW what do R~i~, C and F~i~ represent in the formula above ?
  
> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, 
> testReplication.jstack
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur,

[jira] [Comment Edited] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-19 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418866#comment-13418866
 ] 

Zhihong Ted Yu edited comment on HBASE-6389 at 7/20/12 1:37 AM:


I ran test suite with latest patch on trunk and got:
{code}
Running org.apache.hadoop.hbase.client.TestHCM
Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 37.265 sec <<< 
FAILURE!
--
Running org.apache.hadoop.hbase.client.TestAdmin
Tests run: 40, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 322.872 sec 
<<< FAILURE!
--
Running org.apache.hadoop.hbase.catalog.TestMetaReaderEditor
Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 134.193 sec <<< 
FAILURE!
--
Running org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine
Tests run: 20, Failures: 5, Errors: 2, Skipped: 0, Time elapsed: 669.588 sec 
<<< FAILURE!
{code}
There was one hanging test:
{code}
at 
org.apache.hadoop.hbase.replication.TestReplication.setUp(TestReplication.java:183)
{code}

BTW what do R~i~, C and F~i~ represent in the formula above ?

  was (Author: zhi...@ebaysf.com):
I ran test suite with latest patch on trunk and got:
{code}
Running org.apache.hadoop.hbase.client.TestHCM
Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 37.265 sec <<< 
FAILURE!
--
Running org.apache.hadoop.hbase.client.TestAdmin
Tests run: 40, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 322.872 sec 
<<< FAILURE!
--
Running org.apache.hadoop.hbase.catalog.TestMetaReaderEditor
Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 134.193 sec <<< 
FAILURE!
--
Running org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine
Tests run: 20, Failures: 5, Errors: 2, Skipped: 0, Time elapsed: 669.588 sec 
<<< FAILURE!
{code}
There was one hanging test:
{code}
at 
org.apache.hadoop.hbase.replication.TestReplication.setUp(TestReplication.java:183)
{code}

BTW what do R sub i, C and F sub i represent in the formula above ?
  
> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, 
> testReplication.jstack
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concu

[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-19 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6389:
--

Attachment: testReplication.jstack

jstack for the hanging TestReplication

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, 
> testReplication.jstack
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || timeout > slept || count < 
> minToStart)
>   ){
> ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-19 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418866#comment-13418866
 ] 

Zhihong Ted Yu commented on HBASE-6389:
---

I ran test suite with latest patch on trunk and got:
{code}
Running org.apache.hadoop.hbase.client.TestHCM
Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 37.265 sec <<< 
FAILURE!
--
Running org.apache.hadoop.hbase.client.TestAdmin
Tests run: 40, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 322.872 sec 
<<< FAILURE!
--
Running org.apache.hadoop.hbase.catalog.TestMetaReaderEditor
Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 134.193 sec <<< 
FAILURE!
--
Running org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine
Tests run: 20, Failures: 5, Errors: 2, Skipped: 0, Time elapsed: 669.588 sec 
<<< FAILURE!
{code}
There was one hanging test:
{code}
at 
org.apache.hadoop.hbase.replication.TestReplication.setUp(TestReplication.java:183)
{code}

BTW what do R sub i, C and F sub i represent in the formula above ?

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfi

[jira] [Commented] (HBASE-5843) Improve HBase MTTR - Mean Time To Recover

2012-07-19 Thread Gregory Chanan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418863#comment-13418863
 ] 

Gregory Chanan commented on HBASE-5843:
---

Looks great so far, nkeywal.

Some questions:

{quote}
2) Kill -9 of a RS; wait for all regions to become online again:
0.92: 980s
0.96: ~13s
=> The 180s gap comes from HBASE-5844. For master, HBASE-5926 is not tested but 
should bring similar results.
{quote}

I'm confused as to what the 180s gap refers to.  I see 980 (test 2) - 800 
(test1) = 180, but that is against 0.92, which doesn't have HBASE-5970, right?  
Could you clarify?

{quote}
3) Start of the cluster after a clean stop; wait for all regions to
become online.
0.92: ~1020s
0.94: ~1023s (tested once only)
0.96: ~31s
=> The benefit is visible at startup
=> This does not come from something implemented for 0.94
{quote}

Awesome.. We think this is also due to HBASE-5970 and HBASE-6109? (since I 
assume HBASE-5844 and HBASE-5926 do not apply in this case).

{quote}
7) With 2 RS, Insert 20M simple puts; then kill -9 the second one. See how long 
it takes to have all the regions available.
0.92) 180s detection time+ then hangs twice out of 2 tests.
0.96) 14s (hangs once out of 3)
=> There's a bug 
{quote}
Has a JIRA been filed?

{quote}
Test to be changed to get a real difference when we need to replay the wal.
{quote}
Could you clarify what you mean here?


> Improve HBase MTTR - Mean Time To Recover
> -
>
> Key: HBASE-5843
> URL: https://issues.apache.org/jira/browse/HBASE-5843
> Project: HBase
>  Issue Type: Umbrella
>Affects Versions: 0.96.0
>Reporter: nkeywal
>Assignee: nkeywal
>
> A part of the approach is described here: 
> https://docs.google.com/document/d/1z03xRoZrIJmg7jsWuyKYl6zNournF_7ZHzdi0qz_B4c/edit
> The ideal target is:
> - failure impact client applications only by an added delay to execute a 
> query, whatever the failure.
> - this delay is always inferior to 1 second.
> We're not going to achieve that immediately...
> Priority will be given to the most frequent issues.
> Short term:
> - software crash
> - standard administrative tasks as stop/start of a cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6386) Audit log messages do not include column family / qualifier information consistently

2012-07-19 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418844#comment-13418844
 ] 

Marcelo Vanzin commented on HBASE-6386:
---

Other methods also seem to suffer from similar issues; for example, 
preIncrementColumnValue does this:

{code}
requirePermission(TablePermission.Action.WRITE, c.getEnvironment(),
Arrays.asList(new byte[][]{family}));
{code}

Even though there is a "qualifier" argument; so the qualifier information never 
makes it to the audit log. It also kinda sucks that there's no standard "family 
map" type for all these operations, so to come up with one common type for 
auditing, you'd have to make copies of that data (or use ugly wrapper objects).


> Audit log messages do not include column family / qualifier information 
> consistently
> 
>
> Key: HBASE-6386
> URL: https://issues.apache.org/jira/browse/HBASE-6386
> Project: HBase
>  Issue Type: Improvement
>  Components: security
>Reporter: Marcelo Vanzin
>
> The code related to this issue is in 
> AccessController.java:permissionGranted().
> When creating audit logs, that method will do one of the following:
> * grant access, create audit log with table name only
> * deny access because of table permission, create audit log with table name 
> only
> * deny access because of column family / qualifier permission, create audit 
> log with specific family / qualifier
> So, in the case where more than one column family and/or qualifier are in the 
> same request, there will be a loss of information. Even in the case where 
> only one column family and/or qualifier is involved, information may be lost.
> It would be better if this behavior consistently included all the information 
> in the request; regardless of access being granted or denied, and regardless 
> which permission caused the denial, the column family and qualifier info 
> should be part of the audit log message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5966) MapReduce based tests broken on Hadoop 2.0.0-alpha

2012-07-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418842#comment-13418842
 ] 

Hudson commented on HBASE-5966:
---

Integrated in HBase-0.94 #344 (See 
[https://builds.apache.org/job/HBase-0.94/344/])
HBASE-5966 MapReduce based tests broken on Hadoop 2.0.0-alpha (Gregory 
Chanan) (Revision 1363586)

 Result = FAILURE
jxiang : 
Files : 
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java


> MapReduce based tests broken on Hadoop 2.0.0-alpha
> --
>
> Key: HBASE-5966
> URL: https://issues.apache.org/jira/browse/HBASE-5966
> Project: HBase
>  Issue Type: Bug
>  Components: mapred, mapreduce, test
>Affects Versions: 0.94.0, 0.96.0
> Environment: Hadoop 2.0.0-alpha-SNAPSHOT, HBase 0.94.0-SNAPSHOT, 
> Ubuntu 12.04 LTS (GNU/Linux 3.2.0-24-generic x86_64)
>Reporter: Andrew Purtell
>Assignee: Jimmy Xiang
> Fix For: 0.96.0, 0.94.1
>
> Attachments: HBASE-5966-1.patch, HBASE-5966-94.patch, 
> HBASE-5966.patch, hbase-5966.patch
>
>
> Some fairly recent change in Hadoop 2.0.0-alpha has broken our MapReduce test 
> rigging. Below is a representative error, can be easily reproduced with:
> {noformat}
> mvn -PlocalTests -Psecurity \
>   -Dhadoop.profile=23 -Dhadoop.version=2.0.0-SNAPSHOT \
>   clean test \
>   -Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
> {noformat}
> And the result:
> {noformat}
> ---
>  T E S T S
> ---
> Running org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec 
> <<< FAILURE!
> ---
> Test set: org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
> ---
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec 
> <<< FAILURE!
> testMultiRegionTable(org.apache.hadoop.hbase.mapreduce.TestTableMapReduce)  
> Time elapsed: 21.935 sec  <<< ERROR!
> java.lang.reflect.UndeclaredThrowableException
>   at 
> org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getNewApplication(ClientRMProtocolPBClientImpl.java:134)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.getNewJobID(ResourceMgrDelegate.java:183)
>   at org.apache.hadoop.mapred.YARNRunner.getNewJobID(YARNRunner.java:216)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:339)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1226)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1223)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:416)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1223)
>   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1244)
>   at 
> org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.runTestOnTable(TestTableMapReduce.java:151)
>   at 
> org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.testMultiRegionTable(TestTableMapReduce.java:129)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:616)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:18)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
>   at o

[jira] [Updated] (HBASE-6431) Some FilterList Constructors break addFilter

2012-07-19 Thread Alex Newman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Newman updated HBASE-6431:
---

  Component/s: filters
Affects Version/s: 0.92.1
   0.94.0

> Some FilterList Constructors break addFilter
> 
>
> Key: HBASE-6431
> URL: https://issues.apache.org/jira/browse/HBASE-6431
> Project: HBase
>  Issue Type: Bug
>  Components: filters
>Affects Versions: 0.92.1, 0.94.0
>Reporter: Alex Newman
>Assignee: Alex Newman
>Priority: Minor
> Attachments: 
> 0001-HBASE-6431.-Some-FilterList-Constructors-break-addFi.patch
>
>
> Some of the constructors for FilterList set the internal list of filters to 
> list types which don't support the add operation. As a result 
> FilterList(final List rowFilters)
> FilterList(final Filter... rowFilters)
> FilterList(final Operator operator, final List rowFilters)
> FilterList(final Operator operator, final Filter... rowFilters)
> may init private List filters = new ArrayList(); incorrectly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6431) Some FilterList Constructors break addFilter

2012-07-19 Thread Alex Newman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Newman updated HBASE-6431:
---

Priority: Minor  (was: Major)

> Some FilterList Constructors break addFilter
> 
>
> Key: HBASE-6431
> URL: https://issues.apache.org/jira/browse/HBASE-6431
> Project: HBase
>  Issue Type: Bug
>Reporter: Alex Newman
>Assignee: Alex Newman
>Priority: Minor
> Attachments: 
> 0001-HBASE-6431.-Some-FilterList-Constructors-break-addFi.patch
>
>
> Some of the constructors for FilterList set the internal list of filters to 
> list types which don't support the add operation. As a result 
> FilterList(final List rowFilters)
> FilterList(final Filter... rowFilters)
> FilterList(final Operator operator, final List rowFilters)
> FilterList(final Operator operator, final Filter... rowFilters)
> may init private List filters = new ArrayList(); incorrectly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6429) Filter with filterRow() returning true is also incompatible with scan with limit

2012-07-19 Thread Jie Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418830#comment-13418830
 ] 

Jie Huang commented on HBASE-6429:
--

Oops.I will fix those 2 failures and regenerate the patch soon. Thanks Ted.

> Filter with filterRow() returning true is also incompatible with scan with 
> limit
> 
>
> Key: HBASE-6429
> URL: https://issues.apache.org/jira/browse/HBASE-6429
> Project: HBase
>  Issue Type: Bug
>  Components: filters
>Affects Versions: 0.96.0
>Reporter: Jason Dai
> Attachments: hbase-6429_0_94_0.patch
>
>
> Currently if we scan with bot limit and a Filter with 
> filterRow(List) implemented, an  IncompatibleFilterException will 
> be thrown. The same exception should also be thrown if the filer has its 
> filterRow() implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6411) Move Master Metrics to metrics 2

2012-07-19 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-6411:
-

Attachment: HBASE-6411-0.patch

Here's a working implementation of master with metrics2.  It includes some 
tests but not a whole lot.  I plan to include a lot more once I am able to 
inject test metricsources (HBASE-6407).

It doesn't include histograms of the split size (HBASE-6409).

> Move Master Metrics to metrics 2
> 
>
> Key: HBASE-6411
> URL: https://issues.apache.org/jira/browse/HBASE-6411
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Elliott Clark
>Assignee: Alex Baranau
> Attachments: HBASE-6411-0.patch, HBASE-6411_concept.patch
>
>
> Move Master Metrics to metrics 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6431) Some FilterList Constructors break addFilter

2012-07-19 Thread Alex Newman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Newman updated HBASE-6431:
---

Status: Patch Available  (was: Open)

> Some FilterList Constructors break addFilter
> 
>
> Key: HBASE-6431
> URL: https://issues.apache.org/jira/browse/HBASE-6431
> Project: HBase
>  Issue Type: Bug
>Reporter: Alex Newman
>Assignee: Alex Newman
> Attachments: 
> 0001-HBASE-6431.-Some-FilterList-Constructors-break-addFi.patch
>
>
> Some of the constructors for FilterList set the internal list of filters to 
> list types which don't support the add operation. As a result 
> FilterList(final List rowFilters)
> FilterList(final Filter... rowFilters)
> FilterList(final Operator operator, final List rowFilters)
> FilterList(final Operator operator, final Filter... rowFilters)
> may init private List filters = new ArrayList(); incorrectly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6411) Move Master Metrics to metrics 2

2012-07-19 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-6411:
-

Assignee: Elliott Clark  (was: Alex Baranau)
  Status: Patch Available  (was: Open)

> Move Master Metrics to metrics 2
> 
>
> Key: HBASE-6411
> URL: https://issues.apache.org/jira/browse/HBASE-6411
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HBASE-6411-0.patch, HBASE-6411_concept.patch
>
>
> Move Master Metrics to metrics 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6431) Some FilterList Constructors break addFilter

2012-07-19 Thread Alex Newman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Newman updated HBASE-6431:
---

Attachment: 0001-HBASE-6431.-Some-FilterList-Constructors-break-addFi.patch

> Some FilterList Constructors break addFilter
> 
>
> Key: HBASE-6431
> URL: https://issues.apache.org/jira/browse/HBASE-6431
> Project: HBase
>  Issue Type: Bug
>Reporter: Alex Newman
>Assignee: Alex Newman
> Attachments: 
> 0001-HBASE-6431.-Some-FilterList-Constructors-break-addFi.patch
>
>
> Some of the constructors for FilterList set the internal list of filters to 
> list types which don't support the add operation. As a result 
> FilterList(final List rowFilters)
> FilterList(final Filter... rowFilters)
> FilterList(final Operator operator, final List rowFilters)
> FilterList(final Operator operator, final Filter... rowFilters)
> may init private List filters = new ArrayList(); incorrectly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6431) Some FilterList Constructors break addFilter

2012-07-19 Thread Alex Newman (JIRA)
Alex Newman created HBASE-6431:
--

 Summary: Some FilterList Constructors break addFilter
 Key: HBASE-6431
 URL: https://issues.apache.org/jira/browse/HBASE-6431
 Project: HBase
  Issue Type: Bug
Reporter: Alex Newman
Assignee: Alex Newman


Some of the constructors for FilterList set the internal list of filters to 
list types which don't support the add operation. As a result 

FilterList(final List rowFilters)
FilterList(final Filter... rowFilters)
FilterList(final Operator operator, final List rowFilters)
FilterList(final Operator operator, final Filter... rowFilters)

may init private List filters = new ArrayList(); incorrectly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6405) Create Hadoop compatibilty modules and Metrics2 implementation of replication metrics

2012-07-19 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-6405:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Create Hadoop compatibilty modules and Metrics2 implementation of replication 
> metrics
> -
>
> Key: HBASE-6405
> URL: https://issues.apache.org/jira/browse/HBASE-6405
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Zhihong Ted Yu
>Assignee: Elliott Clark
> Fix For: 0.96.0
>
> Attachments: 6405.txt, HBASE-6405-ADD.patch, 
> hbase-6405-addendum-2-v2.patch, hbase-6405-addendum-2.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-19 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418815#comment-13418815
 ] 

Lars Hofhansl commented on HBASE-6389:
--

:) didn't pick up on the "was"

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || timeout > slept || count < 
> minToStart)
>   ){
> ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-19 Thread Aditya Kishore (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418812#comment-13418812
 ] 

Aditya Kishore commented on HBASE-6389:
---

@Lars

Completely agree and definitely would not want to hold 0.94.1 for this. (That's 
why "My vote *was*... :) ).

Documentation can take care of this in 0.94.1

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || timeout > slept || count < 
> minToStart)
>   ){
> ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.

[jira] [Commented] (HBASE-6325) [replication] Race in ReplicationSourceManager.init can initiate a failover even if the node is alive

2012-07-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418813#comment-13418813
 ] 

Hudson commented on HBASE-6325:
---

Integrated in HBase-TRUNK #3154 (See 
[https://builds.apache.org/job/HBase-TRUNK/3154/])
HBASE-6325  [replication] Race in ReplicationSourceManager.init can 
initiate a failover even if the node is alive (Revision 1363573)

 Result = SUCCESS
jdcryans : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java


> [replication] Race in ReplicationSourceManager.init can initiate a failover 
> even if the node is alive
> -
>
> Key: HBASE-6325
> URL: https://issues.apache.org/jira/browse/HBASE-6325
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.6, 0.92.1, 0.94.0
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-6325-0.92-v2.patch, HBASE-6325-0.92.patch
>
>
> Yet another bug found during the leap second madness, it's possible to miss 
> the registration of new region servers so that in 
> ReplicationSourceManager.init we start the failover of a live and replicating 
> region server. I don't think there's data loss but the RS that's being failed 
> over will die on:
> {noformat}
> 2012-07-01 06:25:15,604 FATAL 
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
> sv4r23s48,10304,1341112194623: Writing replication status
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> NoNode for 
> /hbase/replication/rs/sv4r23s48,10304,1341112194623/4/sv4r23s48%2C10304%2C1341112194623.1341112195369
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
> at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
> at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:655)
> at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:697)
> at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:470)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:154)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:607)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:368)
> {noformat}
> It seems to me that just refreshing {{otherRegionServers}} after getting the 
> list of {{currentReplicators}} would be enough to fix this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Comment Edited] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-19 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418808#comment-13418808
 ] 

Lars Hofhansl edited comment on HBASE-6389 at 7/19/12 11:47 PM:


@Aditya: I do agree. (see my comment about how I'm sure the logic of this 
change is correct).

It now seems, though, that it is the default timeout that is too short (4.5s).
Folks with 5k regions should know to increase the minToStart parameter and the 
timeout. We should document that better.
I can also see to change the timeout to failure condition (as discussed above).

I'm not opposed. It's just that 0.94.1 needs to go out because of HBASE-6311, I 
do not want to risk delaying this further. It also seems this can use further 
discussion.
(Sometimes it is amazing how much discussion a two line change can cause :) )

@Ted and @Stack: What do you guys think? 

Edit: Spelling.

  was (Author: lhofhansl):
@Aditya: I do agree. (see my comment about how I'm sure the logic of this 
change is correct).

It now seems, though, that it is the default timeout that is too short (4.5s).
Folks with 5k regions should know to increase the minToStart parameter and the 
timeout. We should document that better.
I can also see to change the timeout to failure condition (as discussed above).

I'm not opposed. It's just that 0.94.1 needs to go out because of HBASE-6311, I 
do not want to risk delaying this further. It also seems this can use further 
discussion.
(Sometimes it is amazing how much discussion as two change can cause :) )

@Ted and @Stack: What do you guys think? 

  
> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of

[jira] [Resolved] (HBASE-5966) MapReduce based tests broken on Hadoop 2.0.0-alpha

2012-07-19 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang resolved HBASE-5966.


Resolution: Fixed

Integrated to 0.94. Thank Greg for the patch, Lars for the review.

> MapReduce based tests broken on Hadoop 2.0.0-alpha
> --
>
> Key: HBASE-5966
> URL: https://issues.apache.org/jira/browse/HBASE-5966
> Project: HBase
>  Issue Type: Bug
>  Components: mapred, mapreduce, test
>Affects Versions: 0.94.0, 0.96.0
> Environment: Hadoop 2.0.0-alpha-SNAPSHOT, HBase 0.94.0-SNAPSHOT, 
> Ubuntu 12.04 LTS (GNU/Linux 3.2.0-24-generic x86_64)
>Reporter: Andrew Purtell
>Assignee: Jimmy Xiang
> Fix For: 0.96.0, 0.94.1
>
> Attachments: HBASE-5966-1.patch, HBASE-5966-94.patch, 
> HBASE-5966.patch, hbase-5966.patch
>
>
> Some fairly recent change in Hadoop 2.0.0-alpha has broken our MapReduce test 
> rigging. Below is a representative error, can be easily reproduced with:
> {noformat}
> mvn -PlocalTests -Psecurity \
>   -Dhadoop.profile=23 -Dhadoop.version=2.0.0-SNAPSHOT \
>   clean test \
>   -Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
> {noformat}
> And the result:
> {noformat}
> ---
>  T E S T S
> ---
> Running org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec 
> <<< FAILURE!
> ---
> Test set: org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
> ---
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec 
> <<< FAILURE!
> testMultiRegionTable(org.apache.hadoop.hbase.mapreduce.TestTableMapReduce)  
> Time elapsed: 21.935 sec  <<< ERROR!
> java.lang.reflect.UndeclaredThrowableException
>   at 
> org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getNewApplication(ClientRMProtocolPBClientImpl.java:134)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.getNewJobID(ResourceMgrDelegate.java:183)
>   at org.apache.hadoop.mapred.YARNRunner.getNewJobID(YARNRunner.java:216)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:339)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1226)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1223)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:416)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1223)
>   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1244)
>   at 
> org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.runTestOnTable(TestTableMapReduce.java:151)
>   at 
> org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.testMultiRegionTable(TestTableMapReduce.java:129)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:616)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:18)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
>   at 
> org.junit.internal.runners.statements.RunBe

[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-19 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418808#comment-13418808
 ] 

Lars Hofhansl commented on HBASE-6389:
--

@Aditya: I do agree. (see my comment about how I'm sure the logic of this 
change is correct).

It now seems, though, that it is the default timeout that is too short (4.5s).
Folks with 5k regions should know to increase the minToStart parameter and the 
timeout. We should document that better.
I can also see to change the timeout to failure condition (as discussed above).

I'm not opposed. It's just that 0.94.1 needs to go out because of HBASE-6311, I 
do not want to risk delaying this further. It also seems this can use further 
discussion.
(Sometimes it is amazing how much discussion as two change can cause :) )

@Ted and @Stack: What do you guys think? 


> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> I

[jira] [Commented] (HBASE-6325) [replication] Race in ReplicationSourceManager.init can initiate a failover even if the node is alive

2012-07-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418803#comment-13418803
 ] 

Hudson commented on HBASE-6325:
---

Integrated in HBase-0.94 #343 (See 
[https://builds.apache.org/job/HBase-0.94/343/])
HBASE-6319  ReplicationSource can call terminate on itself and deadlock
HBASE-6325  [replication] Race in ReplicationSourceManager.init can initiate a 
failover even if the node is alive (Revision 1363570)

 Result = SUCCESS
jdcryans : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java


> [replication] Race in ReplicationSourceManager.init can initiate a failover 
> even if the node is alive
> -
>
> Key: HBASE-6325
> URL: https://issues.apache.org/jira/browse/HBASE-6325
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.6, 0.92.1, 0.94.0
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-6325-0.92-v2.patch, HBASE-6325-0.92.patch
>
>
> Yet another bug found during the leap second madness, it's possible to miss 
> the registration of new region servers so that in 
> ReplicationSourceManager.init we start the failover of a live and replicating 
> region server. I don't think there's data loss but the RS that's being failed 
> over will die on:
> {noformat}
> 2012-07-01 06:25:15,604 FATAL 
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
> sv4r23s48,10304,1341112194623: Writing replication status
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> NoNode for 
> /hbase/replication/rs/sv4r23s48,10304,1341112194623/4/sv4r23s48%2C10304%2C1341112194623.1341112195369
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
> at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
> at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:655)
> at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:697)
> at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:470)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:154)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:607)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:368)
> {noformat}
> It seems to me that just refreshing {{otherRegionServers}} after getting the 
> list of {{currentReplicators}} would be enough to fix this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Reopened] (HBASE-6276) TestClassLoading is racy

2012-07-19 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell reopened HBASE-6276:
---

  Assignee: (was: Andrew Purtell)

> TestClassLoading is racy
> 
>
> Key: HBASE-6276
> URL: https://issues.apache.org/jira/browse/HBASE-6276
> Project: HBase
>  Issue Type: Bug
>  Components: coprocessors, test
>Affects Versions: 0.92.2, 0.96.0, 0.94.1
>Reporter: Andrew Purtell
>Priority: Minor
> Attachments: HBASE-6276-0.94.patch, HBASE-6276.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6319) ReplicationSource can call terminate on itself and deadlock

2012-07-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418802#comment-13418802
 ] 

Hudson commented on HBASE-6319:
---

Integrated in HBase-0.94 #343 (See 
[https://builds.apache.org/job/HBase-0.94/343/])
HBASE-6319  ReplicationSource can call terminate on itself and deadlock
HBASE-6325  [replication] Race in ReplicationSourceManager.init can initiate a 
failover even if the node is alive (Revision 1363570)

 Result = SUCCESS
jdcryans : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java


> ReplicationSource can call terminate on itself and deadlock
> ---
>
> Key: HBASE-6319
> URL: https://issues.apache.org/jira/browse/HBASE-6319
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.6, 0.92.1, 0.94.0
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.92.2, 0.94.1
>
> Attachments: HBASE-6319-0.92.patch
>
>
> In a few places in the ReplicationSource code calls terminate on itself which 
> is a problem since in terminate() we wait on that thread to die.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4956) Control direct memory buffer consumption by HBaseClient

2012-07-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418797#comment-13418797
 ] 

Hudson commented on HBASE-4956:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #100 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/100/])
HBASE-4956 Control direct memory buffer consumption by HBaseClient (Bob 
Copeland) (Revision 1363526)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/Result.java


> Control direct memory buffer consumption by HBaseClient
> ---
>
> Key: HBASE-4956
> URL: https://issues.apache.org/jira/browse/HBASE-4956
> Project: HBase
>  Issue Type: New Feature
>Reporter: Ted Yu
>Assignee: Bob Copeland
> Fix For: 0.96.0, 0.94.1
>
> Attachments: 4956.txt, thread_get.rb
>
>
> As Jonathan explained here 
> https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357?pli=1
>  , standard hbase client inadvertently consumes large amount of direct memory.
> We should consider using netty for NIO-related tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6312) Make BlockCache eviction thresholds configurable

2012-07-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418798#comment-13418798
 ] 

Hudson commented on HBASE-6312:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #100 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/100/])
HBASE-6312 Make BlockCache eviction thresholds configurable (Jie Huang) 
(Revision 1363468)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/DoubleBlockCache.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestLruBlockCache.java


> Make BlockCache eviction thresholds configurable
> 
>
> Key: HBASE-6312
> URL: https://issues.apache.org/jira/browse/HBASE-6312
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Affects Versions: 0.94.0
>Reporter: Jie Huang
>Assignee: Jie Huang
>Priority: Minor
> Fix For: 0.96.0
>
> Attachments: hbase-6312.patch, hbase-6312_v2.patch, 
> hbase-6312_v3.patch
>
>
> Some of our customers found that tuning the BlockCache eviction thresholds 
> made test results different in their test environment. However, those 
> thresholds are not configurable in the current implementation. The only way 
> to change those values is to re-compile the HBase source code. We wonder if 
> it is possible to make them configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6325) [replication] Race in ReplicationSourceManager.init can initiate a failover even if the node is alive

2012-07-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418799#comment-13418799
 ] 

Hudson commented on HBASE-6325:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #100 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/100/])
HBASE-6325  [replication] Race in ReplicationSourceManager.init can 
initiate a failover even if the node is alive (Revision 1363573)

 Result = FAILURE
jdcryans : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java


> [replication] Race in ReplicationSourceManager.init can initiate a failover 
> even if the node is alive
> -
>
> Key: HBASE-6325
> URL: https://issues.apache.org/jira/browse/HBASE-6325
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.6, 0.92.1, 0.94.0
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-6325-0.92-v2.patch, HBASE-6325-0.92.patch
>
>
> Yet another bug found during the leap second madness, it's possible to miss 
> the registration of new region servers so that in 
> ReplicationSourceManager.init we start the failover of a live and replicating 
> region server. I don't think there's data loss but the RS that's being failed 
> over will die on:
> {noformat}
> 2012-07-01 06:25:15,604 FATAL 
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
> sv4r23s48,10304,1341112194623: Writing replication status
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> NoNode for 
> /hbase/replication/rs/sv4r23s48,10304,1341112194623/4/sv4r23s48%2C10304%2C1341112194623.1341112195369
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
> at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
> at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:655)
> at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:697)
> at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:470)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:154)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:607)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:368)
> {noformat}
> It seems to me that just refreshing {{otherRegionServers}} after getting the 
> list of {{currentReplicators}} would be enough to fix this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5966) MapReduce based tests broken on Hadoop 2.0.0-alpha

2012-07-19 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418793#comment-13418793
 ] 

Lars Hofhansl commented on HBASE-5966:
--

+1

> MapReduce based tests broken on Hadoop 2.0.0-alpha
> --
>
> Key: HBASE-5966
> URL: https://issues.apache.org/jira/browse/HBASE-5966
> Project: HBase
>  Issue Type: Bug
>  Components: mapred, mapreduce, test
>Affects Versions: 0.94.0, 0.96.0
> Environment: Hadoop 2.0.0-alpha-SNAPSHOT, HBase 0.94.0-SNAPSHOT, 
> Ubuntu 12.04 LTS (GNU/Linux 3.2.0-24-generic x86_64)
>Reporter: Andrew Purtell
>Assignee: Jimmy Xiang
> Fix For: 0.96.0, 0.94.1
>
> Attachments: HBASE-5966-1.patch, HBASE-5966-94.patch, 
> HBASE-5966.patch, hbase-5966.patch
>
>
> Some fairly recent change in Hadoop 2.0.0-alpha has broken our MapReduce test 
> rigging. Below is a representative error, can be easily reproduced with:
> {noformat}
> mvn -PlocalTests -Psecurity \
>   -Dhadoop.profile=23 -Dhadoop.version=2.0.0-SNAPSHOT \
>   clean test \
>   -Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
> {noformat}
> And the result:
> {noformat}
> ---
>  T E S T S
> ---
> Running org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec 
> <<< FAILURE!
> ---
> Test set: org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
> ---
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec 
> <<< FAILURE!
> testMultiRegionTable(org.apache.hadoop.hbase.mapreduce.TestTableMapReduce)  
> Time elapsed: 21.935 sec  <<< ERROR!
> java.lang.reflect.UndeclaredThrowableException
>   at 
> org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getNewApplication(ClientRMProtocolPBClientImpl.java:134)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.getNewJobID(ResourceMgrDelegate.java:183)
>   at org.apache.hadoop.mapred.YARNRunner.getNewJobID(YARNRunner.java:216)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:339)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1226)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1223)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:416)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1223)
>   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1244)
>   at 
> org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.runTestOnTable(TestTableMapReduce.java:151)
>   at 
> org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.testMultiRegionTable(TestTableMapReduce.java:129)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:616)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:18)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.ja

[jira] [Commented] (HBASE-3432) [hbck] Add "remove table" switch

2012-07-19 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418792#comment-13418792
 ] 

Jonathan Hsieh commented on HBASE-3432:
---

[juneng603] eventually, after region assignments are completed and the region 
is opened on the target RS, information is updated in the META table so that 
other clients can go to the proper RS.

> [hbck] Add "remove table" switch
> 
>
> Key: HBASE-3432
> URL: https://issues.apache.org/jira/browse/HBASE-3432
> Project: HBase
>  Issue Type: New Feature
>  Components: util
>Affects Versions: 0.89.20100924
>Reporter: Lars George
>Priority: Minor
>
> This happened before and I am not sure how the new Master improves on it 
> (this stuff is only available between the lines are buried in some peoples 
> heads - one other thing I wish was for a better place to communicate what 
> each path improves). Just so we do not miss it, there is an issue that 
> sometimes disabling large tables simply times out and the table gets stuck in 
> limbo. 
> From the CDH User list:
> {quote}
> On Fri, Jan 7, 2011 at 1:57 PM, Sean Sechrist  wrote:
> To get them out of META, you can just scan '.META.' for that table name, and 
> delete those rows. We had to do that a few months ago.
> -Sean
> That did it.  For the benefit of others, here's code.  Beware the literal 
> table names, run at your own peril.
> {quote}
> {code}
> import java.io.IOException;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.hbase.HBaseConfiguration;
> import org.apache.hadoop.hbase.client.HTable;
> import org.apache.hadoop.hbase.client.Delete;
> import org.apache.hadoop.hbase.client.Result;
> import org.apache.hadoop.hbase.client.MetaScanner;
> import org.apache.hadoop.hbase.util.Bytes;
> public class CleanFromMeta {
> public static class Cleaner implements MetaScanner.MetaScannerVisitor {
> public HTable meta = null;
> public Cleaner(Configuration conf) throws IOException {
> meta = new HTable(conf, ".META.");
> }
> public boolean processRow(Result rowResult) throws IOException {
> String r = new String(rowResult.getRow());
> if (r.startsWith("webtable,")) {
> meta.delete(new Delete(rowResult.getRow()));
> System.out.println("Deleting row " + rowResult);
> }
> return true;
> }
> }
> public static void main(String[] args) throws Exception {
> String tname = ".META.";
> Configuration conf = HBaseConfiguration.create();
> MetaScanner.metaScan(conf, new Cleaner(conf), 
>  Bytes.toBytes("webtable"));
> }
> }
> {code}
> I suggest to move this into HBaseFsck. I do not like personally to have these 
> JRuby scripts floating around that may or may not help. This should be 
> available if a user gets stuck and knows what he is doing (they can delete 
> from .META. anyways). Maybe a "\-\-disable-table  \-\-force" or 
> so? But since disable is already in the shell we could add an "\-\-force" 
> there? Or add a "\-\-delete-table " to the hbck?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3432) [hbck] Add "remove table" switch

2012-07-19 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418790#comment-13418790
 ] 

Jonathan Hsieh commented on HBASE-3432:
---

[~vamshi] root and meta are special regions but regions nonetheless. They get 
assigned to arbitrary (possibly different) region servers, and are hit on every 
new client's read and write path.  
 
[~juneng603] /hbase/uassigned is where Regions-in-transitions informatin is 
kept.  These are modified as regions are being assigned to particular region 
servers.  They coordinate the state between the master assigning and then RS 
assignee.

> [hbck] Add "remove table" switch
> 
>
> Key: HBASE-3432
> URL: https://issues.apache.org/jira/browse/HBASE-3432
> Project: HBase
>  Issue Type: New Feature
>  Components: util
>Affects Versions: 0.89.20100924
>Reporter: Lars George
>Priority: Minor
>
> This happened before and I am not sure how the new Master improves on it 
> (this stuff is only available between the lines are buried in some peoples 
> heads - one other thing I wish was for a better place to communicate what 
> each path improves). Just so we do not miss it, there is an issue that 
> sometimes disabling large tables simply times out and the table gets stuck in 
> limbo. 
> From the CDH User list:
> {quote}
> On Fri, Jan 7, 2011 at 1:57 PM, Sean Sechrist  wrote:
> To get them out of META, you can just scan '.META.' for that table name, and 
> delete those rows. We had to do that a few months ago.
> -Sean
> That did it.  For the benefit of others, here's code.  Beware the literal 
> table names, run at your own peril.
> {quote}
> {code}
> import java.io.IOException;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.hbase.HBaseConfiguration;
> import org.apache.hadoop.hbase.client.HTable;
> import org.apache.hadoop.hbase.client.Delete;
> import org.apache.hadoop.hbase.client.Result;
> import org.apache.hadoop.hbase.client.MetaScanner;
> import org.apache.hadoop.hbase.util.Bytes;
> public class CleanFromMeta {
> public static class Cleaner implements MetaScanner.MetaScannerVisitor {
> public HTable meta = null;
> public Cleaner(Configuration conf) throws IOException {
> meta = new HTable(conf, ".META.");
> }
> public boolean processRow(Result rowResult) throws IOException {
> String r = new String(rowResult.getRow());
> if (r.startsWith("webtable,")) {
> meta.delete(new Delete(rowResult.getRow()));
> System.out.println("Deleting row " + rowResult);
> }
> return true;
> }
> }
> public static void main(String[] args) throws Exception {
> String tname = ".META.";
> Configuration conf = HBaseConfiguration.create();
> MetaScanner.metaScan(conf, new Cleaner(conf), 
>  Bytes.toBytes("webtable"));
> }
> }
> {code}
> I suggest to move this into HBaseFsck. I do not like personally to have these 
> JRuby scripts floating around that may or may not help. This should be 
> available if a user gets stuck and knows what he is doing (they can delete 
> from .META. anyways). Maybe a "\-\-disable-table  \-\-force" or 
> so? But since disable is already in the shell we could add an "\-\-force" 
> there? Or add a "\-\-delete-table " to the hbck?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-19 Thread Aditya Kishore (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418789#comment-13418789
 ] 

Aditya Kishore commented on HBASE-6389:
---

My vote was for its inclusion for 2 reasons.

# This was a behavior change in 0.94.0 and I am not sure we have completely 
understood its impact.
# In a large MSLAB enabled cluster, I have repeatedly seen all the regions (in 
excess of 5K with *Σ*~i=1..n~(*R*~i~*CF*~i~) > 8K; with MSLAB on, RS needs > 
16G just to open) being assigned to a single region server leading it to OOM 
crash and creating quite a few HBCK inconsistencies on subsequent recovery.

Lastly, so far all the test failures seems to be due to errors in the test code 
unmasked by this change.

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> .

[jira] [Commented] (HBASE-6310) -ROOT- corruption when .META. is using the old encoding scheme

2012-07-19 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418777#comment-13418777
 ] 

Jonathan Hsieh commented on HBASE-6310:
---

hbck writes directly to .META. but I don't think it ever writes to root unless 
you put the -metaonly flag on.  

It may be possible that if there were two .META. region dirs, hbck tried to 
pull in the old .META. dir.  This would probably write something goofy to .META 
though.  If you just used the -repair option, it would have first tried to 
merge regions before modifying meta. (but also would likely have not modified 
ROOT).

> -ROOT- corruption when .META. is using the old encoding scheme
> --
>
> Key: HBASE-6310
> URL: https://issues.apache.org/jira/browse/HBASE-6310
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.94.0
>Reporter: Jean-Daniel Cryans
>Priority: Blocker
> Fix For: 0.96.0, 0.94.2
>
>
> We're still working the on the root cause here, but after the leap second 
> armageddon we had a hard time getting our 0.94 cluster back up. This is what 
> we saw in the logs until the master died by itself:
> {noformat}
> 2012-07-01 23:01:52,149 DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> locateRegionInMeta parentTable=-ROOT-,
> metaLocation={region=-ROOT-,,0.70236052, hostname=sfor3s28,
> port=10304}, attempt=16 of 100 failed; retrying after sleep of 32000
> because: HRegionInfo was null or empty in -ROOT-,
> row=keyvalues={.META.,,1259448304806/info:server/1341124914705/Put/vlen=14/ts=0,
> .META.,,1259448304806/info:serverstartcode/1341124914705/Put/vlen=8/ts=0}
> {noformat}
> (it's strage that we retry this)
> This was really misleading because I could see the regioninfo in a scan:
> {noformat}
> hbase(main):002:0> scan '-ROOT-'
> ROW   COLUMN+CELL
>  .META.,,1column=info:regioninfo,
> timestamp=1331755381142, value={NAME => '.META.,,1', STARTKEY => '',
> ENDKEY => '', ENCODED => 1028785192,}
>  .META.,,1column=info:server,
> timestamp=1341183448693, value=sfor3s40:10304
>  .META.,,1
> column=info:serverstartcode, timestamp=1341183448693,
> value=1341183444689
>  .META.,,1column=info:v,
> timestamp=1331755419291, value=\x00\x00
>  .META.,,1259448304806column=info:server,
> timestamp=1341124914705, value=sfor3s24:10304
>  .META.,,1259448304806
> column=info:serverstartcode, timestamp=1341124914705,
> value=1341124455863
> {noformat}
> Except that the devil is in the details, ".META.,,1" is not 
> ".META.,,1259448304806". Basically something writes to .META. by directly 
> creating the row key without caring if the row is in the old format. I did a 
> deleteall in the shell and it fixed the issue... until some time later it was 
> stuck again because the edits reappeared (still not sure why). This time the 
> PostOpenDeployTasksThread were stuck in the RS trying to update .META. but 
> there was no logging (saw it with a jstack). I deleted the row again to make 
> it work.
> I'm marking this as a blocker against 0.94.2 since we're trying to get 0.94.1 
> out, but I wouldn't recommend upgrading to 0.94 if your cluster was created 
> before 0.89

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6417) hbck merges .META. regions if there's an old leftover

2012-07-19 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418774#comment-13418774
 ] 

Jean-Daniel Cryans commented on HBASE-6417:
---

No, but I can reproduce.

> hbck merges .META. regions if there's an old leftover
> -
>
> Key: HBASE-6417
> URL: https://issues.apache.org/jira/browse/HBASE-6417
> Project: HBase
>  Issue Type: Bug
>Reporter: Jean-Daniel Cryans
> Fix For: 0.96.0, 0.94.2
>
> Attachments: hbck.log
>
>
> Trying to see what caused HBASE-6310, one of the things I figured is that the 
> bad .META. row is actually one from the time that we were permitting meta 
> splitting and that folder had just been staying there for a while.
> So I tried to recreate the issue with -repair and it merged my good .META. 
> region with the one that's 3 years old that also has the same start key. I 
> ended up with a brand new .META. region!
> I'll be attaching the full log in a separate file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5966) MapReduce based tests broken on Hadoop 2.0.0-alpha

2012-07-19 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418772#comment-13418772
 ] 

Jimmy Xiang commented on HBASE-5966:


looks good to me, will commit to 0.94 tonight if no objection.

> MapReduce based tests broken on Hadoop 2.0.0-alpha
> --
>
> Key: HBASE-5966
> URL: https://issues.apache.org/jira/browse/HBASE-5966
> Project: HBase
>  Issue Type: Bug
>  Components: mapred, mapreduce, test
>Affects Versions: 0.94.0, 0.96.0
> Environment: Hadoop 2.0.0-alpha-SNAPSHOT, HBase 0.94.0-SNAPSHOT, 
> Ubuntu 12.04 LTS (GNU/Linux 3.2.0-24-generic x86_64)
>Reporter: Andrew Purtell
>Assignee: Jimmy Xiang
> Fix For: 0.96.0, 0.94.1
>
> Attachments: HBASE-5966-1.patch, HBASE-5966-94.patch, 
> HBASE-5966.patch, hbase-5966.patch
>
>
> Some fairly recent change in Hadoop 2.0.0-alpha has broken our MapReduce test 
> rigging. Below is a representative error, can be easily reproduced with:
> {noformat}
> mvn -PlocalTests -Psecurity \
>   -Dhadoop.profile=23 -Dhadoop.version=2.0.0-SNAPSHOT \
>   clean test \
>   -Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
> {noformat}
> And the result:
> {noformat}
> ---
>  T E S T S
> ---
> Running org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec 
> <<< FAILURE!
> ---
> Test set: org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
> ---
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec 
> <<< FAILURE!
> testMultiRegionTable(org.apache.hadoop.hbase.mapreduce.TestTableMapReduce)  
> Time elapsed: 21.935 sec  <<< ERROR!
> java.lang.reflect.UndeclaredThrowableException
>   at 
> org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getNewApplication(ClientRMProtocolPBClientImpl.java:134)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.getNewJobID(ResourceMgrDelegate.java:183)
>   at org.apache.hadoop.mapred.YARNRunner.getNewJobID(YARNRunner.java:216)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:339)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1226)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1223)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:416)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1223)
>   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1244)
>   at 
> org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.runTestOnTable(TestTableMapReduce.java:151)
>   at 
> org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.testMultiRegionTable(TestTableMapReduce.java:129)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:616)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:18)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
>   at 
> org.junit.inter

[jira] [Commented] (HBASE-6417) hbck merges .META. regions if there's an old leftover

2012-07-19 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418769#comment-13418769
 ] 

Jonathan Hsieh commented on HBASE-6417:
---

Did you keep a copy of the hbck details before you ran the -repair option?  

> hbck merges .META. regions if there's an old leftover
> -
>
> Key: HBASE-6417
> URL: https://issues.apache.org/jira/browse/HBASE-6417
> Project: HBase
>  Issue Type: Bug
>Reporter: Jean-Daniel Cryans
> Fix For: 0.96.0, 0.94.2
>
> Attachments: hbck.log
>
>
> Trying to see what caused HBASE-6310, one of the things I figured is that the 
> bad .META. row is actually one from the time that we were permitting meta 
> splitting and that folder had just been staying there for a while.
> So I tried to recreate the issue with -repair and it merged my good .META. 
> region with the one that's 3 years old that also has the same start key. I 
> ended up with a brand new .META. region!
> I'll be attaching the full log in a separate file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-19 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418771#comment-13418771
 ] 

Lars Hofhansl commented on HBASE-6389:
--

I'd like to leave this with 0.94.2. Unless you think this must go into 0.94.1

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || timeout > slept || count < 
> minToStart)
>   ){
> ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

   

[jira] [Commented] (HBASE-6393) Decouple audit event creation from storage in AccessController

2012-07-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418768#comment-13418768
 ] 

Hadoop QA commented on HBASE-6393:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12537256/hbase-6393-v1.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 5 javac compiler warnings (more than 
the trunk's current 4 warnings).

-1 findbugs.  The patch appears to introduce 15 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2414//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2414//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2414//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2414//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2414//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2414//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2414//console

This message is automatically generated.

> Decouple audit event creation from storage in AccessController
> --
>
> Key: HBASE-6393
> URL: https://issues.apache.org/jira/browse/HBASE-6393
> Project: HBase
>  Issue Type: Brainstorming
>  Components: security
>Affects Versions: 0.96.0
>Reporter: Marcelo Vanzin
> Attachments: hbase-6393-v1.patch
>
>
> Currently, AccessControler takes care of both generating audit events (by 
> performing access checks) and storing them (by creating a log message and 
> writing it to the AUDITLOG logger).
> This makes the logging system the only way to catch audit events. It means 
> that if someone wants to do something fancier (like writing these records to 
> a database somewhere), they need to hack through the logging system, and 
> parse the messages generated by AccessController, which is not optimal.
> The attached patch decouples generation and storage by introducing a new 
> interface, used by AccessController, to log the audit events. The current, 
> log-based storage is kept in place so that current users won't be affected by 
> the change.
> I'm filing this as an RFC at this point, so the patch is not totally clean; 
> it's on top of HBase 0.92 (which is easier for me to test) and doesn't have 
> any unit tests, for starters. But the changes should be very similar on trunk 
> - I don't remember changes in this particular area of the code between those 
> versions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418767#comment-13418767
 ] 

Hadoop QA commented on HBASE-6389:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12537258/org.apache.hadoop.hbase.TestZooKeeper-output.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 10 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2415//console

This message is automatically generated.

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master

[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-19 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418766#comment-13418766
 ] 

stack commented on HBASE-6389:
--

@Aditya Makes sense.  You got what you needed from Ted?  Let us know.  Thanks.

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || timeout > slept || count < 
> minToStart)
>   ){
> ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5966) MapReduce based tests broken on Hadoop 2.0.0-alpha

2012-07-19 Thread Gregory Chanan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gregory Chanan updated HBASE-5966:
--

Attachment: HBASE-5966-94.patch

Attached patch for 0.94.  Ran TestTableMapReduce against both 1.0 and 2.0 
hadoop profiles, both passed:


mvn test -PlocalTests 
-Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce

---
 T E S T S
---
Running org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 188.087 sec

Results :

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

mvn test -PlocalTests -Dhadoop.profile=2.0 
-Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce

---
 T E S T S
---
Running org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 167.49 sec

Results :

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0


> MapReduce based tests broken on Hadoop 2.0.0-alpha
> --
>
> Key: HBASE-5966
> URL: https://issues.apache.org/jira/browse/HBASE-5966
> Project: HBase
>  Issue Type: Bug
>  Components: mapred, mapreduce, test
>Affects Versions: 0.94.0, 0.96.0
> Environment: Hadoop 2.0.0-alpha-SNAPSHOT, HBase 0.94.0-SNAPSHOT, 
> Ubuntu 12.04 LTS (GNU/Linux 3.2.0-24-generic x86_64)
>Reporter: Andrew Purtell
>Assignee: Jimmy Xiang
> Fix For: 0.96.0, 0.94.1
>
> Attachments: HBASE-5966-1.patch, HBASE-5966-94.patch, 
> HBASE-5966.patch, hbase-5966.patch
>
>
> Some fairly recent change in Hadoop 2.0.0-alpha has broken our MapReduce test 
> rigging. Below is a representative error, can be easily reproduced with:
> {noformat}
> mvn -PlocalTests -Psecurity \
>   -Dhadoop.profile=23 -Dhadoop.version=2.0.0-SNAPSHOT \
>   clean test \
>   -Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
> {noformat}
> And the result:
> {noformat}
> ---
>  T E S T S
> ---
> Running org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec 
> <<< FAILURE!
> ---
> Test set: org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
> ---
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec 
> <<< FAILURE!
> testMultiRegionTable(org.apache.hadoop.hbase.mapreduce.TestTableMapReduce)  
> Time elapsed: 21.935 sec  <<< ERROR!
> java.lang.reflect.UndeclaredThrowableException
>   at 
> org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getNewApplication(ClientRMProtocolPBClientImpl.java:134)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.getNewJobID(ResourceMgrDelegate.java:183)
>   at org.apache.hadoop.mapred.YARNRunner.getNewJobID(YARNRunner.java:216)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:339)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1226)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1223)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:416)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1223)
>   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1244)
>   at 
> org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.runTestOnTable(TestTableMapReduce.java:151)
>   at 
> org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.testMultiRegionTable(TestTableMapReduce.java:129)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:616)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
>   at 
> org.junit.internal.run

[jira] [Resolved] (HBASE-6319) ReplicationSource can call terminate on itself and deadlock

2012-07-19 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans resolved HBASE-6319.
---

   Resolution: Fixed
Fix Version/s: (was: 0.90.8)
 Hadoop Flags: Reviewed

Committed to 0.92 and 0.94, skipping 0.90 like HBASE-6325. Trunk was already 
fixed.

> ReplicationSource can call terminate on itself and deadlock
> ---
>
> Key: HBASE-6319
> URL: https://issues.apache.org/jira/browse/HBASE-6319
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.6, 0.92.1, 0.94.0
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.92.2, 0.94.1
>
> Attachments: HBASE-6319-0.92.patch
>
>
> In a few places in the ReplicationSource code calls terminate on itself which 
> is a problem since in terminate() we wait on that thread to die.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4470) ServerNotRunningException coming out of assignRootAndMeta kills the Master

2012-07-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418753#comment-13418753
 ] 

Hadoop QA commented on HBASE-4470:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12537246/HBASE-4470-v2-trunk.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 5 javac compiler warnings (more than 
the trunk's current 4 warnings).

-1 findbugs.  The patch appears to introduce 12 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2413//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2413//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2413//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2413//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2413//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2413//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2413//console

This message is automatically generated.

> ServerNotRunningException coming out of assignRootAndMeta kills the Master
> --
>
> Key: HBASE-4470
> URL: https://issues.apache.org/jira/browse/HBASE-4470
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: Jean-Daniel Cryans
>Assignee: Gregory Chanan
>Priority: Critical
> Fix For: 0.90.7
>
> Attachments: HBASE-4470-90.patch, HBASE-4470-v2-90.patch, 
> HBASE-4470-v2-92_94.patch, HBASE-4470-v2-trunk.patch
>
>
> I'm surprised we still have issues like that and I didn't get a hit while 
> googling so forgive me if there's already a jira about it.
> When the master starts it verifies the locations of root and meta before 
> assigning them, if the server is started but not running you'll get this:
> {quote}
> 2011-09-23 04:47:44,859 WARN 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: 
> RemoteException connecting to RS
> org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.hbase.ipc.ServerNotRunningException: Server is not running 
> yet
> at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1038)
> at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:771)
> at 
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
> at $Proxy6.getProtocolVersion(Unknown Source)
> at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
> at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
> at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
> at 
> org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
> at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:969)
> at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:388)
> at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:287)
> at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:484)
> at 
> org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:441)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:388)
> at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:282)
> {quote}
> I hit that 3-4 times this week while debugging something else. The worst is 
> that when you restart the master it sees that as a failover, but none of the 
> regions are assigned so it takes an eternity to get back fully online.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA adminis

[jira] [Resolved] (HBASE-6325) [replication] Race in ReplicationSourceManager.init can initiate a failover even if the node is alive

2012-07-19 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans resolved HBASE-6325.
---

   Resolution: Fixed
Fix Version/s: (was: 0.90.8)
 Hadoop Flags: Reviewed

Committed to 0.92, 0.94 and trunk. Not caring about 0.90 either.

> [replication] Race in ReplicationSourceManager.init can initiate a failover 
> even if the node is alive
> -
>
> Key: HBASE-6325
> URL: https://issues.apache.org/jira/browse/HBASE-6325
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.6, 0.92.1, 0.94.0
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-6325-0.92-v2.patch, HBASE-6325-0.92.patch
>
>
> Yet another bug found during the leap second madness, it's possible to miss 
> the registration of new region servers so that in 
> ReplicationSourceManager.init we start the failover of a live and replicating 
> region server. I don't think there's data loss but the RS that's being failed 
> over will die on:
> {noformat}
> 2012-07-01 06:25:15,604 FATAL 
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
> sv4r23s48,10304,1341112194623: Writing replication status
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> NoNode for 
> /hbase/replication/rs/sv4r23s48,10304,1341112194623/4/sv4r23s48%2C10304%2C1341112194623.1341112195369
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
> at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
> at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:655)
> at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:697)
> at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:470)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:154)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:607)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:368)
> {noformat}
> It seems to me that just refreshing {{otherRegionServers}} after getting the 
> list of {{currentReplicators}} would be enough to fix this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-19 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418752#comment-13418752
 ] 

Zhihong Ted Yu commented on HBASE-6389:
---

Looking at https://builds.apache.org/job/PreCommit-HBASE-Build/2406/console, 
there was still some hanging test although I wasn't able to find which test 
hung.

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || timeout > slept || count < 
> minToStart)
>   ){
> ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.js

[jira] [Commented] (HBASE-6430) Few modifications in section 2.4.2.1 of Apache HBase Reference Guide

2012-07-19 Thread Mohammad Tariq Iqbal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418749#comment-13418749
 ] 

Mohammad Tariq Iqbal commented on HBASE-6430:
-

Thanks a lot for the support stack. I'll go through the link provided by you. I 
have made following changes, in case the attachment was ambiguous(I should have 
done it before hand. My bad) -
1-  Addition of 'core-site.xml' file to point out how to give the value of 
'hbase.rootdir' property so that HMaster can contact the NameNode properly.
2- /etc/hosts file modification to avoid loopback problem (as proper DNS 
resolution is very important in order to get Hbase work properly).
3- Modification of hbase-env.sh file to enable the use of Hbase's Zookeeper.
4- Addition of 'hbase.cluster.distributed' and 
'hbase.zookeeper.property.clientPort' properties in conf/hbase-site.xml.
5- Copying hadoop-core-*.jar and commons-collections-3.2.1.jar from 
HADOOP_HOME/lib folder into the HBASE_HOME/lib folder to avoid any 
compatibility issues between Hadoop and Hbase.

Apologies for my ignorance. Many thanks.

> Few modifications in section 2.4.2.1 of Apache HBase Reference Guide
> 
>
> Key: HBASE-6430
> URL: https://issues.apache.org/jira/browse/HBASE-6430
> Project: HBase
>  Issue Type: Improvement
>Reporter: Mohammad Tariq Iqbal
>Priority: Minor
> Attachments: HBASE-6430.txt
>
>
> Quite often, newbies face some issues while configuring Hbase in pseudo 
> distributed mode. I was no exception. I would like to propose some solutions 
> for these problems which worked for me. If the community finds it 
> appropriate, I would like to apply the patch for the same. This is the first 
> time I am trying to do something like this, so please pardon me if I have put 
> it in an appropriate manner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-19 Thread Aditya Kishore (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418750#comment-13418750
 ] 

Aditya Kishore commented on HBASE-6389:
---

@Stack

No, the current patch does not modify the way a live RS is evaluated, but it 
ensures that the dying RS's thread is actually dead before moving forward.

{quote}
What is the below changing doing?

conf.setInt("hbase.master.wait.on.regionservers.mintostart", numSlaves);
conf.setInt("hbase.master.wait.on.regionservers.maxtostart", numSlaves);
+ String count = String.valueOf(numSlaves);
+ conf.setIfUnset("hbase.master.wait.on.regionservers.mintostart", count);
+ conf.setIfUnset("hbase.master.wait.on.regionservers.maxtostart", count);
{quote}

This change was to preserve the values of 'mintostart' and 'maxtostart' in the 
configuration if the caller of HBaseTestingUtility.startMiniHBaseCluster(int, 
int) has set them (which was the case with 
TestMasterFailover#testMasterFailoverWithMockedRITOnDeadRS failure).

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> 

[jira] [Commented] (HBASE-5985) TestMetaMigrationRemovingHTD failed with HADOOP 2.0.0

2012-07-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418747#comment-13418747
 ] 

Hudson commented on HBASE-5985:
---

Integrated in HBase-0.94 #342 (See 
[https://builds.apache.org/job/HBase-0.94/342/])
HBASE-5985 TestMetaMigrationRemovingHTD failed with HADOOP 2.0.0 (Revision 
1363561)

 Result = SUCCESS
jxiang : 
Files : 
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/client/TestMetaMigrationRemovingHTD.java


> TestMetaMigrationRemovingHTD failed with HADOOP 2.0.0
> -
>
> Key: HBASE-5985
> URL: https://issues.apache.org/jira/browse/HBASE-5985
> Project: HBase
>  Issue Type: Test
>  Components: test
>Affects Versions: 0.96.0
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 0.96.0, 0.94.1
>
> Attachments: hbase-5985.patch
>
>
> ---
> Test set: org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD
> ---
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 3.448 sec <<< 
> FAILURE!
> org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD  Time elapsed: 0 
> sec  <<< ERROR!
> java.io.IOException: Failed put; errcode=1
> at 
> org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD.doFsCommand(TestMetaMigrationRemovingHTD.java:124)
> at 
> org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD.setUpBeforeClass(TestMetaMigrationRemovingHTD.java:80)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
> at 
> org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-19 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6389:
--

Attachment: org.apache.hadoop.hbase.TestZooKeeper-output.txt

Here was the test output from yesterday.

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || timeout > slept || count < 
> minToStart)
>   ){
> ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-19 Thread Aditya Kishore (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418744#comment-13418744
 ] 

Aditya Kishore commented on HBASE-6389:
---

Unfortunately, even after repeated attempts, I am not able to fail the test 
after applying the last patch. But I do have a theory.

Could you please test the last patch once again with debug logging enabled and 
send me the log.

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || timeout > slept || count < 
> minToStart)
>   ){
> ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdmini

[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-19 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418742#comment-13418742
 ] 

stack commented on HBASE-6389:
--

@Aditya Thanks for the debugging that went into your figuring the above.

bq. The precondition of each test evaluates and ensure than a minimum of two 
region servers are online (by testing if their threads are "alive" and not by 
testing their ZK node or connecting to it).

Does this patch change how we evaluate "alive" regionservers?  (If not should, 
given your debug above, it seems like a good change for HTU).

What is the below changing doing?

{code}
-conf.setInt("hbase.master.wait.on.regionservers.mintostart", numSlaves);
-conf.setInt("hbase.master.wait.on.regionservers.maxtostart", numSlaves);
+String count = String.valueOf(numSlaves);
+conf.setIfUnset("hbase.master.wait.on.regionservers.mintostart", count);
+conf.setIfUnset("hbase.master.wait.on.regionservers.maxtostart", count);
{code]

Thanks.

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToSta

[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-19 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418738#comment-13418738
 ] 

Zhihong Ted Yu commented on HBASE-6389:
---

Thanks for your explanation.

Have you seen the test failure that I described above @ 19/Jul/12 03:34 ?

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || timeout > slept || count < 
> minToStart)
>   ){
> ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6430) Few modifications in section 2.4.2.1 of Apache HBase Reference Guide

2012-07-19 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418735#comment-13418735
 ] 

stack commented on HBASE-6430:
--

Thank you for the helping out Mohammed.   You might want to give this a review: 
http://hbase.apache.org/book.html#submitting.patches  It tries to help you 
submitting patches (Its hard to tell given what you have attached, what has 
been changed... its also not a 'patch' file... let us know if the doc is not 
sufficient and we'll help you out making a patch).  Good on you Mohammed.

> Few modifications in section 2.4.2.1 of Apache HBase Reference Guide
> 
>
> Key: HBASE-6430
> URL: https://issues.apache.org/jira/browse/HBASE-6430
> Project: HBase
>  Issue Type: Improvement
>Reporter: Mohammad Tariq Iqbal
>Priority: Minor
> Attachments: HBASE-6430.txt
>
>
> Quite often, newbies face some issues while configuring Hbase in pseudo 
> distributed mode. I was no exception. I would like to propose some solutions 
> for these problems which worked for me. If the community finds it 
> appropriate, I would like to apply the patch for the same. This is the first 
> time I am trying to do something like this, so please pardon me if I have put 
> it in an appropriate manner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-19 Thread Aditya Kishore (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418731#comment-13418731
 ] 

Aditya Kishore commented on HBASE-6389:
---

@Ted

The last patch address the exact issue listed at 
https://issues.apache.org/jira/browse/HBASE-6406?focusedCommentId=13417665&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13417665

What had happened is that in most test runs, the test 
testRegionServerSessionExpired() get launched before testMasterSessionExpired() 
or testMasterZKSessionRecoveryFailure(). This test 
testRegionServerSessionExpired() brings down one of the two region servers but 
this RS is not dead by the time testMasterSessionExpired() or 
testMasterZKSessionRecoveryFailure() starts.

The precondition of each test evaluates and ensure than a minimum of two region 
servers are online (by testing if their threads are "alive" and not by testing 
their ZK node or connecting to it).

So while one of the RS is shutting itself down (and its thread is still alive), 
and either of testMasterSessionExpired() or 
testMasterZKSessionRecoveryFailure() could start because the test precondition 
is satisfied. However, both of these test cases result in Master recovery and 
reinitialization which actually attempts to check for the quorum of 2 Online 
region servers and since there is only one region server online at this point, 
the initialization fails with timeout and the master is killed.

By this time the dying region server's thread is dead and the precondition of 
the next test sees that it needs to create one region server. But since no 
master is running at this point, the newly created region server's run thread 
gets blocked in HRegionServer.blockAndCheckIfStopped() and the RS does not come 
online. As a result the test thread which is waiting for the RS to come online 
keeps waiting which is why you see the test hung in setup().

My last patch ensured that the dying RS is completely stopped before 
testRegionServerSessionExpired() completes so that the subsequent tests' 
precondition does not get fooled into thinking that the minimum server count is 
met and start the testcase.

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https:/

[jira] [Commented] (HBASE-5547) Don't delete HFiles when in "backup mode"

2012-07-19 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418730#comment-13418730
 ] 

Zhihong Ted Yu commented on HBASE-5547:
---

I ran TestRegionServerCoprocessorExceptionWithAbort based on patch v16 and it 
passed.

> Don't delete HFiles when in "backup mode"
> -
>
> Key: HBASE-5547
> URL: https://issues.apache.org/jira/browse/HBASE-5547
> Project: HBase
>  Issue Type: New Feature
>Reporter: Lars Hofhansl
>Assignee: Jesse Yates
> Fix For: 0.94.2
>
> Attachments: 5547-v12.txt, 5547-v16.txt, hbase-5447-v8.patch, 
> hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, 
> java_HBASE-5547_v14.patch, java_HBASE-5547_v15.patch, 
> java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, 
> java_HBASE-5547_v7.patch
>
>
> This came up in a discussion I had with Stack.
> It would be nice if HBase could be notified that a backup is in progress (via 
> a znode for example) and in that case either:
> 1. rename HFiles to be delete to .bck
> 2. rename the HFiles into a special directory
> 3. rename them to a general trash directory (which would not need to be tied 
> to backup mode).
> That way it should be able to get a consistent backup based on HFiles (HDFS 
> snapshots or hard links would be better options here, but we do not have 
> those).
> #1 makes cleanup a bit harder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6393) Decouple audit event creation from storage in AccessController

2012-07-19 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HBASE-6393:
--

Affects Version/s: 0.96.0
   Status: Patch Available  (was: Open)

> Decouple audit event creation from storage in AccessController
> --
>
> Key: HBASE-6393
> URL: https://issues.apache.org/jira/browse/HBASE-6393
> Project: HBase
>  Issue Type: Brainstorming
>  Components: security
>Affects Versions: 0.96.0
>Reporter: Marcelo Vanzin
> Attachments: hbase-6393-v1.patch
>
>
> Currently, AccessControler takes care of both generating audit events (by 
> performing access checks) and storing them (by creating a log message and 
> writing it to the AUDITLOG logger).
> This makes the logging system the only way to catch audit events. It means 
> that if someone wants to do something fancier (like writing these records to 
> a database somewhere), they need to hack through the logging system, and 
> parse the messages generated by AccessController, which is not optimal.
> The attached patch decouples generation and storage by introducing a new 
> interface, used by AccessController, to log the audit events. The current, 
> log-based storage is kept in place so that current users won't be affected by 
> the change.
> I'm filing this as an RFC at this point, so the patch is not totally clean; 
> it's on top of HBase 0.92 (which is easier for me to test) and doesn't have 
> any unit tests, for starters. But the changes should be very similar on trunk 
> - I don't remember changes in this particular area of the code between those 
> versions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6393) Decouple audit event creation from storage in AccessController

2012-07-19 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HBASE-6393:
--

Attachment: hbase-6393-v1.patch

Patch against trunk, modeled after similar change in HDFS (HDFS-3680).

> Decouple audit event creation from storage in AccessController
> --
>
> Key: HBASE-6393
> URL: https://issues.apache.org/jira/browse/HBASE-6393
> Project: HBase
>  Issue Type: Brainstorming
>  Components: security
>Affects Versions: 0.96.0
>Reporter: Marcelo Vanzin
> Attachments: hbase-6393-v1.patch
>
>
> Currently, AccessControler takes care of both generating audit events (by 
> performing access checks) and storing them (by creating a log message and 
> writing it to the AUDITLOG logger).
> This makes the logging system the only way to catch audit events. It means 
> that if someone wants to do something fancier (like writing these records to 
> a database somewhere), they need to hack through the logging system, and 
> parse the messages generated by AccessController, which is not optimal.
> The attached patch decouples generation and storage by introducing a new 
> interface, used by AccessController, to log the audit events. The current, 
> log-based storage is kept in place so that current users won't be affected by 
> the change.
> I'm filing this as an RFC at this point, so the patch is not totally clean; 
> it's on top of HBase 0.92 (which is easier for me to test) and doesn't have 
> any unit tests, for starters. But the changes should be very similar on trunk 
> - I don't remember changes in this particular area of the code between those 
> versions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6393) Decouple audit event creation from storage in AccessController

2012-07-19 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HBASE-6393:
--

Attachment: (was: accesslogger-v1.patch)

> Decouple audit event creation from storage in AccessController
> --
>
> Key: HBASE-6393
> URL: https://issues.apache.org/jira/browse/HBASE-6393
> Project: HBase
>  Issue Type: Brainstorming
>  Components: security
>Reporter: Marcelo Vanzin
>
> Currently, AccessControler takes care of both generating audit events (by 
> performing access checks) and storing them (by creating a log message and 
> writing it to the AUDITLOG logger).
> This makes the logging system the only way to catch audit events. It means 
> that if someone wants to do something fancier (like writing these records to 
> a database somewhere), they need to hack through the logging system, and 
> parse the messages generated by AccessController, which is not optimal.
> The attached patch decouples generation and storage by introducing a new 
> interface, used by AccessController, to log the audit events. The current, 
> log-based storage is kept in place so that current users won't be affected by 
> the change.
> I'm filing this as an RFC at this point, so the patch is not totally clean; 
> it's on top of HBase 0.92 (which is easier for me to test) and doesn't have 
> any unit tests, for starters. But the changes should be very similar on trunk 
> - I don't remember changes in this particular area of the code between those 
> versions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-2877) Unnecessary byte written when serializing a Writable RPC parameter

2012-07-19 Thread Benoit Sigoure (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoit Sigoure updated HBASE-2877:
--

Affects Version/s: 0.90.0
   0.90.1
   0.90.2
   0.90.3
   0.90.4
   0.90.5
   0.90.6
   0.92.0
   0.92.1
   0.94.0

> Unnecessary byte written when serializing a Writable RPC parameter
> --
>
> Key: HBASE-2877
> URL: https://issues.apache.org/jira/browse/HBASE-2877
> Project: HBase
>  Issue Type: Bug
>  Components: ipc
>Affects Versions: 0.20.5, 0.89.20100621, 0.90.0, 0.90.1, 0.90.2, 0.90.3, 
> 0.90.4, 0.90.5, 0.90.6, 0.92.0, 0.92.1, 0.94.0
>Reporter: Benoit Sigoure
>Priority: Minor
>
> When {{HbaseObjectWritable#writeObject}} serializes a {{Writable}} RPC 
> parameter, it writes its "class code" twice to the wire.  {{writeClassCode}} 
> is already called once unconditionally at the beginning of the method, and 
> for {{Writable}} arguments, it's called a second time towards the end of the 
> method.  It seems that the code is trying to deal with the "declared type" 
> vs. "actual type" of a parameter.  The Hadoop RPC code was already doing this 
> before Stack changed it to use codes in r608738 for HADOOP-2519.  It's not 
> documented when this is useful though, and I couldn't find any use case.  
> Every RPC I've seen so far just ends up with the same byte sent twice to the 
> wire.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6430) Few modifications in section 2.4.2.1 of Apache HBase Reference Guide

2012-07-19 Thread Mohammad Tariq Iqbal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Tariq Iqbal updated HBASE-6430:


Attachment: HBASE-6430.txt

Please have a look at the attachment and let me know if it requires any 
modification or if it not eligible to be submitted. Many thanks.

> Few modifications in section 2.4.2.1 of Apache HBase Reference Guide
> 
>
> Key: HBASE-6430
> URL: https://issues.apache.org/jira/browse/HBASE-6430
> Project: HBase
>  Issue Type: Improvement
>Reporter: Mohammad Tariq Iqbal
>Priority: Minor
> Attachments: HBASE-6430.txt
>
>
> Quite often, newbies face some issues while configuring Hbase in pseudo 
> distributed mode. I was no exception. I would like to propose some solutions 
> for these problems which worked for me. If the community finds it 
> appropriate, I would like to apply the patch for the same. This is the first 
> time I am trying to do something like this, so please pardon me if I have put 
> it in an appropriate manner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5966) MapReduce based tests broken on Hadoop 2.0.0-alpha

2012-07-19 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5966:
-

Fix Version/s: 0.94.1

Discussed with Jimmy. Let's have this in 0.94.1

> MapReduce based tests broken on Hadoop 2.0.0-alpha
> --
>
> Key: HBASE-5966
> URL: https://issues.apache.org/jira/browse/HBASE-5966
> Project: HBase
>  Issue Type: Bug
>  Components: mapred, mapreduce, test
>Affects Versions: 0.94.0, 0.96.0
> Environment: Hadoop 2.0.0-alpha-SNAPSHOT, HBase 0.94.0-SNAPSHOT, 
> Ubuntu 12.04 LTS (GNU/Linux 3.2.0-24-generic x86_64)
>Reporter: Andrew Purtell
>Assignee: Jimmy Xiang
> Fix For: 0.96.0, 0.94.1
>
> Attachments: HBASE-5966-1.patch, HBASE-5966.patch, hbase-5966.patch
>
>
> Some fairly recent change in Hadoop 2.0.0-alpha has broken our MapReduce test 
> rigging. Below is a representative error, can be easily reproduced with:
> {noformat}
> mvn -PlocalTests -Psecurity \
>   -Dhadoop.profile=23 -Dhadoop.version=2.0.0-SNAPSHOT \
>   clean test \
>   -Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
> {noformat}
> And the result:
> {noformat}
> ---
>  T E S T S
> ---
> Running org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec 
> <<< FAILURE!
> ---
> Test set: org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
> ---
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec 
> <<< FAILURE!
> testMultiRegionTable(org.apache.hadoop.hbase.mapreduce.TestTableMapReduce)  
> Time elapsed: 21.935 sec  <<< ERROR!
> java.lang.reflect.UndeclaredThrowableException
>   at 
> org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getNewApplication(ClientRMProtocolPBClientImpl.java:134)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.getNewJobID(ResourceMgrDelegate.java:183)
>   at org.apache.hadoop.mapred.YARNRunner.getNewJobID(YARNRunner.java:216)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:339)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1226)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1223)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:416)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1223)
>   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1244)
>   at 
> org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.runTestOnTable(TestTableMapReduce.java:151)
>   at 
> org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.testMultiRegionTable(TestTableMapReduce.java:129)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:616)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:18)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
> 

[jira] [Reopened] (HBASE-5966) MapReduce based tests broken on Hadoop 2.0.0-alpha

2012-07-19 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl reopened HBASE-5966:
--


> MapReduce based tests broken on Hadoop 2.0.0-alpha
> --
>
> Key: HBASE-5966
> URL: https://issues.apache.org/jira/browse/HBASE-5966
> Project: HBase
>  Issue Type: Bug
>  Components: mapred, mapreduce, test
>Affects Versions: 0.94.0, 0.96.0
> Environment: Hadoop 2.0.0-alpha-SNAPSHOT, HBase 0.94.0-SNAPSHOT, 
> Ubuntu 12.04 LTS (GNU/Linux 3.2.0-24-generic x86_64)
>Reporter: Andrew Purtell
>Assignee: Jimmy Xiang
> Fix For: 0.96.0, 0.94.1
>
> Attachments: HBASE-5966-1.patch, HBASE-5966.patch, hbase-5966.patch
>
>
> Some fairly recent change in Hadoop 2.0.0-alpha has broken our MapReduce test 
> rigging. Below is a representative error, can be easily reproduced with:
> {noformat}
> mvn -PlocalTests -Psecurity \
>   -Dhadoop.profile=23 -Dhadoop.version=2.0.0-SNAPSHOT \
>   clean test \
>   -Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
> {noformat}
> And the result:
> {noformat}
> ---
>  T E S T S
> ---
> Running org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec 
> <<< FAILURE!
> ---
> Test set: org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
> ---
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec 
> <<< FAILURE!
> testMultiRegionTable(org.apache.hadoop.hbase.mapreduce.TestTableMapReduce)  
> Time elapsed: 21.935 sec  <<< ERROR!
> java.lang.reflect.UndeclaredThrowableException
>   at 
> org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getNewApplication(ClientRMProtocolPBClientImpl.java:134)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.getNewJobID(ResourceMgrDelegate.java:183)
>   at org.apache.hadoop.mapred.YARNRunner.getNewJobID(YARNRunner.java:216)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:339)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1226)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1223)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:416)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1223)
>   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1244)
>   at 
> org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.runTestOnTable(TestTableMapReduce.java:151)
>   at 
> org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.testMultiRegionTable(TestTableMapReduce.java:129)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:616)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:18)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:3

[jira] [Updated] (HBASE-5985) TestMetaMigrationRemovingHTD failed with HADOOP 2.0.0

2012-07-19 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-5985:
---

Fix Version/s: 0.94.1

> TestMetaMigrationRemovingHTD failed with HADOOP 2.0.0
> -
>
> Key: HBASE-5985
> URL: https://issues.apache.org/jira/browse/HBASE-5985
> Project: HBase
>  Issue Type: Test
>  Components: test
>Affects Versions: 0.96.0
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 0.96.0, 0.94.1
>
> Attachments: hbase-5985.patch
>
>
> ---
> Test set: org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD
> ---
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 3.448 sec <<< 
> FAILURE!
> org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD  Time elapsed: 0 
> sec  <<< ERROR!
> java.io.IOException: Failed put; errcode=1
> at 
> org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD.doFsCommand(TestMetaMigrationRemovingHTD.java:124)
> at 
> org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD.setUpBeforeClass(TestMetaMigrationRemovingHTD.java:80)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
> at 
> org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6325) [replication] Race in ReplicationSourceManager.init can initiate a failover even if the node is alive

2012-07-19 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6325:
-

Fix Version/s: (was: 0.94.2)
   0.94.1

+1 on patch

> [replication] Race in ReplicationSourceManager.init can initiate a failover 
> even if the node is alive
> -
>
> Key: HBASE-6325
> URL: https://issues.apache.org/jira/browse/HBASE-6325
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.6, 0.92.1, 0.94.0
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.92.2, 0.96.0, 0.94.1, 0.90.8
>
> Attachments: HBASE-6325-0.92-v2.patch, HBASE-6325-0.92.patch
>
>
> Yet another bug found during the leap second madness, it's possible to miss 
> the registration of new region servers so that in 
> ReplicationSourceManager.init we start the failover of a live and replicating 
> region server. I don't think there's data loss but the RS that's being failed 
> over will die on:
> {noformat}
> 2012-07-01 06:25:15,604 FATAL 
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
> sv4r23s48,10304,1341112194623: Writing replication status
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> NoNode for 
> /hbase/replication/rs/sv4r23s48,10304,1341112194623/4/sv4r23s48%2C10304%2C1341112194623.1341112195369
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
> at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
> at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:655)
> at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:697)
> at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:470)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:154)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:607)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:368)
> {noformat}
> It seems to me that just refreshing {{otherRegionServers}} after getting the 
> list of {{currentReplicators}} would be enough to fix this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6319) ReplicationSource can call terminate on itself and deadlock

2012-07-19 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6319:
-

Fix Version/s: (was: 0.94.2)
   0.94.1

+1 on patch.

> ReplicationSource can call terminate on itself and deadlock
> ---
>
> Key: HBASE-6319
> URL: https://issues.apache.org/jira/browse/HBASE-6319
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.6, 0.92.1, 0.94.0
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.92.2, 0.94.1, 0.90.8
>
> Attachments: HBASE-6319-0.92.patch
>
>
> In a few places in the ReplicationSource code calls terminate on itself which 
> is a problem since in terminate() we wait on that thread to die.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6312) Make BlockCache eviction thresholds configurable

2012-07-19 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-6312:
--

  Resolution: Fixed
Release Note: 
>From now on, the block cache will use all the memory it's given as its upper 
>bound was raised from 85% to 99%. The lower bound for evictions, called 
>"minimum factor", was raised from 75% to 95% and is now configurable via 
>"hbase.lru.blockcache.min.factor". This means that 4% of the block cache is 
>evicted at a time instead of 10%, so evictions may run more often but each 
>will be less disruptive.

Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Closed the jira and added a release note.

> Make BlockCache eviction thresholds configurable
> 
>
> Key: HBASE-6312
> URL: https://issues.apache.org/jira/browse/HBASE-6312
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Affects Versions: 0.94.0
>Reporter: Jie Huang
>Assignee: Jie Huang
>Priority: Minor
> Fix For: 0.96.0
>
> Attachments: hbase-6312.patch, hbase-6312_v2.patch, 
> hbase-6312_v3.patch
>
>
> Some of our customers found that tuning the BlockCache eviction thresholds 
> made test results different in their test environment. However, those 
> thresholds are not configurable in the current implementation. The only way 
> to change those values is to re-compile the HBase source code. We wonder if 
> it is possible to make them configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5547) Don't delete HFiles when in "backup mode"

2012-07-19 Thread Jesse Yates (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418707#comment-13418707
 ] 

Jesse Yates commented on HBASE-5547:


Assuming changes between v15 and v16 are just what Ted mentioned on his last 
post on RB, then I'm good. Lets give it a day or so, before we integrate, so 
people have time to look at RB, if they haven't yet. Failed test doesn't apply 
to this code.

> Don't delete HFiles when in "backup mode"
> -
>
> Key: HBASE-5547
> URL: https://issues.apache.org/jira/browse/HBASE-5547
> Project: HBase
>  Issue Type: New Feature
>Reporter: Lars Hofhansl
>Assignee: Jesse Yates
> Fix For: 0.94.2
>
> Attachments: 5547-v12.txt, 5547-v16.txt, hbase-5447-v8.patch, 
> hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, 
> java_HBASE-5547_v14.patch, java_HBASE-5547_v15.patch, 
> java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, 
> java_HBASE-5547_v7.patch
>
>
> This came up in a discussion I had with Stack.
> It would be nice if HBase could be notified that a backup is in progress (via 
> a znode for example) and in that case either:
> 1. rename HFiles to be delete to .bck
> 2. rename the HFiles into a special directory
> 3. rename them to a general trash directory (which would not need to be tied 
> to backup mode).
> That way it should be able to get a consistent backup based on HFiles (HDFS 
> snapshots or hard links would be better options here, but we do not have 
> those).
> #1 makes cleanup a bit harder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4956) Control direct memory buffer consumption by HBaseClient

2012-07-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418705#comment-13418705
 ] 

Hudson commented on HBASE-4956:
---

Integrated in HBase-0.94 #341 (See 
[https://builds.apache.org/job/HBase-0.94/341/])
HBASE-4956 Control direct memory buffer consumption by HBaseClient (Bob 
Copeland) (Revision 1363533)

 Result = SUCCESS
tedyu : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/Result.java


> Control direct memory buffer consumption by HBaseClient
> ---
>
> Key: HBASE-4956
> URL: https://issues.apache.org/jira/browse/HBASE-4956
> Project: HBase
>  Issue Type: New Feature
>Reporter: Ted Yu
>Assignee: Bob Copeland
> Fix For: 0.96.0, 0.94.1
>
> Attachments: 4956.txt, thread_get.rb
>
>
> As Jonathan explained here 
> https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357?pli=1
>  , standard hbase client inadvertently consumes large amount of direct memory.
> We should consider using netty for NIO-related tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5547) Don't delete HFiles when in "backup mode"

2012-07-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418690#comment-13418690
 ] 

Hadoop QA commented on HBASE-5547:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12537240/5547-v16.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 22 new or modified tests.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 5 javac compiler warnings (more than 
the trunk's current 4 warnings).

-1 findbugs.  The patch appears to introduce 13 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2412//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2412//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2412//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2412//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2412//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2412//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2412//console

This message is automatically generated.

> Don't delete HFiles when in "backup mode"
> -
>
> Key: HBASE-5547
> URL: https://issues.apache.org/jira/browse/HBASE-5547
> Project: HBase
>  Issue Type: New Feature
>Reporter: Lars Hofhansl
>Assignee: Jesse Yates
> Fix For: 0.94.2
>
> Attachments: 5547-v12.txt, 5547-v16.txt, hbase-5447-v8.patch, 
> hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, 
> java_HBASE-5547_v14.patch, java_HBASE-5547_v15.patch, 
> java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, 
> java_HBASE-5547_v7.patch
>
>
> This came up in a discussion I had with Stack.
> It would be nice if HBase could be notified that a backup is in progress (via 
> a znode for example) and in that case either:
> 1. rename HFiles to be delete to .bck
> 2. rename the HFiles into a special directory
> 3. rename them to a general trash directory (which would not need to be tied 
> to backup mode).
> That way it should be able to get a consistent backup based on HFiles (HDFS 
> snapshots or hard links would be better options here, but we do not have 
> those).
> #1 makes cleanup a bit harder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >