[jira] [Commented] (HBASE-6411) Move Master Metrics to metrics 2
[ https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418968#comment-13418968 ] Hadoop QA commented on HBASE-6411: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12537268/HBASE-6411-0.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 16 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.coprocessor.TestRowProcessorEndpoint org.apache.hadoop.hbase.security.access.TestZKPermissionsWatcher Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2418//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2418//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2418//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2418//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2418//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2418//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2418//console This message is automatically generated. > Move Master Metrics to metrics 2 > > > Key: HBASE-6411 > URL: https://issues.apache.org/jira/browse/HBASE-6411 > Project: HBase > Issue Type: Sub-task >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HBASE-6411-0.patch, HBASE-6411_concept.patch > > > Move Master Metrics to metrics 2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3725) HBase increments from old value after delete and write to disk
[ https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ShiXing updated HBASE-3725: --- Attachment: HBASE-3725-0.92-V6.patch toTed bq. TestHRegion#testIncrementWithFlushAndDelete passed without that assignment. Because the iscan is also read from memstore after I remove the code: {code} List fileResults = new ArrayList(); - iscan.checkOnlyStoreFiles(); scanner = null; try { scanner = getScanner(iscan); {code} And there is no result in memstore, so increment will treat it as 0, it has the same effect as delete. I add this case in TestHRegion#testIncrementWithFlushAndDelete in V6. > HBase increments from old value after delete and write to disk > -- > > Key: HBASE-3725 > URL: https://issues.apache.org/jira/browse/HBASE-3725 > Project: HBase > Issue Type: Bug > Components: io, regionserver >Affects Versions: 0.90.1 >Reporter: Nathaniel Cook >Assignee: Jonathan Gray > Attachments: HBASE-3725-0.92-V1.patch, HBASE-3725-0.92-V2.patch, > HBASE-3725-0.92-V3.patch, HBASE-3725-0.92-V4.patch, HBASE-3725-0.92-V5.patch, > HBASE-3725-0.92-V6.patch, HBASE-3725-Test-v1.patch, HBASE-3725-v3.patch, > HBASE-3725.patch > > > Deleted row values are sometimes used for starting points on new increments. > To reproduce: > Create a row "r". Set column "x" to some default value. > Force hbase to write that value to the file system (such as restarting the > cluster). > Delete the row. > Call table.incrementColumnValue with "some_value" > Get the row. > The returned value in the column was incremented from the old value before > the row was deleted instead of being initialized to "some_value". > Code to reproduce: > {code} > import java.io.IOException; > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.hbase.HBaseConfiguration; > import org.apache.hadoop.hbase.HColumnDescriptor; > import org.apache.hadoop.hbase.HTableDescriptor; > import org.apache.hadoop.hbase.client.Delete; > import org.apache.hadoop.hbase.client.Get; > import org.apache.hadoop.hbase.client.HBaseAdmin; > import org.apache.hadoop.hbase.client.HTableInterface; > import org.apache.hadoop.hbase.client.HTablePool; > import org.apache.hadoop.hbase.client.Increment; > import org.apache.hadoop.hbase.client.Result; > import org.apache.hadoop.hbase.util.Bytes; > public class HBaseTestIncrement > { > static String tableName = "testIncrement"; > static byte[] infoCF = Bytes.toBytes("info"); > static byte[] rowKey = Bytes.toBytes("test-rowKey"); > static byte[] newInc = Bytes.toBytes("new"); > static byte[] oldInc = Bytes.toBytes("old"); > /** >* This code reproduces a bug with increment column values in hbase >* Usage: First run part one by passing '1' as the first arg >*Then restart the hbase cluster so it writes everything to disk >*Run part two by passing '2' as the first arg >* >* This will result in the old deleted data being found and used for > the increment calls >* >* @param args >* @throws IOException >*/ > public static void main(String[] args) throws IOException > { > if("1".equals(args[0])) > partOne(); > if("2".equals(args[0])) > partTwo(); > if ("both".equals(args[0])) > { > partOne(); > partTwo(); > } > } > /** >* Creates a table and increments a column value 10 times by 10 each > time. >* Results in a value of 100 for the column >* >* @throws IOException >*/ > static void partOne()throws IOException > { > Configuration conf = HBaseConfiguration.create(); > HBaseAdmin admin = new HBaseAdmin(conf); > HTableDescriptor tableDesc = new HTableDescriptor(tableName); > tableDesc.addFamily(new HColumnDescriptor(infoCF)); > if(admin.tableExists(tableName)) > { > admin.disableTable(tableName); > admin.deleteTable(tableName); > } > admin.createTable(tableDesc); > HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE); > HTableInterface table = pool.getTable(Bytes.toBytes(tableName)); > //Increment unitialized column > for (int j = 0; j < 10; j++) > { > table.incrementColumnValue(rowKey, infoCF, oldInc, > (long)10); > Increment inc = new Increment(rowKey); > inc.addColumn(infoCF, newInc, (long)10); >
[jira] [Commented] (HBASE-6363) HBaseConfiguration can carry a main method that dumps XML output for debug purposes
[ https://issues.apache.org/jira/browse/HBASE-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418940#comment-13418940 ] Shengsheng Huang commented on HBASE-6363: - Seems reasonable. I only got a little bit concern about package dependency, because some of our customers are very reluctant to upgrade their stable hadoop deployment. A standalone patch is good to have. > HBaseConfiguration can carry a main method that dumps XML output for debug > purposes > --- > > Key: HBASE-6363 > URL: https://issues.apache.org/jira/browse/HBASE-6363 > Project: HBase > Issue Type: Improvement > Components: util >Affects Versions: 0.94.0 >Reporter: Harsh J >Priority: Trivial > Labels: conf, newbie, noob > Attachments: HBASE-6363.2.patch, HBASE-6363.patch > > > Just like the Configuration class carries a main() method in it, that simply > loads itself and writes XML out to System.out, HBaseConfiguration can use the > same kinda method. > That way we can do "hbase org.apache.hadoop.….HBaseConfiguration" to get an > Xml dump of things HBaseConfiguration has properly loaded. Nifty in checking > app classpaths sometimes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6429) Filter with filterRow() returning true is also incompatible with scan with limit
[ https://issues.apache.org/jira/browse/HBASE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Huang updated HBASE-6429: - Attachment: hbase-6429-trunk.patch 1. Prepare a patch against trunk 2. Add one more unit test case (TestFilterWithScanLimits) 3. Fix 2 unit test failures in the previous version. > Filter with filterRow() returning true is also incompatible with scan with > limit > > > Key: HBASE-6429 > URL: https://issues.apache.org/jira/browse/HBASE-6429 > Project: HBase > Issue Type: Bug > Components: filters >Affects Versions: 0.96.0 >Reporter: Jason Dai > Attachments: hbase-6429-trunk.patch, hbase-6429_0_94_0.patch > > > Currently if we scan with bot limit and a Filter with > filterRow(List) implemented, an IncompatibleFilterException will > be thrown. The same exception should also be thrown if the filer has its > filterRow() implemented. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6406) TestReplicationPeer.testResetZooKeeperSession and TestZooKeeper.testClientSessionExpired fail frequently
[ https://issues.apache.org/jira/browse/HBASE-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418936#comment-13418936 ] Lars Hofhansl commented on HBASE-6406: -- TestZooKeeper.testClientSessionExpired failed again in latest 0.94 build. Although this is not obvious from the logs the pattern in the code is that same as in TestReplicationPeer. My initial suspicion was RecoverableZooKeeper and that it somehow retries the operation and thereby reconnects the expired session. According to the code it does not do that, though. Somehow HBaseTestingUtil.expireSession is subject to racing. In the case of TestReplicationPeer that happened when expireSession is called before the connection was actually established. Is there a way to check whether the connection was established first and wait if it wasn't? Otherwise, I'd say we disable this test for now. > TestReplicationPeer.testResetZooKeeperSession and > TestZooKeeper.testClientSessionExpired fail frequently > > > Key: HBASE-6406 > URL: https://issues.apache.org/jira/browse/HBASE-6406 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.1 >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Fix For: 0.96.0, 0.94.1 > > Attachments: 6406.txt, testReplication.jstack, testZooKeeper.jstack > > > Looking back through the 0.94 test runs these two tests accounted for 11 of > 34 failed tests. > They should be fixed or (temporarily) disabled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3725) HBase increments from old value after delete and write to disk
[ https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418933#comment-13418933 ] ShiXing commented on HBASE-3725: @Ted, the reassignment is because there is no interface to set the iscan back to both memstore and filestore, because at the begining, the iscan is set memstore {code} // memstore scan iscan.checkOnlyMemStore(); {code} > HBase increments from old value after delete and write to disk > -- > > Key: HBASE-3725 > URL: https://issues.apache.org/jira/browse/HBASE-3725 > Project: HBase > Issue Type: Bug > Components: io, regionserver >Affects Versions: 0.90.1 >Reporter: Nathaniel Cook >Assignee: Jonathan Gray > Attachments: HBASE-3725-0.92-V1.patch, HBASE-3725-0.92-V2.patch, > HBASE-3725-0.92-V3.patch, HBASE-3725-0.92-V4.patch, HBASE-3725-0.92-V5.patch, > HBASE-3725-Test-v1.patch, HBASE-3725-v3.patch, HBASE-3725.patch > > > Deleted row values are sometimes used for starting points on new increments. > To reproduce: > Create a row "r". Set column "x" to some default value. > Force hbase to write that value to the file system (such as restarting the > cluster). > Delete the row. > Call table.incrementColumnValue with "some_value" > Get the row. > The returned value in the column was incremented from the old value before > the row was deleted instead of being initialized to "some_value". > Code to reproduce: > {code} > import java.io.IOException; > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.hbase.HBaseConfiguration; > import org.apache.hadoop.hbase.HColumnDescriptor; > import org.apache.hadoop.hbase.HTableDescriptor; > import org.apache.hadoop.hbase.client.Delete; > import org.apache.hadoop.hbase.client.Get; > import org.apache.hadoop.hbase.client.HBaseAdmin; > import org.apache.hadoop.hbase.client.HTableInterface; > import org.apache.hadoop.hbase.client.HTablePool; > import org.apache.hadoop.hbase.client.Increment; > import org.apache.hadoop.hbase.client.Result; > import org.apache.hadoop.hbase.util.Bytes; > public class HBaseTestIncrement > { > static String tableName = "testIncrement"; > static byte[] infoCF = Bytes.toBytes("info"); > static byte[] rowKey = Bytes.toBytes("test-rowKey"); > static byte[] newInc = Bytes.toBytes("new"); > static byte[] oldInc = Bytes.toBytes("old"); > /** >* This code reproduces a bug with increment column values in hbase >* Usage: First run part one by passing '1' as the first arg >*Then restart the hbase cluster so it writes everything to disk >*Run part two by passing '2' as the first arg >* >* This will result in the old deleted data being found and used for > the increment calls >* >* @param args >* @throws IOException >*/ > public static void main(String[] args) throws IOException > { > if("1".equals(args[0])) > partOne(); > if("2".equals(args[0])) > partTwo(); > if ("both".equals(args[0])) > { > partOne(); > partTwo(); > } > } > /** >* Creates a table and increments a column value 10 times by 10 each > time. >* Results in a value of 100 for the column >* >* @throws IOException >*/ > static void partOne()throws IOException > { > Configuration conf = HBaseConfiguration.create(); > HBaseAdmin admin = new HBaseAdmin(conf); > HTableDescriptor tableDesc = new HTableDescriptor(tableName); > tableDesc.addFamily(new HColumnDescriptor(infoCF)); > if(admin.tableExists(tableName)) > { > admin.disableTable(tableName); > admin.deleteTable(tableName); > } > admin.createTable(tableDesc); > HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE); > HTableInterface table = pool.getTable(Bytes.toBytes(tableName)); > //Increment unitialized column > for (int j = 0; j < 10; j++) > { > table.incrementColumnValue(rowKey, infoCF, oldInc, > (long)10); > Increment inc = new Increment(rowKey); > inc.addColumn(infoCF, newInc, (long)10); > table.increment(inc); > } > Get get = new Get(rowKey); > Result r = table.get(get); > System.out.println("initial values: new " + > Bytes.toLong(r.getValue(infoCF, newInc)) + " old " + > B
[jira] [Updated] (HBASE-6432) HRegionServer doesn't properly set clusterId in conf
[ https://issues.apache.org/jira/browse/HBASE-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francis Liu updated HBASE-6432: --- Description: ClusterId is normally set into the passed conf during instantiation of an HTable class. In the case of a HRegionServer this is bypassed and set to "default" since getMaster() since it uses HBaseRPC to create the proxy directly and bypasses the class which retrieves and sets the correct clusterId. This becomes a problem with clients (ie within a coprocessor) using delegation tokens for authentication. Since the token's service will be the correct clusterId and while the TokenSelector is looking for one with service "default". was: ClusterId is normally set into the passed conf during instantiation of an HTable class. In the case of a HRegionServer this is bypassed and set to "default" since getMaster() bypasses the class which sets clusterID clusterId since it uses HBaseRPC to create the proxy to create the proxy directly. This becomes a problem with clients (ie within a coprocessor) using delegation tokens for authentication. Since the token's service will be the correct clusterId and while the TokenSelector is looking for one with service "default". > HRegionServer doesn't properly set clusterId in conf > > > Key: HBASE-6432 > URL: https://issues.apache.org/jira/browse/HBASE-6432 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.0 >Reporter: Francis Liu >Assignee: Francis Liu > Fix For: 0.96.0 > > Attachments: HBASE-6432_94.patch > > > ClusterId is normally set into the passed conf during instantiation of an > HTable class. In the case of a HRegionServer this is bypassed and set to > "default" since getMaster() since it uses HBaseRPC to create the proxy > directly and bypasses the class which retrieves and sets the correct > clusterId. > This becomes a problem with clients (ie within a coprocessor) using > delegation tokens for authentication. Since the token's service will be the > correct clusterId and while the TokenSelector is looking for one with service > "default". -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6431) Some FilterList Constructors break addFilter
[ https://issues.apache.org/jira/browse/HBASE-6431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418931#comment-13418931 ] Hadoop QA commented on HBASE-6431: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12537269/0001-HBASE-6431.-Some-FilterList-Constructors-break-addFi.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 12 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2417//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2417//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2417//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2417//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2417//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2417//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2417//console This message is automatically generated. > Some FilterList Constructors break addFilter > > > Key: HBASE-6431 > URL: https://issues.apache.org/jira/browse/HBASE-6431 > Project: HBase > Issue Type: Bug > Components: filters >Affects Versions: 0.92.1, 0.94.0 >Reporter: Alex Newman >Assignee: Alex Newman >Priority: Minor > Attachments: > 0001-HBASE-6431.-Some-FilterList-Constructors-break-addFi.patch > > > Some of the constructors for FilterList set the internal list of filters to > list types which don't support the add operation. As a result > FilterList(final List rowFilters) > FilterList(final Filter... rowFilters) > FilterList(final Operator operator, final List rowFilters) > FilterList(final Operator operator, final Filter... rowFilters) > may init private List filters = new ArrayList(); incorrectly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5498) Secure Bulk Load
[ https://issues.apache.org/jira/browse/HBASE-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francis Liu updated HBASE-5498: --- Attachment: HBASE-5498_draft_94.patch Laxman, here's a working patch. It incorporates HBASE-6432 which took some time debugging. I still have to address the other comments, some cleanup and TODOs. Let me know if this works for you. > Secure Bulk Load > > > Key: HBASE-5498 > URL: https://issues.apache.org/jira/browse/HBASE-5498 > Project: HBase > Issue Type: Improvement > Components: mapred, security >Reporter: Francis Liu >Assignee: Francis Liu > Fix For: 0.96.0 > > Attachments: HBASE-5498_draft.patch, HBASE-5498_draft_94.patch > > > Design doc: > https://cwiki.apache.org/confluence/display/HCATALOG/HBase+Secure+Bulk+Load > Short summary: > Security as it stands does not cover the bulkLoadHFiles() feature. Users > calling this method will bypass ACLs. Also loading is made more cumbersome in > a secure setting because of hdfs privileges. bulkLoadHFiles() moves the data > from user's directory to the hbase directory, which would require certain > write access privileges set. > Our solution is to create a coprocessor which makes use of AuthManager to > verify if a user has write access to the table. If so, launches a MR job as > the hbase user to do the importing (ie rewrite from text to hfiles). One > tricky part this job will have to do is impersonate the calling user when > reading the input files. We can do this by expecting the user to pass an hdfs > delegation token as part of the secureBulkLoad() coprocessor call and extend > an inputformat to make use of that token. The output is written to a > temporary directory accessible only by hbase and then bulkloadHFiles() is > called. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6406) TestReplicationPeer.testResetZooKeeperSession and TestZooKeeper.testClientSessionExpired fail frequently
[ https://issues.apache.org/jira/browse/HBASE-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6406: - Fix Version/s: (was: 0.94.2) 0.94.1 0.96.0 > TestReplicationPeer.testResetZooKeeperSession and > TestZooKeeper.testClientSessionExpired fail frequently > > > Key: HBASE-6406 > URL: https://issues.apache.org/jira/browse/HBASE-6406 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.1 >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Fix For: 0.96.0, 0.94.1 > > Attachments: 6406.txt, testReplication.jstack, testZooKeeper.jstack > > > Looking back through the 0.94 test runs these two tests accounted for 11 of > 34 failed tests. > They should be fixed or (temporarily) disabled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6428) Pluggable Compaction policies
[ https://issues.apache.org/jira/browse/HBASE-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418924#comment-13418924 ] Lars Hofhansl commented on HBASE-6428: -- Another way of looking at this is a possible policy that considers all HFile in terms of a baseline + changes on top of that baseline. (For the record: I am not saying that I will do this any time soon, just recording this as an idea). > Pluggable Compaction policies > - > > Key: HBASE-6428 > URL: https://issues.apache.org/jira/browse/HBASE-6428 > Project: HBase > Issue Type: New Feature >Reporter: Lars Hofhansl > > For some usecases is useful to allow more control over how KVs get compacted. > For example one could envision storing old versions of a KV separate HFiles, > which then rarely have to be touched/cached by queries querying for new data. > In addition these date ranged HFile can be easily used for backups while > maintaining historical data. > This would be a major change, allowing compactions to provide multiple > targets (not just a filter). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6411) Move Master Metrics to metrics 2
[ https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418923#comment-13418923 ] Elliott Clark commented on HBASE-6411: -- Sorry didn't mean to re-assign. I must have done that when submitting to hadoop qa. Sorry I didn't mean to step on any toes. I agree that a metrics factory or something like it could be very useful. However like I said above I was hoping to take a crack using guice to do most of the factory stuff. However maybe until I get that up it would be useful. On #2 I don't think removing them interface completely is really the way to go since both the replication metrics and the region server metrics are mostly dynamic metrics; ie they aren't pre-created like the master metrics. I think it still makes sense to have a source that's mostly focused on those map based metrics. > Move Master Metrics to metrics 2 > > > Key: HBASE-6411 > URL: https://issues.apache.org/jira/browse/HBASE-6411 > Project: HBase > Issue Type: Sub-task >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HBASE-6411-0.patch, HBASE-6411_concept.patch > > > Move Master Metrics to metrics 2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6427) Pluggable policy for smallestReadPoint in HRegion
[ https://issues.apache.org/jira/browse/HBASE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418922#comment-13418922 ] Lars Hofhansl commented on HBASE-6427: -- Let me clarify what I mean by this: If I wanted to implement an MVCC based optimistic transaction engine on top of HBase I would naturally want to use HBase's built in versioning (where possible). In that case it is not clear a priori how many versions to keep or for how long (i.e. specifying VERSION/TTL is too static). The outside engine would need to determine that. The simplest of all approaches would be to do that via the smallestReadpoint in each region, by making its determination pluggable. > Pluggable policy for smallestReadPoint in HRegion > - > > Key: HBASE-6427 > URL: https://issues.apache.org/jira/browse/HBASE-6427 > Project: HBase > Issue Type: New Feature >Reporter: Lars Hofhansl >Priority: Minor > > When implementing higher level stores on top of HBase it is necessary to > allow dynamic control over how long KVs must be kept around. > Semi-static config options for ColumnFamilies (# of version or TTL) is not > sufficient. > The simplest way to achieve this is to have a pluggable class to determine > the smallestReadpoint for Region. That way outside code can control what KVs > to retain. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6432) HRegionServer doesn't properly set clusterId in conf
[ https://issues.apache.org/jira/browse/HBASE-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francis Liu updated HBASE-6432: --- Attachment: HBASE-6432_94.patch a patch for 0.94 to get feedback on the approach. Things changed significant enough in trunk to need a separate patch. I'm hoping to get this backported to 0.94 since it is needed for security. > HRegionServer doesn't properly set clusterId in conf > > > Key: HBASE-6432 > URL: https://issues.apache.org/jira/browse/HBASE-6432 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.0 >Reporter: Francis Liu >Assignee: Francis Liu > Fix For: 0.96.0 > > Attachments: HBASE-6432_94.patch > > > ClusterId is normally set into the passed conf during instantiation of an > HTable class. In the case of a HRegionServer this is bypassed and set to > "default" since getMaster() bypasses the class which sets clusterID clusterId > since it uses HBaseRPC to create the proxy to create the proxy directly. > This becomes a problem with clients (ie within a coprocessor) using > delegation tokens for authentication. Since the token's service will be the > correct clusterId and while the TokenSelector is looking for one with service > "default". -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6432) HRegionServer doesn't properly set clusterId in conf
Francis Liu created HBASE-6432: -- Summary: HRegionServer doesn't properly set clusterId in conf Key: HBASE-6432 URL: https://issues.apache.org/jira/browse/HBASE-6432 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: Francis Liu Assignee: Francis Liu Fix For: 0.96.0 ClusterId is normally set into the passed conf during instantiation of an HTable class. In the case of a HRegionServer this is bypassed and set to "default" since getMaster() bypasses the class which sets clusterID clusterId since it uses HBaseRPC to create the proxy to create the proxy directly. This becomes a problem with clients (ie within a coprocessor) using delegation tokens for authentication. Since the token's service will be the correct clusterId and while the TokenSelector is looking for one with service "default". -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5843) Improve HBase MTTR - Mean Time To Recover
[ https://issues.apache.org/jira/browse/HBASE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418919#comment-13418919 ] nkeywal commented on HBASE-5843: bq. I'm confused as to what the 180s gap refers to. I see 980 (test 2) - 800 (test1) = 180, but that is against 0.92, which doesn't have HBASE-5970, right? Could you clarify? Yes, it's because with a clean stop, the RS unregisters itself in ZK, so the recovery starts immediately. With a kill -9, the RS remains registered in ZK. So if you don't have HBASE-5844 or HBASE-5926, you wait for the ZK timeout. bq. Awesome.. We think this is also due to HBASE-5970 and HBASE-6109? Yes. bq. Has a JIRA been filed? Not yet. I'm writing specific unit tests for this, I found issues that I have not yet fully analyzed, and I need to create the jiras. Also, may be my test was not good for this part: as I was doing the test without a datanode, it could be that the recovery was not working for this reason (I wonder if the sync works with the local file system for example). bq. Test to be changed to get a real difference when we need to replay the wal. bq. Could you clarify what you mean here? It's does not last long enough, so I won't be able to see much difference even if there is one. So I need to redo the work with a real datanode, check that it recovers, then check that I measure something meaningful. I will also redo the first tests with a DN to see if there is still a gap. > Improve HBase MTTR - Mean Time To Recover > - > > Key: HBASE-5843 > URL: https://issues.apache.org/jira/browse/HBASE-5843 > Project: HBase > Issue Type: Umbrella >Affects Versions: 0.96.0 >Reporter: nkeywal >Assignee: nkeywal > > A part of the approach is described here: > https://docs.google.com/document/d/1z03xRoZrIJmg7jsWuyKYl6zNournF_7ZHzdi0qz_B4c/edit > The ideal target is: > - failure impact client applications only by an added delay to execute a > query, whatever the failure. > - this delay is always inferior to 1 second. > We're not going to achieve that immediately... > Priority will be given to the most frequent issues. > Short term: > - software crash > - standard administrative tasks as stop/start of a cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6363) HBaseConfiguration can carry a main method that dumps XML output for debug purposes
[ https://issues.apache.org/jira/browse/HBASE-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418917#comment-13418917 ] Harsh J commented on HBASE-6363: Sorry, I didn't notice 1.x didn't have it! (I checked only against my 2.x installation, and CDH3 here seems to have had it backported at some point too). Instead of working around, I think we can rather backport it to a v1 future release, via: HADOOP-8567. > HBaseConfiguration can carry a main method that dumps XML output for debug > purposes > --- > > Key: HBASE-6363 > URL: https://issues.apache.org/jira/browse/HBASE-6363 > Project: HBase > Issue Type: Improvement > Components: util >Affects Versions: 0.94.0 >Reporter: Harsh J >Priority: Trivial > Labels: conf, newbie, noob > Attachments: HBASE-6363.2.patch, HBASE-6363.patch > > > Just like the Configuration class carries a main() method in it, that simply > loads itself and writes XML out to System.out, HBaseConfiguration can use the > same kinda method. > That way we can do "hbase org.apache.hadoop.….HBaseConfiguration" to get an > Xml dump of things HBaseConfiguration has properly loaded. Nifty in checking > app classpaths sometimes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3725) HBase increments from old value after delete and write to disk
[ https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418904#comment-13418904 ] Zhihong Ted Yu commented on HBASE-3725: --- Looking at existing code: {code} private List getLastIncrement(final Get get) throws IOException { InternalScan iscan = new InternalScan(get); {code} iscan was assigned at the beginning. Looks like the assignment in else block is redundant. TestHRegion#testIncrementWithFlushAndDelete passed without that assignment. > HBase increments from old value after delete and write to disk > -- > > Key: HBASE-3725 > URL: https://issues.apache.org/jira/browse/HBASE-3725 > Project: HBase > Issue Type: Bug > Components: io, regionserver >Affects Versions: 0.90.1 >Reporter: Nathaniel Cook >Assignee: Jonathan Gray > Attachments: HBASE-3725-0.92-V1.patch, HBASE-3725-0.92-V2.patch, > HBASE-3725-0.92-V3.patch, HBASE-3725-0.92-V4.patch, HBASE-3725-0.92-V5.patch, > HBASE-3725-Test-v1.patch, HBASE-3725-v3.patch, HBASE-3725.patch > > > Deleted row values are sometimes used for starting points on new increments. > To reproduce: > Create a row "r". Set column "x" to some default value. > Force hbase to write that value to the file system (such as restarting the > cluster). > Delete the row. > Call table.incrementColumnValue with "some_value" > Get the row. > The returned value in the column was incremented from the old value before > the row was deleted instead of being initialized to "some_value". > Code to reproduce: > {code} > import java.io.IOException; > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.hbase.HBaseConfiguration; > import org.apache.hadoop.hbase.HColumnDescriptor; > import org.apache.hadoop.hbase.HTableDescriptor; > import org.apache.hadoop.hbase.client.Delete; > import org.apache.hadoop.hbase.client.Get; > import org.apache.hadoop.hbase.client.HBaseAdmin; > import org.apache.hadoop.hbase.client.HTableInterface; > import org.apache.hadoop.hbase.client.HTablePool; > import org.apache.hadoop.hbase.client.Increment; > import org.apache.hadoop.hbase.client.Result; > import org.apache.hadoop.hbase.util.Bytes; > public class HBaseTestIncrement > { > static String tableName = "testIncrement"; > static byte[] infoCF = Bytes.toBytes("info"); > static byte[] rowKey = Bytes.toBytes("test-rowKey"); > static byte[] newInc = Bytes.toBytes("new"); > static byte[] oldInc = Bytes.toBytes("old"); > /** >* This code reproduces a bug with increment column values in hbase >* Usage: First run part one by passing '1' as the first arg >*Then restart the hbase cluster so it writes everything to disk >*Run part two by passing '2' as the first arg >* >* This will result in the old deleted data being found and used for > the increment calls >* >* @param args >* @throws IOException >*/ > public static void main(String[] args) throws IOException > { > if("1".equals(args[0])) > partOne(); > if("2".equals(args[0])) > partTwo(); > if ("both".equals(args[0])) > { > partOne(); > partTwo(); > } > } > /** >* Creates a table and increments a column value 10 times by 10 each > time. >* Results in a value of 100 for the column >* >* @throws IOException >*/ > static void partOne()throws IOException > { > Configuration conf = HBaseConfiguration.create(); > HBaseAdmin admin = new HBaseAdmin(conf); > HTableDescriptor tableDesc = new HTableDescriptor(tableName); > tableDesc.addFamily(new HColumnDescriptor(infoCF)); > if(admin.tableExists(tableName)) > { > admin.disableTable(tableName); > admin.deleteTable(tableName); > } > admin.createTable(tableDesc); > HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE); > HTableInterface table = pool.getTable(Bytes.toBytes(tableName)); > //Increment unitialized column > for (int j = 0; j < 10; j++) > { > table.incrementColumnValue(rowKey, infoCF, oldInc, > (long)10); > Increment inc = new Increment(rowKey); > inc.addColumn(infoCF, newInc, (long)10); > table.increment(inc); > } > Get get = new Get(rowKey); > Result r = table.get(get); >
[jira] [Resolved] (HBASE-6345) Utilize fault injection in testing using AspectJ
[ https://issues.apache.org/jira/browse/HBASE-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu resolved HBASE-6345. --- Resolution: Won't Fix There was not enough incentive to pursue fault injection using AspectJ. > Utilize fault injection in testing using AspectJ > > > Key: HBASE-6345 > URL: https://issues.apache.org/jira/browse/HBASE-6345 > Project: HBase > Issue Type: Bug >Reporter: Zhihong Ted Yu > > HDFS uses fault injection to test pipeline failure in addition to mock, spy. > HBase uses mock, spy. But there are cases where mock, spy aren't convenient. > Some example from DFSClientAspects.aj : > {code} > pointcut pipelineInitNonAppend(DataStreamer datastreamer): > callCreateBlockOutputStream(datastreamer) > && cflow(execution(* nextBlockOutputStream(..))) > && within(DataStreamer); > after(DataStreamer datastreamer) returning : > pipelineInitNonAppend(datastreamer) { > LOG.info("FI: after pipelineInitNonAppend: hasError=" > + datastreamer.hasError + " errorIndex=" + datastreamer.errorIndex); > if (datastreamer.hasError) { > DataTransferTest dtTest = DataTransferTestUtil.getDataTransferTest(); > if (dtTest != null) > dtTest.fiPipelineInitErrorNonAppend.run(datastreamer.errorIndex); > } > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6363) HBaseConfiguration can carry a main method that dumps XML output for debug purposes
[ https://issues.apache.org/jira/browse/HBASE-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418900#comment-13418900 ] Shengsheng Huang commented on HBASE-6363: - Thanks very much for clarification Harsh. It seems /conf is only added into Hadoop since release 0.21 (HADOOP-6408). As we're using hadoop v1 it didn't work at our local cluster. We would consider adding HADOOP-6408 patch into our local hadoop branch. After all, servlet config dump would contain all the configuration changes in code. Anyway, do you think it worth a seperate servlet to dump configuration as xml only? Or reorganize the dump output into more consistent format to make it easier for automatic parsing? > HBaseConfiguration can carry a main method that dumps XML output for debug > purposes > --- > > Key: HBASE-6363 > URL: https://issues.apache.org/jira/browse/HBASE-6363 > Project: HBase > Issue Type: Improvement > Components: util >Affects Versions: 0.94.0 >Reporter: Harsh J >Priority: Trivial > Labels: conf, newbie, noob > Attachments: HBASE-6363.2.patch, HBASE-6363.patch > > > Just like the Configuration class carries a main() method in it, that simply > loads itself and writes XML out to System.out, HBaseConfiguration can use the > same kinda method. > That way we can do "hbase org.apache.hadoop.….HBaseConfiguration" to get an > Xml dump of things HBaseConfiguration has properly loaded. Nifty in checking > app classpaths sometimes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6389: -- Status: Open (was: Patch Available) > Modify the conditions to ensure that Master waits for sufficient number of > Region Servers before starting region assignments > > > Key: HBASE-6389 > URL: https://issues.apache.org/jira/browse/HBASE-6389 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.94.0, 0.96.0 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Fix For: 0.96.0, 0.94.2 > > Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, > HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, > testReplication.jstack > > > Continuing from HBASE-6375. > It seems I was mistaken in my assumption that changing the value of > "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from > default of 1) can help prevent assignment of all regions to one (or a small > number of) region server(s). > While this was the case in 0.90.x and 0.92.x, the behavior has changed in > 0.94.0 onwards to address HBASE-4993. > From 0.94.0 onwards, Master will proceed immediately after the timeout has > lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not > reached. > Reading the current conditions of waitForRegionServers() clarifies it > {code:title=ServerManager.java (trunk rev:1360470)} > > 581 /** > 582 * Wait for the region servers to report in. > 583 * We will wait until one of this condition is met: > 584 * - the master is stopped > 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached > 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of > 587 *region servers is reached > 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached > AND > 589 * there have been no new region server in for > 590 * 'hbase.master.wait.on.regionservers.interval' time > 591 * > 592 * @throws InterruptedException > 593 */ > 594 public void waitForRegionServers(MonitoredTask status) > 595 throws InterruptedException { > > > 612 while ( > 613 !this.master.isStopped() && > 614 slept < timeout && > 615 count < maxToStart && > 616 (lastCountChange+interval > now || count < minToStart) > 617 ){ > > {code} > So with the current conditions, the wait will end as soon as timeout is > reached even lesser number of RS have checked-in with the Master and the > master will proceed with the region assignment among these RSes alone. > As mentioned in > -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, > and I concur, this could have disastrous effect in large cluster especially > now that MSLAB is turned on. > To enforce the required quorum as specified by > "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, > these conditions need to be modified as following > {code:title=ServerManager.java} > .. > /** >* Wait for the region servers to report in. >* We will wait until one of this condition is met: >* - the master is stopped >* - the 'hbase.master.wait.on.regionservers.maxtostart' number of >*region servers is reached >* - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND >* there have been no new region server in for >* 'hbase.master.wait.on.regionservers.interval' time AND >* the 'hbase.master.wait.on.regionservers.timeout' is reached >* >* @throws InterruptedException >*/ > public void waitForRegionServers(MonitoredTask status) > .. > .. > int minToStart = this.master.getConfiguration(). > getInt("hbase.master.wait.on.regionservers.mintostart", 1); > int maxToStart = this.master.getConfiguration(). > getInt("hbase.master.wait.on.regionservers.maxtostart", > Integer.MAX_VALUE); > if (maxToStart < minToStart) { > maxToStart = minToStart; > } > .. > .. > while ( > !this.master.isStopped() && > count < maxToStart && > (lastCountChange+interval > now || timeout > slept || count < > minToStart) > ){ > .. > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6411) Move Master Metrics to metrics 2
[ https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418898#comment-13418898 ] Alex Baranau commented on HBASE-6411: - Looks like you reassigned the task, so I should probably not touch the patch to avoid intersection, right? Was going to add actual metrics tests (which test metrics values changes in addition to testing factories/classes loading) and perhaps apply the 2nd point above, if it makes sense to you. > Move Master Metrics to metrics 2 > > > Key: HBASE-6411 > URL: https://issues.apache.org/jira/browse/HBASE-6411 > Project: HBase > Issue Type: Sub-task >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HBASE-6411-0.patch, HBASE-6411_concept.patch > > > Move Master Metrics to metrics 2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418895#comment-13418895 ] Hadoop QA commented on HBASE-6389: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12537286/testReplication.jstack against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2416//console This message is automatically generated. > Modify the conditions to ensure that Master waits for sufficient number of > Region Servers before starting region assignments > > > Key: HBASE-6389 > URL: https://issues.apache.org/jira/browse/HBASE-6389 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.94.0, 0.96.0 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Fix For: 0.96.0, 0.94.2 > > Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, > HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, > testReplication.jstack > > > Continuing from HBASE-6375. > It seems I was mistaken in my assumption that changing the value of > "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from > default of 1) can help prevent assignment of all regions to one (or a small > number of) region server(s). > While this was the case in 0.90.x and 0.92.x, the behavior has changed in > 0.94.0 onwards to address HBASE-4993. > From 0.94.0 onwards, Master will proceed immediately after the timeout has > lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not > reached. > Reading the current conditions of waitForRegionServers() clarifies it > {code:title=ServerManager.java (trunk rev:1360470)} > > 581 /** > 582 * Wait for the region servers to report in. > 583 * We will wait until one of this condition is met: > 584 * - the master is stopped > 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached > 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of > 587 *region servers is reached > 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached > AND > 589 * there have been no new region server in for > 590 * 'hbase.master.wait.on.regionservers.interval' time > 591 * > 592 * @throws InterruptedException > 593 */ > 594 public void waitForRegionServers(MonitoredTask status) > 595 throws InterruptedException { > > > 612 while ( > 613 !this.master.isStopped() && > 614 slept < timeout && > 615 count < maxToStart && > 616 (lastCountChange+interval > now || count < minToStart) > 617 ){ > > {code} > So with the current conditions, the wait will end as soon as timeout is > reached even lesser number of RS have checked-in with the Master and the > master will proceed with the region assignment among these RSes alone. > As mentioned in > -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, > and I concur, this could have disastrous effect in large cluster especially > now that MSLAB is turned on. > To enforce the required quorum as specified by > "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, > these conditions need to be modified as following > {code:title=ServerManager.java} > .. > /** >* Wait for the region servers to report in. >* We will wait until one of this condition is met: >* - the master is stopped >* - the 'hbase.master.wait.on.regionservers.maxtostart' number of >*region servers is reached >* - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND >* there have been no new region server in for >* 'hbase.master.wait.on.regionservers.interval' time AND >* the 'hbase.master.wait.on.regionservers.timeout' is reached >* >* @throws InterruptedException >*/ > public void waitForRegionServers(MonitoredTask status) > .. > .. > int minToStart = this.master.getConfiguration(). > getInt("hbase.master.wait.on.regionservers.mintostart", 1); > int maxToStart = this.master.getConfiguration(). > getInt("hbase.master.wait.on.regionservers.maxtostart", > Integer.MAX_VALUE); > if (maxToStart < minToStart) { > maxToStart = minToStart; > } > .. > .. > while ( > !this.maste
[jira] [Comment Edited] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418866#comment-13418866 ] Zhihong Ted Yu edited comment on HBASE-6389 at 7/20/12 2:53 AM: I ran test suite with latest patch on trunk and got: {code} Failed tests: testRunThriftServer[12](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine): expected:<1> but was:<0> testRunThriftServer[14](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine): expected:<1> but was:<0> testRunThriftServer[15](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine): expected:<1> but was:<0> testRunThriftServer[16](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine): expected:<1> but was:<0> testRunThriftServer[17](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine): expected:<1> but was:<0> Tests in error: testRegionCaching(org.apache.hadoop.hbase.client.TestHCM): org.apache.hadoop.hbase.UnknownRegionException: bd992463917ba68fe5389c5bf9e94a3a testCloseRegionThatFetchesTheHRIFromMeta(org.apache.hadoop.hbase.client.TestAdmin): -1 testTableExists(org.apache.hadoop.hbase.catalog.TestMetaReaderEditor): org.apache.hadoop.hbase.TableNotEnabledException: testTableExists testRunThriftServer[11](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine): test timed out after 6 milliseconds testRunThriftServer[13](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine): test timed out after 6 milliseconds {code} There was one hanging test: {code} at org.apache.hadoop.hbase.replication.TestReplication.setUp(TestReplication.java:183) {code} BTW what do *R*~i~, C and *F*~i~ represent in the formula above ? was (Author: zhi...@ebaysf.com): I ran test suite with latest patch on trunk and got: {code} Running org.apache.hadoop.hbase.client.TestHCM Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 37.265 sec <<< FAILURE! -- Running org.apache.hadoop.hbase.client.TestAdmin Tests run: 40, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 322.872 sec <<< FAILURE! -- Running org.apache.hadoop.hbase.catalog.TestMetaReaderEditor Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 134.193 sec <<< FAILURE! -- Running org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine Tests run: 20, Failures: 5, Errors: 2, Skipped: 0, Time elapsed: 669.588 sec <<< FAILURE! {code} There was one hanging test: {code} at org.apache.hadoop.hbase.replication.TestReplication.setUp(TestReplication.java:183) {code} BTW what do *R*~i~, C and *F*~i~ represent in the formula above ? > Modify the conditions to ensure that Master waits for sufficient number of > Region Servers before starting region assignments > > > Key: HBASE-6389 > URL: https://issues.apache.org/jira/browse/HBASE-6389 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.94.0, 0.96.0 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Fix For: 0.96.0, 0.94.2 > > Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, > HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, > testReplication.jstack > > > Continuing from HBASE-6375. > It seems I was mistaken in my assumption that changing the value of > "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from > default of 1) can help prevent assignment of all regions to one (or a small > number of) region server(s). > While this was the case in 0.90.x and 0.92.x, the behavior has changed in > 0.94.0 onwards to address HBASE-4993. > From 0.94.0 onwards, Master will proceed immediately after the timeout has > lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not > reached. > Reading the current conditions of waitForRegionServers() clarifies it > {code:title=ServerManager.java (trunk rev:1360470)} > > 581 /** > 582 * Wait for the region servers to report in. > 583 * We will wait until one of this condition is met: > 584 * - the master is stopped > 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached > 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of > 587 *region servers is reached > 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached > AND > 589 * there have been no new region server in for > 590 * 'hbase.master.wait.on.regionservers.interval' time > 591 * > 592 * @throws InterruptedException > 593 */ > 594 public void waitForRegionServers(MonitoredTask status) > 595 throws InterruptedException { > .
[jira] [Commented] (HBASE-3725) HBase increments from old value after delete and write to disk
[ https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418892#comment-13418892 ] ShiXing commented on HBASE-3725: @Ted bq. I generate a region with 3 store files. The increment slows from 1810 tps to 1020 tps, it slows 43.6%, . The tps is increment the same rowkey. The performance depends on how frequently the memstore flushed to the store file. If I also do the same test case, the latest patch's performance is same as the orig, because the increment rowkey is always in the memstore, and we do not need to read the store file. The difference is only for the rowKey that can't get the value from memstore, it need do a more read from memstore , compared to the 0.92 trunk: read only from store file. You must know, the orig's high performance is just benefit by only read from the memstore. > HBase increments from old value after delete and write to disk > -- > > Key: HBASE-3725 > URL: https://issues.apache.org/jira/browse/HBASE-3725 > Project: HBase > Issue Type: Bug > Components: io, regionserver >Affects Versions: 0.90.1 >Reporter: Nathaniel Cook >Assignee: Jonathan Gray > Attachments: HBASE-3725-0.92-V1.patch, HBASE-3725-0.92-V2.patch, > HBASE-3725-0.92-V3.patch, HBASE-3725-0.92-V4.patch, HBASE-3725-0.92-V5.patch, > HBASE-3725-Test-v1.patch, HBASE-3725-v3.patch, HBASE-3725.patch > > > Deleted row values are sometimes used for starting points on new increments. > To reproduce: > Create a row "r". Set column "x" to some default value. > Force hbase to write that value to the file system (such as restarting the > cluster). > Delete the row. > Call table.incrementColumnValue with "some_value" > Get the row. > The returned value in the column was incremented from the old value before > the row was deleted instead of being initialized to "some_value". > Code to reproduce: > {code} > import java.io.IOException; > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.hbase.HBaseConfiguration; > import org.apache.hadoop.hbase.HColumnDescriptor; > import org.apache.hadoop.hbase.HTableDescriptor; > import org.apache.hadoop.hbase.client.Delete; > import org.apache.hadoop.hbase.client.Get; > import org.apache.hadoop.hbase.client.HBaseAdmin; > import org.apache.hadoop.hbase.client.HTableInterface; > import org.apache.hadoop.hbase.client.HTablePool; > import org.apache.hadoop.hbase.client.Increment; > import org.apache.hadoop.hbase.client.Result; > import org.apache.hadoop.hbase.util.Bytes; > public class HBaseTestIncrement > { > static String tableName = "testIncrement"; > static byte[] infoCF = Bytes.toBytes("info"); > static byte[] rowKey = Bytes.toBytes("test-rowKey"); > static byte[] newInc = Bytes.toBytes("new"); > static byte[] oldInc = Bytes.toBytes("old"); > /** >* This code reproduces a bug with increment column values in hbase >* Usage: First run part one by passing '1' as the first arg >*Then restart the hbase cluster so it writes everything to disk >*Run part two by passing '2' as the first arg >* >* This will result in the old deleted data being found and used for > the increment calls >* >* @param args >* @throws IOException >*/ > public static void main(String[] args) throws IOException > { > if("1".equals(args[0])) > partOne(); > if("2".equals(args[0])) > partTwo(); > if ("both".equals(args[0])) > { > partOne(); > partTwo(); > } > } > /** >* Creates a table and increments a column value 10 times by 10 each > time. >* Results in a value of 100 for the column >* >* @throws IOException >*/ > static void partOne()throws IOException > { > Configuration conf = HBaseConfiguration.create(); > HBaseAdmin admin = new HBaseAdmin(conf); > HTableDescriptor tableDesc = new HTableDescriptor(tableName); > tableDesc.addFamily(new HColumnDescriptor(infoCF)); > if(admin.tableExists(tableName)) > { > admin.disableTable(tableName); > admin.deleteTable(tableName); > } > admin.createTable(tableDesc); > HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE); > HTableInterface table = pool.getTable(Bytes.toBytes(tableName)); > //Increment unitialized column > for (int j = 0; j < 10; j++) > { >
[jira] [Commented] (HBASE-6411) Move Master Metrics to metrics 2
[ https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418891#comment-13418891 ] Alex Baranau commented on HBASE-6411: - Glanced over your patch. I like this way better (over initial patch at 4050): exposing the real interface of MetricsSource (in this case master metrics). I.e. with methods defines, not empty + hashmap. 1. What do you think about having MasterMetricsFactory available through compat module (created by CompatibilitySingletonFactory?) which is creating MetricsSource, like this: interface MasterMetricsFactory { MasterMetricsSource create(final String name, final String sessionId); } This way we could pass parameters and control creation of metrics source. 2. Independent on the above: how about removing BaseMetricsSource interface from compat as we don't really need it with explicit definition of metrics in sources? This way current BaseMetricsSourceImpl could be renamed to MetricsRegistry and used via composition (as a field) in metrics sources instead of realization. Thus, creating & initializing of the sources which might be different for each could stay in metrics source implementation itself. Including deciding on using JvmMetricsSource (I assume not every source should create it), etc. This way they would look as normal metricsSources from hadoop codebase, just that they will use hbase's MetricsRegistry which allows metrics removals. Thoughts? > Move Master Metrics to metrics 2 > > > Key: HBASE-6411 > URL: https://issues.apache.org/jira/browse/HBASE-6411 > Project: HBase > Issue Type: Sub-task >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HBASE-6411-0.patch, HBASE-6411_concept.patch > > > Move Master Metrics to metrics 2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6417) hbck merges .META. regions if there's an old leftover
[ https://issues.apache.org/jira/browse/HBASE-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418887#comment-13418887 ] Jonathan Hsieh commented on HBASE-6417: --- Feels like we could add an option to not do repairs on META unless forced to. > hbck merges .META. regions if there's an old leftover > - > > Key: HBASE-6417 > URL: https://issues.apache.org/jira/browse/HBASE-6417 > Project: HBase > Issue Type: Bug >Reporter: Jean-Daniel Cryans > Fix For: 0.96.0, 0.94.2 > > Attachments: hbck.log > > > Trying to see what caused HBASE-6310, one of the things I figured is that the > bad .META. row is actually one from the time that we were permitting meta > splitting and that folder had just been staying there for a while. > So I tried to recreate the issue with -repair and it merged my good .META. > region with the one that's 3 years old that also has the same start key. I > ended up with a brand new .META. region! > I'll be attaching the full log in a separate file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6363) HBaseConfiguration can carry a main method that dumps XML output for debug purposes
[ https://issues.apache.org/jira/browse/HBASE-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418882#comment-13418882 ] Harsh J commented on HBASE-6363: Thanks again Shengsheng. The /dump servlet is more verbose than the simple XML given by /conf servlet. If its just config you need, /conf is where you need to go to, not /dump. But for the sake of debuggability, suggesting /dump in the javadoc does seem fine to do for HBase. I think the patch looks good. If needed, we can switch /dump with /conf (since we're discussing just configs, not env. info as well), but otherwise I think it does what the goal of this report was. Thanks again! > HBaseConfiguration can carry a main method that dumps XML output for debug > purposes > --- > > Key: HBASE-6363 > URL: https://issues.apache.org/jira/browse/HBASE-6363 > Project: HBase > Issue Type: Improvement > Components: util >Affects Versions: 0.94.0 >Reporter: Harsh J >Priority: Trivial > Labels: conf, newbie, noob > Attachments: HBASE-6363.2.patch, HBASE-6363.patch > > > Just like the Configuration class carries a main() method in it, that simply > loads itself and writes XML out to System.out, HBaseConfiguration can use the > same kinda method. > That way we can do "hbase org.apache.hadoop.….HBaseConfiguration" to get an > Xml dump of things HBaseConfiguration has properly loaded. Nifty in checking > app classpaths sometimes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6325) [replication] Race in ReplicationSourceManager.init can initiate a failover even if the node is alive
[ https://issues.apache.org/jira/browse/HBASE-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418880#comment-13418880 ] Hudson commented on HBASE-6325: --- Integrated in HBase-0.92 #480 (See [https://builds.apache.org/job/HBase-0.92/480/]) HBASE-6319 ReplicationSource can call terminate on itself and deadlock HBASE-6325 [replication] Race in ReplicationSourceManager.init can initiate a failover even if the node is alive (Revision 1363571) Result = FAILURE jdcryans : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java > [replication] Race in ReplicationSourceManager.init can initiate a failover > even if the node is alive > - > > Key: HBASE-6325 > URL: https://issues.apache.org/jira/browse/HBASE-6325 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.6, 0.92.1, 0.94.0 >Reporter: Jean-Daniel Cryans >Assignee: Jean-Daniel Cryans > Fix For: 0.92.2, 0.96.0, 0.94.1 > > Attachments: HBASE-6325-0.92-v2.patch, HBASE-6325-0.92.patch > > > Yet another bug found during the leap second madness, it's possible to miss > the registration of new region servers so that in > ReplicationSourceManager.init we start the failover of a live and replicating > region server. I don't think there's data loss but the RS that's being failed > over will die on: > {noformat} > 2012-07-01 06:25:15,604 FATAL > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server > sv4r23s48,10304,1341112194623: Writing replication status > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for > /hbase/replication/rs/sv4r23s48,10304,1341112194623/4/sv4r23s48%2C10304%2C1341112194623.1341112195369 > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:111) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246) > at > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372) > at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:655) > at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:697) > at > org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:470) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:154) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:607) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:368) > {noformat} > It seems to me that just refreshing {{otherRegionServers}} after getting the > list of {{currentReplicators}} would be enough to fix this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6319) ReplicationSource can call terminate on itself and deadlock
[ https://issues.apache.org/jira/browse/HBASE-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418879#comment-13418879 ] Hudson commented on HBASE-6319: --- Integrated in HBase-0.92 #480 (See [https://builds.apache.org/job/HBase-0.92/480/]) HBASE-6319 ReplicationSource can call terminate on itself and deadlock HBASE-6325 [replication] Race in ReplicationSourceManager.init can initiate a failover even if the node is alive (Revision 1363571) Result = FAILURE jdcryans : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java > ReplicationSource can call terminate on itself and deadlock > --- > > Key: HBASE-6319 > URL: https://issues.apache.org/jira/browse/HBASE-6319 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.6, 0.92.1, 0.94.0 >Reporter: Jean-Daniel Cryans >Assignee: Jean-Daniel Cryans > Fix For: 0.92.2, 0.94.1 > > Attachments: HBASE-6319-0.92.patch > > > In a few places in the ReplicationSource code calls terminate on itself which > is a problem since in terminate() we wait on that thread to die. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6363) HBaseConfiguration can carry a main method that dumps XML output for debug purposes
[ https://issues.apache.org/jira/browse/HBASE-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shengsheng Huang updated HBASE-6363: Attachment: HBASE-6363.2.patch Updated the patch according to @Harsh's comments. Actually we did the patch for automation purposes. Http master/dump contains much more information than we needed. > HBaseConfiguration can carry a main method that dumps XML output for debug > purposes > --- > > Key: HBASE-6363 > URL: https://issues.apache.org/jira/browse/HBASE-6363 > Project: HBase > Issue Type: Improvement > Components: util >Affects Versions: 0.94.0 >Reporter: Harsh J >Priority: Trivial > Labels: conf, newbie, noob > Attachments: HBASE-6363.2.patch, HBASE-6363.patch > > > Just like the Configuration class carries a main() method in it, that simply > loads itself and writes XML out to System.out, HBaseConfiguration can use the > same kinda method. > That way we can do "hbase org.apache.hadoop.….HBaseConfiguration" to get an > Xml dump of things HBaseConfiguration has properly loaded. Nifty in checking > app classpaths sometimes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418866#comment-13418866 ] Zhihong Ted Yu edited comment on HBASE-6389 at 7/20/12 1:41 AM: I ran test suite with latest patch on trunk and got: {code} Running org.apache.hadoop.hbase.client.TestHCM Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 37.265 sec <<< FAILURE! -- Running org.apache.hadoop.hbase.client.TestAdmin Tests run: 40, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 322.872 sec <<< FAILURE! -- Running org.apache.hadoop.hbase.catalog.TestMetaReaderEditor Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 134.193 sec <<< FAILURE! -- Running org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine Tests run: 20, Failures: 5, Errors: 2, Skipped: 0, Time elapsed: 669.588 sec <<< FAILURE! {code} There was one hanging test: {code} at org.apache.hadoop.hbase.replication.TestReplication.setUp(TestReplication.java:183) {code} BTW what do *R*~i~, C and *F*~i~ represent in the formula above ? was (Author: zhi...@ebaysf.com): I ran test suite with latest patch on trunk and got: {code} Running org.apache.hadoop.hbase.client.TestHCM Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 37.265 sec <<< FAILURE! -- Running org.apache.hadoop.hbase.client.TestAdmin Tests run: 40, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 322.872 sec <<< FAILURE! -- Running org.apache.hadoop.hbase.catalog.TestMetaReaderEditor Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 134.193 sec <<< FAILURE! -- Running org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine Tests run: 20, Failures: 5, Errors: 2, Skipped: 0, Time elapsed: 669.588 sec <<< FAILURE! {code} There was one hanging test: {code} at org.apache.hadoop.hbase.replication.TestReplication.setUp(TestReplication.java:183) {code} BTW what do R~i~, C and F~i~ represent in the formula above ? > Modify the conditions to ensure that Master waits for sufficient number of > Region Servers before starting region assignments > > > Key: HBASE-6389 > URL: https://issues.apache.org/jira/browse/HBASE-6389 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.94.0, 0.96.0 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Fix For: 0.96.0, 0.94.2 > > Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, > HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, > testReplication.jstack > > > Continuing from HBASE-6375. > It seems I was mistaken in my assumption that changing the value of > "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from > default of 1) can help prevent assignment of all regions to one (or a small > number of) region server(s). > While this was the case in 0.90.x and 0.92.x, the behavior has changed in > 0.94.0 onwards to address HBASE-4993. > From 0.94.0 onwards, Master will proceed immediately after the timeout has > lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not > reached. > Reading the current conditions of waitForRegionServers() clarifies it > {code:title=ServerManager.java (trunk rev:1360470)} > > 581 /** > 582 * Wait for the region servers to report in. > 583 * We will wait until one of this condition is met: > 584 * - the master is stopped > 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached > 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of > 587 *region servers is reached > 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached > AND > 589 * there have been no new region server in for > 590 * 'hbase.master.wait.on.regionservers.interval' time > 591 * > 592 * @throws InterruptedException > 593 */ > 594 public void waitForRegionServers(MonitoredTask status) > 595 throws InterruptedException { > > > 612 while ( > 613 !this.master.isStopped() && > 614 slept < timeout && > 615 count < maxToStart && > 616 (lastCountChange+interval > now || count < minToStart) > 617 ){ > > {code} > So with the current conditions, the wait will end as soon as timeout is > reached even lesser number of RS have checked-in with the Master and the > master will proceed with the region assignment among these RSes alone. > As mentioned in > -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, > and I concur,
[jira] [Comment Edited] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418866#comment-13418866 ] Zhihong Ted Yu edited comment on HBASE-6389 at 7/20/12 1:37 AM: I ran test suite with latest patch on trunk and got: {code} Running org.apache.hadoop.hbase.client.TestHCM Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 37.265 sec <<< FAILURE! -- Running org.apache.hadoop.hbase.client.TestAdmin Tests run: 40, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 322.872 sec <<< FAILURE! -- Running org.apache.hadoop.hbase.catalog.TestMetaReaderEditor Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 134.193 sec <<< FAILURE! -- Running org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine Tests run: 20, Failures: 5, Errors: 2, Skipped: 0, Time elapsed: 669.588 sec <<< FAILURE! {code} There was one hanging test: {code} at org.apache.hadoop.hbase.replication.TestReplication.setUp(TestReplication.java:183) {code} BTW what do R~i~, C and F~i~ represent in the formula above ? was (Author: zhi...@ebaysf.com): I ran test suite with latest patch on trunk and got: {code} Running org.apache.hadoop.hbase.client.TestHCM Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 37.265 sec <<< FAILURE! -- Running org.apache.hadoop.hbase.client.TestAdmin Tests run: 40, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 322.872 sec <<< FAILURE! -- Running org.apache.hadoop.hbase.catalog.TestMetaReaderEditor Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 134.193 sec <<< FAILURE! -- Running org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine Tests run: 20, Failures: 5, Errors: 2, Skipped: 0, Time elapsed: 669.588 sec <<< FAILURE! {code} There was one hanging test: {code} at org.apache.hadoop.hbase.replication.TestReplication.setUp(TestReplication.java:183) {code} BTW what do R sub i, C and F sub i represent in the formula above ? > Modify the conditions to ensure that Master waits for sufficient number of > Region Servers before starting region assignments > > > Key: HBASE-6389 > URL: https://issues.apache.org/jira/browse/HBASE-6389 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.94.0, 0.96.0 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Fix For: 0.96.0, 0.94.2 > > Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, > HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, > testReplication.jstack > > > Continuing from HBASE-6375. > It seems I was mistaken in my assumption that changing the value of > "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from > default of 1) can help prevent assignment of all regions to one (or a small > number of) region server(s). > While this was the case in 0.90.x and 0.92.x, the behavior has changed in > 0.94.0 onwards to address HBASE-4993. > From 0.94.0 onwards, Master will proceed immediately after the timeout has > lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not > reached. > Reading the current conditions of waitForRegionServers() clarifies it > {code:title=ServerManager.java (trunk rev:1360470)} > > 581 /** > 582 * Wait for the region servers to report in. > 583 * We will wait until one of this condition is met: > 584 * - the master is stopped > 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached > 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of > 587 *region servers is reached > 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached > AND > 589 * there have been no new region server in for > 590 * 'hbase.master.wait.on.regionservers.interval' time > 591 * > 592 * @throws InterruptedException > 593 */ > 594 public void waitForRegionServers(MonitoredTask status) > 595 throws InterruptedException { > > > 612 while ( > 613 !this.master.isStopped() && > 614 slept < timeout && > 615 count < maxToStart && > 616 (lastCountChange+interval > now || count < minToStart) > 617 ){ > > {code} > So with the current conditions, the wait will end as soon as timeout is > reached even lesser number of RS have checked-in with the Master and the > master will proceed with the region assignment among these RSes alone. > As mentioned in > -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, > and I concu
[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6389: -- Attachment: testReplication.jstack jstack for the hanging TestReplication > Modify the conditions to ensure that Master waits for sufficient number of > Region Servers before starting region assignments > > > Key: HBASE-6389 > URL: https://issues.apache.org/jira/browse/HBASE-6389 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.94.0, 0.96.0 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Fix For: 0.96.0, 0.94.2 > > Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, > HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, > testReplication.jstack > > > Continuing from HBASE-6375. > It seems I was mistaken in my assumption that changing the value of > "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from > default of 1) can help prevent assignment of all regions to one (or a small > number of) region server(s). > While this was the case in 0.90.x and 0.92.x, the behavior has changed in > 0.94.0 onwards to address HBASE-4993. > From 0.94.0 onwards, Master will proceed immediately after the timeout has > lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not > reached. > Reading the current conditions of waitForRegionServers() clarifies it > {code:title=ServerManager.java (trunk rev:1360470)} > > 581 /** > 582 * Wait for the region servers to report in. > 583 * We will wait until one of this condition is met: > 584 * - the master is stopped > 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached > 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of > 587 *region servers is reached > 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached > AND > 589 * there have been no new region server in for > 590 * 'hbase.master.wait.on.regionservers.interval' time > 591 * > 592 * @throws InterruptedException > 593 */ > 594 public void waitForRegionServers(MonitoredTask status) > 595 throws InterruptedException { > > > 612 while ( > 613 !this.master.isStopped() && > 614 slept < timeout && > 615 count < maxToStart && > 616 (lastCountChange+interval > now || count < minToStart) > 617 ){ > > {code} > So with the current conditions, the wait will end as soon as timeout is > reached even lesser number of RS have checked-in with the Master and the > master will proceed with the region assignment among these RSes alone. > As mentioned in > -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, > and I concur, this could have disastrous effect in large cluster especially > now that MSLAB is turned on. > To enforce the required quorum as specified by > "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, > these conditions need to be modified as following > {code:title=ServerManager.java} > .. > /** >* Wait for the region servers to report in. >* We will wait until one of this condition is met: >* - the master is stopped >* - the 'hbase.master.wait.on.regionservers.maxtostart' number of >*region servers is reached >* - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND >* there have been no new region server in for >* 'hbase.master.wait.on.regionservers.interval' time AND >* the 'hbase.master.wait.on.regionservers.timeout' is reached >* >* @throws InterruptedException >*/ > public void waitForRegionServers(MonitoredTask status) > .. > .. > int minToStart = this.master.getConfiguration(). > getInt("hbase.master.wait.on.regionservers.mintostart", 1); > int maxToStart = this.master.getConfiguration(). > getInt("hbase.master.wait.on.regionservers.maxtostart", > Integer.MAX_VALUE); > if (maxToStart < minToStart) { > maxToStart = minToStart; > } > .. > .. > while ( > !this.master.isStopped() && > count < maxToStart && > (lastCountChange+interval > now || timeout > slept || count < > minToStart) > ){ > .. > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418866#comment-13418866 ] Zhihong Ted Yu commented on HBASE-6389: --- I ran test suite with latest patch on trunk and got: {code} Running org.apache.hadoop.hbase.client.TestHCM Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 37.265 sec <<< FAILURE! -- Running org.apache.hadoop.hbase.client.TestAdmin Tests run: 40, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 322.872 sec <<< FAILURE! -- Running org.apache.hadoop.hbase.catalog.TestMetaReaderEditor Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 134.193 sec <<< FAILURE! -- Running org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine Tests run: 20, Failures: 5, Errors: 2, Skipped: 0, Time elapsed: 669.588 sec <<< FAILURE! {code} There was one hanging test: {code} at org.apache.hadoop.hbase.replication.TestReplication.setUp(TestReplication.java:183) {code} BTW what do R sub i, C and F sub i represent in the formula above ? > Modify the conditions to ensure that Master waits for sufficient number of > Region Servers before starting region assignments > > > Key: HBASE-6389 > URL: https://issues.apache.org/jira/browse/HBASE-6389 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.94.0, 0.96.0 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Fix For: 0.96.0, 0.94.2 > > Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, > HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt > > > Continuing from HBASE-6375. > It seems I was mistaken in my assumption that changing the value of > "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from > default of 1) can help prevent assignment of all regions to one (or a small > number of) region server(s). > While this was the case in 0.90.x and 0.92.x, the behavior has changed in > 0.94.0 onwards to address HBASE-4993. > From 0.94.0 onwards, Master will proceed immediately after the timeout has > lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not > reached. > Reading the current conditions of waitForRegionServers() clarifies it > {code:title=ServerManager.java (trunk rev:1360470)} > > 581 /** > 582 * Wait for the region servers to report in. > 583 * We will wait until one of this condition is met: > 584 * - the master is stopped > 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached > 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of > 587 *region servers is reached > 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached > AND > 589 * there have been no new region server in for > 590 * 'hbase.master.wait.on.regionservers.interval' time > 591 * > 592 * @throws InterruptedException > 593 */ > 594 public void waitForRegionServers(MonitoredTask status) > 595 throws InterruptedException { > > > 612 while ( > 613 !this.master.isStopped() && > 614 slept < timeout && > 615 count < maxToStart && > 616 (lastCountChange+interval > now || count < minToStart) > 617 ){ > > {code} > So with the current conditions, the wait will end as soon as timeout is > reached even lesser number of RS have checked-in with the Master and the > master will proceed with the region assignment among these RSes alone. > As mentioned in > -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, > and I concur, this could have disastrous effect in large cluster especially > now that MSLAB is turned on. > To enforce the required quorum as specified by > "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, > these conditions need to be modified as following > {code:title=ServerManager.java} > .. > /** >* Wait for the region servers to report in. >* We will wait until one of this condition is met: >* - the master is stopped >* - the 'hbase.master.wait.on.regionservers.maxtostart' number of >*region servers is reached >* - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND >* there have been no new region server in for >* 'hbase.master.wait.on.regionservers.interval' time AND >* the 'hbase.master.wait.on.regionservers.timeout' is reached >* >* @throws InterruptedException >*/ > public void waitForRegionServers(MonitoredTask status) > .. > .. > int minToStart = this.master.getConfi
[jira] [Commented] (HBASE-5843) Improve HBase MTTR - Mean Time To Recover
[ https://issues.apache.org/jira/browse/HBASE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418863#comment-13418863 ] Gregory Chanan commented on HBASE-5843: --- Looks great so far, nkeywal. Some questions: {quote} 2) Kill -9 of a RS; wait for all regions to become online again: 0.92: 980s 0.96: ~13s => The 180s gap comes from HBASE-5844. For master, HBASE-5926 is not tested but should bring similar results. {quote} I'm confused as to what the 180s gap refers to. I see 980 (test 2) - 800 (test1) = 180, but that is against 0.92, which doesn't have HBASE-5970, right? Could you clarify? {quote} 3) Start of the cluster after a clean stop; wait for all regions to become online. 0.92: ~1020s 0.94: ~1023s (tested once only) 0.96: ~31s => The benefit is visible at startup => This does not come from something implemented for 0.94 {quote} Awesome.. We think this is also due to HBASE-5970 and HBASE-6109? (since I assume HBASE-5844 and HBASE-5926 do not apply in this case). {quote} 7) With 2 RS, Insert 20M simple puts; then kill -9 the second one. See how long it takes to have all the regions available. 0.92) 180s detection time+ then hangs twice out of 2 tests. 0.96) 14s (hangs once out of 3) => There's a bug {quote} Has a JIRA been filed? {quote} Test to be changed to get a real difference when we need to replay the wal. {quote} Could you clarify what you mean here? > Improve HBase MTTR - Mean Time To Recover > - > > Key: HBASE-5843 > URL: https://issues.apache.org/jira/browse/HBASE-5843 > Project: HBase > Issue Type: Umbrella >Affects Versions: 0.96.0 >Reporter: nkeywal >Assignee: nkeywal > > A part of the approach is described here: > https://docs.google.com/document/d/1z03xRoZrIJmg7jsWuyKYl6zNournF_7ZHzdi0qz_B4c/edit > The ideal target is: > - failure impact client applications only by an added delay to execute a > query, whatever the failure. > - this delay is always inferior to 1 second. > We're not going to achieve that immediately... > Priority will be given to the most frequent issues. > Short term: > - software crash > - standard administrative tasks as stop/start of a cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6386) Audit log messages do not include column family / qualifier information consistently
[ https://issues.apache.org/jira/browse/HBASE-6386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418844#comment-13418844 ] Marcelo Vanzin commented on HBASE-6386: --- Other methods also seem to suffer from similar issues; for example, preIncrementColumnValue does this: {code} requirePermission(TablePermission.Action.WRITE, c.getEnvironment(), Arrays.asList(new byte[][]{family})); {code} Even though there is a "qualifier" argument; so the qualifier information never makes it to the audit log. It also kinda sucks that there's no standard "family map" type for all these operations, so to come up with one common type for auditing, you'd have to make copies of that data (or use ugly wrapper objects). > Audit log messages do not include column family / qualifier information > consistently > > > Key: HBASE-6386 > URL: https://issues.apache.org/jira/browse/HBASE-6386 > Project: HBase > Issue Type: Improvement > Components: security >Reporter: Marcelo Vanzin > > The code related to this issue is in > AccessController.java:permissionGranted(). > When creating audit logs, that method will do one of the following: > * grant access, create audit log with table name only > * deny access because of table permission, create audit log with table name > only > * deny access because of column family / qualifier permission, create audit > log with specific family / qualifier > So, in the case where more than one column family and/or qualifier are in the > same request, there will be a loss of information. Even in the case where > only one column family and/or qualifier is involved, information may be lost. > It would be better if this behavior consistently included all the information > in the request; regardless of access being granted or denied, and regardless > which permission caused the denial, the column family and qualifier info > should be part of the audit log message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5966) MapReduce based tests broken on Hadoop 2.0.0-alpha
[ https://issues.apache.org/jira/browse/HBASE-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418842#comment-13418842 ] Hudson commented on HBASE-5966: --- Integrated in HBase-0.94 #344 (See [https://builds.apache.org/job/HBase-0.94/344/]) HBASE-5966 MapReduce based tests broken on Hadoop 2.0.0-alpha (Gregory Chanan) (Revision 1363586) Result = FAILURE jxiang : Files : * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java > MapReduce based tests broken on Hadoop 2.0.0-alpha > -- > > Key: HBASE-5966 > URL: https://issues.apache.org/jira/browse/HBASE-5966 > Project: HBase > Issue Type: Bug > Components: mapred, mapreduce, test >Affects Versions: 0.94.0, 0.96.0 > Environment: Hadoop 2.0.0-alpha-SNAPSHOT, HBase 0.94.0-SNAPSHOT, > Ubuntu 12.04 LTS (GNU/Linux 3.2.0-24-generic x86_64) >Reporter: Andrew Purtell >Assignee: Jimmy Xiang > Fix For: 0.96.0, 0.94.1 > > Attachments: HBASE-5966-1.patch, HBASE-5966-94.patch, > HBASE-5966.patch, hbase-5966.patch > > > Some fairly recent change in Hadoop 2.0.0-alpha has broken our MapReduce test > rigging. Below is a representative error, can be easily reproduced with: > {noformat} > mvn -PlocalTests -Psecurity \ > -Dhadoop.profile=23 -Dhadoop.version=2.0.0-SNAPSHOT \ > clean test \ > -Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce > {noformat} > And the result: > {noformat} > --- > T E S T S > --- > Running org.apache.hadoop.hbase.mapreduce.TestTableMapReduce > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec > <<< FAILURE! > --- > Test set: org.apache.hadoop.hbase.mapreduce.TestTableMapReduce > --- > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec > <<< FAILURE! > testMultiRegionTable(org.apache.hadoop.hbase.mapreduce.TestTableMapReduce) > Time elapsed: 21.935 sec <<< ERROR! > java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135) > at > org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getNewApplication(ClientRMProtocolPBClientImpl.java:134) > at > org.apache.hadoop.mapred.ResourceMgrDelegate.getNewJobID(ResourceMgrDelegate.java:183) > at org.apache.hadoop.mapred.YARNRunner.getNewJobID(YARNRunner.java:216) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:339) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1226) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1223) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:416) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1223) > at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1244) > at > org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.runTestOnTable(TestTableMapReduce.java:151) > at > org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.testMultiRegionTable(TestTableMapReduce.java:129) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:616) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47) > at org.junit.rules.RunRules.evaluate(RunRules.java:18) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) > at o
[jira] [Updated] (HBASE-6431) Some FilterList Constructors break addFilter
[ https://issues.apache.org/jira/browse/HBASE-6431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Newman updated HBASE-6431: --- Component/s: filters Affects Version/s: 0.92.1 0.94.0 > Some FilterList Constructors break addFilter > > > Key: HBASE-6431 > URL: https://issues.apache.org/jira/browse/HBASE-6431 > Project: HBase > Issue Type: Bug > Components: filters >Affects Versions: 0.92.1, 0.94.0 >Reporter: Alex Newman >Assignee: Alex Newman >Priority: Minor > Attachments: > 0001-HBASE-6431.-Some-FilterList-Constructors-break-addFi.patch > > > Some of the constructors for FilterList set the internal list of filters to > list types which don't support the add operation. As a result > FilterList(final List rowFilters) > FilterList(final Filter... rowFilters) > FilterList(final Operator operator, final List rowFilters) > FilterList(final Operator operator, final Filter... rowFilters) > may init private List filters = new ArrayList(); incorrectly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6431) Some FilterList Constructors break addFilter
[ https://issues.apache.org/jira/browse/HBASE-6431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Newman updated HBASE-6431: --- Priority: Minor (was: Major) > Some FilterList Constructors break addFilter > > > Key: HBASE-6431 > URL: https://issues.apache.org/jira/browse/HBASE-6431 > Project: HBase > Issue Type: Bug >Reporter: Alex Newman >Assignee: Alex Newman >Priority: Minor > Attachments: > 0001-HBASE-6431.-Some-FilterList-Constructors-break-addFi.patch > > > Some of the constructors for FilterList set the internal list of filters to > list types which don't support the add operation. As a result > FilterList(final List rowFilters) > FilterList(final Filter... rowFilters) > FilterList(final Operator operator, final List rowFilters) > FilterList(final Operator operator, final Filter... rowFilters) > may init private List filters = new ArrayList(); incorrectly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6429) Filter with filterRow() returning true is also incompatible with scan with limit
[ https://issues.apache.org/jira/browse/HBASE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418830#comment-13418830 ] Jie Huang commented on HBASE-6429: -- Oops.I will fix those 2 failures and regenerate the patch soon. Thanks Ted. > Filter with filterRow() returning true is also incompatible with scan with > limit > > > Key: HBASE-6429 > URL: https://issues.apache.org/jira/browse/HBASE-6429 > Project: HBase > Issue Type: Bug > Components: filters >Affects Versions: 0.96.0 >Reporter: Jason Dai > Attachments: hbase-6429_0_94_0.patch > > > Currently if we scan with bot limit and a Filter with > filterRow(List) implemented, an IncompatibleFilterException will > be thrown. The same exception should also be thrown if the filer has its > filterRow() implemented. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6411) Move Master Metrics to metrics 2
[ https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-6411: - Attachment: HBASE-6411-0.patch Here's a working implementation of master with metrics2. It includes some tests but not a whole lot. I plan to include a lot more once I am able to inject test metricsources (HBASE-6407). It doesn't include histograms of the split size (HBASE-6409). > Move Master Metrics to metrics 2 > > > Key: HBASE-6411 > URL: https://issues.apache.org/jira/browse/HBASE-6411 > Project: HBase > Issue Type: Sub-task >Reporter: Elliott Clark >Assignee: Alex Baranau > Attachments: HBASE-6411-0.patch, HBASE-6411_concept.patch > > > Move Master Metrics to metrics 2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6431) Some FilterList Constructors break addFilter
[ https://issues.apache.org/jira/browse/HBASE-6431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Newman updated HBASE-6431: --- Status: Patch Available (was: Open) > Some FilterList Constructors break addFilter > > > Key: HBASE-6431 > URL: https://issues.apache.org/jira/browse/HBASE-6431 > Project: HBase > Issue Type: Bug >Reporter: Alex Newman >Assignee: Alex Newman > Attachments: > 0001-HBASE-6431.-Some-FilterList-Constructors-break-addFi.patch > > > Some of the constructors for FilterList set the internal list of filters to > list types which don't support the add operation. As a result > FilterList(final List rowFilters) > FilterList(final Filter... rowFilters) > FilterList(final Operator operator, final List rowFilters) > FilterList(final Operator operator, final Filter... rowFilters) > may init private List filters = new ArrayList(); incorrectly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6411) Move Master Metrics to metrics 2
[ https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-6411: - Assignee: Elliott Clark (was: Alex Baranau) Status: Patch Available (was: Open) > Move Master Metrics to metrics 2 > > > Key: HBASE-6411 > URL: https://issues.apache.org/jira/browse/HBASE-6411 > Project: HBase > Issue Type: Sub-task >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HBASE-6411-0.patch, HBASE-6411_concept.patch > > > Move Master Metrics to metrics 2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6431) Some FilterList Constructors break addFilter
[ https://issues.apache.org/jira/browse/HBASE-6431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Newman updated HBASE-6431: --- Attachment: 0001-HBASE-6431.-Some-FilterList-Constructors-break-addFi.patch > Some FilterList Constructors break addFilter > > > Key: HBASE-6431 > URL: https://issues.apache.org/jira/browse/HBASE-6431 > Project: HBase > Issue Type: Bug >Reporter: Alex Newman >Assignee: Alex Newman > Attachments: > 0001-HBASE-6431.-Some-FilterList-Constructors-break-addFi.patch > > > Some of the constructors for FilterList set the internal list of filters to > list types which don't support the add operation. As a result > FilterList(final List rowFilters) > FilterList(final Filter... rowFilters) > FilterList(final Operator operator, final List rowFilters) > FilterList(final Operator operator, final Filter... rowFilters) > may init private List filters = new ArrayList(); incorrectly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6431) Some FilterList Constructors break addFilter
Alex Newman created HBASE-6431: -- Summary: Some FilterList Constructors break addFilter Key: HBASE-6431 URL: https://issues.apache.org/jira/browse/HBASE-6431 Project: HBase Issue Type: Bug Reporter: Alex Newman Assignee: Alex Newman Some of the constructors for FilterList set the internal list of filters to list types which don't support the add operation. As a result FilterList(final List rowFilters) FilterList(final Filter... rowFilters) FilterList(final Operator operator, final List rowFilters) FilterList(final Operator operator, final Filter... rowFilters) may init private List filters = new ArrayList(); incorrectly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6405) Create Hadoop compatibilty modules and Metrics2 implementation of replication metrics
[ https://issues.apache.org/jira/browse/HBASE-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-6405: - Resolution: Fixed Status: Resolved (was: Patch Available) > Create Hadoop compatibilty modules and Metrics2 implementation of replication > metrics > - > > Key: HBASE-6405 > URL: https://issues.apache.org/jira/browse/HBASE-6405 > Project: HBase > Issue Type: Sub-task >Reporter: Zhihong Ted Yu >Assignee: Elliott Clark > Fix For: 0.96.0 > > Attachments: 6405.txt, HBASE-6405-ADD.patch, > hbase-6405-addendum-2-v2.patch, hbase-6405-addendum-2.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418815#comment-13418815 ] Lars Hofhansl commented on HBASE-6389: -- :) didn't pick up on the "was" > Modify the conditions to ensure that Master waits for sufficient number of > Region Servers before starting region assignments > > > Key: HBASE-6389 > URL: https://issues.apache.org/jira/browse/HBASE-6389 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.94.0, 0.96.0 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Fix For: 0.96.0, 0.94.2 > > Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, > HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt > > > Continuing from HBASE-6375. > It seems I was mistaken in my assumption that changing the value of > "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from > default of 1) can help prevent assignment of all regions to one (or a small > number of) region server(s). > While this was the case in 0.90.x and 0.92.x, the behavior has changed in > 0.94.0 onwards to address HBASE-4993. > From 0.94.0 onwards, Master will proceed immediately after the timeout has > lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not > reached. > Reading the current conditions of waitForRegionServers() clarifies it > {code:title=ServerManager.java (trunk rev:1360470)} > > 581 /** > 582 * Wait for the region servers to report in. > 583 * We will wait until one of this condition is met: > 584 * - the master is stopped > 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached > 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of > 587 *region servers is reached > 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached > AND > 589 * there have been no new region server in for > 590 * 'hbase.master.wait.on.regionservers.interval' time > 591 * > 592 * @throws InterruptedException > 593 */ > 594 public void waitForRegionServers(MonitoredTask status) > 595 throws InterruptedException { > > > 612 while ( > 613 !this.master.isStopped() && > 614 slept < timeout && > 615 count < maxToStart && > 616 (lastCountChange+interval > now || count < minToStart) > 617 ){ > > {code} > So with the current conditions, the wait will end as soon as timeout is > reached even lesser number of RS have checked-in with the Master and the > master will proceed with the region assignment among these RSes alone. > As mentioned in > -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, > and I concur, this could have disastrous effect in large cluster especially > now that MSLAB is turned on. > To enforce the required quorum as specified by > "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, > these conditions need to be modified as following > {code:title=ServerManager.java} > .. > /** >* Wait for the region servers to report in. >* We will wait until one of this condition is met: >* - the master is stopped >* - the 'hbase.master.wait.on.regionservers.maxtostart' number of >*region servers is reached >* - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND >* there have been no new region server in for >* 'hbase.master.wait.on.regionservers.interval' time AND >* the 'hbase.master.wait.on.regionservers.timeout' is reached >* >* @throws InterruptedException >*/ > public void waitForRegionServers(MonitoredTask status) > .. > .. > int minToStart = this.master.getConfiguration(). > getInt("hbase.master.wait.on.regionservers.mintostart", 1); > int maxToStart = this.master.getConfiguration(). > getInt("hbase.master.wait.on.regionservers.maxtostart", > Integer.MAX_VALUE); > if (maxToStart < minToStart) { > maxToStart = minToStart; > } > .. > .. > while ( > !this.master.isStopped() && > count < maxToStart && > (lastCountChange+interval > now || timeout > slept || count < > minToStart) > ){ > .. > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418812#comment-13418812 ] Aditya Kishore commented on HBASE-6389: --- @Lars Completely agree and definitely would not want to hold 0.94.1 for this. (That's why "My vote *was*... :) ). Documentation can take care of this in 0.94.1 > Modify the conditions to ensure that Master waits for sufficient number of > Region Servers before starting region assignments > > > Key: HBASE-6389 > URL: https://issues.apache.org/jira/browse/HBASE-6389 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.94.0, 0.96.0 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Fix For: 0.96.0, 0.94.2 > > Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, > HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt > > > Continuing from HBASE-6375. > It seems I was mistaken in my assumption that changing the value of > "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from > default of 1) can help prevent assignment of all regions to one (or a small > number of) region server(s). > While this was the case in 0.90.x and 0.92.x, the behavior has changed in > 0.94.0 onwards to address HBASE-4993. > From 0.94.0 onwards, Master will proceed immediately after the timeout has > lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not > reached. > Reading the current conditions of waitForRegionServers() clarifies it > {code:title=ServerManager.java (trunk rev:1360470)} > > 581 /** > 582 * Wait for the region servers to report in. > 583 * We will wait until one of this condition is met: > 584 * - the master is stopped > 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached > 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of > 587 *region servers is reached > 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached > AND > 589 * there have been no new region server in for > 590 * 'hbase.master.wait.on.regionservers.interval' time > 591 * > 592 * @throws InterruptedException > 593 */ > 594 public void waitForRegionServers(MonitoredTask status) > 595 throws InterruptedException { > > > 612 while ( > 613 !this.master.isStopped() && > 614 slept < timeout && > 615 count < maxToStart && > 616 (lastCountChange+interval > now || count < minToStart) > 617 ){ > > {code} > So with the current conditions, the wait will end as soon as timeout is > reached even lesser number of RS have checked-in with the Master and the > master will proceed with the region assignment among these RSes alone. > As mentioned in > -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, > and I concur, this could have disastrous effect in large cluster especially > now that MSLAB is turned on. > To enforce the required quorum as specified by > "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, > these conditions need to be modified as following > {code:title=ServerManager.java} > .. > /** >* Wait for the region servers to report in. >* We will wait until one of this condition is met: >* - the master is stopped >* - the 'hbase.master.wait.on.regionservers.maxtostart' number of >*region servers is reached >* - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND >* there have been no new region server in for >* 'hbase.master.wait.on.regionservers.interval' time AND >* the 'hbase.master.wait.on.regionservers.timeout' is reached >* >* @throws InterruptedException >*/ > public void waitForRegionServers(MonitoredTask status) > .. > .. > int minToStart = this.master.getConfiguration(). > getInt("hbase.master.wait.on.regionservers.mintostart", 1); > int maxToStart = this.master.getConfiguration(). > getInt("hbase.master.wait.on.regionservers.maxtostart", > Integer.MAX_VALUE); > if (maxToStart < minToStart) { > maxToStart = minToStart; > } > .. > .. > while ( > !this.master.isStopped() && > count < maxToStart && > (lastCountChange+interval > now || timeout > slept || count < > minToStart) > ){ > .. > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.
[jira] [Commented] (HBASE-6325) [replication] Race in ReplicationSourceManager.init can initiate a failover even if the node is alive
[ https://issues.apache.org/jira/browse/HBASE-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418813#comment-13418813 ] Hudson commented on HBASE-6325: --- Integrated in HBase-TRUNK #3154 (See [https://builds.apache.org/job/HBase-TRUNK/3154/]) HBASE-6325 [replication] Race in ReplicationSourceManager.init can initiate a failover even if the node is alive (Revision 1363573) Result = SUCCESS jdcryans : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java > [replication] Race in ReplicationSourceManager.init can initiate a failover > even if the node is alive > - > > Key: HBASE-6325 > URL: https://issues.apache.org/jira/browse/HBASE-6325 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.6, 0.92.1, 0.94.0 >Reporter: Jean-Daniel Cryans >Assignee: Jean-Daniel Cryans > Fix For: 0.92.2, 0.96.0, 0.94.1 > > Attachments: HBASE-6325-0.92-v2.patch, HBASE-6325-0.92.patch > > > Yet another bug found during the leap second madness, it's possible to miss > the registration of new region servers so that in > ReplicationSourceManager.init we start the failover of a live and replicating > region server. I don't think there's data loss but the RS that's being failed > over will die on: > {noformat} > 2012-07-01 06:25:15,604 FATAL > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server > sv4r23s48,10304,1341112194623: Writing replication status > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for > /hbase/replication/rs/sv4r23s48,10304,1341112194623/4/sv4r23s48%2C10304%2C1341112194623.1341112195369 > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:111) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246) > at > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372) > at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:655) > at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:697) > at > org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:470) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:154) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:607) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:368) > {noformat} > It seems to me that just refreshing {{otherRegionServers}} after getting the > list of {{currentReplicators}} would be enough to fix this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418808#comment-13418808 ] Lars Hofhansl edited comment on HBASE-6389 at 7/19/12 11:47 PM: @Aditya: I do agree. (see my comment about how I'm sure the logic of this change is correct). It now seems, though, that it is the default timeout that is too short (4.5s). Folks with 5k regions should know to increase the minToStart parameter and the timeout. We should document that better. I can also see to change the timeout to failure condition (as discussed above). I'm not opposed. It's just that 0.94.1 needs to go out because of HBASE-6311, I do not want to risk delaying this further. It also seems this can use further discussion. (Sometimes it is amazing how much discussion a two line change can cause :) ) @Ted and @Stack: What do you guys think? Edit: Spelling. was (Author: lhofhansl): @Aditya: I do agree. (see my comment about how I'm sure the logic of this change is correct). It now seems, though, that it is the default timeout that is too short (4.5s). Folks with 5k regions should know to increase the minToStart parameter and the timeout. We should document that better. I can also see to change the timeout to failure condition (as discussed above). I'm not opposed. It's just that 0.94.1 needs to go out because of HBASE-6311, I do not want to risk delaying this further. It also seems this can use further discussion. (Sometimes it is amazing how much discussion as two change can cause :) ) @Ted and @Stack: What do you guys think? > Modify the conditions to ensure that Master waits for sufficient number of > Region Servers before starting region assignments > > > Key: HBASE-6389 > URL: https://issues.apache.org/jira/browse/HBASE-6389 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.94.0, 0.96.0 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Fix For: 0.96.0, 0.94.2 > > Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, > HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt > > > Continuing from HBASE-6375. > It seems I was mistaken in my assumption that changing the value of > "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from > default of 1) can help prevent assignment of all regions to one (or a small > number of) region server(s). > While this was the case in 0.90.x and 0.92.x, the behavior has changed in > 0.94.0 onwards to address HBASE-4993. > From 0.94.0 onwards, Master will proceed immediately after the timeout has > lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not > reached. > Reading the current conditions of waitForRegionServers() clarifies it > {code:title=ServerManager.java (trunk rev:1360470)} > > 581 /** > 582 * Wait for the region servers to report in. > 583 * We will wait until one of this condition is met: > 584 * - the master is stopped > 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached > 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of > 587 *region servers is reached > 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached > AND > 589 * there have been no new region server in for > 590 * 'hbase.master.wait.on.regionservers.interval' time > 591 * > 592 * @throws InterruptedException > 593 */ > 594 public void waitForRegionServers(MonitoredTask status) > 595 throws InterruptedException { > > > 612 while ( > 613 !this.master.isStopped() && > 614 slept < timeout && > 615 count < maxToStart && > 616 (lastCountChange+interval > now || count < minToStart) > 617 ){ > > {code} > So with the current conditions, the wait will end as soon as timeout is > reached even lesser number of RS have checked-in with the Master and the > master will proceed with the region assignment among these RSes alone. > As mentioned in > -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, > and I concur, this could have disastrous effect in large cluster especially > now that MSLAB is turned on. > To enforce the required quorum as specified by > "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, > these conditions need to be modified as following > {code:title=ServerManager.java} > .. > /** >* Wait for the region servers to report in. >* We will wait until one of
[jira] [Resolved] (HBASE-5966) MapReduce based tests broken on Hadoop 2.0.0-alpha
[ https://issues.apache.org/jira/browse/HBASE-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang resolved HBASE-5966. Resolution: Fixed Integrated to 0.94. Thank Greg for the patch, Lars for the review. > MapReduce based tests broken on Hadoop 2.0.0-alpha > -- > > Key: HBASE-5966 > URL: https://issues.apache.org/jira/browse/HBASE-5966 > Project: HBase > Issue Type: Bug > Components: mapred, mapreduce, test >Affects Versions: 0.94.0, 0.96.0 > Environment: Hadoop 2.0.0-alpha-SNAPSHOT, HBase 0.94.0-SNAPSHOT, > Ubuntu 12.04 LTS (GNU/Linux 3.2.0-24-generic x86_64) >Reporter: Andrew Purtell >Assignee: Jimmy Xiang > Fix For: 0.96.0, 0.94.1 > > Attachments: HBASE-5966-1.patch, HBASE-5966-94.patch, > HBASE-5966.patch, hbase-5966.patch > > > Some fairly recent change in Hadoop 2.0.0-alpha has broken our MapReduce test > rigging. Below is a representative error, can be easily reproduced with: > {noformat} > mvn -PlocalTests -Psecurity \ > -Dhadoop.profile=23 -Dhadoop.version=2.0.0-SNAPSHOT \ > clean test \ > -Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce > {noformat} > And the result: > {noformat} > --- > T E S T S > --- > Running org.apache.hadoop.hbase.mapreduce.TestTableMapReduce > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec > <<< FAILURE! > --- > Test set: org.apache.hadoop.hbase.mapreduce.TestTableMapReduce > --- > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec > <<< FAILURE! > testMultiRegionTable(org.apache.hadoop.hbase.mapreduce.TestTableMapReduce) > Time elapsed: 21.935 sec <<< ERROR! > java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135) > at > org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getNewApplication(ClientRMProtocolPBClientImpl.java:134) > at > org.apache.hadoop.mapred.ResourceMgrDelegate.getNewJobID(ResourceMgrDelegate.java:183) > at org.apache.hadoop.mapred.YARNRunner.getNewJobID(YARNRunner.java:216) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:339) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1226) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1223) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:416) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1223) > at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1244) > at > org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.runTestOnTable(TestTableMapReduce.java:151) > at > org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.testMultiRegionTable(TestTableMapReduce.java:129) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:616) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47) > at org.junit.rules.RunRules.evaluate(RunRules.java:18) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) > at > org.junit.internal.runners.statements.RunBe
[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418808#comment-13418808 ] Lars Hofhansl commented on HBASE-6389: -- @Aditya: I do agree. (see my comment about how I'm sure the logic of this change is correct). It now seems, though, that it is the default timeout that is too short (4.5s). Folks with 5k regions should know to increase the minToStart parameter and the timeout. We should document that better. I can also see to change the timeout to failure condition (as discussed above). I'm not opposed. It's just that 0.94.1 needs to go out because of HBASE-6311, I do not want to risk delaying this further. It also seems this can use further discussion. (Sometimes it is amazing how much discussion as two change can cause :) ) @Ted and @Stack: What do you guys think? > Modify the conditions to ensure that Master waits for sufficient number of > Region Servers before starting region assignments > > > Key: HBASE-6389 > URL: https://issues.apache.org/jira/browse/HBASE-6389 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.94.0, 0.96.0 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Fix For: 0.96.0, 0.94.2 > > Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, > HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt > > > Continuing from HBASE-6375. > It seems I was mistaken in my assumption that changing the value of > "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from > default of 1) can help prevent assignment of all regions to one (or a small > number of) region server(s). > While this was the case in 0.90.x and 0.92.x, the behavior has changed in > 0.94.0 onwards to address HBASE-4993. > From 0.94.0 onwards, Master will proceed immediately after the timeout has > lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not > reached. > Reading the current conditions of waitForRegionServers() clarifies it > {code:title=ServerManager.java (trunk rev:1360470)} > > 581 /** > 582 * Wait for the region servers to report in. > 583 * We will wait until one of this condition is met: > 584 * - the master is stopped > 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached > 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of > 587 *region servers is reached > 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached > AND > 589 * there have been no new region server in for > 590 * 'hbase.master.wait.on.regionservers.interval' time > 591 * > 592 * @throws InterruptedException > 593 */ > 594 public void waitForRegionServers(MonitoredTask status) > 595 throws InterruptedException { > > > 612 while ( > 613 !this.master.isStopped() && > 614 slept < timeout && > 615 count < maxToStart && > 616 (lastCountChange+interval > now || count < minToStart) > 617 ){ > > {code} > So with the current conditions, the wait will end as soon as timeout is > reached even lesser number of RS have checked-in with the Master and the > master will proceed with the region assignment among these RSes alone. > As mentioned in > -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, > and I concur, this could have disastrous effect in large cluster especially > now that MSLAB is turned on. > To enforce the required quorum as specified by > "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, > these conditions need to be modified as following > {code:title=ServerManager.java} > .. > /** >* Wait for the region servers to report in. >* We will wait until one of this condition is met: >* - the master is stopped >* - the 'hbase.master.wait.on.regionservers.maxtostart' number of >*region servers is reached >* - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND >* there have been no new region server in for >* 'hbase.master.wait.on.regionservers.interval' time AND >* the 'hbase.master.wait.on.regionservers.timeout' is reached >* >* @throws InterruptedException >*/ > public void waitForRegionServers(MonitoredTask status) > .. > .. > int minToStart = this.master.getConfiguration(). > getInt("hbase.master.wait.on.regionservers.mintostart", 1); > int maxToStart = this.master.getConfiguration(). > getInt("hbase.master.wait.on.regionservers.maxtostart", > I
[jira] [Commented] (HBASE-6325) [replication] Race in ReplicationSourceManager.init can initiate a failover even if the node is alive
[ https://issues.apache.org/jira/browse/HBASE-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418803#comment-13418803 ] Hudson commented on HBASE-6325: --- Integrated in HBase-0.94 #343 (See [https://builds.apache.org/job/HBase-0.94/343/]) HBASE-6319 ReplicationSource can call terminate on itself and deadlock HBASE-6325 [replication] Race in ReplicationSourceManager.init can initiate a failover even if the node is alive (Revision 1363570) Result = SUCCESS jdcryans : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java > [replication] Race in ReplicationSourceManager.init can initiate a failover > even if the node is alive > - > > Key: HBASE-6325 > URL: https://issues.apache.org/jira/browse/HBASE-6325 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.6, 0.92.1, 0.94.0 >Reporter: Jean-Daniel Cryans >Assignee: Jean-Daniel Cryans > Fix For: 0.92.2, 0.96.0, 0.94.1 > > Attachments: HBASE-6325-0.92-v2.patch, HBASE-6325-0.92.patch > > > Yet another bug found during the leap second madness, it's possible to miss > the registration of new region servers so that in > ReplicationSourceManager.init we start the failover of a live and replicating > region server. I don't think there's data loss but the RS that's being failed > over will die on: > {noformat} > 2012-07-01 06:25:15,604 FATAL > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server > sv4r23s48,10304,1341112194623: Writing replication status > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for > /hbase/replication/rs/sv4r23s48,10304,1341112194623/4/sv4r23s48%2C10304%2C1341112194623.1341112195369 > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:111) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246) > at > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372) > at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:655) > at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:697) > at > org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:470) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:154) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:607) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:368) > {noformat} > It seems to me that just refreshing {{otherRegionServers}} after getting the > list of {{currentReplicators}} would be enough to fix this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HBASE-6276) TestClassLoading is racy
[ https://issues.apache.org/jira/browse/HBASE-6276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-6276: --- Assignee: (was: Andrew Purtell) > TestClassLoading is racy > > > Key: HBASE-6276 > URL: https://issues.apache.org/jira/browse/HBASE-6276 > Project: HBase > Issue Type: Bug > Components: coprocessors, test >Affects Versions: 0.92.2, 0.96.0, 0.94.1 >Reporter: Andrew Purtell >Priority: Minor > Attachments: HBASE-6276-0.94.patch, HBASE-6276.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6319) ReplicationSource can call terminate on itself and deadlock
[ https://issues.apache.org/jira/browse/HBASE-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418802#comment-13418802 ] Hudson commented on HBASE-6319: --- Integrated in HBase-0.94 #343 (See [https://builds.apache.org/job/HBase-0.94/343/]) HBASE-6319 ReplicationSource can call terminate on itself and deadlock HBASE-6325 [replication] Race in ReplicationSourceManager.init can initiate a failover even if the node is alive (Revision 1363570) Result = SUCCESS jdcryans : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java > ReplicationSource can call terminate on itself and deadlock > --- > > Key: HBASE-6319 > URL: https://issues.apache.org/jira/browse/HBASE-6319 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.6, 0.92.1, 0.94.0 >Reporter: Jean-Daniel Cryans >Assignee: Jean-Daniel Cryans > Fix For: 0.92.2, 0.94.1 > > Attachments: HBASE-6319-0.92.patch > > > In a few places in the ReplicationSource code calls terminate on itself which > is a problem since in terminate() we wait on that thread to die. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4956) Control direct memory buffer consumption by HBaseClient
[ https://issues.apache.org/jira/browse/HBASE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418797#comment-13418797 ] Hudson commented on HBASE-4956: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #100 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/100/]) HBASE-4956 Control direct memory buffer consumption by HBaseClient (Bob Copeland) (Revision 1363526) Result = FAILURE tedyu : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/Result.java > Control direct memory buffer consumption by HBaseClient > --- > > Key: HBASE-4956 > URL: https://issues.apache.org/jira/browse/HBASE-4956 > Project: HBase > Issue Type: New Feature >Reporter: Ted Yu >Assignee: Bob Copeland > Fix For: 0.96.0, 0.94.1 > > Attachments: 4956.txt, thread_get.rb > > > As Jonathan explained here > https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357?pli=1 > , standard hbase client inadvertently consumes large amount of direct memory. > We should consider using netty for NIO-related tasks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6312) Make BlockCache eviction thresholds configurable
[ https://issues.apache.org/jira/browse/HBASE-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418798#comment-13418798 ] Hudson commented on HBASE-6312: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #100 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/100/]) HBASE-6312 Make BlockCache eviction thresholds configurable (Jie Huang) (Revision 1363468) Result = FAILURE tedyu : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/DoubleBlockCache.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestLruBlockCache.java > Make BlockCache eviction thresholds configurable > > > Key: HBASE-6312 > URL: https://issues.apache.org/jira/browse/HBASE-6312 > Project: HBase > Issue Type: Improvement > Components: io >Affects Versions: 0.94.0 >Reporter: Jie Huang >Assignee: Jie Huang >Priority: Minor > Fix For: 0.96.0 > > Attachments: hbase-6312.patch, hbase-6312_v2.patch, > hbase-6312_v3.patch > > > Some of our customers found that tuning the BlockCache eviction thresholds > made test results different in their test environment. However, those > thresholds are not configurable in the current implementation. The only way > to change those values is to re-compile the HBase source code. We wonder if > it is possible to make them configurable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6325) [replication] Race in ReplicationSourceManager.init can initiate a failover even if the node is alive
[ https://issues.apache.org/jira/browse/HBASE-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418799#comment-13418799 ] Hudson commented on HBASE-6325: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #100 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/100/]) HBASE-6325 [replication] Race in ReplicationSourceManager.init can initiate a failover even if the node is alive (Revision 1363573) Result = FAILURE jdcryans : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java > [replication] Race in ReplicationSourceManager.init can initiate a failover > even if the node is alive > - > > Key: HBASE-6325 > URL: https://issues.apache.org/jira/browse/HBASE-6325 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.6, 0.92.1, 0.94.0 >Reporter: Jean-Daniel Cryans >Assignee: Jean-Daniel Cryans > Fix For: 0.92.2, 0.96.0, 0.94.1 > > Attachments: HBASE-6325-0.92-v2.patch, HBASE-6325-0.92.patch > > > Yet another bug found during the leap second madness, it's possible to miss > the registration of new region servers so that in > ReplicationSourceManager.init we start the failover of a live and replicating > region server. I don't think there's data loss but the RS that's being failed > over will die on: > {noformat} > 2012-07-01 06:25:15,604 FATAL > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server > sv4r23s48,10304,1341112194623: Writing replication status > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for > /hbase/replication/rs/sv4r23s48,10304,1341112194623/4/sv4r23s48%2C10304%2C1341112194623.1341112195369 > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:111) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246) > at > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372) > at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:655) > at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:697) > at > org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:470) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:154) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:607) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:368) > {noformat} > It seems to me that just refreshing {{otherRegionServers}} after getting the > list of {{currentReplicators}} would be enough to fix this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5966) MapReduce based tests broken on Hadoop 2.0.0-alpha
[ https://issues.apache.org/jira/browse/HBASE-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418793#comment-13418793 ] Lars Hofhansl commented on HBASE-5966: -- +1 > MapReduce based tests broken on Hadoop 2.0.0-alpha > -- > > Key: HBASE-5966 > URL: https://issues.apache.org/jira/browse/HBASE-5966 > Project: HBase > Issue Type: Bug > Components: mapred, mapreduce, test >Affects Versions: 0.94.0, 0.96.0 > Environment: Hadoop 2.0.0-alpha-SNAPSHOT, HBase 0.94.0-SNAPSHOT, > Ubuntu 12.04 LTS (GNU/Linux 3.2.0-24-generic x86_64) >Reporter: Andrew Purtell >Assignee: Jimmy Xiang > Fix For: 0.96.0, 0.94.1 > > Attachments: HBASE-5966-1.patch, HBASE-5966-94.patch, > HBASE-5966.patch, hbase-5966.patch > > > Some fairly recent change in Hadoop 2.0.0-alpha has broken our MapReduce test > rigging. Below is a representative error, can be easily reproduced with: > {noformat} > mvn -PlocalTests -Psecurity \ > -Dhadoop.profile=23 -Dhadoop.version=2.0.0-SNAPSHOT \ > clean test \ > -Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce > {noformat} > And the result: > {noformat} > --- > T E S T S > --- > Running org.apache.hadoop.hbase.mapreduce.TestTableMapReduce > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec > <<< FAILURE! > --- > Test set: org.apache.hadoop.hbase.mapreduce.TestTableMapReduce > --- > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec > <<< FAILURE! > testMultiRegionTable(org.apache.hadoop.hbase.mapreduce.TestTableMapReduce) > Time elapsed: 21.935 sec <<< ERROR! > java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135) > at > org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getNewApplication(ClientRMProtocolPBClientImpl.java:134) > at > org.apache.hadoop.mapred.ResourceMgrDelegate.getNewJobID(ResourceMgrDelegate.java:183) > at org.apache.hadoop.mapred.YARNRunner.getNewJobID(YARNRunner.java:216) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:339) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1226) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1223) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:416) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1223) > at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1244) > at > org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.runTestOnTable(TestTableMapReduce.java:151) > at > org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.testMultiRegionTable(TestTableMapReduce.java:129) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:616) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47) > at org.junit.rules.RunRules.evaluate(RunRules.java:18) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.ja
[jira] [Commented] (HBASE-3432) [hbck] Add "remove table" switch
[ https://issues.apache.org/jira/browse/HBASE-3432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418792#comment-13418792 ] Jonathan Hsieh commented on HBASE-3432: --- [juneng603] eventually, after region assignments are completed and the region is opened on the target RS, information is updated in the META table so that other clients can go to the proper RS. > [hbck] Add "remove table" switch > > > Key: HBASE-3432 > URL: https://issues.apache.org/jira/browse/HBASE-3432 > Project: HBase > Issue Type: New Feature > Components: util >Affects Versions: 0.89.20100924 >Reporter: Lars George >Priority: Minor > > This happened before and I am not sure how the new Master improves on it > (this stuff is only available between the lines are buried in some peoples > heads - one other thing I wish was for a better place to communicate what > each path improves). Just so we do not miss it, there is an issue that > sometimes disabling large tables simply times out and the table gets stuck in > limbo. > From the CDH User list: > {quote} > On Fri, Jan 7, 2011 at 1:57 PM, Sean Sechrist wrote: > To get them out of META, you can just scan '.META.' for that table name, and > delete those rows. We had to do that a few months ago. > -Sean > That did it. For the benefit of others, here's code. Beware the literal > table names, run at your own peril. > {quote} > {code} > import java.io.IOException; > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.hbase.HBaseConfiguration; > import org.apache.hadoop.hbase.client.HTable; > import org.apache.hadoop.hbase.client.Delete; > import org.apache.hadoop.hbase.client.Result; > import org.apache.hadoop.hbase.client.MetaScanner; > import org.apache.hadoop.hbase.util.Bytes; > public class CleanFromMeta { > public static class Cleaner implements MetaScanner.MetaScannerVisitor { > public HTable meta = null; > public Cleaner(Configuration conf) throws IOException { > meta = new HTable(conf, ".META."); > } > public boolean processRow(Result rowResult) throws IOException { > String r = new String(rowResult.getRow()); > if (r.startsWith("webtable,")) { > meta.delete(new Delete(rowResult.getRow())); > System.out.println("Deleting row " + rowResult); > } > return true; > } > } > public static void main(String[] args) throws Exception { > String tname = ".META."; > Configuration conf = HBaseConfiguration.create(); > MetaScanner.metaScan(conf, new Cleaner(conf), > Bytes.toBytes("webtable")); > } > } > {code} > I suggest to move this into HBaseFsck. I do not like personally to have these > JRuby scripts floating around that may or may not help. This should be > available if a user gets stuck and knows what he is doing (they can delete > from .META. anyways). Maybe a "\-\-disable-table \-\-force" or > so? But since disable is already in the shell we could add an "\-\-force" > there? Or add a "\-\-delete-table " to the hbck? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3432) [hbck] Add "remove table" switch
[ https://issues.apache.org/jira/browse/HBASE-3432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418790#comment-13418790 ] Jonathan Hsieh commented on HBASE-3432: --- [~vamshi] root and meta are special regions but regions nonetheless. They get assigned to arbitrary (possibly different) region servers, and are hit on every new client's read and write path. [~juneng603] /hbase/uassigned is where Regions-in-transitions informatin is kept. These are modified as regions are being assigned to particular region servers. They coordinate the state between the master assigning and then RS assignee. > [hbck] Add "remove table" switch > > > Key: HBASE-3432 > URL: https://issues.apache.org/jira/browse/HBASE-3432 > Project: HBase > Issue Type: New Feature > Components: util >Affects Versions: 0.89.20100924 >Reporter: Lars George >Priority: Minor > > This happened before and I am not sure how the new Master improves on it > (this stuff is only available between the lines are buried in some peoples > heads - one other thing I wish was for a better place to communicate what > each path improves). Just so we do not miss it, there is an issue that > sometimes disabling large tables simply times out and the table gets stuck in > limbo. > From the CDH User list: > {quote} > On Fri, Jan 7, 2011 at 1:57 PM, Sean Sechrist wrote: > To get them out of META, you can just scan '.META.' for that table name, and > delete those rows. We had to do that a few months ago. > -Sean > That did it. For the benefit of others, here's code. Beware the literal > table names, run at your own peril. > {quote} > {code} > import java.io.IOException; > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.hbase.HBaseConfiguration; > import org.apache.hadoop.hbase.client.HTable; > import org.apache.hadoop.hbase.client.Delete; > import org.apache.hadoop.hbase.client.Result; > import org.apache.hadoop.hbase.client.MetaScanner; > import org.apache.hadoop.hbase.util.Bytes; > public class CleanFromMeta { > public static class Cleaner implements MetaScanner.MetaScannerVisitor { > public HTable meta = null; > public Cleaner(Configuration conf) throws IOException { > meta = new HTable(conf, ".META."); > } > public boolean processRow(Result rowResult) throws IOException { > String r = new String(rowResult.getRow()); > if (r.startsWith("webtable,")) { > meta.delete(new Delete(rowResult.getRow())); > System.out.println("Deleting row " + rowResult); > } > return true; > } > } > public static void main(String[] args) throws Exception { > String tname = ".META."; > Configuration conf = HBaseConfiguration.create(); > MetaScanner.metaScan(conf, new Cleaner(conf), > Bytes.toBytes("webtable")); > } > } > {code} > I suggest to move this into HBaseFsck. I do not like personally to have these > JRuby scripts floating around that may or may not help. This should be > available if a user gets stuck and knows what he is doing (they can delete > from .META. anyways). Maybe a "\-\-disable-table \-\-force" or > so? But since disable is already in the shell we could add an "\-\-force" > there? Or add a "\-\-delete-table " to the hbck? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418789#comment-13418789 ] Aditya Kishore commented on HBASE-6389: --- My vote was for its inclusion for 2 reasons. # This was a behavior change in 0.94.0 and I am not sure we have completely understood its impact. # In a large MSLAB enabled cluster, I have repeatedly seen all the regions (in excess of 5K with *Σ*~i=1..n~(*R*~i~*CF*~i~) > 8K; with MSLAB on, RS needs > 16G just to open) being assigned to a single region server leading it to OOM crash and creating quite a few HBCK inconsistencies on subsequent recovery. Lastly, so far all the test failures seems to be due to errors in the test code unmasked by this change. > Modify the conditions to ensure that Master waits for sufficient number of > Region Servers before starting region assignments > > > Key: HBASE-6389 > URL: https://issues.apache.org/jira/browse/HBASE-6389 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.94.0, 0.96.0 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Fix For: 0.96.0, 0.94.2 > > Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, > HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt > > > Continuing from HBASE-6375. > It seems I was mistaken in my assumption that changing the value of > "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from > default of 1) can help prevent assignment of all regions to one (or a small > number of) region server(s). > While this was the case in 0.90.x and 0.92.x, the behavior has changed in > 0.94.0 onwards to address HBASE-4993. > From 0.94.0 onwards, Master will proceed immediately after the timeout has > lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not > reached. > Reading the current conditions of waitForRegionServers() clarifies it > {code:title=ServerManager.java (trunk rev:1360470)} > > 581 /** > 582 * Wait for the region servers to report in. > 583 * We will wait until one of this condition is met: > 584 * - the master is stopped > 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached > 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of > 587 *region servers is reached > 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached > AND > 589 * there have been no new region server in for > 590 * 'hbase.master.wait.on.regionservers.interval' time > 591 * > 592 * @throws InterruptedException > 593 */ > 594 public void waitForRegionServers(MonitoredTask status) > 595 throws InterruptedException { > > > 612 while ( > 613 !this.master.isStopped() && > 614 slept < timeout && > 615 count < maxToStart && > 616 (lastCountChange+interval > now || count < minToStart) > 617 ){ > > {code} > So with the current conditions, the wait will end as soon as timeout is > reached even lesser number of RS have checked-in with the Master and the > master will proceed with the region assignment among these RSes alone. > As mentioned in > -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, > and I concur, this could have disastrous effect in large cluster especially > now that MSLAB is turned on. > To enforce the required quorum as specified by > "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, > these conditions need to be modified as following > {code:title=ServerManager.java} > .. > /** >* Wait for the region servers to report in. >* We will wait until one of this condition is met: >* - the master is stopped >* - the 'hbase.master.wait.on.regionservers.maxtostart' number of >*region servers is reached >* - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND >* there have been no new region server in for >* 'hbase.master.wait.on.regionservers.interval' time AND >* the 'hbase.master.wait.on.regionservers.timeout' is reached >* >* @throws InterruptedException >*/ > public void waitForRegionServers(MonitoredTask status) > .. > .. > int minToStart = this.master.getConfiguration(). > getInt("hbase.master.wait.on.regionservers.mintostart", 1); > int maxToStart = this.master.getConfiguration(). > getInt("hbase.master.wait.on.regionservers.maxtostart", > Integer.MAX_VALUE); > if (maxToStart < minToStart) { > maxToStart = minToStart; > } > .
[jira] [Commented] (HBASE-6310) -ROOT- corruption when .META. is using the old encoding scheme
[ https://issues.apache.org/jira/browse/HBASE-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418777#comment-13418777 ] Jonathan Hsieh commented on HBASE-6310: --- hbck writes directly to .META. but I don't think it ever writes to root unless you put the -metaonly flag on. It may be possible that if there were two .META. region dirs, hbck tried to pull in the old .META. dir. This would probably write something goofy to .META though. If you just used the -repair option, it would have first tried to merge regions before modifying meta. (but also would likely have not modified ROOT). > -ROOT- corruption when .META. is using the old encoding scheme > -- > > Key: HBASE-6310 > URL: https://issues.apache.org/jira/browse/HBASE-6310 > Project: HBase > Issue Type: Improvement >Affects Versions: 0.94.0 >Reporter: Jean-Daniel Cryans >Priority: Blocker > Fix For: 0.96.0, 0.94.2 > > > We're still working the on the root cause here, but after the leap second > armageddon we had a hard time getting our 0.94 cluster back up. This is what > we saw in the logs until the master died by itself: > {noformat} > 2012-07-01 23:01:52,149 DEBUG > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: > locateRegionInMeta parentTable=-ROOT-, > metaLocation={region=-ROOT-,,0.70236052, hostname=sfor3s28, > port=10304}, attempt=16 of 100 failed; retrying after sleep of 32000 > because: HRegionInfo was null or empty in -ROOT-, > row=keyvalues={.META.,,1259448304806/info:server/1341124914705/Put/vlen=14/ts=0, > .META.,,1259448304806/info:serverstartcode/1341124914705/Put/vlen=8/ts=0} > {noformat} > (it's strage that we retry this) > This was really misleading because I could see the regioninfo in a scan: > {noformat} > hbase(main):002:0> scan '-ROOT-' > ROW COLUMN+CELL > .META.,,1column=info:regioninfo, > timestamp=1331755381142, value={NAME => '.META.,,1', STARTKEY => '', > ENDKEY => '', ENCODED => 1028785192,} > .META.,,1column=info:server, > timestamp=1341183448693, value=sfor3s40:10304 > .META.,,1 > column=info:serverstartcode, timestamp=1341183448693, > value=1341183444689 > .META.,,1column=info:v, > timestamp=1331755419291, value=\x00\x00 > .META.,,1259448304806column=info:server, > timestamp=1341124914705, value=sfor3s24:10304 > .META.,,1259448304806 > column=info:serverstartcode, timestamp=1341124914705, > value=1341124455863 > {noformat} > Except that the devil is in the details, ".META.,,1" is not > ".META.,,1259448304806". Basically something writes to .META. by directly > creating the row key without caring if the row is in the old format. I did a > deleteall in the shell and it fixed the issue... until some time later it was > stuck again because the edits reappeared (still not sure why). This time the > PostOpenDeployTasksThread were stuck in the RS trying to update .META. but > there was no logging (saw it with a jstack). I deleted the row again to make > it work. > I'm marking this as a blocker against 0.94.2 since we're trying to get 0.94.1 > out, but I wouldn't recommend upgrading to 0.94 if your cluster was created > before 0.89 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6417) hbck merges .META. regions if there's an old leftover
[ https://issues.apache.org/jira/browse/HBASE-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418774#comment-13418774 ] Jean-Daniel Cryans commented on HBASE-6417: --- No, but I can reproduce. > hbck merges .META. regions if there's an old leftover > - > > Key: HBASE-6417 > URL: https://issues.apache.org/jira/browse/HBASE-6417 > Project: HBase > Issue Type: Bug >Reporter: Jean-Daniel Cryans > Fix For: 0.96.0, 0.94.2 > > Attachments: hbck.log > > > Trying to see what caused HBASE-6310, one of the things I figured is that the > bad .META. row is actually one from the time that we were permitting meta > splitting and that folder had just been staying there for a while. > So I tried to recreate the issue with -repair and it merged my good .META. > region with the one that's 3 years old that also has the same start key. I > ended up with a brand new .META. region! > I'll be attaching the full log in a separate file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5966) MapReduce based tests broken on Hadoop 2.0.0-alpha
[ https://issues.apache.org/jira/browse/HBASE-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418772#comment-13418772 ] Jimmy Xiang commented on HBASE-5966: looks good to me, will commit to 0.94 tonight if no objection. > MapReduce based tests broken on Hadoop 2.0.0-alpha > -- > > Key: HBASE-5966 > URL: https://issues.apache.org/jira/browse/HBASE-5966 > Project: HBase > Issue Type: Bug > Components: mapred, mapreduce, test >Affects Versions: 0.94.0, 0.96.0 > Environment: Hadoop 2.0.0-alpha-SNAPSHOT, HBase 0.94.0-SNAPSHOT, > Ubuntu 12.04 LTS (GNU/Linux 3.2.0-24-generic x86_64) >Reporter: Andrew Purtell >Assignee: Jimmy Xiang > Fix For: 0.96.0, 0.94.1 > > Attachments: HBASE-5966-1.patch, HBASE-5966-94.patch, > HBASE-5966.patch, hbase-5966.patch > > > Some fairly recent change in Hadoop 2.0.0-alpha has broken our MapReduce test > rigging. Below is a representative error, can be easily reproduced with: > {noformat} > mvn -PlocalTests -Psecurity \ > -Dhadoop.profile=23 -Dhadoop.version=2.0.0-SNAPSHOT \ > clean test \ > -Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce > {noformat} > And the result: > {noformat} > --- > T E S T S > --- > Running org.apache.hadoop.hbase.mapreduce.TestTableMapReduce > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec > <<< FAILURE! > --- > Test set: org.apache.hadoop.hbase.mapreduce.TestTableMapReduce > --- > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec > <<< FAILURE! > testMultiRegionTable(org.apache.hadoop.hbase.mapreduce.TestTableMapReduce) > Time elapsed: 21.935 sec <<< ERROR! > java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135) > at > org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getNewApplication(ClientRMProtocolPBClientImpl.java:134) > at > org.apache.hadoop.mapred.ResourceMgrDelegate.getNewJobID(ResourceMgrDelegate.java:183) > at org.apache.hadoop.mapred.YARNRunner.getNewJobID(YARNRunner.java:216) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:339) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1226) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1223) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:416) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1223) > at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1244) > at > org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.runTestOnTable(TestTableMapReduce.java:151) > at > org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.testMultiRegionTable(TestTableMapReduce.java:129) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:616) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47) > at org.junit.rules.RunRules.evaluate(RunRules.java:18) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) > at > org.junit.inter
[jira] [Commented] (HBASE-6417) hbck merges .META. regions if there's an old leftover
[ https://issues.apache.org/jira/browse/HBASE-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418769#comment-13418769 ] Jonathan Hsieh commented on HBASE-6417: --- Did you keep a copy of the hbck details before you ran the -repair option? > hbck merges .META. regions if there's an old leftover > - > > Key: HBASE-6417 > URL: https://issues.apache.org/jira/browse/HBASE-6417 > Project: HBase > Issue Type: Bug >Reporter: Jean-Daniel Cryans > Fix For: 0.96.0, 0.94.2 > > Attachments: hbck.log > > > Trying to see what caused HBASE-6310, one of the things I figured is that the > bad .META. row is actually one from the time that we were permitting meta > splitting and that folder had just been staying there for a while. > So I tried to recreate the issue with -repair and it merged my good .META. > region with the one that's 3 years old that also has the same start key. I > ended up with a brand new .META. region! > I'll be attaching the full log in a separate file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418771#comment-13418771 ] Lars Hofhansl commented on HBASE-6389: -- I'd like to leave this with 0.94.2. Unless you think this must go into 0.94.1 > Modify the conditions to ensure that Master waits for sufficient number of > Region Servers before starting region assignments > > > Key: HBASE-6389 > URL: https://issues.apache.org/jira/browse/HBASE-6389 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.94.0, 0.96.0 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Fix For: 0.96.0, 0.94.2 > > Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, > HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt > > > Continuing from HBASE-6375. > It seems I was mistaken in my assumption that changing the value of > "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from > default of 1) can help prevent assignment of all regions to one (or a small > number of) region server(s). > While this was the case in 0.90.x and 0.92.x, the behavior has changed in > 0.94.0 onwards to address HBASE-4993. > From 0.94.0 onwards, Master will proceed immediately after the timeout has > lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not > reached. > Reading the current conditions of waitForRegionServers() clarifies it > {code:title=ServerManager.java (trunk rev:1360470)} > > 581 /** > 582 * Wait for the region servers to report in. > 583 * We will wait until one of this condition is met: > 584 * - the master is stopped > 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached > 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of > 587 *region servers is reached > 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached > AND > 589 * there have been no new region server in for > 590 * 'hbase.master.wait.on.regionservers.interval' time > 591 * > 592 * @throws InterruptedException > 593 */ > 594 public void waitForRegionServers(MonitoredTask status) > 595 throws InterruptedException { > > > 612 while ( > 613 !this.master.isStopped() && > 614 slept < timeout && > 615 count < maxToStart && > 616 (lastCountChange+interval > now || count < minToStart) > 617 ){ > > {code} > So with the current conditions, the wait will end as soon as timeout is > reached even lesser number of RS have checked-in with the Master and the > master will proceed with the region assignment among these RSes alone. > As mentioned in > -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, > and I concur, this could have disastrous effect in large cluster especially > now that MSLAB is turned on. > To enforce the required quorum as specified by > "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, > these conditions need to be modified as following > {code:title=ServerManager.java} > .. > /** >* Wait for the region servers to report in. >* We will wait until one of this condition is met: >* - the master is stopped >* - the 'hbase.master.wait.on.regionservers.maxtostart' number of >*region servers is reached >* - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND >* there have been no new region server in for >* 'hbase.master.wait.on.regionservers.interval' time AND >* the 'hbase.master.wait.on.regionservers.timeout' is reached >* >* @throws InterruptedException >*/ > public void waitForRegionServers(MonitoredTask status) > .. > .. > int minToStart = this.master.getConfiguration(). > getInt("hbase.master.wait.on.regionservers.mintostart", 1); > int maxToStart = this.master.getConfiguration(). > getInt("hbase.master.wait.on.regionservers.maxtostart", > Integer.MAX_VALUE); > if (maxToStart < minToStart) { > maxToStart = minToStart; > } > .. > .. > while ( > !this.master.isStopped() && > count < maxToStart && > (lastCountChange+interval > now || timeout > slept || count < > minToStart) > ){ > .. > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6393) Decouple audit event creation from storage in AccessController
[ https://issues.apache.org/jira/browse/HBASE-6393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418768#comment-13418768 ] Hadoop QA commented on HBASE-6393: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12537256/hbase-6393-v1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 15 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2414//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2414//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2414//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2414//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2414//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2414//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2414//console This message is automatically generated. > Decouple audit event creation from storage in AccessController > -- > > Key: HBASE-6393 > URL: https://issues.apache.org/jira/browse/HBASE-6393 > Project: HBase > Issue Type: Brainstorming > Components: security >Affects Versions: 0.96.0 >Reporter: Marcelo Vanzin > Attachments: hbase-6393-v1.patch > > > Currently, AccessControler takes care of both generating audit events (by > performing access checks) and storing them (by creating a log message and > writing it to the AUDITLOG logger). > This makes the logging system the only way to catch audit events. It means > that if someone wants to do something fancier (like writing these records to > a database somewhere), they need to hack through the logging system, and > parse the messages generated by AccessController, which is not optimal. > The attached patch decouples generation and storage by introducing a new > interface, used by AccessController, to log the audit events. The current, > log-based storage is kept in place so that current users won't be affected by > the change. > I'm filing this as an RFC at this point, so the patch is not totally clean; > it's on top of HBase 0.92 (which is easier for me to test) and doesn't have > any unit tests, for starters. But the changes should be very similar on trunk > - I don't remember changes in this particular area of the code between those > versions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418767#comment-13418767 ] Hadoop QA commented on HBASE-6389: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12537258/org.apache.hadoop.hbase.TestZooKeeper-output.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 10 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2415//console This message is automatically generated. > Modify the conditions to ensure that Master waits for sufficient number of > Region Servers before starting region assignments > > > Key: HBASE-6389 > URL: https://issues.apache.org/jira/browse/HBASE-6389 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.94.0, 0.96.0 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Fix For: 0.96.0, 0.94.2 > > Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, > HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt > > > Continuing from HBASE-6375. > It seems I was mistaken in my assumption that changing the value of > "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from > default of 1) can help prevent assignment of all regions to one (or a small > number of) region server(s). > While this was the case in 0.90.x and 0.92.x, the behavior has changed in > 0.94.0 onwards to address HBASE-4993. > From 0.94.0 onwards, Master will proceed immediately after the timeout has > lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not > reached. > Reading the current conditions of waitForRegionServers() clarifies it > {code:title=ServerManager.java (trunk rev:1360470)} > > 581 /** > 582 * Wait for the region servers to report in. > 583 * We will wait until one of this condition is met: > 584 * - the master is stopped > 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached > 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of > 587 *region servers is reached > 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached > AND > 589 * there have been no new region server in for > 590 * 'hbase.master.wait.on.regionservers.interval' time > 591 * > 592 * @throws InterruptedException > 593 */ > 594 public void waitForRegionServers(MonitoredTask status) > 595 throws InterruptedException { > > > 612 while ( > 613 !this.master.isStopped() && > 614 slept < timeout && > 615 count < maxToStart && > 616 (lastCountChange+interval > now || count < minToStart) > 617 ){ > > {code} > So with the current conditions, the wait will end as soon as timeout is > reached even lesser number of RS have checked-in with the Master and the > master will proceed with the region assignment among these RSes alone. > As mentioned in > -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, > and I concur, this could have disastrous effect in large cluster especially > now that MSLAB is turned on. > To enforce the required quorum as specified by > "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, > these conditions need to be modified as following > {code:title=ServerManager.java} > .. > /** >* Wait for the region servers to report in. >* We will wait until one of this condition is met: >* - the master is stopped >* - the 'hbase.master.wait.on.regionservers.maxtostart' number of >*region servers is reached >* - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND >* there have been no new region server in for >* 'hbase.master.wait.on.regionservers.interval' time AND >* the 'hbase.master.wait.on.regionservers.timeout' is reached >* >* @throws InterruptedException >*/ > public void waitForRegionServers(MonitoredTask status) > .. > .. > int minToStart = this.master.getConfiguration(). > getInt("hbase.master.wait.on.regionservers.mintostart", 1); > int maxToStart = this.master.getConfiguration(). > getInt("hbase.master.wait.on.regionservers.maxtostart", > Integer.MAX_VALUE); > if (maxToStart < minToStart) { > maxToStart = minToStart; > } > .. > .. > while ( > !this.master
[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418766#comment-13418766 ] stack commented on HBASE-6389: -- @Aditya Makes sense. You got what you needed from Ted? Let us know. Thanks. > Modify the conditions to ensure that Master waits for sufficient number of > Region Servers before starting region assignments > > > Key: HBASE-6389 > URL: https://issues.apache.org/jira/browse/HBASE-6389 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.94.0, 0.96.0 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Fix For: 0.96.0, 0.94.2 > > Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, > HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt > > > Continuing from HBASE-6375. > It seems I was mistaken in my assumption that changing the value of > "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from > default of 1) can help prevent assignment of all regions to one (or a small > number of) region server(s). > While this was the case in 0.90.x and 0.92.x, the behavior has changed in > 0.94.0 onwards to address HBASE-4993. > From 0.94.0 onwards, Master will proceed immediately after the timeout has > lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not > reached. > Reading the current conditions of waitForRegionServers() clarifies it > {code:title=ServerManager.java (trunk rev:1360470)} > > 581 /** > 582 * Wait for the region servers to report in. > 583 * We will wait until one of this condition is met: > 584 * - the master is stopped > 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached > 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of > 587 *region servers is reached > 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached > AND > 589 * there have been no new region server in for > 590 * 'hbase.master.wait.on.regionservers.interval' time > 591 * > 592 * @throws InterruptedException > 593 */ > 594 public void waitForRegionServers(MonitoredTask status) > 595 throws InterruptedException { > > > 612 while ( > 613 !this.master.isStopped() && > 614 slept < timeout && > 615 count < maxToStart && > 616 (lastCountChange+interval > now || count < minToStart) > 617 ){ > > {code} > So with the current conditions, the wait will end as soon as timeout is > reached even lesser number of RS have checked-in with the Master and the > master will proceed with the region assignment among these RSes alone. > As mentioned in > -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, > and I concur, this could have disastrous effect in large cluster especially > now that MSLAB is turned on. > To enforce the required quorum as specified by > "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, > these conditions need to be modified as following > {code:title=ServerManager.java} > .. > /** >* Wait for the region servers to report in. >* We will wait until one of this condition is met: >* - the master is stopped >* - the 'hbase.master.wait.on.regionservers.maxtostart' number of >*region servers is reached >* - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND >* there have been no new region server in for >* 'hbase.master.wait.on.regionservers.interval' time AND >* the 'hbase.master.wait.on.regionservers.timeout' is reached >* >* @throws InterruptedException >*/ > public void waitForRegionServers(MonitoredTask status) > .. > .. > int minToStart = this.master.getConfiguration(). > getInt("hbase.master.wait.on.regionservers.mintostart", 1); > int maxToStart = this.master.getConfiguration(). > getInt("hbase.master.wait.on.regionservers.maxtostart", > Integer.MAX_VALUE); > if (maxToStart < minToStart) { > maxToStart = minToStart; > } > .. > .. > while ( > !this.master.isStopped() && > count < maxToStart && > (lastCountChange+interval > now || timeout > slept || count < > minToStart) > ){ > .. > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5966) MapReduce based tests broken on Hadoop 2.0.0-alpha
[ https://issues.apache.org/jira/browse/HBASE-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Chanan updated HBASE-5966: -- Attachment: HBASE-5966-94.patch Attached patch for 0.94. Ran TestTableMapReduce against both 1.0 and 2.0 hadoop profiles, both passed: mvn test -PlocalTests -Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce --- T E S T S --- Running org.apache.hadoop.hbase.mapreduce.TestTableMapReduce Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 188.087 sec Results : Tests run: 1, Failures: 0, Errors: 0, Skipped: 0 mvn test -PlocalTests -Dhadoop.profile=2.0 -Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce --- T E S T S --- Running org.apache.hadoop.hbase.mapreduce.TestTableMapReduce Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 167.49 sec Results : Tests run: 1, Failures: 0, Errors: 0, Skipped: 0 > MapReduce based tests broken on Hadoop 2.0.0-alpha > -- > > Key: HBASE-5966 > URL: https://issues.apache.org/jira/browse/HBASE-5966 > Project: HBase > Issue Type: Bug > Components: mapred, mapreduce, test >Affects Versions: 0.94.0, 0.96.0 > Environment: Hadoop 2.0.0-alpha-SNAPSHOT, HBase 0.94.0-SNAPSHOT, > Ubuntu 12.04 LTS (GNU/Linux 3.2.0-24-generic x86_64) >Reporter: Andrew Purtell >Assignee: Jimmy Xiang > Fix For: 0.96.0, 0.94.1 > > Attachments: HBASE-5966-1.patch, HBASE-5966-94.patch, > HBASE-5966.patch, hbase-5966.patch > > > Some fairly recent change in Hadoop 2.0.0-alpha has broken our MapReduce test > rigging. Below is a representative error, can be easily reproduced with: > {noformat} > mvn -PlocalTests -Psecurity \ > -Dhadoop.profile=23 -Dhadoop.version=2.0.0-SNAPSHOT \ > clean test \ > -Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce > {noformat} > And the result: > {noformat} > --- > T E S T S > --- > Running org.apache.hadoop.hbase.mapreduce.TestTableMapReduce > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec > <<< FAILURE! > --- > Test set: org.apache.hadoop.hbase.mapreduce.TestTableMapReduce > --- > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec > <<< FAILURE! > testMultiRegionTable(org.apache.hadoop.hbase.mapreduce.TestTableMapReduce) > Time elapsed: 21.935 sec <<< ERROR! > java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135) > at > org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getNewApplication(ClientRMProtocolPBClientImpl.java:134) > at > org.apache.hadoop.mapred.ResourceMgrDelegate.getNewJobID(ResourceMgrDelegate.java:183) > at org.apache.hadoop.mapred.YARNRunner.getNewJobID(YARNRunner.java:216) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:339) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1226) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1223) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:416) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1223) > at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1244) > at > org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.runTestOnTable(TestTableMapReduce.java:151) > at > org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.testMultiRegionTable(TestTableMapReduce.java:129) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:616) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) > at > org.junit.internal.run
[jira] [Resolved] (HBASE-6319) ReplicationSource can call terminate on itself and deadlock
[ https://issues.apache.org/jira/browse/HBASE-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans resolved HBASE-6319. --- Resolution: Fixed Fix Version/s: (was: 0.90.8) Hadoop Flags: Reviewed Committed to 0.92 and 0.94, skipping 0.90 like HBASE-6325. Trunk was already fixed. > ReplicationSource can call terminate on itself and deadlock > --- > > Key: HBASE-6319 > URL: https://issues.apache.org/jira/browse/HBASE-6319 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.6, 0.92.1, 0.94.0 >Reporter: Jean-Daniel Cryans >Assignee: Jean-Daniel Cryans > Fix For: 0.92.2, 0.94.1 > > Attachments: HBASE-6319-0.92.patch > > > In a few places in the ReplicationSource code calls terminate on itself which > is a problem since in terminate() we wait on that thread to die. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4470) ServerNotRunningException coming out of assignRootAndMeta kills the Master
[ https://issues.apache.org/jira/browse/HBASE-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418753#comment-13418753 ] Hadoop QA commented on HBASE-4470: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12537246/HBASE-4470-v2-trunk.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 12 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2413//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2413//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2413//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2413//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2413//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2413//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2413//console This message is automatically generated. > ServerNotRunningException coming out of assignRootAndMeta kills the Master > -- > > Key: HBASE-4470 > URL: https://issues.apache.org/jira/browse/HBASE-4470 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.4 >Reporter: Jean-Daniel Cryans >Assignee: Gregory Chanan >Priority: Critical > Fix For: 0.90.7 > > Attachments: HBASE-4470-90.patch, HBASE-4470-v2-90.patch, > HBASE-4470-v2-92_94.patch, HBASE-4470-v2-trunk.patch > > > I'm surprised we still have issues like that and I didn't get a hit while > googling so forgive me if there's already a jira about it. > When the master starts it verifies the locations of root and meta before > assigning them, if the server is started but not running you'll get this: > {quote} > 2011-09-23 04:47:44,859 WARN > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: > RemoteException connecting to RS > org.apache.hadoop.ipc.RemoteException: > org.apache.hadoop.hbase.ipc.ServerNotRunningException: Server is not running > yet > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1038) > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:771) > at > org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257) > at $Proxy6.getProtocolVersion(Unknown Source) > at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419) > at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393) > at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444) > at > org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:969) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:388) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:287) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:484) > at > org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:441) > at > org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:388) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:282) > {quote} > I hit that 3-4 times this week while debugging something else. The worst is > that when you restart the master it sees that as a failover, but none of the > regions are assigned so it takes an eternity to get back fully online. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA adminis
[jira] [Resolved] (HBASE-6325) [replication] Race in ReplicationSourceManager.init can initiate a failover even if the node is alive
[ https://issues.apache.org/jira/browse/HBASE-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans resolved HBASE-6325. --- Resolution: Fixed Fix Version/s: (was: 0.90.8) Hadoop Flags: Reviewed Committed to 0.92, 0.94 and trunk. Not caring about 0.90 either. > [replication] Race in ReplicationSourceManager.init can initiate a failover > even if the node is alive > - > > Key: HBASE-6325 > URL: https://issues.apache.org/jira/browse/HBASE-6325 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.6, 0.92.1, 0.94.0 >Reporter: Jean-Daniel Cryans >Assignee: Jean-Daniel Cryans > Fix For: 0.92.2, 0.96.0, 0.94.1 > > Attachments: HBASE-6325-0.92-v2.patch, HBASE-6325-0.92.patch > > > Yet another bug found during the leap second madness, it's possible to miss > the registration of new region servers so that in > ReplicationSourceManager.init we start the failover of a live and replicating > region server. I don't think there's data loss but the RS that's being failed > over will die on: > {noformat} > 2012-07-01 06:25:15,604 FATAL > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server > sv4r23s48,10304,1341112194623: Writing replication status > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for > /hbase/replication/rs/sv4r23s48,10304,1341112194623/4/sv4r23s48%2C10304%2C1341112194623.1341112195369 > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:111) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246) > at > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372) > at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:655) > at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:697) > at > org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:470) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:154) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:607) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:368) > {noformat} > It seems to me that just refreshing {{otherRegionServers}} after getting the > list of {{currentReplicators}} would be enough to fix this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418752#comment-13418752 ] Zhihong Ted Yu commented on HBASE-6389: --- Looking at https://builds.apache.org/job/PreCommit-HBASE-Build/2406/console, there was still some hanging test although I wasn't able to find which test hung. > Modify the conditions to ensure that Master waits for sufficient number of > Region Servers before starting region assignments > > > Key: HBASE-6389 > URL: https://issues.apache.org/jira/browse/HBASE-6389 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.94.0, 0.96.0 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Fix For: 0.96.0, 0.94.2 > > Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, > HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt > > > Continuing from HBASE-6375. > It seems I was mistaken in my assumption that changing the value of > "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from > default of 1) can help prevent assignment of all regions to one (or a small > number of) region server(s). > While this was the case in 0.90.x and 0.92.x, the behavior has changed in > 0.94.0 onwards to address HBASE-4993. > From 0.94.0 onwards, Master will proceed immediately after the timeout has > lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not > reached. > Reading the current conditions of waitForRegionServers() clarifies it > {code:title=ServerManager.java (trunk rev:1360470)} > > 581 /** > 582 * Wait for the region servers to report in. > 583 * We will wait until one of this condition is met: > 584 * - the master is stopped > 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached > 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of > 587 *region servers is reached > 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached > AND > 589 * there have been no new region server in for > 590 * 'hbase.master.wait.on.regionservers.interval' time > 591 * > 592 * @throws InterruptedException > 593 */ > 594 public void waitForRegionServers(MonitoredTask status) > 595 throws InterruptedException { > > > 612 while ( > 613 !this.master.isStopped() && > 614 slept < timeout && > 615 count < maxToStart && > 616 (lastCountChange+interval > now || count < minToStart) > 617 ){ > > {code} > So with the current conditions, the wait will end as soon as timeout is > reached even lesser number of RS have checked-in with the Master and the > master will proceed with the region assignment among these RSes alone. > As mentioned in > -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, > and I concur, this could have disastrous effect in large cluster especially > now that MSLAB is turned on. > To enforce the required quorum as specified by > "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, > these conditions need to be modified as following > {code:title=ServerManager.java} > .. > /** >* Wait for the region servers to report in. >* We will wait until one of this condition is met: >* - the master is stopped >* - the 'hbase.master.wait.on.regionservers.maxtostart' number of >*region servers is reached >* - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND >* there have been no new region server in for >* 'hbase.master.wait.on.regionservers.interval' time AND >* the 'hbase.master.wait.on.regionservers.timeout' is reached >* >* @throws InterruptedException >*/ > public void waitForRegionServers(MonitoredTask status) > .. > .. > int minToStart = this.master.getConfiguration(). > getInt("hbase.master.wait.on.regionservers.mintostart", 1); > int maxToStart = this.master.getConfiguration(). > getInt("hbase.master.wait.on.regionservers.maxtostart", > Integer.MAX_VALUE); > if (maxToStart < minToStart) { > maxToStart = minToStart; > } > .. > .. > while ( > !this.master.isStopped() && > count < maxToStart && > (lastCountChange+interval > now || timeout > slept || count < > minToStart) > ){ > .. > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.js
[jira] [Commented] (HBASE-6430) Few modifications in section 2.4.2.1 of Apache HBase Reference Guide
[ https://issues.apache.org/jira/browse/HBASE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418749#comment-13418749 ] Mohammad Tariq Iqbal commented on HBASE-6430: - Thanks a lot for the support stack. I'll go through the link provided by you. I have made following changes, in case the attachment was ambiguous(I should have done it before hand. My bad) - 1- Addition of 'core-site.xml' file to point out how to give the value of 'hbase.rootdir' property so that HMaster can contact the NameNode properly. 2- /etc/hosts file modification to avoid loopback problem (as proper DNS resolution is very important in order to get Hbase work properly). 3- Modification of hbase-env.sh file to enable the use of Hbase's Zookeeper. 4- Addition of 'hbase.cluster.distributed' and 'hbase.zookeeper.property.clientPort' properties in conf/hbase-site.xml. 5- Copying hadoop-core-*.jar and commons-collections-3.2.1.jar from HADOOP_HOME/lib folder into the HBASE_HOME/lib folder to avoid any compatibility issues between Hadoop and Hbase. Apologies for my ignorance. Many thanks. > Few modifications in section 2.4.2.1 of Apache HBase Reference Guide > > > Key: HBASE-6430 > URL: https://issues.apache.org/jira/browse/HBASE-6430 > Project: HBase > Issue Type: Improvement >Reporter: Mohammad Tariq Iqbal >Priority: Minor > Attachments: HBASE-6430.txt > > > Quite often, newbies face some issues while configuring Hbase in pseudo > distributed mode. I was no exception. I would like to propose some solutions > for these problems which worked for me. If the community finds it > appropriate, I would like to apply the patch for the same. This is the first > time I am trying to do something like this, so please pardon me if I have put > it in an appropriate manner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418750#comment-13418750 ] Aditya Kishore commented on HBASE-6389: --- @Stack No, the current patch does not modify the way a live RS is evaluated, but it ensures that the dying RS's thread is actually dead before moving forward. {quote} What is the below changing doing? conf.setInt("hbase.master.wait.on.regionservers.mintostart", numSlaves); conf.setInt("hbase.master.wait.on.regionservers.maxtostart", numSlaves); + String count = String.valueOf(numSlaves); + conf.setIfUnset("hbase.master.wait.on.regionservers.mintostart", count); + conf.setIfUnset("hbase.master.wait.on.regionservers.maxtostart", count); {quote} This change was to preserve the values of 'mintostart' and 'maxtostart' in the configuration if the caller of HBaseTestingUtility.startMiniHBaseCluster(int, int) has set them (which was the case with TestMasterFailover#testMasterFailoverWithMockedRITOnDeadRS failure). > Modify the conditions to ensure that Master waits for sufficient number of > Region Servers before starting region assignments > > > Key: HBASE-6389 > URL: https://issues.apache.org/jira/browse/HBASE-6389 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.94.0, 0.96.0 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Fix For: 0.96.0, 0.94.2 > > Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, > HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt > > > Continuing from HBASE-6375. > It seems I was mistaken in my assumption that changing the value of > "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from > default of 1) can help prevent assignment of all regions to one (or a small > number of) region server(s). > While this was the case in 0.90.x and 0.92.x, the behavior has changed in > 0.94.0 onwards to address HBASE-4993. > From 0.94.0 onwards, Master will proceed immediately after the timeout has > lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not > reached. > Reading the current conditions of waitForRegionServers() clarifies it > {code:title=ServerManager.java (trunk rev:1360470)} > > 581 /** > 582 * Wait for the region servers to report in. > 583 * We will wait until one of this condition is met: > 584 * - the master is stopped > 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached > 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of > 587 *region servers is reached > 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached > AND > 589 * there have been no new region server in for > 590 * 'hbase.master.wait.on.regionservers.interval' time > 591 * > 592 * @throws InterruptedException > 593 */ > 594 public void waitForRegionServers(MonitoredTask status) > 595 throws InterruptedException { > > > 612 while ( > 613 !this.master.isStopped() && > 614 slept < timeout && > 615 count < maxToStart && > 616 (lastCountChange+interval > now || count < minToStart) > 617 ){ > > {code} > So with the current conditions, the wait will end as soon as timeout is > reached even lesser number of RS have checked-in with the Master and the > master will proceed with the region assignment among these RSes alone. > As mentioned in > -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, > and I concur, this could have disastrous effect in large cluster especially > now that MSLAB is turned on. > To enforce the required quorum as specified by > "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, > these conditions need to be modified as following > {code:title=ServerManager.java} > .. > /** >* Wait for the region servers to report in. >* We will wait until one of this condition is met: >* - the master is stopped >* - the 'hbase.master.wait.on.regionservers.maxtostart' number of >*region servers is reached >* - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND >* there have been no new region server in for >* 'hbase.master.wait.on.regionservers.interval' time AND >* the 'hbase.master.wait.on.regionservers.timeout' is reached >* >* @throws InterruptedException >*/ > public void waitForRegionServers(MonitoredTask status) > .. > .. > int minToStart = this.master.getConfiguration(). >
[jira] [Commented] (HBASE-5985) TestMetaMigrationRemovingHTD failed with HADOOP 2.0.0
[ https://issues.apache.org/jira/browse/HBASE-5985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418747#comment-13418747 ] Hudson commented on HBASE-5985: --- Integrated in HBase-0.94 #342 (See [https://builds.apache.org/job/HBase-0.94/342/]) HBASE-5985 TestMetaMigrationRemovingHTD failed with HADOOP 2.0.0 (Revision 1363561) Result = SUCCESS jxiang : Files : * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/client/TestMetaMigrationRemovingHTD.java > TestMetaMigrationRemovingHTD failed with HADOOP 2.0.0 > - > > Key: HBASE-5985 > URL: https://issues.apache.org/jira/browse/HBASE-5985 > Project: HBase > Issue Type: Test > Components: test >Affects Versions: 0.96.0 >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang > Fix For: 0.96.0, 0.94.1 > > Attachments: hbase-5985.patch > > > --- > Test set: org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD > --- > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 3.448 sec <<< > FAILURE! > org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD Time elapsed: 0 > sec <<< ERROR! > java.io.IOException: Failed put; errcode=1 > at > org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD.doFsCommand(TestMetaMigrationRemovingHTD.java:124) > at > org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD.setUpBeforeClass(TestMetaMigrationRemovingHTD.java:80) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30) > at org.junit.runners.ParentRunner.run(ParentRunner.java:300) > at > org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Ted Yu updated HBASE-6389: -- Attachment: org.apache.hadoop.hbase.TestZooKeeper-output.txt Here was the test output from yesterday. > Modify the conditions to ensure that Master waits for sufficient number of > Region Servers before starting region assignments > > > Key: HBASE-6389 > URL: https://issues.apache.org/jira/browse/HBASE-6389 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.94.0, 0.96.0 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Fix For: 0.96.0, 0.94.2 > > Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, > HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt > > > Continuing from HBASE-6375. > It seems I was mistaken in my assumption that changing the value of > "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from > default of 1) can help prevent assignment of all regions to one (or a small > number of) region server(s). > While this was the case in 0.90.x and 0.92.x, the behavior has changed in > 0.94.0 onwards to address HBASE-4993. > From 0.94.0 onwards, Master will proceed immediately after the timeout has > lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not > reached. > Reading the current conditions of waitForRegionServers() clarifies it > {code:title=ServerManager.java (trunk rev:1360470)} > > 581 /** > 582 * Wait for the region servers to report in. > 583 * We will wait until one of this condition is met: > 584 * - the master is stopped > 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached > 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of > 587 *region servers is reached > 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached > AND > 589 * there have been no new region server in for > 590 * 'hbase.master.wait.on.regionservers.interval' time > 591 * > 592 * @throws InterruptedException > 593 */ > 594 public void waitForRegionServers(MonitoredTask status) > 595 throws InterruptedException { > > > 612 while ( > 613 !this.master.isStopped() && > 614 slept < timeout && > 615 count < maxToStart && > 616 (lastCountChange+interval > now || count < minToStart) > 617 ){ > > {code} > So with the current conditions, the wait will end as soon as timeout is > reached even lesser number of RS have checked-in with the Master and the > master will proceed with the region assignment among these RSes alone. > As mentioned in > -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, > and I concur, this could have disastrous effect in large cluster especially > now that MSLAB is turned on. > To enforce the required quorum as specified by > "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, > these conditions need to be modified as following > {code:title=ServerManager.java} > .. > /** >* Wait for the region servers to report in. >* We will wait until one of this condition is met: >* - the master is stopped >* - the 'hbase.master.wait.on.regionservers.maxtostart' number of >*region servers is reached >* - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND >* there have been no new region server in for >* 'hbase.master.wait.on.regionservers.interval' time AND >* the 'hbase.master.wait.on.regionservers.timeout' is reached >* >* @throws InterruptedException >*/ > public void waitForRegionServers(MonitoredTask status) > .. > .. > int minToStart = this.master.getConfiguration(). > getInt("hbase.master.wait.on.regionservers.mintostart", 1); > int maxToStart = this.master.getConfiguration(). > getInt("hbase.master.wait.on.regionservers.maxtostart", > Integer.MAX_VALUE); > if (maxToStart < minToStart) { > maxToStart = minToStart; > } > .. > .. > while ( > !this.master.isStopped() && > count < maxToStart && > (lastCountChange+interval > now || timeout > slept || count < > minToStart) > ){ > .. > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418744#comment-13418744 ] Aditya Kishore commented on HBASE-6389: --- Unfortunately, even after repeated attempts, I am not able to fail the test after applying the last patch. But I do have a theory. Could you please test the last patch once again with debug logging enabled and send me the log. > Modify the conditions to ensure that Master waits for sufficient number of > Region Servers before starting region assignments > > > Key: HBASE-6389 > URL: https://issues.apache.org/jira/browse/HBASE-6389 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.94.0, 0.96.0 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Fix For: 0.96.0, 0.94.2 > > Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, > HBASE-6389_trunk.patch > > > Continuing from HBASE-6375. > It seems I was mistaken in my assumption that changing the value of > "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from > default of 1) can help prevent assignment of all regions to one (or a small > number of) region server(s). > While this was the case in 0.90.x and 0.92.x, the behavior has changed in > 0.94.0 onwards to address HBASE-4993. > From 0.94.0 onwards, Master will proceed immediately after the timeout has > lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not > reached. > Reading the current conditions of waitForRegionServers() clarifies it > {code:title=ServerManager.java (trunk rev:1360470)} > > 581 /** > 582 * Wait for the region servers to report in. > 583 * We will wait until one of this condition is met: > 584 * - the master is stopped > 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached > 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of > 587 *region servers is reached > 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached > AND > 589 * there have been no new region server in for > 590 * 'hbase.master.wait.on.regionservers.interval' time > 591 * > 592 * @throws InterruptedException > 593 */ > 594 public void waitForRegionServers(MonitoredTask status) > 595 throws InterruptedException { > > > 612 while ( > 613 !this.master.isStopped() && > 614 slept < timeout && > 615 count < maxToStart && > 616 (lastCountChange+interval > now || count < minToStart) > 617 ){ > > {code} > So with the current conditions, the wait will end as soon as timeout is > reached even lesser number of RS have checked-in with the Master and the > master will proceed with the region assignment among these RSes alone. > As mentioned in > -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, > and I concur, this could have disastrous effect in large cluster especially > now that MSLAB is turned on. > To enforce the required quorum as specified by > "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, > these conditions need to be modified as following > {code:title=ServerManager.java} > .. > /** >* Wait for the region servers to report in. >* We will wait until one of this condition is met: >* - the master is stopped >* - the 'hbase.master.wait.on.regionservers.maxtostart' number of >*region servers is reached >* - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND >* there have been no new region server in for >* 'hbase.master.wait.on.regionservers.interval' time AND >* the 'hbase.master.wait.on.regionservers.timeout' is reached >* >* @throws InterruptedException >*/ > public void waitForRegionServers(MonitoredTask status) > .. > .. > int minToStart = this.master.getConfiguration(). > getInt("hbase.master.wait.on.regionservers.mintostart", 1); > int maxToStart = this.master.getConfiguration(). > getInt("hbase.master.wait.on.regionservers.maxtostart", > Integer.MAX_VALUE); > if (maxToStart < minToStart) { > maxToStart = minToStart; > } > .. > .. > while ( > !this.master.isStopped() && > count < maxToStart && > (lastCountChange+interval > now || timeout > slept || count < > minToStart) > ){ > .. > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdmini
[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418742#comment-13418742 ] stack commented on HBASE-6389: -- @Aditya Thanks for the debugging that went into your figuring the above. bq. The precondition of each test evaluates and ensure than a minimum of two region servers are online (by testing if their threads are "alive" and not by testing their ZK node or connecting to it). Does this patch change how we evaluate "alive" regionservers? (If not should, given your debug above, it seems like a good change for HTU). What is the below changing doing? {code} -conf.setInt("hbase.master.wait.on.regionservers.mintostart", numSlaves); -conf.setInt("hbase.master.wait.on.regionservers.maxtostart", numSlaves); +String count = String.valueOf(numSlaves); +conf.setIfUnset("hbase.master.wait.on.regionservers.mintostart", count); +conf.setIfUnset("hbase.master.wait.on.regionservers.maxtostart", count); {code] Thanks. > Modify the conditions to ensure that Master waits for sufficient number of > Region Servers before starting region assignments > > > Key: HBASE-6389 > URL: https://issues.apache.org/jira/browse/HBASE-6389 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.94.0, 0.96.0 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Fix For: 0.96.0, 0.94.2 > > Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, > HBASE-6389_trunk.patch > > > Continuing from HBASE-6375. > It seems I was mistaken in my assumption that changing the value of > "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from > default of 1) can help prevent assignment of all regions to one (or a small > number of) region server(s). > While this was the case in 0.90.x and 0.92.x, the behavior has changed in > 0.94.0 onwards to address HBASE-4993. > From 0.94.0 onwards, Master will proceed immediately after the timeout has > lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not > reached. > Reading the current conditions of waitForRegionServers() clarifies it > {code:title=ServerManager.java (trunk rev:1360470)} > > 581 /** > 582 * Wait for the region servers to report in. > 583 * We will wait until one of this condition is met: > 584 * - the master is stopped > 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached > 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of > 587 *region servers is reached > 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached > AND > 589 * there have been no new region server in for > 590 * 'hbase.master.wait.on.regionservers.interval' time > 591 * > 592 * @throws InterruptedException > 593 */ > 594 public void waitForRegionServers(MonitoredTask status) > 595 throws InterruptedException { > > > 612 while ( > 613 !this.master.isStopped() && > 614 slept < timeout && > 615 count < maxToStart && > 616 (lastCountChange+interval > now || count < minToStart) > 617 ){ > > {code} > So with the current conditions, the wait will end as soon as timeout is > reached even lesser number of RS have checked-in with the Master and the > master will proceed with the region assignment among these RSes alone. > As mentioned in > -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, > and I concur, this could have disastrous effect in large cluster especially > now that MSLAB is turned on. > To enforce the required quorum as specified by > "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, > these conditions need to be modified as following > {code:title=ServerManager.java} > .. > /** >* Wait for the region servers to report in. >* We will wait until one of this condition is met: >* - the master is stopped >* - the 'hbase.master.wait.on.regionservers.maxtostart' number of >*region servers is reached >* - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND >* there have been no new region server in for >* 'hbase.master.wait.on.regionservers.interval' time AND >* the 'hbase.master.wait.on.regionservers.timeout' is reached >* >* @throws InterruptedException >*/ > public void waitForRegionServers(MonitoredTask status) > .. > .. > int minToStart = this.master.getConfiguration(). > getInt("hbase.master.wait.on.regionservers.mintostart", 1); > int maxToSta
[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418738#comment-13418738 ] Zhihong Ted Yu commented on HBASE-6389: --- Thanks for your explanation. Have you seen the test failure that I described above @ 19/Jul/12 03:34 ? > Modify the conditions to ensure that Master waits for sufficient number of > Region Servers before starting region assignments > > > Key: HBASE-6389 > URL: https://issues.apache.org/jira/browse/HBASE-6389 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.94.0, 0.96.0 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Fix For: 0.96.0, 0.94.2 > > Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, > HBASE-6389_trunk.patch > > > Continuing from HBASE-6375. > It seems I was mistaken in my assumption that changing the value of > "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from > default of 1) can help prevent assignment of all regions to one (or a small > number of) region server(s). > While this was the case in 0.90.x and 0.92.x, the behavior has changed in > 0.94.0 onwards to address HBASE-4993. > From 0.94.0 onwards, Master will proceed immediately after the timeout has > lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not > reached. > Reading the current conditions of waitForRegionServers() clarifies it > {code:title=ServerManager.java (trunk rev:1360470)} > > 581 /** > 582 * Wait for the region servers to report in. > 583 * We will wait until one of this condition is met: > 584 * - the master is stopped > 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached > 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of > 587 *region servers is reached > 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached > AND > 589 * there have been no new region server in for > 590 * 'hbase.master.wait.on.regionservers.interval' time > 591 * > 592 * @throws InterruptedException > 593 */ > 594 public void waitForRegionServers(MonitoredTask status) > 595 throws InterruptedException { > > > 612 while ( > 613 !this.master.isStopped() && > 614 slept < timeout && > 615 count < maxToStart && > 616 (lastCountChange+interval > now || count < minToStart) > 617 ){ > > {code} > So with the current conditions, the wait will end as soon as timeout is > reached even lesser number of RS have checked-in with the Master and the > master will proceed with the region assignment among these RSes alone. > As mentioned in > -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-, > and I concur, this could have disastrous effect in large cluster especially > now that MSLAB is turned on. > To enforce the required quorum as specified by > "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, > these conditions need to be modified as following > {code:title=ServerManager.java} > .. > /** >* Wait for the region servers to report in. >* We will wait until one of this condition is met: >* - the master is stopped >* - the 'hbase.master.wait.on.regionservers.maxtostart' number of >*region servers is reached >* - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND >* there have been no new region server in for >* 'hbase.master.wait.on.regionservers.interval' time AND >* the 'hbase.master.wait.on.regionservers.timeout' is reached >* >* @throws InterruptedException >*/ > public void waitForRegionServers(MonitoredTask status) > .. > .. > int minToStart = this.master.getConfiguration(). > getInt("hbase.master.wait.on.regionservers.mintostart", 1); > int maxToStart = this.master.getConfiguration(). > getInt("hbase.master.wait.on.regionservers.maxtostart", > Integer.MAX_VALUE); > if (maxToStart < minToStart) { > maxToStart = minToStart; > } > .. > .. > while ( > !this.master.isStopped() && > count < maxToStart && > (lastCountChange+interval > now || timeout > slept || count < > minToStart) > ){ > .. > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6430) Few modifications in section 2.4.2.1 of Apache HBase Reference Guide
[ https://issues.apache.org/jira/browse/HBASE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418735#comment-13418735 ] stack commented on HBASE-6430: -- Thank you for the helping out Mohammed. You might want to give this a review: http://hbase.apache.org/book.html#submitting.patches It tries to help you submitting patches (Its hard to tell given what you have attached, what has been changed... its also not a 'patch' file... let us know if the doc is not sufficient and we'll help you out making a patch). Good on you Mohammed. > Few modifications in section 2.4.2.1 of Apache HBase Reference Guide > > > Key: HBASE-6430 > URL: https://issues.apache.org/jira/browse/HBASE-6430 > Project: HBase > Issue Type: Improvement >Reporter: Mohammad Tariq Iqbal >Priority: Minor > Attachments: HBASE-6430.txt > > > Quite often, newbies face some issues while configuring Hbase in pseudo > distributed mode. I was no exception. I would like to propose some solutions > for these problems which worked for me. If the community finds it > appropriate, I would like to apply the patch for the same. This is the first > time I am trying to do something like this, so please pardon me if I have put > it in an appropriate manner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments
[ https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418731#comment-13418731 ] Aditya Kishore commented on HBASE-6389: --- @Ted The last patch address the exact issue listed at https://issues.apache.org/jira/browse/HBASE-6406?focusedCommentId=13417665&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13417665 What had happened is that in most test runs, the test testRegionServerSessionExpired() get launched before testMasterSessionExpired() or testMasterZKSessionRecoveryFailure(). This test testRegionServerSessionExpired() brings down one of the two region servers but this RS is not dead by the time testMasterSessionExpired() or testMasterZKSessionRecoveryFailure() starts. The precondition of each test evaluates and ensure than a minimum of two region servers are online (by testing if their threads are "alive" and not by testing their ZK node or connecting to it). So while one of the RS is shutting itself down (and its thread is still alive), and either of testMasterSessionExpired() or testMasterZKSessionRecoveryFailure() could start because the test precondition is satisfied. However, both of these test cases result in Master recovery and reinitialization which actually attempts to check for the quorum of 2 Online region servers and since there is only one region server online at this point, the initialization fails with timeout and the master is killed. By this time the dying region server's thread is dead and the precondition of the next test sees that it needs to create one region server. But since no master is running at this point, the newly created region server's run thread gets blocked in HRegionServer.blockAndCheckIfStopped() and the RS does not come online. As a result the test thread which is waiting for the RS to come online keeps waiting which is why you see the test hung in setup(). My last patch ensured that the dying RS is completely stopped before testRegionServerSessionExpired() completes so that the subsequent tests' precondition does not get fooled into thinking that the minimum server count is met and start the testcase. > Modify the conditions to ensure that Master waits for sufficient number of > Region Servers before starting region assignments > > > Key: HBASE-6389 > URL: https://issues.apache.org/jira/browse/HBASE-6389 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.94.0, 0.96.0 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Fix For: 0.96.0, 0.94.2 > > Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, > HBASE-6389_trunk.patch > > > Continuing from HBASE-6375. > It seems I was mistaken in my assumption that changing the value of > "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from > default of 1) can help prevent assignment of all regions to one (or a small > number of) region server(s). > While this was the case in 0.90.x and 0.92.x, the behavior has changed in > 0.94.0 onwards to address HBASE-4993. > From 0.94.0 onwards, Master will proceed immediately after the timeout has > lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not > reached. > Reading the current conditions of waitForRegionServers() clarifies it > {code:title=ServerManager.java (trunk rev:1360470)} > > 581 /** > 582 * Wait for the region servers to report in. > 583 * We will wait until one of this condition is met: > 584 * - the master is stopped > 585 * - the 'hbase.master.wait.on.regionservers.timeout' is reached > 586 * - the 'hbase.master.wait.on.regionservers.maxtostart' number of > 587 *region servers is reached > 588 * - the 'hbase.master.wait.on.regionservers.mintostart' is reached > AND > 589 * there have been no new region server in for > 590 * 'hbase.master.wait.on.regionservers.interval' time > 591 * > 592 * @throws InterruptedException > 593 */ > 594 public void waitForRegionServers(MonitoredTask status) > 595 throws InterruptedException { > > > 612 while ( > 613 !this.master.isStopped() && > 614 slept < timeout && > 615 count < maxToStart && > 616 (lastCountChange+interval > now || count < minToStart) > 617 ){ > > {code} > So with the current conditions, the wait will end as soon as timeout is > reached even lesser number of RS have checked-in with the Master and the > master will proceed with the region assignment among these RSes alone. > As mentioned in > -[HBASE-4993|https:/
[jira] [Commented] (HBASE-5547) Don't delete HFiles when in "backup mode"
[ https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418730#comment-13418730 ] Zhihong Ted Yu commented on HBASE-5547: --- I ran TestRegionServerCoprocessorExceptionWithAbort based on patch v16 and it passed. > Don't delete HFiles when in "backup mode" > - > > Key: HBASE-5547 > URL: https://issues.apache.org/jira/browse/HBASE-5547 > Project: HBase > Issue Type: New Feature >Reporter: Lars Hofhansl >Assignee: Jesse Yates > Fix For: 0.94.2 > > Attachments: 5547-v12.txt, 5547-v16.txt, hbase-5447-v8.patch, > hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, > java_HBASE-5547_v14.patch, java_HBASE-5547_v15.patch, > java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, > java_HBASE-5547_v7.patch > > > This came up in a discussion I had with Stack. > It would be nice if HBase could be notified that a backup is in progress (via > a znode for example) and in that case either: > 1. rename HFiles to be delete to .bck > 2. rename the HFiles into a special directory > 3. rename them to a general trash directory (which would not need to be tied > to backup mode). > That way it should be able to get a consistent backup based on HFiles (HDFS > snapshots or hard links would be better options here, but we do not have > those). > #1 makes cleanup a bit harder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6393) Decouple audit event creation from storage in AccessController
[ https://issues.apache.org/jira/browse/HBASE-6393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HBASE-6393: -- Affects Version/s: 0.96.0 Status: Patch Available (was: Open) > Decouple audit event creation from storage in AccessController > -- > > Key: HBASE-6393 > URL: https://issues.apache.org/jira/browse/HBASE-6393 > Project: HBase > Issue Type: Brainstorming > Components: security >Affects Versions: 0.96.0 >Reporter: Marcelo Vanzin > Attachments: hbase-6393-v1.patch > > > Currently, AccessControler takes care of both generating audit events (by > performing access checks) and storing them (by creating a log message and > writing it to the AUDITLOG logger). > This makes the logging system the only way to catch audit events. It means > that if someone wants to do something fancier (like writing these records to > a database somewhere), they need to hack through the logging system, and > parse the messages generated by AccessController, which is not optimal. > The attached patch decouples generation and storage by introducing a new > interface, used by AccessController, to log the audit events. The current, > log-based storage is kept in place so that current users won't be affected by > the change. > I'm filing this as an RFC at this point, so the patch is not totally clean; > it's on top of HBase 0.92 (which is easier for me to test) and doesn't have > any unit tests, for starters. But the changes should be very similar on trunk > - I don't remember changes in this particular area of the code between those > versions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6393) Decouple audit event creation from storage in AccessController
[ https://issues.apache.org/jira/browse/HBASE-6393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HBASE-6393: -- Attachment: hbase-6393-v1.patch Patch against trunk, modeled after similar change in HDFS (HDFS-3680). > Decouple audit event creation from storage in AccessController > -- > > Key: HBASE-6393 > URL: https://issues.apache.org/jira/browse/HBASE-6393 > Project: HBase > Issue Type: Brainstorming > Components: security >Affects Versions: 0.96.0 >Reporter: Marcelo Vanzin > Attachments: hbase-6393-v1.patch > > > Currently, AccessControler takes care of both generating audit events (by > performing access checks) and storing them (by creating a log message and > writing it to the AUDITLOG logger). > This makes the logging system the only way to catch audit events. It means > that if someone wants to do something fancier (like writing these records to > a database somewhere), they need to hack through the logging system, and > parse the messages generated by AccessController, which is not optimal. > The attached patch decouples generation and storage by introducing a new > interface, used by AccessController, to log the audit events. The current, > log-based storage is kept in place so that current users won't be affected by > the change. > I'm filing this as an RFC at this point, so the patch is not totally clean; > it's on top of HBase 0.92 (which is easier for me to test) and doesn't have > any unit tests, for starters. But the changes should be very similar on trunk > - I don't remember changes in this particular area of the code between those > versions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6393) Decouple audit event creation from storage in AccessController
[ https://issues.apache.org/jira/browse/HBASE-6393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HBASE-6393: -- Attachment: (was: accesslogger-v1.patch) > Decouple audit event creation from storage in AccessController > -- > > Key: HBASE-6393 > URL: https://issues.apache.org/jira/browse/HBASE-6393 > Project: HBase > Issue Type: Brainstorming > Components: security >Reporter: Marcelo Vanzin > > Currently, AccessControler takes care of both generating audit events (by > performing access checks) and storing them (by creating a log message and > writing it to the AUDITLOG logger). > This makes the logging system the only way to catch audit events. It means > that if someone wants to do something fancier (like writing these records to > a database somewhere), they need to hack through the logging system, and > parse the messages generated by AccessController, which is not optimal. > The attached patch decouples generation and storage by introducing a new > interface, used by AccessController, to log the audit events. The current, > log-based storage is kept in place so that current users won't be affected by > the change. > I'm filing this as an RFC at this point, so the patch is not totally clean; > it's on top of HBase 0.92 (which is easier for me to test) and doesn't have > any unit tests, for starters. But the changes should be very similar on trunk > - I don't remember changes in this particular area of the code between those > versions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-2877) Unnecessary byte written when serializing a Writable RPC parameter
[ https://issues.apache.org/jira/browse/HBASE-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoit Sigoure updated HBASE-2877: -- Affects Version/s: 0.90.0 0.90.1 0.90.2 0.90.3 0.90.4 0.90.5 0.90.6 0.92.0 0.92.1 0.94.0 > Unnecessary byte written when serializing a Writable RPC parameter > -- > > Key: HBASE-2877 > URL: https://issues.apache.org/jira/browse/HBASE-2877 > Project: HBase > Issue Type: Bug > Components: ipc >Affects Versions: 0.20.5, 0.89.20100621, 0.90.0, 0.90.1, 0.90.2, 0.90.3, > 0.90.4, 0.90.5, 0.90.6, 0.92.0, 0.92.1, 0.94.0 >Reporter: Benoit Sigoure >Priority: Minor > > When {{HbaseObjectWritable#writeObject}} serializes a {{Writable}} RPC > parameter, it writes its "class code" twice to the wire. {{writeClassCode}} > is already called once unconditionally at the beginning of the method, and > for {{Writable}} arguments, it's called a second time towards the end of the > method. It seems that the code is trying to deal with the "declared type" > vs. "actual type" of a parameter. The Hadoop RPC code was already doing this > before Stack changed it to use codes in r608738 for HADOOP-2519. It's not > documented when this is useful though, and I couldn't find any use case. > Every RPC I've seen so far just ends up with the same byte sent twice to the > wire. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6430) Few modifications in section 2.4.2.1 of Apache HBase Reference Guide
[ https://issues.apache.org/jira/browse/HBASE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Tariq Iqbal updated HBASE-6430: Attachment: HBASE-6430.txt Please have a look at the attachment and let me know if it requires any modification or if it not eligible to be submitted. Many thanks. > Few modifications in section 2.4.2.1 of Apache HBase Reference Guide > > > Key: HBASE-6430 > URL: https://issues.apache.org/jira/browse/HBASE-6430 > Project: HBase > Issue Type: Improvement >Reporter: Mohammad Tariq Iqbal >Priority: Minor > Attachments: HBASE-6430.txt > > > Quite often, newbies face some issues while configuring Hbase in pseudo > distributed mode. I was no exception. I would like to propose some solutions > for these problems which worked for me. If the community finds it > appropriate, I would like to apply the patch for the same. This is the first > time I am trying to do something like this, so please pardon me if I have put > it in an appropriate manner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5966) MapReduce based tests broken on Hadoop 2.0.0-alpha
[ https://issues.apache.org/jira/browse/HBASE-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5966: - Fix Version/s: 0.94.1 Discussed with Jimmy. Let's have this in 0.94.1 > MapReduce based tests broken on Hadoop 2.0.0-alpha > -- > > Key: HBASE-5966 > URL: https://issues.apache.org/jira/browse/HBASE-5966 > Project: HBase > Issue Type: Bug > Components: mapred, mapreduce, test >Affects Versions: 0.94.0, 0.96.0 > Environment: Hadoop 2.0.0-alpha-SNAPSHOT, HBase 0.94.0-SNAPSHOT, > Ubuntu 12.04 LTS (GNU/Linux 3.2.0-24-generic x86_64) >Reporter: Andrew Purtell >Assignee: Jimmy Xiang > Fix For: 0.96.0, 0.94.1 > > Attachments: HBASE-5966-1.patch, HBASE-5966.patch, hbase-5966.patch > > > Some fairly recent change in Hadoop 2.0.0-alpha has broken our MapReduce test > rigging. Below is a representative error, can be easily reproduced with: > {noformat} > mvn -PlocalTests -Psecurity \ > -Dhadoop.profile=23 -Dhadoop.version=2.0.0-SNAPSHOT \ > clean test \ > -Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce > {noformat} > And the result: > {noformat} > --- > T E S T S > --- > Running org.apache.hadoop.hbase.mapreduce.TestTableMapReduce > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec > <<< FAILURE! > --- > Test set: org.apache.hadoop.hbase.mapreduce.TestTableMapReduce > --- > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec > <<< FAILURE! > testMultiRegionTable(org.apache.hadoop.hbase.mapreduce.TestTableMapReduce) > Time elapsed: 21.935 sec <<< ERROR! > java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135) > at > org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getNewApplication(ClientRMProtocolPBClientImpl.java:134) > at > org.apache.hadoop.mapred.ResourceMgrDelegate.getNewJobID(ResourceMgrDelegate.java:183) > at org.apache.hadoop.mapred.YARNRunner.getNewJobID(YARNRunner.java:216) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:339) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1226) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1223) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:416) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1223) > at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1244) > at > org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.runTestOnTable(TestTableMapReduce.java:151) > at > org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.testMultiRegionTable(TestTableMapReduce.java:129) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:616) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47) > at org.junit.rules.RunRules.evaluate(RunRules.java:18) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) >
[jira] [Reopened] (HBASE-5966) MapReduce based tests broken on Hadoop 2.0.0-alpha
[ https://issues.apache.org/jira/browse/HBASE-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl reopened HBASE-5966: -- > MapReduce based tests broken on Hadoop 2.0.0-alpha > -- > > Key: HBASE-5966 > URL: https://issues.apache.org/jira/browse/HBASE-5966 > Project: HBase > Issue Type: Bug > Components: mapred, mapreduce, test >Affects Versions: 0.94.0, 0.96.0 > Environment: Hadoop 2.0.0-alpha-SNAPSHOT, HBase 0.94.0-SNAPSHOT, > Ubuntu 12.04 LTS (GNU/Linux 3.2.0-24-generic x86_64) >Reporter: Andrew Purtell >Assignee: Jimmy Xiang > Fix For: 0.96.0, 0.94.1 > > Attachments: HBASE-5966-1.patch, HBASE-5966.patch, hbase-5966.patch > > > Some fairly recent change in Hadoop 2.0.0-alpha has broken our MapReduce test > rigging. Below is a representative error, can be easily reproduced with: > {noformat} > mvn -PlocalTests -Psecurity \ > -Dhadoop.profile=23 -Dhadoop.version=2.0.0-SNAPSHOT \ > clean test \ > -Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce > {noformat} > And the result: > {noformat} > --- > T E S T S > --- > Running org.apache.hadoop.hbase.mapreduce.TestTableMapReduce > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec > <<< FAILURE! > --- > Test set: org.apache.hadoop.hbase.mapreduce.TestTableMapReduce > --- > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec > <<< FAILURE! > testMultiRegionTable(org.apache.hadoop.hbase.mapreduce.TestTableMapReduce) > Time elapsed: 21.935 sec <<< ERROR! > java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135) > at > org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getNewApplication(ClientRMProtocolPBClientImpl.java:134) > at > org.apache.hadoop.mapred.ResourceMgrDelegate.getNewJobID(ResourceMgrDelegate.java:183) > at org.apache.hadoop.mapred.YARNRunner.getNewJobID(YARNRunner.java:216) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:339) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1226) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1223) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:416) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1223) > at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1244) > at > org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.runTestOnTable(TestTableMapReduce.java:151) > at > org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.testMultiRegionTable(TestTableMapReduce.java:129) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:616) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47) > at org.junit.rules.RunRules.evaluate(RunRules.java:18) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:3
[jira] [Updated] (HBASE-5985) TestMetaMigrationRemovingHTD failed with HADOOP 2.0.0
[ https://issues.apache.org/jira/browse/HBASE-5985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-5985: --- Fix Version/s: 0.94.1 > TestMetaMigrationRemovingHTD failed with HADOOP 2.0.0 > - > > Key: HBASE-5985 > URL: https://issues.apache.org/jira/browse/HBASE-5985 > Project: HBase > Issue Type: Test > Components: test >Affects Versions: 0.96.0 >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang > Fix For: 0.96.0, 0.94.1 > > Attachments: hbase-5985.patch > > > --- > Test set: org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD > --- > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 3.448 sec <<< > FAILURE! > org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD Time elapsed: 0 > sec <<< ERROR! > java.io.IOException: Failed put; errcode=1 > at > org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD.doFsCommand(TestMetaMigrationRemovingHTD.java:124) > at > org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD.setUpBeforeClass(TestMetaMigrationRemovingHTD.java:80) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30) > at org.junit.runners.ParentRunner.run(ParentRunner.java:300) > at > org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6325) [replication] Race in ReplicationSourceManager.init can initiate a failover even if the node is alive
[ https://issues.apache.org/jira/browse/HBASE-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6325: - Fix Version/s: (was: 0.94.2) 0.94.1 +1 on patch > [replication] Race in ReplicationSourceManager.init can initiate a failover > even if the node is alive > - > > Key: HBASE-6325 > URL: https://issues.apache.org/jira/browse/HBASE-6325 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.6, 0.92.1, 0.94.0 >Reporter: Jean-Daniel Cryans >Assignee: Jean-Daniel Cryans > Fix For: 0.92.2, 0.96.0, 0.94.1, 0.90.8 > > Attachments: HBASE-6325-0.92-v2.patch, HBASE-6325-0.92.patch > > > Yet another bug found during the leap second madness, it's possible to miss > the registration of new region servers so that in > ReplicationSourceManager.init we start the failover of a live and replicating > region server. I don't think there's data loss but the RS that's being failed > over will die on: > {noformat} > 2012-07-01 06:25:15,604 FATAL > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server > sv4r23s48,10304,1341112194623: Writing replication status > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for > /hbase/replication/rs/sv4r23s48,10304,1341112194623/4/sv4r23s48%2C10304%2C1341112194623.1341112195369 > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:111) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246) > at > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372) > at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:655) > at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:697) > at > org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:470) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:154) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:607) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:368) > {noformat} > It seems to me that just refreshing {{otherRegionServers}} after getting the > list of {{currentReplicators}} would be enough to fix this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6319) ReplicationSource can call terminate on itself and deadlock
[ https://issues.apache.org/jira/browse/HBASE-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6319: - Fix Version/s: (was: 0.94.2) 0.94.1 +1 on patch. > ReplicationSource can call terminate on itself and deadlock > --- > > Key: HBASE-6319 > URL: https://issues.apache.org/jira/browse/HBASE-6319 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.6, 0.92.1, 0.94.0 >Reporter: Jean-Daniel Cryans >Assignee: Jean-Daniel Cryans > Fix For: 0.92.2, 0.94.1, 0.90.8 > > Attachments: HBASE-6319-0.92.patch > > > In a few places in the ReplicationSource code calls terminate on itself which > is a problem since in terminate() we wait on that thread to die. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6312) Make BlockCache eviction thresholds configurable
[ https://issues.apache.org/jira/browse/HBASE-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans updated HBASE-6312: -- Resolution: Fixed Release Note: >From now on, the block cache will use all the memory it's given as its upper >bound was raised from 85% to 99%. The lower bound for evictions, called >"minimum factor", was raised from 75% to 95% and is now configurable via >"hbase.lru.blockcache.min.factor". This means that 4% of the block cache is >evicted at a time instead of 10%, so evictions may run more often but each >will be less disruptive. Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Closed the jira and added a release note. > Make BlockCache eviction thresholds configurable > > > Key: HBASE-6312 > URL: https://issues.apache.org/jira/browse/HBASE-6312 > Project: HBase > Issue Type: Improvement > Components: io >Affects Versions: 0.94.0 >Reporter: Jie Huang >Assignee: Jie Huang >Priority: Minor > Fix For: 0.96.0 > > Attachments: hbase-6312.patch, hbase-6312_v2.patch, > hbase-6312_v3.patch > > > Some of our customers found that tuning the BlockCache eviction thresholds > made test results different in their test environment. However, those > thresholds are not configurable in the current implementation. The only way > to change those values is to re-compile the HBase source code. We wonder if > it is possible to make them configurable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5547) Don't delete HFiles when in "backup mode"
[ https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418707#comment-13418707 ] Jesse Yates commented on HBASE-5547: Assuming changes between v15 and v16 are just what Ted mentioned on his last post on RB, then I'm good. Lets give it a day or so, before we integrate, so people have time to look at RB, if they haven't yet. Failed test doesn't apply to this code. > Don't delete HFiles when in "backup mode" > - > > Key: HBASE-5547 > URL: https://issues.apache.org/jira/browse/HBASE-5547 > Project: HBase > Issue Type: New Feature >Reporter: Lars Hofhansl >Assignee: Jesse Yates > Fix For: 0.94.2 > > Attachments: 5547-v12.txt, 5547-v16.txt, hbase-5447-v8.patch, > hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, > java_HBASE-5547_v14.patch, java_HBASE-5547_v15.patch, > java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, > java_HBASE-5547_v7.patch > > > This came up in a discussion I had with Stack. > It would be nice if HBase could be notified that a backup is in progress (via > a znode for example) and in that case either: > 1. rename HFiles to be delete to .bck > 2. rename the HFiles into a special directory > 3. rename them to a general trash directory (which would not need to be tied > to backup mode). > That way it should be able to get a consistent backup based on HFiles (HDFS > snapshots or hard links would be better options here, but we do not have > those). > #1 makes cleanup a bit harder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4956) Control direct memory buffer consumption by HBaseClient
[ https://issues.apache.org/jira/browse/HBASE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418705#comment-13418705 ] Hudson commented on HBASE-4956: --- Integrated in HBase-0.94 #341 (See [https://builds.apache.org/job/HBase-0.94/341/]) HBASE-4956 Control direct memory buffer consumption by HBaseClient (Bob Copeland) (Revision 1363533) Result = SUCCESS tedyu : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/Result.java > Control direct memory buffer consumption by HBaseClient > --- > > Key: HBASE-4956 > URL: https://issues.apache.org/jira/browse/HBASE-4956 > Project: HBase > Issue Type: New Feature >Reporter: Ted Yu >Assignee: Bob Copeland > Fix For: 0.96.0, 0.94.1 > > Attachments: 4956.txt, thread_get.rb > > > As Jonathan explained here > https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357?pli=1 > , standard hbase client inadvertently consumes large amount of direct memory. > We should consider using netty for NIO-related tasks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5547) Don't delete HFiles when in "backup mode"
[ https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418690#comment-13418690 ] Hadoop QA commented on HBASE-5547: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12537240/5547-v16.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 22 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 13 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2412//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2412//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2412//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2412//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2412//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2412//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2412//console This message is automatically generated. > Don't delete HFiles when in "backup mode" > - > > Key: HBASE-5547 > URL: https://issues.apache.org/jira/browse/HBASE-5547 > Project: HBase > Issue Type: New Feature >Reporter: Lars Hofhansl >Assignee: Jesse Yates > Fix For: 0.94.2 > > Attachments: 5547-v12.txt, 5547-v16.txt, hbase-5447-v8.patch, > hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, > java_HBASE-5547_v14.patch, java_HBASE-5547_v15.patch, > java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, > java_HBASE-5547_v7.patch > > > This came up in a discussion I had with Stack. > It would be nice if HBase could be notified that a backup is in progress (via > a znode for example) and in that case either: > 1. rename HFiles to be delete to .bck > 2. rename the HFiles into a special directory > 3. rename them to a general trash directory (which would not need to be tied > to backup mode). > That way it should be able to get a consistent backup based on HFiles (HDFS > snapshots or hard links would be better options here, but we do not have > those). > #1 makes cleanup a bit harder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira