[jira] [Updated] (HBASE-6049) Serializing List containing null elements will cause NullPointerException in HbaseObjectWritable.writeObject()
[ https://issues.apache.org/jira/browse/HBASE-6049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maryann Xue updated HBASE-6049: --- Attachment: HBASE-6049-v2.patch @Zhihong updated the patch with modification to the test case. how does this look? Serializing List containing null elements will cause NullPointerException in HbaseObjectWritable.writeObject() Key: HBASE-6049 URL: https://issues.apache.org/jira/browse/HBASE-6049 Project: HBase Issue Type: Bug Components: io Affects Versions: 0.94.0 Reporter: Maryann Xue Attachments: HBASE-6049-v2.patch, HBASE-6049.patch An error case could be in Coprocessor AggregationClient, the median() function handles an empty region and returns a List Object with the first element as a Null value. NPE occurs in the RPC response stage and the response never gets sent. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6059) Replaying recovered edits would make deleted data exist again
[ https://issues.apache.org/jira/browse/HBASE-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-6059: Attachment: HBASE-6059-testcase.patch I have written the test case to reproduce the issue Replaying recovered edits would make deleted data exist again - Key: HBASE-6059 URL: https://issues.apache.org/jira/browse/HBASE-6059 Project: HBase Issue Type: Bug Components: regionserver Reporter: chunhui shen Assignee: chunhui shen Attachments: HBASE-6059-testcase.patch When we replay recovered edits, we used the minSeqId of Store, It may cause deleted data appeared again. Let's see how it happens. Suppose the region with two families(cf1,cf2) 1.put one data to the region (put r1,cf1:q1,v1) 2.move the region from server A to server B. 3.delete the data put by step 1(delete r1) 4.flush this region. 5.make major compaction for this region 6.move the region from server B to server A. 7.Abort server A 8.After the region is online, we could get the deleted data(r1,cf1:q1,v1) (When we replay recovered edits, we used the minSeqId of Store, because cf2 has no store files, so its seqId is 0, so the edit log of put data will be replayed to the region) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6059) Replaying recovered edits would make deleted data exist again
[ https://issues.apache.org/jira/browse/HBASE-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-6059: Attachment: HBASE-6059.patch In the solution patch, I use Mapbyte[], Long maxSeqIdInStores to save each store's maxSeqId, So, when replaying edit logs, we skip the edits for different stores accoring to its own maxSeqId Replaying recovered edits would make deleted data exist again - Key: HBASE-6059 URL: https://issues.apache.org/jira/browse/HBASE-6059 Project: HBase Issue Type: Bug Components: regionserver Reporter: chunhui shen Assignee: chunhui shen Attachments: HBASE-6059-testcase.patch, HBASE-6059.patch When we replay recovered edits, we used the minSeqId of Store, It may cause deleted data appeared again. Let's see how it happens. Suppose the region with two families(cf1,cf2) 1.put one data to the region (put r1,cf1:q1,v1) 2.move the region from server A to server B. 3.delete the data put by step 1(delete r1) 4.flush this region. 5.make major compaction for this region 6.move the region from server B to server A. 7.Abort server A 8.After the region is online, we could get the deleted data(r1,cf1:q1,v1) (When we replay recovered edits, we used the minSeqId of Store, because cf2 has no store files, so its seqId is 0, so the edit log of put data will be replayed to the region) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6049) Serializing List containing null elements will cause NullPointerException in HbaseObjectWritable.writeObject()
[ https://issues.apache.org/jira/browse/HBASE-6049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280038#comment-13280038 ] Hadoop QA commented on HBASE-6049: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12528398/HBASE-6049-v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 33 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.io.TestHbaseObjectWritable Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1943//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1943//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1943//console This message is automatically generated. Serializing List containing null elements will cause NullPointerException in HbaseObjectWritable.writeObject() Key: HBASE-6049 URL: https://issues.apache.org/jira/browse/HBASE-6049 Project: HBase Issue Type: Bug Components: io Affects Versions: 0.94.0 Reporter: Maryann Xue Attachments: HBASE-6049-v2.patch, HBASE-6049.patch An error case could be in Coprocessor AggregationClient, the median() function handles an empty region and returns a List Object with the first element as a Null value. NPE occurs in the RPC response stage and the response never gets sent. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6059) Replaying recovered edits would make deleted data exist again
[ https://issues.apache.org/jira/browse/HBASE-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280077#comment-13280077 ] ramkrishna.s.vasudevan commented on HBASE-6059: --- @Chunhui This is a damn good one. But still i find one problem is there in this. A similar type of problem that you have reported. Pls correct me if am wrong. In the same test case in the place where you are deleting the row 'r1' if i delete the row 'r2' also {code} del = new Delete(Bytes.toBytes(r)); htable.delete(del); resultScanner = htable.getScanner(new Scan()); count = 0; while (resultScanner.next() != null) { count++; } {code} Now my seq id from the store files will be 0 only as nothing to get after major compaction. So still the same problem is occuring. I tried to simulate this with the same test case that you added. May be we need someother way to know that the edit has been deleted out by a major compaction? Because as i see this problem that without major compaction there is no issue at all. Replaying recovered edits would make deleted data exist again - Key: HBASE-6059 URL: https://issues.apache.org/jira/browse/HBASE-6059 Project: HBase Issue Type: Bug Components: regionserver Reporter: chunhui shen Assignee: chunhui shen Attachments: HBASE-6059-testcase.patch, HBASE-6059.patch When we replay recovered edits, we used the minSeqId of Store, It may cause deleted data appeared again. Let's see how it happens. Suppose the region with two families(cf1,cf2) 1.put one data to the region (put r1,cf1:q1,v1) 2.move the region from server A to server B. 3.delete the data put by step 1(delete r1) 4.flush this region. 5.make major compaction for this region 6.move the region from server B to server A. 7.Abort server A 8.After the region is online, we could get the deleted data(r1,cf1:q1,v1) (When we replay recovered edits, we used the minSeqId of Store, because cf2 has no store files, so its seqId is 0, so the edit log of put data will be replayed to the region) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HBASE-6059) Replaying recovered edits would make deleted data exist again
[ https://issues.apache.org/jira/browse/HBASE-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280077#comment-13280077 ] ramkrishna.s.vasudevan edited comment on HBASE-6059 at 5/21/12 10:59 AM: - @Chunhui This is a damn good one. But still i find one problem is there in this. A similar type of problem that you have reported. Pls correct me if am wrong. In the same test case in the place where you are deleting the row 'r1' if i delete the row 'r2' also {edit} In the same test case in the place where you are deleting the row 'r1' if i delete the row 'r' also {edit} {code} del = new Delete(Bytes.toBytes(r)); htable.delete(del); resultScanner = htable.getScanner(new Scan()); count = 0; while (resultScanner.next() != null) { count++; } {code} Now my seq id from the store files will be 0 only as nothing to get after major compaction. So still the same problem is occuring. I tried to simulate this with the same test case that you added. May be we need someother way to know that the edit has been deleted out by a major compaction? Because as i see this problem that without major compaction there is no issue at all. was (Author: ram_krish): @Chunhui This is a damn good one. But still i find one problem is there in this. A similar type of problem that you have reported. Pls correct me if am wrong. In the same test case in the place where you are deleting the row 'r1' if i delete the row 'r2' also {code} del = new Delete(Bytes.toBytes(r)); htable.delete(del); resultScanner = htable.getScanner(new Scan()); count = 0; while (resultScanner.next() != null) { count++; } {code} Now my seq id from the store files will be 0 only as nothing to get after major compaction. So still the same problem is occuring. I tried to simulate this with the same test case that you added. May be we need someother way to know that the edit has been deleted out by a major compaction? Because as i see this problem that without major compaction there is no issue at all. Replaying recovered edits would make deleted data exist again - Key: HBASE-6059 URL: https://issues.apache.org/jira/browse/HBASE-6059 Project: HBase Issue Type: Bug Components: regionserver Reporter: chunhui shen Assignee: chunhui shen Attachments: HBASE-6059-testcase.patch, HBASE-6059.patch When we replay recovered edits, we used the minSeqId of Store, It may cause deleted data appeared again. Let's see how it happens. Suppose the region with two families(cf1,cf2) 1.put one data to the region (put r1,cf1:q1,v1) 2.move the region from server A to server B. 3.delete the data put by step 1(delete r1) 4.flush this region. 5.make major compaction for this region 6.move the region from server B to server A. 7.Abort server A 8.After the region is online, we could get the deleted data(r1,cf1:q1,v1) (When we replay recovered edits, we used the minSeqId of Store, because cf2 has no store files, so its seqId is 0, so the edit log of put data will be replayed to the region) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6059) Replaying recovered edits would make deleted data exist again
[ https://issues.apache.org/jira/browse/HBASE-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280101#comment-13280101 ] chunhui shen commented on HBASE-6059: - @ram Yes, I have also considered that all the entries in the store file is deleted and we don't write any new store file. But, could we generate one empty store file with its meta data alone? Let me do a try first. Replaying recovered edits would make deleted data exist again - Key: HBASE-6059 URL: https://issues.apache.org/jira/browse/HBASE-6059 Project: HBase Issue Type: Bug Components: regionserver Reporter: chunhui shen Assignee: chunhui shen Attachments: HBASE-6059-testcase.patch, HBASE-6059.patch When we replay recovered edits, we used the minSeqId of Store, It may cause deleted data appeared again. Let's see how it happens. Suppose the region with two families(cf1,cf2) 1.put one data to the region (put r1,cf1:q1,v1) 2.move the region from server A to server B. 3.delete the data put by step 1(delete r1) 4.flush this region. 5.make major compaction for this region 6.move the region from server B to server A. 7.Abort server A 8.After the region is online, we could get the deleted data(r1,cf1:q1,v1) (When we replay recovered edits, we used the minSeqId of Store, because cf2 has no store files, so its seqId is 0, so the edit log of put data will be replayed to the region) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible
[ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280157#comment-13280157 ] Zhihong Yu commented on HBASE-5757: --- @Jan: Neither patch applies to trunk as of today. Can you attach patch for trunk and name it accordingly ? Thanks TableInputFormat should handle as many errors as possible - Key: HBASE-5757 URL: https://issues.apache.org/jira/browse/HBASE-5757 Project: HBase Issue Type: Bug Components: mapred, mapreduce Affects Versions: 0.90.6 Reporter: Jan Lukavsky Attachments: HBASE-5757.patch, HBASE-5757.patch Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters * I don't see any possibility to get rid of LeaseException (this is configured on server side) I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?- -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5757) TableInputFormat should handle as many errors as possible
[ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Lukavsky updated HBASE-5757: Attachment: HBASE-5757-trunk-r1341041.patch There was conflicting commit to patch for HBASE-6004. Merged this patch, the new one should apply to revision 1341041. TableInputFormat should handle as many errors as possible - Key: HBASE-5757 URL: https://issues.apache.org/jira/browse/HBASE-5757 Project: HBase Issue Type: Bug Components: mapred, mapreduce Affects Versions: 0.90.6 Reporter: Jan Lukavsky Attachments: HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters * I don't see any possibility to get rid of LeaseException (this is configured on server side) I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?- -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5757) TableInputFormat should handle as many errors as possible
[ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5757: -- Status: Patch Available (was: Open) TableInputFormat should handle as many errors as possible - Key: HBASE-5757 URL: https://issues.apache.org/jira/browse/HBASE-5757 Project: HBase Issue Type: Bug Components: mapred, mapreduce Affects Versions: 0.90.6 Reporter: Jan Lukavsky Attachments: HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters * I don't see any possibility to get rid of LeaseException (this is configured on server side) I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?- -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible
[ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280240#comment-13280240 ] Hadoop QA commented on HBASE-5757: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12528434/HBASE-5757-trunk-r1341041.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 33 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.coprocessor.TestClassLoading org.apache.hadoop.hbase.replication.TestReplication org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster org.apache.hadoop.hbase.replication.TestMultiSlaveReplication org.apache.hadoop.hbase.replication.TestMasterReplication Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1944//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1944//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1944//console This message is automatically generated. TableInputFormat should handle as many errors as possible - Key: HBASE-5757 URL: https://issues.apache.org/jira/browse/HBASE-5757 Project: HBase Issue Type: Bug Components: mapred, mapreduce Affects Versions: 0.90.6 Reporter: Jan Lukavsky Attachments: HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters * I don't see any possibility to get rid of LeaseException (this is configured on server side) I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?- -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible
[ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280249#comment-13280249 ] Zhihong Yu commented on HBASE-5757: --- I ran the following two tests and they passed with the latest patch: {code} 518 mt -Dtest=TestClassLoading 519 mt -Dtest=TestSplitTransactionOnCluster {code} The replication tests have been failing and are not related to this change. Minor comments: {code} +// try to handle exceptions all possible exceptions by restarting {code} The first 'exceptions ' should be removed. TableInputFormat should handle as many errors as possible - Key: HBASE-5757 URL: https://issues.apache.org/jira/browse/HBASE-5757 Project: HBase Issue Type: Bug Components: mapred, mapreduce Affects Versions: 0.90.6 Reporter: Jan Lukavsky Attachments: HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters * I don't see any possibility to get rid of LeaseException (this is configured on server side) I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?- -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5882) Prcoess RIT on master restart can try assigning the region if the region is found on a dead server instead of waiting for Timeout Monitor
[ https://issues.apache.org/jira/browse/HBASE-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan resolved HBASE-5882. --- Resolution: Fixed Committed the patch. Hence resolving this. Prcoess RIT on master restart can try assigning the region if the region is found on a dead server instead of waiting for Timeout Monitor - Key: HBASE-5882 URL: https://issues.apache.org/jira/browse/HBASE-5882 Project: HBase Issue Type: Improvement Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: Ashutosh Jindal Fix For: 0.96.0, 0.94.1 Attachments: HBASE-5882_v5.patch, HBASE-5882_v6.patch, hbase_5882.patch, hbase_5882_V2.patch, hbase_5882_V3.patch, hbase_5882_V4.patch Currently on master restart if it tries to do processRIT, any region if found on dead server tries to avoid the nwe assignment so that timeout monitor can take care. This case is more prominent if the node is found in RS_ZK_REGION_OPENING state. I think we can handle this by triggering a new assignment with a new plan. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible
[ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280285#comment-13280285 ] Hadoop QA commented on HBASE-5757: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12528448/5757-trunk-v2.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 33 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.coprocessor.TestClassLoading org.apache.hadoop.hbase.replication.TestReplication org.apache.hadoop.hbase.replication.TestMultiSlaveReplication org.apache.hadoop.hbase.regionserver.wal.TestHLog org.apache.hadoop.hbase.replication.TestMasterReplication Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1945//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1945//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1945//console This message is automatically generated. TableInputFormat should handle as many errors as possible - Key: HBASE-5757 URL: https://issues.apache.org/jira/browse/HBASE-5757 Project: HBase Issue Type: Bug Components: mapred, mapreduce Affects Versions: 0.90.6 Reporter: Jan Lukavsky Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters * I don't see any possibility to get rid of LeaseException (this is configured on server side) I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?- -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6059) Replaying recovered edits would make deleted data exist again
[ https://issues.apache.org/jira/browse/HBASE-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280295#comment-13280295 ] Zhihong Yu commented on HBASE-6059: --- If majorCompaction is false, we still need to check !kvs.isEmpty(), right ? Replaying recovered edits would make deleted data exist again - Key: HBASE-6059 URL: https://issues.apache.org/jira/browse/HBASE-6059 Project: HBase Issue Type: Bug Components: regionserver Reporter: chunhui shen Assignee: chunhui shen Attachments: HBASE-6059-testcase.patch, HBASE-6059.patch When we replay recovered edits, we used the minSeqId of Store, It may cause deleted data appeared again. Let's see how it happens. Suppose the region with two families(cf1,cf2) 1.put one data to the region (put r1,cf1:q1,v1) 2.move the region from server A to server B. 3.delete the data put by step 1(delete r1) 4.flush this region. 5.make major compaction for this region 6.move the region from server B to server A. 7.Abort server A 8.After the region is online, we could get the deleted data(r1,cf1:q1,v1) (When we replay recovered edits, we used the minSeqId of Store, because cf2 has no store files, so its seqId is 0, so the edit log of put data will be replayed to the region) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5778) Turn on WAL compression by default
[ https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280311#comment-13280311 ] Jean-Daniel Cryans commented on HBASE-5778: --- I don't see how in theory the seek can be a problem when tail'ing a log from the start since we read the whole file. The only case where it will need to be handled differently is when a region server needs to replicate a log that another RS started working on but died. In that case we can just read the file up to the last seek position but don't replicate anything. Turn on WAL compression by default -- Key: HBASE-5778 URL: https://issues.apache.org/jira/browse/HBASE-5778 Project: HBase Issue Type: Improvement Reporter: Jean-Daniel Cryans Assignee: Lars Hofhansl Priority: Blocker Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch I ran some tests to verify if WAL compression should be turned on by default. For a use case where it's not very useful (values two order of magnitude bigger than the keys), the insert time wasn't different and the CPU usage 15% higher (150% CPU usage VS 130% when not compressing the WAL). When values are smaller than the keys, I saw a 38% improvement for the insert run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure WAL compression accounts for all the additional CPU usage, it might just be that we're able to insert faster and we spend more time in the MemStore per second (because our MemStores are bad when they contain tens of thousands of values). Those are two extremes, but it shows that for the price of some CPU we can save a lot. My machines have 2 quads with HT, so I still had a lot of idle CPUs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible
[ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280315#comment-13280315 ] Jonathan Hsieh commented on HBASE-5757: --- Zhihong, thanks for pinging me about this. Jan, thanks for being patient with me on this. The changes look good. Patch applies to 0.94 and trunk. I believe the request was for getting this into 0.90 -- I'll look into backporting this behavior back into that version. TableInputFormat should handle as many errors as possible - Key: HBASE-5757 URL: https://issues.apache.org/jira/browse/HBASE-5757 Project: HBase Issue Type: Bug Components: mapred, mapreduce Affects Versions: 0.90.6 Reporter: Jan Lukavsky Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters * I don't see any possibility to get rid of LeaseException (this is configured on server side) I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?- -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5882) Prcoess RIT on master restart can try assigning the region if the region is found on a dead server instead of waiting for Timeout Monitor
[ https://issues.apache.org/jira/browse/HBASE-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280316#comment-13280316 ] Hudson commented on HBASE-5882: --- Integrated in HBase-TRUNK #2910 (See [https://builds.apache.org/job/HBase-TRUNK/2910/]) HBASE-5882 Prcoess RIT on master restart can try assigning the region if the region is found on a dead server instead of waiting for Timeout Monitor (Ashutosh) (Revision 1341110) Result = FAILURE ramkrishna : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java Prcoess RIT on master restart can try assigning the region if the region is found on a dead server instead of waiting for Timeout Monitor - Key: HBASE-5882 URL: https://issues.apache.org/jira/browse/HBASE-5882 Project: HBase Issue Type: Improvement Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: Ashutosh Jindal Fix For: 0.96.0, 0.94.1 Attachments: HBASE-5882_v5.patch, HBASE-5882_v6.patch, hbase_5882.patch, hbase_5882_V2.patch, hbase_5882_V3.patch, hbase_5882_V4.patch Currently on master restart if it tries to do processRIT, any region if found on dead server tries to avoid the nwe assignment so that timeout monitor can take care. This case is more prominent if the node is found in RS_ZK_REGION_OPENING state. I think we can handle this by triggering a new assignment with a new plan. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5757) TableInputFormat should handle as many errors as possible
[ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh reassigned HBASE-5757: - Assignee: Jan Lukavsky TableInputFormat should handle as many errors as possible - Key: HBASE-5757 URL: https://issues.apache.org/jira/browse/HBASE-5757 Project: HBase Issue Type: Bug Components: mapred, mapreduce Affects Versions: 0.90.6 Reporter: Jan Lukavsky Assignee: Jan Lukavsky Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters * I don't see any possibility to get rid of LeaseException (this is configured on server side) I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?- -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible
[ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280318#comment-13280318 ] Zhihong Yu commented on HBASE-5757: --- TestHLog failure was caused by: {code} java.net.BindException: Problem binding to localhost/127.0.0.1:41331 : Address already in use at org.apache.hadoop.ipc.Server.bind(Server.java:227) at org.apache.hadoop.ipc.Server$Listener.init(Server.java:301) {code} I ran it locally and it passed. TableInputFormat should handle as many errors as possible - Key: HBASE-5757 URL: https://issues.apache.org/jira/browse/HBASE-5757 Project: HBase Issue Type: Bug Components: mapred, mapreduce Affects Versions: 0.90.6 Reporter: Jan Lukavsky Assignee: Jan Lukavsky Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters * I don't see any possibility to get rid of LeaseException (this is configured on server side) I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?- -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible
[ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280328#comment-13280328 ] Hudson commented on HBASE-5757: --- Integrated in HBase-TRUNK #2911 (See [https://builds.apache.org/job/HBase-TRUNK/2911/]) HBASE-5757 TableInputFormat should handle as many errors as possible (Jan Lukavsky) (Revision 1341132) Result = FAILURE jmhsieh : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapred/TableRecordReaderImpl.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapred/TestTableInputFormat.java TableInputFormat should handle as many errors as possible - Key: HBASE-5757 URL: https://issues.apache.org/jira/browse/HBASE-5757 Project: HBase Issue Type: Bug Components: mapred, mapreduce Affects Versions: 0.90.6 Reporter: Jan Lukavsky Assignee: Jan Lukavsky Fix For: 0.96.0, 0.94.1 Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters * I don't see any possibility to get rid of LeaseException (this is configured on server side) I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?- -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover
Enis Soztutar created HBASE-6060: Summary: Regions's in OPENING state from failed regionservers takes a long time to recover Key: HBASE-6060 URL: https://issues.apache.org/jira/browse/HBASE-6060 Project: HBase Issue Type: Bug Components: master, regionserver Reporter: Enis Soztutar Assignee: Enis Soztutar we have seen a pattern in tests, that the regions are stuck in OPENING state for a very long time when the region server who is opening the region fails. My understanding of the process: - master calls rs to open the region. If rs is offline, a new plan is generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), HMaster.assign() - RegionServer, starts opening a region, changes the state in znode. But that znode is not ephemeral. (see ZkAssign) - Rs transitions zk node from OFFLINE to OPENING. See OpenRegionHandler.process() - rs then opens the region, and changes znode from OPENING to OPENED - when rs is killed between OPENING and OPENED states, then zk shows OPENING state, and the master just waits for rs to change the region state, but since rs is down, that wont happen. - There is a AssignmentManager.TimeoutMonitor, which does exactly guard against these kind of conditions. It periodically checks (every 10 sec by default) the regions in transition to see whether they timedout (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, which explains what you and I are seeing. - ServerShutdownHandler in Master does not reassign regions in OPENING state, although it handles other states. Lowering that threshold from the configuration is one option, but still I think we can do better. Will investigate more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5970) Improve the AssignmentManager#updateTimer and speed up handling opened event
[ https://issues.apache.org/jira/browse/HBASE-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280367#comment-13280367 ] nkeywal commented on HBASE-5970: Hi, Could you share the logs of the tests? I would be interested to have a look at them. The javadoc for updateTimers says it's not used for bulk assignment, is there a mix of regions 'bulk assigned' and other regions? I see as well in the description that the time was once with 'retainAssignment=true' and once without. Are the results comparable in both cases? Thank you! Improve the AssignmentManager#updateTimer and speed up handling opened event Key: HBASE-5970 URL: https://issues.apache.org/jira/browse/HBASE-5970 Project: HBase Issue Type: Improvement Components: master Reporter: chunhui shen Assignee: chunhui shen Attachments: 5970v3.patch, HBASE-5970.patch, HBASE-5970v2.patch, HBASE-5970v3.patch We found handing opened event very slow in the environment with lots of regions. The problem is the slow AssignmentManager#updateTimer. We do the test for bulk assigning 10w (i.e. 100k) regions, the whole process of bulk assigning took 1 hours. 2012-05-06 20:31:49,201 INFO org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning 10 region(s) round-robin across 5 server(s) 2012-05-06 21:26:32,103 INFO org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning done I think we could do the improvement for the AssignmentManager#updateTimer: Make a thread do this work. After the improvement, it took only 4.5mins 2012-05-07 11:03:36,581 INFO org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning 10 region(s) across 5 server(s), retainAssignment=true 2012-05-07 11:07:57,073 INFO org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning done -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible
[ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280385#comment-13280385 ] Hudson commented on HBASE-5757: --- Integrated in HBase-0.94 #205 (See [https://builds.apache.org/job/HBase-0.94/205/]) HBASE-5757 TableInputFormat should handle as many errors as possible (Jan Lukavsky) (Revision 1341133) Result = FAILURE jmhsieh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/mapred/TableRecordReaderImpl.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/mapred/TestTableInputFormat.java TableInputFormat should handle as many errors as possible - Key: HBASE-5757 URL: https://issues.apache.org/jira/browse/HBASE-5757 Project: HBase Issue Type: Bug Components: mapred, mapreduce Affects Versions: 0.90.6 Reporter: Jan Lukavsky Assignee: Jan Lukavsky Fix For: 0.96.0, 0.94.1 Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters * I don't see any possibility to get rid of LeaseException (this is configured on server side) I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?- -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1749) If RS looses lease, we used to restart by default; reinstitute
[ https://issues.apache.org/jira/browse/HBASE-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280387#comment-13280387 ] nkeywal commented on HBASE-1749: Yes, because of HBASE-5844 HBASE-5939, we now: - delete immediately the znode when we exit - restart after a non planned stop. This is safer than retrying to reinstitute a region server in the same jvm, as it removes any memory or static variable effect. In both case we trigger a reassignment of the regions however. If RS looses lease, we used to restart by default; reinstitute -- Key: HBASE-1749 URL: https://issues.apache.org/jira/browse/HBASE-1749 Project: HBase Issue Type: Bug Reporter: stack Assignee: nkeywal -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HBASE-1749) If RS looses lease, we used to restart by default; reinstitute
[ https://issues.apache.org/jira/browse/HBASE-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-1749 started by nkeywal. If RS looses lease, we used to restart by default; reinstitute -- Key: HBASE-1749 URL: https://issues.apache.org/jira/browse/HBASE-1749 Project: HBase Issue Type: Bug Reporter: stack Assignee: nkeywal -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-1749) If RS looses lease, we used to restart by default; reinstitute
[ https://issues.apache.org/jira/browse/HBASE-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal resolved HBASE-1749. Resolution: Duplicate If RS looses lease, we used to restart by default; reinstitute -- Key: HBASE-1749 URL: https://issues.apache.org/jira/browse/HBASE-1749 Project: HBase Issue Type: Bug Reporter: stack Assignee: nkeywal -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5757) TableInputFormat should handle as many errors as possible
[ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-5757: -- Attachment: hbase-5757-92.patch hbase-5757-92.patch is for 0.92 and 0.90 versions. Underlaying metrics have changed so it does not update metrics like in 0.94 or trunk/0.96. It does however include the updated tests that demonstrated updated semantics. TableInputFormat should handle as many errors as possible - Key: HBASE-5757 URL: https://issues.apache.org/jira/browse/HBASE-5757 Project: HBase Issue Type: Bug Components: mapred, mapreduce Affects Versions: 0.90.6 Reporter: Jan Lukavsky Assignee: Jan Lukavsky Fix For: 0.96.0, 0.94.1 Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch, hbase-5757-92.patch Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters * I don't see any possibility to get rid of LeaseException (this is configured on server side) I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?- -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible
[ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280398#comment-13280398 ] Jonathan Hsieh commented on HBASE-5757: --- Zhihong, Jan, if the 0.92/0.90 versions looks good to you I will commit. TableInputFormat should handle as many errors as possible - Key: HBASE-5757 URL: https://issues.apache.org/jira/browse/HBASE-5757 Project: HBase Issue Type: Bug Components: mapred, mapreduce Affects Versions: 0.90.6 Reporter: Jan Lukavsky Assignee: Jan Lukavsky Fix For: 0.96.0, 0.94.1 Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch, hbase-5757-92.patch Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters * I don't see any possibility to get rid of LeaseException (this is configured on server side) I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?- -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible
[ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280412#comment-13280412 ] Hadoop QA commented on HBASE-5757: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12528472/hbase-5757-92.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1946//console This message is automatically generated. TableInputFormat should handle as many errors as possible - Key: HBASE-5757 URL: https://issues.apache.org/jira/browse/HBASE-5757 Project: HBase Issue Type: Bug Components: mapred, mapreduce Affects Versions: 0.90.6 Reporter: Jan Lukavsky Assignee: Jan Lukavsky Fix For: 0.96.0, 0.94.1 Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch, hbase-5757-92.patch Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters * I don't see any possibility to get rid of LeaseException (this is configured on server side) I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?- -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6061) Fix ACL Admin Table inconsistent permission check
Matteo Bertozzi created HBASE-6061: -- Summary: Fix ACL Admin Table inconsistent permission check Key: HBASE-6061 URL: https://issues.apache.org/jira/browse/HBASE-6061 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.94.0, 0.92.1, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: 0.92.2, 0.96.0, 0.94.1 the requirePermission() check for admin operation on a table is currently inconsistent. Table Owner with CREATE rights (that means, the owner has created that table) can enable/disable and delete the table but needs ADMIN rights to add/remove/modify a column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6061) Fix ACL Admin Table inconsistent permission check
[ https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-6061: --- Attachment: HBASE-6061-v0.patch Fix ACL Admin Table inconsistent permission check --- Key: HBASE-6061 URL: https://issues.apache.org/jira/browse/HBASE-6061 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Labels: acl, security Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6061-v0.patch the requirePermission() check for admin operation on a table is currently inconsistent. Table Owner with CREATE rights (that means, the owner has created that table) can enable/disable and delete the table but needs ADMIN rights to add/remove/modify a column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible
[ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280433#comment-13280433 ] Zhihong Yu commented on HBASE-5757: --- TestTableInputFormat passed in 0.92 with 0.92 patch. +1 from me. TableInputFormat should handle as many errors as possible - Key: HBASE-5757 URL: https://issues.apache.org/jira/browse/HBASE-5757 Project: HBase Issue Type: Bug Components: mapred, mapreduce Affects Versions: 0.90.6 Reporter: Jan Lukavsky Assignee: Jan Lukavsky Fix For: 0.96.0, 0.94.1 Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch, hbase-5757-92.patch Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters * I don't see any possibility to get rid of LeaseException (this is configured on server side) I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?- -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6036) Add Cluster-level PB-based calls to HMasterInterface (minus file-format related calls)
[ https://issues.apache.org/jira/browse/HBASE-6036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280436#comment-13280436 ] Gregory Chanan commented on HBASE-6036: --- These replication tests fail even without this patch applied, so I think this is good to go. Add Cluster-level PB-based calls to HMasterInterface (minus file-format related calls) -- Key: HBASE-6036 URL: https://issues.apache.org/jira/browse/HBASE-6036 Project: HBase Issue Type: Task Components: ipc, master, migration Reporter: Gregory Chanan Assignee: Gregory Chanan Fix For: 0.96.0 Attachments: HBASE-6036-v2.patch, HBASE-6036.patch This should be a subtask of HBASE-5445, but since that is a subtask, I can't also make this a subtask (apparently). Convert the cluster-level calls that do not touch the file-format related calls (see HBASE-5453). These are: IsMasterRunning Shutdown StopMaster Balance LoadBalancerIs (was synchronousBalanceSwitch/balanceSwitch) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6043) Add Increment Coalescing in thrift.
[ https://issues.apache.org/jira/browse/HBASE-6043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280440#comment-13280440 ] Elliott Clark commented on HBASE-6043: -- Not sure why Phabricator isn't posting diffs but the review is up at https://reviews.facebook.net/D3315. Add Increment Coalescing in thrift. --- Key: HBASE-6043 URL: https://issues.apache.org/jira/browse/HBASE-6043 Project: HBase Issue Type: Improvement Reporter: Elliott Clark Assignee: Elliott Clark Since the thrift server uses the client api reducing the number of rpc's greatly speeds up increments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6043) Add Increment Coalescing in thrift.
[ https://issues.apache.org/jira/browse/HBASE-6043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-6043: - Attachment: HBASE-6043-0.patch Add Increment Coalescing in thrift. --- Key: HBASE-6043 URL: https://issues.apache.org/jira/browse/HBASE-6043 Project: HBase Issue Type: Improvement Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-6043-0.patch Since the thrift server uses the client api reducing the number of rpc's greatly speeds up increments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6043) Add Increment Coalescing in thrift.
[ https://issues.apache.org/jira/browse/HBASE-6043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-6043: - Status: Patch Available (was: Open) Add Increment Coalescing in thrift. --- Key: HBASE-6043 URL: https://issues.apache.org/jira/browse/HBASE-6043 Project: HBase Issue Type: Improvement Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-6043-0.patch Since the thrift server uses the client api reducing the number of rpc's greatly speeds up increments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6061) Fix ACL Admin Table inconsistent permission check
[ https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280442#comment-13280442 ] Zhihong Yu commented on HBASE-6061: --- Minor comment: {code} + * If current user is the table owner, and has CREATE permission is a table admin, {code} ', and has CREATE permission is a table admin' - ' and has CREATE permission, then he/she has table admin permission.' (wrap if line is too long) Fix ACL Admin Table inconsistent permission check --- Key: HBASE-6061 URL: https://issues.apache.org/jira/browse/HBASE-6061 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Labels: acl, security Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6061-v0.patch the requirePermission() check for admin operation on a table is currently inconsistent. Table Owner with CREATE rights (that means, the owner has created that table) can enable/disable and delete the table but needs ADMIN rights to add/remove/modify a column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6061) Fix ACL Admin Table inconsistent permission check
[ https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280448#comment-13280448 ] Andrew Purtell commented on HBASE-6061: --- +1 yes, this is better, since the direction here is to let the creator take any action on the table, pulling up the logic to a small helper method is cleaner, fixes the issue, and will avoid error going forward. Fix ACL Admin Table inconsistent permission check --- Key: HBASE-6061 URL: https://issues.apache.org/jira/browse/HBASE-6061 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Labels: acl, security Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6061-v0.patch the requirePermission() check for admin operation on a table is currently inconsistent. Table Owner with CREATE rights (that means, the owner has created that table) can enable/disable and delete the table but needs ADMIN rights to add/remove/modify a column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover
[ https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280453#comment-13280453 ] Andrew Purtell commented on HBASE-6060: --- The TimeoutMonitor timeout was increased to 30 minutes in HBASE-4126. Regions's in OPENING state from failed regionservers takes a long time to recover - Key: HBASE-6060 URL: https://issues.apache.org/jira/browse/HBASE-6060 Project: HBase Issue Type: Bug Components: master, regionserver Reporter: Enis Soztutar Assignee: Enis Soztutar we have seen a pattern in tests, that the regions are stuck in OPENING state for a very long time when the region server who is opening the region fails. My understanding of the process: - master calls rs to open the region. If rs is offline, a new plan is generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), HMaster.assign() - RegionServer, starts opening a region, changes the state in znode. But that znode is not ephemeral. (see ZkAssign) - Rs transitions zk node from OFFLINE to OPENING. See OpenRegionHandler.process() - rs then opens the region, and changes znode from OPENING to OPENED - when rs is killed between OPENING and OPENED states, then zk shows OPENING state, and the master just waits for rs to change the region state, but since rs is down, that wont happen. - There is a AssignmentManager.TimeoutMonitor, which does exactly guard against these kind of conditions. It periodically checks (every 10 sec by default) the regions in transition to see whether they timedout (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, which explains what you and I are seeing. - ServerShutdownHandler in Master does not reassign regions in OPENING state, although it handles other states. Lowering that threshold from the configuration is one option, but still I think we can do better. Will investigate more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6061) Fix ACL Admin Table inconsistent permission check
[ https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280459#comment-13280459 ] Matteo Bertozzi commented on HBASE-6061: Not related but maybe we can squeeze into this one... preCheckAndPut() and preCheckAndDelete() checks for READ when they also want to WRITE... for me checking for WRITE permission is the right thing... what do you say about that? keep READ, replace with WRITE.. open new jira? Fix ACL Admin Table inconsistent permission check --- Key: HBASE-6061 URL: https://issues.apache.org/jira/browse/HBASE-6061 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Labels: acl, security Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6061-v0.patch the requirePermission() check for admin operation on a table is currently inconsistent. Table Owner with CREATE rights (that means, the owner has created that table) can enable/disable and delete the table but needs ADMIN rights to add/remove/modify a column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6061) Fix ACL Admin Table inconsistent permission check
[ https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-6061: --- Attachment: HBASE-6061-v1.patch Fix ACL Admin Table inconsistent permission check --- Key: HBASE-6061 URL: https://issues.apache.org/jira/browse/HBASE-6061 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Labels: acl, security Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6061-v0.patch, HBASE-6061-v1.patch the requirePermission() check for admin operation on a table is currently inconsistent. Table Owner with CREATE rights (that means, the owner has created that table) can enable/disable and delete the table but needs ADMIN rights to add/remove/modify a column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6061) Fix ACL Admin Table inconsistent permission check
[ https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-6061: --- Attachment: (was: HBASE-6061-v1.patch) Fix ACL Admin Table inconsistent permission check --- Key: HBASE-6061 URL: https://issues.apache.org/jira/browse/HBASE-6061 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Labels: acl, security Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6061-v0.patch, HBASE-6061-v1.patch the requirePermission() check for admin operation on a table is currently inconsistent. Table Owner with CREATE rights (that means, the owner has created that table) can enable/disable and delete the table but needs ADMIN rights to add/remove/modify a column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6061) Fix ACL Admin Table inconsistent permission check
[ https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-6061: --- Attachment: HBASE-6061-v1.patch Fix ACL Admin Table inconsistent permission check --- Key: HBASE-6061 URL: https://issues.apache.org/jira/browse/HBASE-6061 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Labels: acl, security Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6061-v0.patch, HBASE-6061-v1.patch the requirePermission() check for admin operation on a table is currently inconsistent. Table Owner with CREATE rights (that means, the owner has created that table) can enable/disable and delete the table but needs ADMIN rights to add/remove/modify a column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6061) Fix ACL Admin Table inconsistent permission check
[ https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280466#comment-13280466 ] Andrew Purtell commented on HBASE-6061: --- bq. Not related but maybe we can squeeze into this one... preCheckAndPut() and preCheckAndDelete() checks for READ when they also want to WRITE Yes, new jira. Fix ACL Admin Table inconsistent permission check --- Key: HBASE-6061 URL: https://issues.apache.org/jira/browse/HBASE-6061 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Labels: acl, security Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6061-v0.patch, HBASE-6061-v1.patch the requirePermission() check for admin operation on a table is currently inconsistent. Table Owner with CREATE rights (that means, the owner has created that table) can enable/disable and delete the table but needs ADMIN rights to add/remove/modify a column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover
[ https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280469#comment-13280469 ] Enis Soztutar commented on HBASE-6060: -- Thanks Andrew for the pointer. Agreed that lowering the timeout can have deeper impacts. We should fix the issue properly instead. Regions's in OPENING state from failed regionservers takes a long time to recover - Key: HBASE-6060 URL: https://issues.apache.org/jira/browse/HBASE-6060 Project: HBase Issue Type: Bug Components: master, regionserver Reporter: Enis Soztutar Assignee: Enis Soztutar we have seen a pattern in tests, that the regions are stuck in OPENING state for a very long time when the region server who is opening the region fails. My understanding of the process: - master calls rs to open the region. If rs is offline, a new plan is generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), HMaster.assign() - RegionServer, starts opening a region, changes the state in znode. But that znode is not ephemeral. (see ZkAssign) - Rs transitions zk node from OFFLINE to OPENING. See OpenRegionHandler.process() - rs then opens the region, and changes znode from OPENING to OPENED - when rs is killed between OPENING and OPENED states, then zk shows OPENING state, and the master just waits for rs to change the region state, but since rs is down, that wont happen. - There is a AssignmentManager.TimeoutMonitor, which does exactly guard against these kind of conditions. It periodically checks (every 10 sec by default) the regions in transition to see whether they timedout (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, which explains what you and I are seeing. - ServerShutdownHandler in Master does not reassign regions in OPENING state, although it handles other states. Lowering that threshold from the configuration is one option, but still I think we can do better. Will investigate more. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6062) preCheckAndPut/Delete() checks for READ when also a WRITE is performed
Matteo Bertozzi created HBASE-6062: -- Summary: preCheckAndPut/Delete() checks for READ when also a WRITE is performed Key: HBASE-6062 URL: https://issues.apache.org/jira/browse/HBASE-6062 Project: HBase Issue Type: Sub-task Affects Versions: 0.94.0, 0.92.1, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: 0.92.2, 0.96.0, 0.94.1 preCheckAndPut() and preCheckAndDelete() checks for READ when they also want to WRITE... for me checking for WRITE permission is the right thing... what do you say about that? keep READ, replace with WRITE? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6062) preCheckAndPut/Delete() checks for READ when also a WRITE is performed
[ https://issues.apache.org/jira/browse/HBASE-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-6062: --- Attachment: HBASE-6062-v0.patch preCheckAndPut/Delete() checks for READ when also a WRITE is performed -- Key: HBASE-6062 URL: https://issues.apache.org/jira/browse/HBASE-6062 Project: HBase Issue Type: Sub-task Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Labels: acl, security Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6062-v0.patch preCheckAndPut() and preCheckAndDelete() checks for READ when they also want to WRITE... for me checking for WRITE permission is the right thing... what do you say about that? keep READ, replace with WRITE? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6062) preCheckAndPut/Delete() checks for READ when also a WRITE is performed
[ https://issues.apache.org/jira/browse/HBASE-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-6062: --- Status: Patch Available (was: Open) preCheckAndPut/Delete() checks for READ when also a WRITE is performed -- Key: HBASE-6062 URL: https://issues.apache.org/jira/browse/HBASE-6062 Project: HBase Issue Type: Sub-task Affects Versions: 0.94.0, 0.92.1, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Labels: acl, security Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6062-v0.patch preCheckAndPut() and preCheckAndDelete() checks for READ when they also want to WRITE... for me checking for WRITE permission is the right thing... what do you say about that? keep READ, replace with WRITE? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6044) copytable: remove rs.* parameters
[ https://issues.apache.org/jira/browse/HBASE-6044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-6044: -- Attachment: hbase-6044-92.patch minor tweak for 0.92 copytable: remove rs.* parameters - Key: HBASE-6044 URL: https://issues.apache.org/jira/browse/HBASE-6044 Project: HBase Issue Type: New Feature Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: hbase-6044-92.patch, hbase-6044-v2.patch, hbase-6044-v3.patch, hbase-6044-v4.patch, hbase-6044.patch In discussion of HBASE-6013 it was suggested that we remove these arguments from 0.92+ (but keep in 0.90) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6044) copytable: remove rs.* parameters
[ https://issues.apache.org/jira/browse/HBASE-6044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-6044: -- Resolution: Fixed Fix Version/s: 0.94.1 0.96.0 0.92.2 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to 0.92/0.94/0.96-trunk. Thanks for review stack! copytable: remove rs.* parameters - Key: HBASE-6044 URL: https://issues.apache.org/jira/browse/HBASE-6044 Project: HBase Issue Type: New Feature Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: hbase-6044-92.patch, hbase-6044-v2.patch, hbase-6044-v3.patch, hbase-6044-v4.patch, hbase-6044.patch In discussion of HBASE-6013 it was suggested that we remove these arguments from 0.92+ (but keep in 0.90) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5757) TableInputFormat should handle as many errors as possible
[ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-5757: -- Resolution: Fixed Fix Version/s: 0.92.2 0.90.7 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Commited the 0.92 version to 0.92/0.90 branches. Thanks for review Ted, thanks for patches Jan! TableInputFormat should handle as many errors as possible - Key: HBASE-5757 URL: https://issues.apache.org/jira/browse/HBASE-5757 Project: HBase Issue Type: Bug Components: mapred, mapreduce Affects Versions: 0.90.6 Reporter: Jan Lukavsky Assignee: Jan Lukavsky Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1 Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch, hbase-5757-92.patch Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters * I don't see any possibility to get rid of LeaseException (this is configured on server side) I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?- -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6061) Fix ACL Admin Table inconsistent permission check
[ https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280508#comment-13280508 ] Hadoop QA commented on HBASE-6061: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12528475/HBASE-6061-v0.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 33 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.replication.TestReplication org.apache.hadoop.hbase.replication.TestMultiSlaveReplication org.apache.hadoop.hbase.replication.TestMasterReplication Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1947//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1947//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1947//console This message is automatically generated. Fix ACL Admin Table inconsistent permission check --- Key: HBASE-6061 URL: https://issues.apache.org/jira/browse/HBASE-6061 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Labels: acl, security Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6061-v0.patch, HBASE-6061-v1.patch the requirePermission() check for admin operation on a table is currently inconsistent. Table Owner with CREATE rights (that means, the owner has created that table) can enable/disable and delete the table but needs ADMIN rights to add/remove/modify a column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6041) NullPointerException prevents the master from starting up
[ https://issues.apache.org/jira/browse/HBASE-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280510#comment-13280510 ] Zhihong Yu commented on HBASE-6041: --- Patch looks good. Do all tests pass ? NullPointerException prevents the master from starting up - Key: HBASE-6041 URL: https://issues.apache.org/jira/browse/HBASE-6041 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.90.7 Attachments: hbase-6041.patch This is 0.90 only. 2012-05-04 14:27:57,913 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.lang.NullPointerException at org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:731) at org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:215) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:419) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:293) 2012-05-04 14:27:57,914 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2012-05-04 14:27:57,915 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 1433 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6061) Fix ACL Admin Table inconsistent permission check
[ https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280516#comment-13280516 ] Zhihong Yu commented on HBASE-6061: --- @Matteo: Do you mind providing patch for 0.92 / 0.94 ? The directory structure has changed. Fix ACL Admin Table inconsistent permission check --- Key: HBASE-6061 URL: https://issues.apache.org/jira/browse/HBASE-6061 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Labels: acl, security Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6061-v0.patch, HBASE-6061-v1.patch the requirePermission() check for admin operation on a table is currently inconsistent. Table Owner with CREATE rights (that means, the owner has created that table) can enable/disable and delete the table but needs ADMIN rights to add/remove/modify a column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6033) Adding some fuction to check if a table/region is in compaction
[ https://issues.apache.org/jira/browse/HBASE-6033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280518#comment-13280518 ] Jimmy Xiang commented on HBASE-6033: Here is the review request: https://reviews.apache.org/r/5167/ Adding some fuction to check if a table/region is in compaction --- Key: HBASE-6033 URL: https://issues.apache.org/jira/browse/HBASE-6033 Project: HBase Issue Type: New Feature Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: table_ui.png This feature will be helpful to find out if a major compaction is going on. We can show if it is in any minor compaction too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6057) Change some tests categories to optimize build time
[ https://issues.apache.org/jira/browse/HBASE-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans updated HBASE-6057: -- Resolution: Fixed Fix Version/s: 0.96.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Ran the small tests with the patch, works ok. Committed. Change some tests categories to optimize build time --- Key: HBASE-6057 URL: https://issues.apache.org/jira/browse/HBASE-6057 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 6057.v1.patch Some tests categorized as small takes more than 15s: it's better if they are executed in // with the medium tests. Some medium tests last less than 2s: it's better to have then executed with the small tests: we save a fork. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6061) Fix ACL Admin Table inconsistent permission check
[ https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-6061: --- Attachment: HBASE-6061-0.92.patch Attached the 0.92 patch, also good for 0.94 Fix ACL Admin Table inconsistent permission check --- Key: HBASE-6061 URL: https://issues.apache.org/jira/browse/HBASE-6061 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Labels: acl, security Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6061-0.92.patch, HBASE-6061-v0.patch, HBASE-6061-v1.patch the requirePermission() check for admin operation on a table is currently inconsistent. Table Owner with CREATE rights (that means, the owner has created that table) can enable/disable and delete the table but needs ADMIN rights to add/remove/modify a column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6062) preCheckAndPut/Delete() checks for READ when also a WRITE is performed
[ https://issues.apache.org/jira/browse/HBASE-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280534#comment-13280534 ] Hadoop QA commented on HBASE-6062: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12528497/HBASE-6062-v0.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 33 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.coprocessor.TestMasterObserver org.apache.hadoop.hbase.replication.TestReplication org.apache.hadoop.hbase.replication.TestMultiSlaveReplication org.apache.hadoop.hbase.replication.TestMasterReplication Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1949//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1949//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1949//console This message is automatically generated. preCheckAndPut/Delete() checks for READ when also a WRITE is performed -- Key: HBASE-6062 URL: https://issues.apache.org/jira/browse/HBASE-6062 Project: HBase Issue Type: Sub-task Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Labels: acl, security Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6062-v0.patch preCheckAndPut() and preCheckAndDelete() checks for READ when they also want to WRITE... for me checking for WRITE permission is the right thing... what do you say about that? keep READ, replace with WRITE? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6044) copytable: remove rs.* parameters
[ https://issues.apache.org/jira/browse/HBASE-6044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280535#comment-13280535 ] Hudson commented on HBASE-6044: --- Integrated in HBase-TRUNK #2912 (See [https://builds.apache.org/job/HBase-TRUNK/2912/]) HBASE-6044 copytable: remove rs.* parameters (Revision 1341200) Result = FAILURE jmhsieh : Files : * /hbase/trunk/src/docbkx/ops_mgt.xml * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/CopyTable.java copytable: remove rs.* parameters - Key: HBASE-6044 URL: https://issues.apache.org/jira/browse/HBASE-6044 Project: HBase Issue Type: New Feature Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: hbase-6044-92.patch, hbase-6044-v2.patch, hbase-6044-v3.patch, hbase-6044-v4.patch, hbase-6044.patch In discussion of HBASE-6013 it was suggested that we remove these arguments from 0.92+ (but keep in 0.90) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6061) Fix ACL Admin Table inconsistent permission check
[ https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280540#comment-13280540 ] Hadoop QA commented on HBASE-6061: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12528491/HBASE-6061-v1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 33 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.replication.TestReplication org.apache.hadoop.hbase.replication.TestMultiSlaveReplication org.apache.hadoop.hbase.replication.TestMasterReplication org.apache.hadoop.hbase.master.TestSplitLogManager Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1950//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1950//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1950//console This message is automatically generated. Fix ACL Admin Table inconsistent permission check --- Key: HBASE-6061 URL: https://issues.apache.org/jira/browse/HBASE-6061 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Labels: acl, security Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6061-0.92.patch, HBASE-6061-v0.patch, HBASE-6061-v1.patch the requirePermission() check for admin operation on a table is currently inconsistent. Table Owner with CREATE rights (that means, the owner has created that table) can enable/disable and delete the table but needs ADMIN rights to add/remove/modify a column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6061) Fix ACL Admin Table inconsistent permission check
[ https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280583#comment-13280583 ] Hadoop QA commented on HBASE-6061: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12528508/HBASE-6061-0.92.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 33 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.replication.TestReplication org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster org.apache.hadoop.hbase.replication.TestMultiSlaveReplication org.apache.hadoop.hbase.replication.TestMasterReplication Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1951//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1951//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1951//console This message is automatically generated. Fix ACL Admin Table inconsistent permission check --- Key: HBASE-6061 URL: https://issues.apache.org/jira/browse/HBASE-6061 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Labels: acl, security Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6061-0.92.patch, HBASE-6061-v0.patch, HBASE-6061-v1.patch the requirePermission() check for admin operation on a table is currently inconsistent. Table Owner with CREATE rights (that means, the owner has created that table) can enable/disable and delete the table but needs ADMIN rights to add/remove/modify a column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6062) preCheckAndPut/Delete() checks for READ when also a WRITE is performed
[ https://issues.apache.org/jira/browse/HBASE-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280594#comment-13280594 ] Andrew Purtell commented on HBASE-6062: --- Patch looks good but please make sure TestAccessController includes tests for the change. preCheckAndPut/Delete() checks for READ when also a WRITE is performed -- Key: HBASE-6062 URL: https://issues.apache.org/jira/browse/HBASE-6062 Project: HBase Issue Type: Sub-task Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Labels: acl, security Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6062-v0.patch preCheckAndPut() and preCheckAndDelete() checks for READ when they also want to WRITE... for me checking for WRITE permission is the right thing... what do you say about that? keep READ, replace with WRITE? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6041) NullPointerException prevents the master from starting up
[ https://issues.apache.org/jira/browse/HBASE-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280595#comment-13280595 ] Jimmy Xiang commented on HBASE-6041: Yes, all tests pass. Thanks. NullPointerException prevents the master from starting up - Key: HBASE-6041 URL: https://issues.apache.org/jira/browse/HBASE-6041 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.90.7 Attachments: hbase-6041.patch This is 0.90 only. 2012-05-04 14:27:57,913 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.lang.NullPointerException at org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:731) at org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:215) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:419) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:293) 2012-05-04 14:27:57,914 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2012-05-04 14:27:57,915 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 1433 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6041) NullPointerException prevents the master from starting up
[ https://issues.apache.org/jira/browse/HBASE-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280600#comment-13280600 ] Zhihong Yu commented on HBASE-6041: --- Integrated to 0.90 branch. Thanks for the patch, Jimmy. NullPointerException prevents the master from starting up - Key: HBASE-6041 URL: https://issues.apache.org/jira/browse/HBASE-6041 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.90.7 Attachments: hbase-6041.patch This is 0.90 only. 2012-05-04 14:27:57,913 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.lang.NullPointerException at org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:731) at org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:215) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:419) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:293) 2012-05-04 14:27:57,914 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2012-05-04 14:27:57,915 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 1433 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6043) Add Increment Coalescing in thrift.
[ https://issues.apache.org/jira/browse/HBASE-6043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-6043: - Attachment: HBASE-6043-1.patch Add Increment Coalescing in thrift. --- Key: HBASE-6043 URL: https://issues.apache.org/jira/browse/HBASE-6043 Project: HBase Issue Type: Improvement Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-6043-0.patch, HBASE-6043-1.patch Since the thrift server uses the client api reducing the number of rpc's greatly speeds up increments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5979) Non-pread DFSInputStreams should be associated with scanners, not HFile.Readers
[ https://issues.apache.org/jira/browse/HBASE-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280634#comment-13280634 ] Kannan Muthukkaruppan commented on HBASE-5979: -- Todd: If we always use positional reads, we don't the benefit of HDFS sending the rest of the HDFS block, correct? So I didn't quite catch your recent suggestion. Did you mean, issue positional reads, but explicitly read a much larger chunk (in the Scan case) than just the current block? Non-pread DFSInputStreams should be associated with scanners, not HFile.Readers --- Key: HBASE-5979 URL: https://issues.apache.org/jira/browse/HBASE-5979 Project: HBase Issue Type: Improvement Components: performance, regionserver Reporter: Todd Lipcon Currently, every HFile.Reader has a single DFSInputStream, which it uses to service all gets and scans. For gets, we use the positional read API (aka pread) and for scans we use a synchronized block to seek, then read. The advantage of pread is that it doesn't hold any locks, so multiple gets can proceed at the same time. The advantage of seek+read for scans is that the datanode starts to send the entire rest of the HDFS block, rather than just the single hfile block necessary. So, in a single thread, pread is faster for gets, and seek+read is faster for scans since you get a strong pipelining effect. However, in a multi-threaded case where there are multiple scans (including scans which are actually part of compactions), the seek+read strategy falls apart, since only one scanner may be reading at a time. Additionally, a large amount of wasted IO is generated on the datanode side, and we get none of the earlier-mentioned advantages. In one test, I switched scans to always use pread, and saw a 5x improvement in throughput of the YCSB scan-only workload, since it previously was completely blocked by contention on the DFSIS lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5979) Non-pread DFSInputStreams should be associated with scanners, not HFile.Readers
[ https://issues.apache.org/jira/browse/HBASE-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280640#comment-13280640 ] Todd Lipcon commented on HBASE-5979: Hey Kannan, Sorry, let me elaborate on that suggestion: The idea is to make a new FSReader implementation, which only has one API. That API would look like the current positional read call (i.e take a position and length). Internally, it would have a pool of cached DFSInputStreams, and remember the position for each of them. Each of the input streams would be referencing the same file. When a read request comes in, it is matched against the pooled streams: if it is within N bytes forward from the current position of one of the streams, then a seek and read would be issued, synchronized on that stream. Otherwise, any random stream would be chosen and a position read would be chosen. Separately, we can track the last N positional reads: if we detect a sequential pattern in the position reads, we can take one of the pooled input streams and seek to the next predicted offset, so that future reads get the sequential benefit. Non-pread DFSInputStreams should be associated with scanners, not HFile.Readers --- Key: HBASE-5979 URL: https://issues.apache.org/jira/browse/HBASE-5979 Project: HBase Issue Type: Improvement Components: performance, regionserver Reporter: Todd Lipcon Currently, every HFile.Reader has a single DFSInputStream, which it uses to service all gets and scans. For gets, we use the positional read API (aka pread) and for scans we use a synchronized block to seek, then read. The advantage of pread is that it doesn't hold any locks, so multiple gets can proceed at the same time. The advantage of seek+read for scans is that the datanode starts to send the entire rest of the HDFS block, rather than just the single hfile block necessary. So, in a single thread, pread is faster for gets, and seek+read is faster for scans since you get a strong pipelining effect. However, in a multi-threaded case where there are multiple scans (including scans which are actually part of compactions), the seek+read strategy falls apart, since only one scanner may be reading at a time. Additionally, a large amount of wasted IO is generated on the datanode side, and we get none of the earlier-mentioned advantages. In one test, I switched scans to always use pread, and saw a 5x improvement in throughput of the YCSB scan-only workload, since it previously was completely blocked by contention on the DFSIS lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4686) [89-fb] Fix per-store metrics aggregation
[ https://issues.apache.org/jira/browse/HBASE-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin resolved HBASE-4686. --- Resolution: Fixed This has already been committed to trunk. [89-fb] Fix per-store metrics aggregation -- Key: HBASE-4686 URL: https://issues.apache.org/jira/browse/HBASE-4686 Project: HBase Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: D87.1.patch, D87.2.patch, D87.3.patch, D87.4.patch, HBASE-4686-TestRegionServerMetics-and-Store-metric-a-20111027134023-cc718144.patch, HBASE-4686-jira-89-fb-Fix-per-store-metrics-aggregat-20111027152723-05bea421.patch In r1182034 per-Store metrics were broken, because the aggregation of StoreFile metrics over all stores in a region was replaced by overriding them every time. We saw these metrics drop by a factor of numRegions on a production cluster -- thanks to Kannan for noticing this! We need to fix the metrics and add a unit test to ensure regressions like this don't happen in the future. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6043) Add Increment Coalescing in thrift.
[ https://issues.apache.org/jira/browse/HBASE-6043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280643#comment-13280643 ] Hadoop QA commented on HBASE-6043: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12528531/HBASE-6043-1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 35 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.replication.TestReplication org.apache.hadoop.hbase.replication.TestMultiSlaveReplication org.apache.hadoop.hbase.replication.TestMasterReplication Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1952//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1952//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1952//console This message is automatically generated. Add Increment Coalescing in thrift. --- Key: HBASE-6043 URL: https://issues.apache.org/jira/browse/HBASE-6043 Project: HBase Issue Type: Improvement Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-6043-0.patch, HBASE-6043-1.patch, HBASE-6043-2.patch Since the thrift server uses the client api reducing the number of rpc's greatly speeds up increments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6063) Replication related failures on trunk after HBASE-5453
[ https://issues.apache.org/jira/browse/HBASE-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Chanan updated HBASE-6063: -- Status: Patch Available (was: Open) Replication related failures on trunk after HBASE-5453 -- Key: HBASE-6063 URL: https://issues.apache.org/jira/browse/HBASE-6063 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Attachments: HBASE-6063.patch HBASE-5453 added this line: {code} return ClusterId.parseFrom(data).toString(); {code} in function: public static String readClusterIdZNode(ZooKeeperWatcher watcher) but this is not implemented, so you get log messages like: 2012-05-21 16:46:31,256 ERROR [RegionServer:0;cloudera-vm,60456,1337643971995-EventThread] zookeeper.ClientCnxn$EventThread(523): Error while calling watcher java.lang.IllegalArgumentException: Invalid UUID string: org.apache.hadoop.hbase.ClusterId@5563d208 at java.util.UUID.fromString(UUID.java:204) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.init(ReplicationSource.java:192) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.getReplicationSource(ReplicationSourceManager.java:328) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.addSource(ReplicationSourceManager.java:206) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager$PeersWatcher.nodeChildrenChanged(ReplicationSourceManager.java:505) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:300) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497) 2012-05-21 16:46:31,256 ERROR [RegionServer:0;cloudera-vm,50926,1337643981835-EventThread] zookeeper.ClientCnxn$EventThread(523): Error while calling watcher and replication fails because the ClusterId does not match what is expected. Patch coming soon. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6044) copytable: remove rs.* parameters
[ https://issues.apache.org/jira/browse/HBASE-6044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280649#comment-13280649 ] Hudson commented on HBASE-6044: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #13 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/13/]) HBASE-6044 copytable: remove rs.* parameters (Revision 1341200) Result = FAILURE jmhsieh : Files : * /hbase/trunk/src/docbkx/ops_mgt.xml * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/CopyTable.java copytable: remove rs.* parameters - Key: HBASE-6044 URL: https://issues.apache.org/jira/browse/HBASE-6044 Project: HBase Issue Type: New Feature Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: hbase-6044-92.patch, hbase-6044-v2.patch, hbase-6044-v3.patch, hbase-6044-v4.patch, hbase-6044.patch In discussion of HBASE-6013 it was suggested that we remove these arguments from 0.92+ (but keep in 0.90) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6057) Change some tests categories to optimize build time
[ https://issues.apache.org/jira/browse/HBASE-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280651#comment-13280651 ] Hudson commented on HBASE-6057: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #13 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/13/]) HBASE-6057 Change some tests categories to optimize build time (nkeywal via JD) (Revision 1341211) Result = FAILURE jdcryans : Files : * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/encoding/TestBufferedDataBlockEncoder.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/encoding/TestEncodedSeekers.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestLruBlockCache.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/ipc/TestPBOnWritableRpc.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestClockSkewDetection.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestDefaultLoadBalancer.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/metrics/TestMetricsMBeanBase.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/monitoring/TestMemoryBoundedLogMessageBuffer.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/monitoring/TestTaskMonitor.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestGetClosestAtOrBefore.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollingNoCluster.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestPoolMap.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestHQuorumPeer.java Change some tests categories to optimize build time --- Key: HBASE-6057 URL: https://issues.apache.org/jira/browse/HBASE-6057 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 6057.v1.patch Some tests categorized as small takes more than 15s: it's better if they are executed in // with the medium tests. Some medium tests last less than 2s: it's better to have then executed with the small tests: we save a fork. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible
[ https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280650#comment-13280650 ] Hudson commented on HBASE-5757: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #13 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/13/]) HBASE-5757 TableInputFormat should handle as many errors as possible (Jan Lukavsky) (Revision 1341132) Result = FAILURE jmhsieh : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapred/TableRecordReaderImpl.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapred/TestTableInputFormat.java TableInputFormat should handle as many errors as possible - Key: HBASE-5757 URL: https://issues.apache.org/jira/browse/HBASE-5757 Project: HBase Issue Type: Bug Components: mapred, mapreduce Affects Versions: 0.90.6 Reporter: Jan Lukavsky Assignee: Jan Lukavsky Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1 Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, HBASE-5757.patch, hbase-5757-92.patch Prior to HBASE-4196 there was different handling of IOExceptions thrown from scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this handling so that if exception is caught a reconnect is attempted (without bothering the mapred client). After that, HBASE-4269 changed this behavior back, but in both mapred and mapreduce APIs. The question is, is there any reason not to handle all errors that the input format can handle? In other words, why not try to reissue the request after *any* IOException? I see the following disadvantages of current approach * the client may see exceptions like LeaseException and ScannerTimeoutException if he fails to process all fetched data in timeout * to avoid ScannerTimeoutException the client must raise hbase.regionserver.lease.period * timeouts for tasks is aready configured in mapred.task.timeout, so this seems to me a bit redundant, because typically one needs to update both these parameters * I don't see any possibility to get rid of LeaseException (this is configured on server side) I think all of these issues would be gone, if the DoNotRetryIOException would not be rethrown. -On the other hand, handling errors in InputFormat has disadvantage, that it may hide from the user some inefficiency. Eg. if I have very big scanner.caching, and I manage to process only a few rows in timeout, I will end up with single row being fetched many times (and will not be explicitly notified about this). Could we solve this problem by adding some counter to the InputFormat?- -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5882) Prcoess RIT on master restart can try assigning the region if the region is found on a dead server instead of waiting for Timeout Monitor
[ https://issues.apache.org/jira/browse/HBASE-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280653#comment-13280653 ] Hudson commented on HBASE-5882: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #13 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/13/]) HBASE-5882 Prcoess RIT on master restart can try assigning the region if the region is found on a dead server instead of waiting for Timeout Monitor (Ashutosh) (Revision 1341110) Result = FAILURE ramkrishna : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java Prcoess RIT on master restart can try assigning the region if the region is found on a dead server instead of waiting for Timeout Monitor - Key: HBASE-5882 URL: https://issues.apache.org/jira/browse/HBASE-5882 Project: HBase Issue Type: Improvement Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: ramkrishna.s.vasudevan Assignee: Ashutosh Jindal Fix For: 0.96.0, 0.94.1 Attachments: HBASE-5882_v5.patch, HBASE-5882_v6.patch, hbase_5882.patch, hbase_5882_V2.patch, hbase_5882_V3.patch, hbase_5882_V4.patch Currently on master restart if it tries to do processRIT, any region if found on dead server tries to avoid the nwe assignment so that timeout monitor can take care. This case is more prominent if the node is found in RS_ZK_REGION_OPENING state. I think we can handle this by triggering a new assignment with a new plan. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6061) Fix ACL Admin Table inconsistent permission check
[ https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280652#comment-13280652 ] Hudson commented on HBASE-6061: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #13 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/13/]) HBASE-6061 Fix ACL Admin Table inconsistent permission check (Matteo Bertozzi) (Revision 1341265) Result = FAILURE tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java Fix ACL Admin Table inconsistent permission check --- Key: HBASE-6061 URL: https://issues.apache.org/jira/browse/HBASE-6061 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Labels: acl, security Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6061-0.92.patch, HBASE-6061-v0.patch, HBASE-6061-v1.patch the requirePermission() check for admin operation on a table is currently inconsistent. Table Owner with CREATE rights (that means, the owner has created that table) can enable/disable and delete the table but needs ADMIN rights to add/remove/modify a column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6064) Add timestamp to Mutation Thrift API
Mikhail Bautin created HBASE-6064: - Summary: Add timestamp to Mutation Thrift API Key: HBASE-6064 URL: https://issues.apache.org/jira/browse/HBASE-6064 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin We need to be able to specify per-mutation timestamps in the HBase Thrift API. If the timestamp is not specified, the timestamp passed to the Thrift API method itself (mutateRowTs/mutateRowsTs) should be used. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6043) Add Increment Coalescing in thrift.
[ https://issues.apache.org/jira/browse/HBASE-6043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280657#comment-13280657 ] Elliott Clark commented on HBASE-6043: -- Looks like those tests are failing on trunk right now. Add Increment Coalescing in thrift. --- Key: HBASE-6043 URL: https://issues.apache.org/jira/browse/HBASE-6043 Project: HBase Issue Type: Improvement Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-6043-0.patch, HBASE-6043-1.patch, HBASE-6043-2.patch Since the thrift server uses the client api reducing the number of rpc's greatly speeds up increments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6063) Replication related failures on trunk after HBASE-5453
[ https://issues.apache.org/jira/browse/HBASE-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280673#comment-13280673 ] Hadoop QA commented on HBASE-6063: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12528539/HBASE-6063.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 33 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1954//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1954//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1954//console This message is automatically generated. Replication related failures on trunk after HBASE-5453 -- Key: HBASE-6063 URL: https://issues.apache.org/jira/browse/HBASE-6063 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Attachments: HBASE-6063.patch HBASE-5453 added this line: {code} return ClusterId.parseFrom(data).toString(); {code} in function: public static String readClusterIdZNode(ZooKeeperWatcher watcher) but this is not implemented, so you get log messages like: 2012-05-21 16:46:31,256 ERROR [RegionServer:0;cloudera-vm,60456,1337643971995-EventThread] zookeeper.ClientCnxn$EventThread(523): Error while calling watcher java.lang.IllegalArgumentException: Invalid UUID string: org.apache.hadoop.hbase.ClusterId@5563d208 at java.util.UUID.fromString(UUID.java:204) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.init(ReplicationSource.java:192) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.getReplicationSource(ReplicationSourceManager.java:328) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.addSource(ReplicationSourceManager.java:206) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager$PeersWatcher.nodeChildrenChanged(ReplicationSourceManager.java:505) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:300) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497) 2012-05-21 16:46:31,256 ERROR [RegionServer:0;cloudera-vm,50926,1337643981835-EventThread] zookeeper.ClientCnxn$EventThread(523): Error while calling watcher and replication fails because the ClusterId does not match what is expected. Patch coming soon. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6043) Add Increment Coalescing in thrift.
[ https://issues.apache.org/jira/browse/HBASE-6043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280676#comment-13280676 ] Hadoop QA commented on HBASE-6043: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12528534/HBASE-6043-2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.replication.TestMasterReplication org.apache.hadoop.hbase.replication.TestMultiSlaveReplication org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol org.apache.hadoop.hbase.replication.TestReplication Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1953//testReport/ Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1953//console This message is automatically generated. Add Increment Coalescing in thrift. --- Key: HBASE-6043 URL: https://issues.apache.org/jira/browse/HBASE-6043 Project: HBase Issue Type: Improvement Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-6043-0.patch, HBASE-6043-1.patch, HBASE-6043-2.patch Since the thrift server uses the client api reducing the number of rpc's greatly speeds up increments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6061) Fix ACL Admin Table inconsistent permission check
[ https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280680#comment-13280680 ] Hudson commented on HBASE-6061: --- Integrated in HBase-TRUNK #2914 (See [https://builds.apache.org/job/HBase-TRUNK/2914/]) HBASE-6061 Fix ACL Admin Table inconsistent permission check (Matteo Bertozzi) (Revision 1341265) Result = FAILURE tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java Fix ACL Admin Table inconsistent permission check --- Key: HBASE-6061 URL: https://issues.apache.org/jira/browse/HBASE-6061 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Labels: acl, security Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6061-0.92.patch, HBASE-6061-v0.patch, HBASE-6061-v1.patch the requirePermission() check for admin operation on a table is currently inconsistent. Table Owner with CREATE rights (that means, the owner has created that table) can enable/disable and delete the table but needs ADMIN rights to add/remove/modify a column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6061) Fix ACL Admin Table inconsistent permission check
[ https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280681#comment-13280681 ] Hudson commented on HBASE-6061: --- Integrated in HBase-0.94 #207 (See [https://builds.apache.org/job/HBase-0.94/207/]) HBASE-6061 Fix ACL Admin Table inconsistent permission check (Matteo Bertozzi) (Revision 1341267) Result = FAILURE tedyu : Files : * /hbase/branches/0.94/security/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java Fix ACL Admin Table inconsistent permission check --- Key: HBASE-6061 URL: https://issues.apache.org/jira/browse/HBASE-6061 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Labels: acl, security Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6061-0.92.patch, HBASE-6061-v0.patch, HBASE-6061-v1.patch the requirePermission() check for admin operation on a table is currently inconsistent. Table Owner with CREATE rights (that means, the owner has created that table) can enable/disable and delete the table but needs ADMIN rights to add/remove/modify a column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6061) Fix ACL Admin Table inconsistent permission check
[ https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280694#comment-13280694 ] Hudson commented on HBASE-6061: --- Integrated in HBase-0.92 #416 (See [https://builds.apache.org/job/HBase-0.92/416/]) HBASE-6061 Fix ACL Admin Table inconsistent permission check (Matteo Bertozzi) (Revision 1341268) Result = FAILURE tedyu : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/security/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java Fix ACL Admin Table inconsistent permission check --- Key: HBASE-6061 URL: https://issues.apache.org/jira/browse/HBASE-6061 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Labels: acl, security Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-6061-0.92.patch, HBASE-6061-v0.patch, HBASE-6061-v1.patch the requirePermission() check for admin operation on a table is currently inconsistent. Table Owner with CREATE rights (that means, the owner has created that table) can enable/disable and delete the table but needs ADMIN rights to add/remove/modify a column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6065) Log for flush would append a non-sequential edit in the hlog, may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-6065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-6065: Component/s: wal Assignee: chunhui shen Summary: Log for flush would append a non-sequential edit in the hlog, may cause data loss (was: Log for flush would append a non-sequential edit in the hlog, may cause data los) Log for flush would append a non-sequential edit in the hlog, may cause data loss - Key: HBASE-6065 URL: https://issues.apache.org/jira/browse/HBASE-6065 Project: HBase Issue Type: Bug Components: wal Reporter: chunhui shen Assignee: chunhui shen -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6065) Log for flush would append a non-sequential edit in the hlog, may cause data los
chunhui shen created HBASE-6065: --- Summary: Log for flush would append a non-sequential edit in the hlog, may cause data los Key: HBASE-6065 URL: https://issues.apache.org/jira/browse/HBASE-6065 Project: HBase Issue Type: Bug Reporter: chunhui shen -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6065) Log for flush would append a non-sequential edit in the hlog, may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-6065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-6065: Attachment: HBASE-6065.patch In the patch, I obtainSeqNum() for the flush log edit rather than the seqId from parameter. So we could ensure the log seq id is always sequential in the file. BTW, do we use the flush log edit anywhere? There is another solution: change the splitted log file's name to the real max seq id, rather than the last seq id Log for flush would append a non-sequential edit in the hlog, may cause data loss - Key: HBASE-6065 URL: https://issues.apache.org/jira/browse/HBASE-6065 Project: HBase Issue Type: Bug Components: wal Reporter: chunhui shen Assignee: chunhui shen Attachments: HBASE-6065.patch After completing flush region, we will append a log edit in the hlog file through HLog#completeCacheFlush. {code} public void completeCacheFlush(final byte [] encodedRegionName, final byte [] tableName, final long logSeqId, final boolean isMetaRegion) { ... HLogKey key = makeKey(encodedRegionName, tableName, logSeqId, System.currentTimeMillis(), HConstants.DEFAULT_CLUSTER_ID); ... } {code} when we make the hlog key, we use the seqId from the parameter, and it is generated by HLog#startCacheFlush, Here, we may append a lower seq id edit than the last edit in the hlog file. If it is the last edit log in the file, it may cause data loss. because {code} HRegion#replayRecoveredEditsIfAny{ ... maxSeqId = Math.abs(Long.parseLong(fileName)); if (maxSeqId = minSeqId) { String msg = Maximum sequenceid for this log is + maxSeqId + and minimum sequenceid for the region is + minSeqId + , skipped the whole file, path= + edits; LOG.debug(msg); continue; } ... } {code} We may skip the splitted log file, because we use the lase edit's seq id as its file name, and consider this seqId as the max seq id in this log file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6033) Adding some fuction to check if a table/region is in compaction
[ https://issues.apache.org/jira/browse/HBASE-6033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6033: --- Status: Patch Available (was: Open) Adding some fuction to check if a table/region is in compaction --- Key: HBASE-6033 URL: https://issues.apache.org/jira/browse/HBASE-6033 Project: HBase Issue Type: New Feature Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: hbase-6033_v2.patch, table_ui.png This feature will be helpful to find out if a major compaction is going on. We can show if it is in any minor compaction too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6033) Adding some fuction to check if a table/region is in compaction
[ https://issues.apache.org/jira/browse/HBASE-6033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6033: --- Attachment: hbase-6033_v2.patch Adding some fuction to check if a table/region is in compaction --- Key: HBASE-6033 URL: https://issues.apache.org/jira/browse/HBASE-6033 Project: HBase Issue Type: New Feature Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: hbase-6033_v2.patch, table_ui.png This feature will be helpful to find out if a major compaction is going on. We can show if it is in any minor compaction too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6059) Replaying recovered edits would make deleted data exist again
[ https://issues.apache.org/jira/browse/HBASE-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280704#comment-13280704 ] chunhui shen commented on HBASE-6059: - bq.If majorCompaction is false, we still need to check !kvs.isEmpty(), right? Yes, I think just about majorCompaction, minorCompaction will retain delete type, there is no problem. Replaying recovered edits would make deleted data exist again - Key: HBASE-6059 URL: https://issues.apache.org/jira/browse/HBASE-6059 Project: HBase Issue Type: Bug Components: regionserver Reporter: chunhui shen Assignee: chunhui shen Attachments: HBASE-6059-testcase.patch, HBASE-6059.patch When we replay recovered edits, we used the minSeqId of Store, It may cause deleted data appeared again. Let's see how it happens. Suppose the region with two families(cf1,cf2) 1.put one data to the region (put r1,cf1:q1,v1) 2.move the region from server A to server B. 3.delete the data put by step 1(delete r1) 4.flush this region. 5.make major compaction for this region 6.move the region from server B to server A. 7.Abort server A 8.After the region is online, we could get the deleted data(r1,cf1:q1,v1) (When we replay recovered edits, we used the minSeqId of Store, because cf2 has no store files, so its seqId is 0, so the edit log of put data will be replayed to the region) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96
[ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280706#comment-13280706 ] Zhihong Yu commented on HBASE-6055: --- The design document is very good. Will get back to reviewing HBASE-5547 first. Snapshots in HBase 0.96 --- Key: HBASE-6055 URL: https://issues.apache.org/jira/browse/HBASE-6055 Project: HBase Issue Type: New Feature Components: client, master, regionserver, zookeeper Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.96.0 Attachments: Snapshots in HBase.docx Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6065) Log for flush would append a non-sequential edit in the hlog, may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-6065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280707#comment-13280707 ] ramkrishna.s.vasudevan commented on HBASE-6065: --- So this applies to 0.94 and above only right? Log for flush would append a non-sequential edit in the hlog, may cause data loss - Key: HBASE-6065 URL: https://issues.apache.org/jira/browse/HBASE-6065 Project: HBase Issue Type: Bug Components: wal Reporter: chunhui shen Assignee: chunhui shen Attachments: HBASE-6065.patch After completing flush region, we will append a log edit in the hlog file through HLog#completeCacheFlush. {code} public void completeCacheFlush(final byte [] encodedRegionName, final byte [] tableName, final long logSeqId, final boolean isMetaRegion) { ... HLogKey key = makeKey(encodedRegionName, tableName, logSeqId, System.currentTimeMillis(), HConstants.DEFAULT_CLUSTER_ID); ... } {code} when we make the hlog key, we use the seqId from the parameter, and it is generated by HLog#startCacheFlush, Here, we may append a lower seq id edit than the last edit in the hlog file. If it is the last edit log in the file, it may cause data loss. because {code} HRegion#replayRecoveredEditsIfAny{ ... maxSeqId = Math.abs(Long.parseLong(fileName)); if (maxSeqId = minSeqId) { String msg = Maximum sequenceid for this log is + maxSeqId + and minimum sequenceid for the region is + minSeqId + , skipped the whole file, path= + edits; LOG.debug(msg); continue; } ... } {code} We may skip the splitted log file, because we use the lase edit's seq id as its file name, and consider this seqId as the max seq id in this log file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5974) Scanner retry behavior with RPC timeout on next() seems incorrect
[ https://issues.apache.org/jira/browse/HBASE-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280709#comment-13280709 ] Anoop Sam John commented on HBASE-5974: --- Thanks for the review Todd {quote} why do we need the new RegionScannerWithCookie class? why not add the cookie to RegionScanner itself? {quote} I was also thinking initially in this way. There are 2 reasons why I have avoided to do the seqNo work within the RegionScanner 1. In case of the caching1 there will be more than one call to the RegionScanner.next(). U mean passing the client sent seqNo ( I am avoiding cookie as I agree with you to rename this ) to the RegionScanner which will change the interface. This is exposed 2. This is the main reason. With the CP usage we have exposed the RegionScanner and using the preScannerOpen() and postScannerOpen() impls user can now return his own RegionScanner impl. If we do this seqNo maintain and check logics in RegionScanner this will make the user to worry abt these? I feel this should be handled by HBase core code. What do u say? {quote} this isn't currently compatible with 0.94, since a new client wouldn't be able to scan an old server. {quote} Agree.. I can fix this {quote} let's rename cookie to callSequenceNumber {quote} Already agreed.. :) {quote} In the test, I think you should use HRegionInterface directly, so you don't have to actually generate an RPC timeout. {quote} I thought of an E2E FT case.. Yes as u said the other one also I can write. So what is your recommendation? Should I change? {quote} As is, I think it's also not guaranteed to trigger the issue unless you set scanner caching to 1, right? {quote} May be in that case I can explicitly set the caching=1 for this test case. I can do that Scanner retry behavior with RPC timeout on next() seems incorrect - Key: HBASE-5974 URL: https://issues.apache.org/jira/browse/HBASE-5974 Project: HBase Issue Type: Bug Components: client, regionserver Affects Versions: 0.90.7, 0.92.1, 0.94.0, 0.96.0 Reporter: Todd Lipcon Priority: Critical Attachments: HBASE-5974_0.94.patch I'm seeing the following behavior: - set RPC timeout to a short value - call next() for some batch of rows, big enough so the client times out before the result is returned - the HConnectionManager stuff will retry the next() call to the same server. At this point, one of two things can happen: 1) the previous next() call will still be processing, in which case you get a LeaseException, because it was removed from the map during the processing, or 2) the next() call will succeed but skip the prior batch of rows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6065) Log for flush would append a non-sequential edit in the hlog, may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-6065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280717#comment-13280717 ] ramkrishna.s.vasudevan commented on HBASE-6065: --- @Chunhui What type of dataloass you see here? is it the edit with HBASE::CACHEFLUSH that gets missed here? Ideally by design that edit is actually needed to show up to what point the flush has been done and the same is added as an entry in HLog. Even while recovering we tend to skip this entry. {code} // Check this edit is for me. Also, guard against writing the special // METACOLUMN info such as HBASE::CACHEFLUSH entries if (kv.matchingFamily(HLog.METAFAMILY) || !Bytes.equals(key.getEncodedRegionName(), this.regionInfo.getEncodedNameAsBytes())) { skippedEdits++; continue; } {code} Did you find any other type of dataloss which i am not able to foresee here? Correct me if am wrong. Log for flush would append a non-sequential edit in the hlog, may cause data loss - Key: HBASE-6065 URL: https://issues.apache.org/jira/browse/HBASE-6065 Project: HBase Issue Type: Bug Components: wal Reporter: chunhui shen Assignee: chunhui shen Attachments: HBASE-6065.patch After completing flush region, we will append a log edit in the hlog file through HLog#completeCacheFlush. {code} public void completeCacheFlush(final byte [] encodedRegionName, final byte [] tableName, final long logSeqId, final boolean isMetaRegion) { ... HLogKey key = makeKey(encodedRegionName, tableName, logSeqId, System.currentTimeMillis(), HConstants.DEFAULT_CLUSTER_ID); ... } {code} when we make the hlog key, we use the seqId from the parameter, and it is generated by HLog#startCacheFlush, Here, we may append a lower seq id edit than the last edit in the hlog file. If it is the last edit log in the file, it may cause data loss. because {code} HRegion#replayRecoveredEditsIfAny{ ... maxSeqId = Math.abs(Long.parseLong(fileName)); if (maxSeqId = minSeqId) { String msg = Maximum sequenceid for this log is + maxSeqId + and minimum sequenceid for the region is + minSeqId + , skipped the whole file, path= + edits; LOG.debug(msg); continue; } ... } {code} We may skip the splitted log file, because we use the lase edit's seq id as its file name, and consider this seqId as the max seq id in this log file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6059) Replaying recovered edits would make deleted data exist again
[ https://issues.apache.org/jira/browse/HBASE-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280718#comment-13280718 ] ramkrishna.s.vasudevan commented on HBASE-6059: --- I think only major compaction could lead us to this problem which probabaly deletes it. Incase of TTL expiry of all the entries in a store file, can we have this scenario of empty StoreFile getting created on minor or major compaction? I think creating empty store file should be fine. Lets take others input also on this? Replaying recovered edits would make deleted data exist again - Key: HBASE-6059 URL: https://issues.apache.org/jira/browse/HBASE-6059 Project: HBase Issue Type: Bug Components: regionserver Reporter: chunhui shen Assignee: chunhui shen Attachments: HBASE-6059-testcase.patch, HBASE-6059.patch When we replay recovered edits, we used the minSeqId of Store, It may cause deleted data appeared again. Let's see how it happens. Suppose the region with two families(cf1,cf2) 1.put one data to the region (put r1,cf1:q1,v1) 2.move the region from server A to server B. 3.delete the data put by step 1(delete r1) 4.flush this region. 5.make major compaction for this region 6.move the region from server B to server A. 7.Abort server A 8.After the region is online, we could get the deleted data(r1,cf1:q1,v1) (When we replay recovered edits, we used the minSeqId of Store, because cf2 has no store files, so its seqId is 0, so the edit log of put data will be replayed to the region) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6033) Adding some fuction to check if a table/region is in compaction
[ https://issues.apache.org/jira/browse/HBASE-6033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280719#comment-13280719 ] Hadoop QA commented on HBASE-6033: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12528550/hbase-6033_v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 33 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestCompactionState org.apache.hadoop.hbase.replication.TestReplication org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster org.apache.hadoop.hbase.replication.TestMultiSlaveReplication org.apache.hadoop.hbase.replication.TestMasterReplication Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1955//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1955//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1955//console This message is automatically generated. Adding some fuction to check if a table/region is in compaction --- Key: HBASE-6033 URL: https://issues.apache.org/jira/browse/HBASE-6033 Project: HBase Issue Type: New Feature Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: hbase-6033_v2.patch, table_ui.png This feature will be helpful to find out if a major compaction is going on. We can show if it is in any minor compaction too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5974) Scanner retry behavior with RPC timeout on next() seems incorrect
[ https://issues.apache.org/jira/browse/HBASE-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280728#comment-13280728 ] Anoop Sam John commented on HBASE-5974: --- Thanks for the review Jieshan {quote} So what's your suggestion, Anoop? call CP hooks in the finally section? {quote} I mean in whatever case when we close the scanner we need to call the CP hooks. Currently before this patch we were not doing this when getting a NSRE {code} catch (Throwable t) { if (t instanceof NotServingRegionException) { this.scanners.remove(scannerName); } throw convertThrowableToIOE(cleanup(t)); } {code} Here we can see it is not calling the CP hooks. As of now in case of the cookie out of order also I am not contacting the CP hooks. {quote} RegionScanner scanner = scanners.get(scannerIdString).s; {quote} Oh yes. Thanks for pointing it out. I will fix.. This was not in that direct next() call flow.. That is why I missed..:( Scanner retry behavior with RPC timeout on next() seems incorrect - Key: HBASE-5974 URL: https://issues.apache.org/jira/browse/HBASE-5974 Project: HBase Issue Type: Bug Components: client, regionserver Affects Versions: 0.90.7, 0.92.1, 0.94.0, 0.96.0 Reporter: Todd Lipcon Priority: Critical Attachments: HBASE-5974_0.94.patch I'm seeing the following behavior: - set RPC timeout to a short value - call next() for some batch of rows, big enough so the client times out before the result is returned - the HConnectionManager stuff will retry the next() call to the same server. At this point, one of two things can happen: 1) the previous next() call will still be processing, in which case you get a LeaseException, because it was removed from the map during the processing, or 2) the next() call will succeed but skip the prior batch of rows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6033) Adding some fuction to check if a table/region is in compaction
[ https://issues.apache.org/jira/browse/HBASE-6033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280729#comment-13280729 ] Zhihong Yu commented on HBASE-6033: --- @Jimmy: Can you check why TestCompactionState failed ? Adding some fuction to check if a table/region is in compaction --- Key: HBASE-6033 URL: https://issues.apache.org/jira/browse/HBASE-6033 Project: HBase Issue Type: New Feature Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: hbase-6033_v2.patch, table_ui.png This feature will be helpful to find out if a major compaction is going on. We can show if it is in any minor compaction too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6065) Log for flush would append a non-sequential edit in the hlog, may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-6065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280731#comment-13280731 ] chunhui shen commented on HBASE-6065: - Suppose region A on the regionserver B, The issue could reproduce as the following step: 1.put one data to region A (append seq 1 in the hlog) 2.put one data to region A (append seq 2 in the hlog) 3.region A start flush, it will call HLog#startCacheFlush (current seq num is 3 in the hlog) 4.put one data to region A (append seq 4 in the hlog) 5.region A complete flush, it will call HLog#completeCacheFlush (append seq 3 in the hlog) 6.kill regionserver B. So, the hlog file has four edit: seq 1 seq 2 seq 4 seq 3 when splitting this hlog file, we generate the recoverd.edits file for region A which is named 3.(About the name, we could see HLogSplitter#splitLogFileToTemp) Now, when replaying recoverd.edits file for region A, we will skip this file and cause data loss. Log for flush would append a non-sequential edit in the hlog, may cause data loss - Key: HBASE-6065 URL: https://issues.apache.org/jira/browse/HBASE-6065 Project: HBase Issue Type: Bug Components: wal Reporter: chunhui shen Assignee: chunhui shen Attachments: HBASE-6065.patch After completing flush region, we will append a log edit in the hlog file through HLog#completeCacheFlush. {code} public void completeCacheFlush(final byte [] encodedRegionName, final byte [] tableName, final long logSeqId, final boolean isMetaRegion) { ... HLogKey key = makeKey(encodedRegionName, tableName, logSeqId, System.currentTimeMillis(), HConstants.DEFAULT_CLUSTER_ID); ... } {code} when we make the hlog key, we use the seqId from the parameter, and it is generated by HLog#startCacheFlush, Here, we may append a lower seq id edit than the last edit in the hlog file. If it is the last edit log in the file, it may cause data loss. because {code} HRegion#replayRecoveredEditsIfAny{ ... maxSeqId = Math.abs(Long.parseLong(fileName)); if (maxSeqId = minSeqId) { String msg = Maximum sequenceid for this log is + maxSeqId + and minimum sequenceid for the region is + minSeqId + , skipped the whole file, path= + edits; LOG.debug(msg); continue; } ... } {code} We may skip the splitted log file, because we use the lase edit's seq id as its file name, and consider this seqId as the max seq id in this log file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6065) Log for flush would append a non-sequential edit in the hlog, may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-6065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280734#comment-13280734 ] chunhui shen commented on HBASE-6065: - I have tried to write a test, but it is a little hard. We also could fix the issue using another solution (patchv2): In current logic, we consider the last edit's seq id as the maximal seq id in the recoverd.edits file, however it is wrong because we can't ensure the sequentia edit in the hlog. So we should changed the logic of find the maximal seq id for the recoverd.edits file, We only need do a little for the method HLogSplitter#updateRegionMaximumEditLogSeqNum. Log for flush would append a non-sequential edit in the hlog, may cause data loss - Key: HBASE-6065 URL: https://issues.apache.org/jira/browse/HBASE-6065 Project: HBase Issue Type: Bug Components: wal Reporter: chunhui shen Assignee: chunhui shen Attachments: HBASE-6065.patch, HBASE-6065v2.patch After completing flush region, we will append a log edit in the hlog file through HLog#completeCacheFlush. {code} public void completeCacheFlush(final byte [] encodedRegionName, final byte [] tableName, final long logSeqId, final boolean isMetaRegion) { ... HLogKey key = makeKey(encodedRegionName, tableName, logSeqId, System.currentTimeMillis(), HConstants.DEFAULT_CLUSTER_ID); ... } {code} when we make the hlog key, we use the seqId from the parameter, and it is generated by HLog#startCacheFlush, Here, we may append a lower seq id edit than the last edit in the hlog file. If it is the last edit log in the file, it may cause data loss. because {code} HRegion#replayRecoveredEditsIfAny{ ... maxSeqId = Math.abs(Long.parseLong(fileName)); if (maxSeqId = minSeqId) { String msg = Maximum sequenceid for this log is + maxSeqId + and minimum sequenceid for the region is + minSeqId + , skipped the whole file, path= + edits; LOG.debug(msg); continue; } ... } {code} We may skip the splitted log file, because we use the lase edit's seq id as its file name, and consider this seqId as the max seq id in this log file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6065) Log for flush would append a non-sequential edit in the hlog, may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-6065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-6065: Attachment: HBASE-6065v2.patch Log for flush would append a non-sequential edit in the hlog, may cause data loss - Key: HBASE-6065 URL: https://issues.apache.org/jira/browse/HBASE-6065 Project: HBase Issue Type: Bug Components: wal Reporter: chunhui shen Assignee: chunhui shen Attachments: HBASE-6065.patch, HBASE-6065v2.patch After completing flush region, we will append a log edit in the hlog file through HLog#completeCacheFlush. {code} public void completeCacheFlush(final byte [] encodedRegionName, final byte [] tableName, final long logSeqId, final boolean isMetaRegion) { ... HLogKey key = makeKey(encodedRegionName, tableName, logSeqId, System.currentTimeMillis(), HConstants.DEFAULT_CLUSTER_ID); ... } {code} when we make the hlog key, we use the seqId from the parameter, and it is generated by HLog#startCacheFlush, Here, we may append a lower seq id edit than the last edit in the hlog file. If it is the last edit log in the file, it may cause data loss. because {code} HRegion#replayRecoveredEditsIfAny{ ... maxSeqId = Math.abs(Long.parseLong(fileName)); if (maxSeqId = minSeqId) { String msg = Maximum sequenceid for this log is + maxSeqId + and minimum sequenceid for the region is + minSeqId + , skipped the whole file, path= + edits; LOG.debug(msg); continue; } ... } {code} We may skip the splitted log file, because we use the lase edit's seq id as its file name, and consider this seqId as the max seq id in this log file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira