[jira] [Updated] (HBASE-6049) Serializing List containing null elements will cause NullPointerException in HbaseObjectWritable.writeObject()

2012-05-21 Thread Maryann Xue (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maryann Xue updated HBASE-6049:
---

Attachment: HBASE-6049-v2.patch

@Zhihong updated the patch with modification to the test case. how does this 
look?

 Serializing List containing null elements will cause NullPointerException 
 in HbaseObjectWritable.writeObject()
 

 Key: HBASE-6049
 URL: https://issues.apache.org/jira/browse/HBASE-6049
 Project: HBase
  Issue Type: Bug
  Components: io
Affects Versions: 0.94.0
Reporter: Maryann Xue
 Attachments: HBASE-6049-v2.patch, HBASE-6049.patch


 An error case could be in Coprocessor AggregationClient, the median() 
 function handles an empty region and returns a List Object with the first 
 element as a Null value. NPE occurs in the RPC response stage and the 
 response never gets sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6059) Replaying recovered edits would make deleted data exist again

2012-05-21 Thread chunhui shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-6059:


Attachment: HBASE-6059-testcase.patch

I have written the test case to reproduce the issue

 Replaying recovered edits would make deleted data exist again
 -

 Key: HBASE-6059
 URL: https://issues.apache.org/jira/browse/HBASE-6059
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: HBASE-6059-testcase.patch


 When we replay recovered edits, we used the minSeqId of Store, It may cause 
 deleted data appeared again.
 Let's see how it happens. Suppose the region with two families(cf1,cf2)
 1.put one data to the region (put r1,cf1:q1,v1)
 2.move the region from server A to server B.
 3.delete the data put by step 1(delete r1)
 4.flush this region.
 5.make major compaction for this region
 6.move the region from server B to server A.
 7.Abort server A
 8.After the region is online, we could get the deleted data(r1,cf1:q1,v1)
 (When we replay recovered edits, we used the minSeqId of Store, because cf2 
 has no store files, so its seqId is 0, so the edit log of put data will be 
 replayed to the region)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6059) Replaying recovered edits would make deleted data exist again

2012-05-21 Thread chunhui shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-6059:


Attachment: HBASE-6059.patch

In the solution patch,  I use Mapbyte[], Long maxSeqIdInStores to save each 
store's maxSeqId,
So, when replaying edit logs, we skip the edits for different stores accoring 
to its own maxSeqId

 Replaying recovered edits would make deleted data exist again
 -

 Key: HBASE-6059
 URL: https://issues.apache.org/jira/browse/HBASE-6059
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: HBASE-6059-testcase.patch, HBASE-6059.patch


 When we replay recovered edits, we used the minSeqId of Store, It may cause 
 deleted data appeared again.
 Let's see how it happens. Suppose the region with two families(cf1,cf2)
 1.put one data to the region (put r1,cf1:q1,v1)
 2.move the region from server A to server B.
 3.delete the data put by step 1(delete r1)
 4.flush this region.
 5.make major compaction for this region
 6.move the region from server B to server A.
 7.Abort server A
 8.After the region is online, we could get the deleted data(r1,cf1:q1,v1)
 (When we replay recovered edits, we used the minSeqId of Store, because cf2 
 has no store files, so its seqId is 0, so the edit log of put data will be 
 replayed to the region)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6049) Serializing List containing null elements will cause NullPointerException in HbaseObjectWritable.writeObject()

2012-05-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280038#comment-13280038
 ] 

Hadoop QA commented on HBASE-6049:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12528398/HBASE-6049-v2.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 33 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.io.TestHbaseObjectWritable

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1943//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1943//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1943//console

This message is automatically generated.

 Serializing List containing null elements will cause NullPointerException 
 in HbaseObjectWritable.writeObject()
 

 Key: HBASE-6049
 URL: https://issues.apache.org/jira/browse/HBASE-6049
 Project: HBase
  Issue Type: Bug
  Components: io
Affects Versions: 0.94.0
Reporter: Maryann Xue
 Attachments: HBASE-6049-v2.patch, HBASE-6049.patch


 An error case could be in Coprocessor AggregationClient, the median() 
 function handles an empty region and returns a List Object with the first 
 element as a Null value. NPE occurs in the RPC response stage and the 
 response never gets sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6059) Replaying recovered edits would make deleted data exist again

2012-05-21 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280077#comment-13280077
 ] 

ramkrishna.s.vasudevan commented on HBASE-6059:
---

@Chunhui
This is a damn good one.  But still i find one problem is there in this.  A 
similar type of problem that you have reported. Pls correct me if am wrong.
In the same test case in the place where you are deleting the row 'r1' if i 
delete the row 'r2' also
{code}
del = new Delete(Bytes.toBytes(r));
htable.delete(del);
resultScanner = htable.getScanner(new Scan());
count = 0;
while (resultScanner.next() != null) {
  count++;
}
{code}
Now my seq id from the store files will be 0 only as nothing to get after major 
compaction. So still the same problem is occuring.  I tried to simulate this 
with the same test case that you added. 
May be we need someother way to know that the edit has been deleted out by a 
major compaction? Because as i see this problem that without major compaction 
there is no issue at all.

 Replaying recovered edits would make deleted data exist again
 -

 Key: HBASE-6059
 URL: https://issues.apache.org/jira/browse/HBASE-6059
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: HBASE-6059-testcase.patch, HBASE-6059.patch


 When we replay recovered edits, we used the minSeqId of Store, It may cause 
 deleted data appeared again.
 Let's see how it happens. Suppose the region with two families(cf1,cf2)
 1.put one data to the region (put r1,cf1:q1,v1)
 2.move the region from server A to server B.
 3.delete the data put by step 1(delete r1)
 4.flush this region.
 5.make major compaction for this region
 6.move the region from server B to server A.
 7.Abort server A
 8.After the region is online, we could get the deleted data(r1,cf1:q1,v1)
 (When we replay recovered edits, we used the minSeqId of Store, because cf2 
 has no store files, so its seqId is 0, so the edit log of put data will be 
 replayed to the region)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Comment Edited] (HBASE-6059) Replaying recovered edits would make deleted data exist again

2012-05-21 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280077#comment-13280077
 ] 

ramkrishna.s.vasudevan edited comment on HBASE-6059 at 5/21/12 10:59 AM:
-

@Chunhui
This is a damn good one.  But still i find one problem is there in this.  A 
similar type of problem that you have reported. Pls correct me if am wrong.
In the same test case in the place where you are deleting the row 'r1' if i 
delete the row 'r2' also
{edit}
In the same test case in the place where you are deleting the row 'r1' if i 
delete the row 'r' also
{edit}
{code}
del = new Delete(Bytes.toBytes(r));
htable.delete(del);
resultScanner = htable.getScanner(new Scan());
count = 0;
while (resultScanner.next() != null) {
  count++;
}
{code}
Now my seq id from the store files will be 0 only as nothing to get after major 
compaction. So still the same problem is occuring.  I tried to simulate this 
with the same test case that you added. 
May be we need someother way to know that the edit has been deleted out by a 
major compaction? Because as i see this problem that without major compaction 
there is no issue at all.

  was (Author: ram_krish):
@Chunhui
This is a damn good one.  But still i find one problem is there in this.  A 
similar type of problem that you have reported. Pls correct me if am wrong.
In the same test case in the place where you are deleting the row 'r1' if i 
delete the row 'r2' also
{code}
del = new Delete(Bytes.toBytes(r));
htable.delete(del);
resultScanner = htable.getScanner(new Scan());
count = 0;
while (resultScanner.next() != null) {
  count++;
}
{code}
Now my seq id from the store files will be 0 only as nothing to get after major 
compaction. So still the same problem is occuring.  I tried to simulate this 
with the same test case that you added. 
May be we need someother way to know that the edit has been deleted out by a 
major compaction? Because as i see this problem that without major compaction 
there is no issue at all.
  
 Replaying recovered edits would make deleted data exist again
 -

 Key: HBASE-6059
 URL: https://issues.apache.org/jira/browse/HBASE-6059
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: HBASE-6059-testcase.patch, HBASE-6059.patch


 When we replay recovered edits, we used the minSeqId of Store, It may cause 
 deleted data appeared again.
 Let's see how it happens. Suppose the region with two families(cf1,cf2)
 1.put one data to the region (put r1,cf1:q1,v1)
 2.move the region from server A to server B.
 3.delete the data put by step 1(delete r1)
 4.flush this region.
 5.make major compaction for this region
 6.move the region from server B to server A.
 7.Abort server A
 8.After the region is online, we could get the deleted data(r1,cf1:q1,v1)
 (When we replay recovered edits, we used the minSeqId of Store, because cf2 
 has no store files, so its seqId is 0, so the edit log of put data will be 
 replayed to the region)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6059) Replaying recovered edits would make deleted data exist again

2012-05-21 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280101#comment-13280101
 ] 

chunhui shen commented on HBASE-6059:
-

@ram
Yes, I have also considered that all the entries in the store file is deleted 
and we don't write any new store file.
But, could we generate one empty store file with its meta data alone? Let me do 
a try first.

 Replaying recovered edits would make deleted data exist again
 -

 Key: HBASE-6059
 URL: https://issues.apache.org/jira/browse/HBASE-6059
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: HBASE-6059-testcase.patch, HBASE-6059.patch


 When we replay recovered edits, we used the minSeqId of Store, It may cause 
 deleted data appeared again.
 Let's see how it happens. Suppose the region with two families(cf1,cf2)
 1.put one data to the region (put r1,cf1:q1,v1)
 2.move the region from server A to server B.
 3.delete the data put by step 1(delete r1)
 4.flush this region.
 5.make major compaction for this region
 6.move the region from server B to server A.
 7.Abort server A
 8.After the region is online, we could get the deleted data(r1,cf1:q1,v1)
 (When we replay recovered edits, we used the minSeqId of Store, because cf2 
 has no store files, so its seqId is 0, so the edit log of put data will be 
 replayed to the region)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

2012-05-21 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280157#comment-13280157
 ] 

Zhihong Yu commented on HBASE-5757:
---

@Jan:
Neither patch applies to trunk as of today.
Can you attach patch for trunk and name it accordingly ?

Thanks

 TableInputFormat should handle as many errors as possible
 -

 Key: HBASE-5757
 URL: https://issues.apache.org/jira/browse/HBASE-5757
 Project: HBase
  Issue Type: Bug
  Components: mapred, mapreduce
Affects Versions: 0.90.6
Reporter: Jan Lukavsky
 Attachments: HBASE-5757.patch, HBASE-5757.patch


 Prior to HBASE-4196 there was different handling of IOExceptions thrown from 
 scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this 
 handling so that if exception is caught a reconnect is attempted (without 
 bothering the mapred client). After that, HBASE-4269 changed this behavior 
 back, but in both mapred and mapreduce APIs. The question is, is there any 
 reason not to handle all errors that the input format can handle? In other 
 words, why not try to reissue the request after *any* IOException? I see the 
 following disadvantages of current approach
  * the client may see exceptions like LeaseException and 
 ScannerTimeoutException if he fails to process all fetched data in timeout
  * to avoid ScannerTimeoutException the client must raise 
 hbase.regionserver.lease.period
  * timeouts for tasks is aready configured in mapred.task.timeout, so this 
 seems to me a bit redundant, because typically one needs to update both these 
 parameters
  * I don't see any possibility to get rid of LeaseException (this is 
 configured on server side)
 I think all of these issues would be gone, if the DoNotRetryIOException would 
 not be rethrown. -On the other hand, handling errors in InputFormat has 
 disadvantage, that it may hide from the user some inefficiency. Eg. if I have 
 very big scanner.caching, and I manage to process only a few rows in timeout, 
 I will end up with single row being fetched many times (and will not be 
 explicitly notified about this). Could we solve this problem by adding some 
 counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5757) TableInputFormat should handle as many errors as possible

2012-05-21 Thread Jan Lukavsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Lukavsky updated HBASE-5757:


Attachment: HBASE-5757-trunk-r1341041.patch

There was conflicting commit to patch for HBASE-6004. Merged this patch, the 
new one should apply to revision 1341041.

 TableInputFormat should handle as many errors as possible
 -

 Key: HBASE-5757
 URL: https://issues.apache.org/jira/browse/HBASE-5757
 Project: HBase
  Issue Type: Bug
  Components: mapred, mapreduce
Affects Versions: 0.90.6
Reporter: Jan Lukavsky
 Attachments: HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, 
 HBASE-5757.patch


 Prior to HBASE-4196 there was different handling of IOExceptions thrown from 
 scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this 
 handling so that if exception is caught a reconnect is attempted (without 
 bothering the mapred client). After that, HBASE-4269 changed this behavior 
 back, but in both mapred and mapreduce APIs. The question is, is there any 
 reason not to handle all errors that the input format can handle? In other 
 words, why not try to reissue the request after *any* IOException? I see the 
 following disadvantages of current approach
  * the client may see exceptions like LeaseException and 
 ScannerTimeoutException if he fails to process all fetched data in timeout
  * to avoid ScannerTimeoutException the client must raise 
 hbase.regionserver.lease.period
  * timeouts for tasks is aready configured in mapred.task.timeout, so this 
 seems to me a bit redundant, because typically one needs to update both these 
 parameters
  * I don't see any possibility to get rid of LeaseException (this is 
 configured on server side)
 I think all of these issues would be gone, if the DoNotRetryIOException would 
 not be rethrown. -On the other hand, handling errors in InputFormat has 
 disadvantage, that it may hide from the user some inefficiency. Eg. if I have 
 very big scanner.caching, and I manage to process only a few rows in timeout, 
 I will end up with single row being fetched many times (and will not be 
 explicitly notified about this). Could we solve this problem by adding some 
 counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5757) TableInputFormat should handle as many errors as possible

2012-05-21 Thread Zhihong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5757:
--

Status: Patch Available  (was: Open)

 TableInputFormat should handle as many errors as possible
 -

 Key: HBASE-5757
 URL: https://issues.apache.org/jira/browse/HBASE-5757
 Project: HBase
  Issue Type: Bug
  Components: mapred, mapreduce
Affects Versions: 0.90.6
Reporter: Jan Lukavsky
 Attachments: HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, 
 HBASE-5757.patch


 Prior to HBASE-4196 there was different handling of IOExceptions thrown from 
 scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this 
 handling so that if exception is caught a reconnect is attempted (without 
 bothering the mapred client). After that, HBASE-4269 changed this behavior 
 back, but in both mapred and mapreduce APIs. The question is, is there any 
 reason not to handle all errors that the input format can handle? In other 
 words, why not try to reissue the request after *any* IOException? I see the 
 following disadvantages of current approach
  * the client may see exceptions like LeaseException and 
 ScannerTimeoutException if he fails to process all fetched data in timeout
  * to avoid ScannerTimeoutException the client must raise 
 hbase.regionserver.lease.period
  * timeouts for tasks is aready configured in mapred.task.timeout, so this 
 seems to me a bit redundant, because typically one needs to update both these 
 parameters
  * I don't see any possibility to get rid of LeaseException (this is 
 configured on server side)
 I think all of these issues would be gone, if the DoNotRetryIOException would 
 not be rethrown. -On the other hand, handling errors in InputFormat has 
 disadvantage, that it may hide from the user some inefficiency. Eg. if I have 
 very big scanner.caching, and I manage to process only a few rows in timeout, 
 I will end up with single row being fetched many times (and will not be 
 explicitly notified about this). Could we solve this problem by adding some 
 counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

2012-05-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280240#comment-13280240
 ] 

Hadoop QA commented on HBASE-5757:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12528434/HBASE-5757-trunk-r1341041.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 33 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.coprocessor.TestClassLoading
  org.apache.hadoop.hbase.replication.TestReplication
  
org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
  org.apache.hadoop.hbase.replication.TestMultiSlaveReplication
  org.apache.hadoop.hbase.replication.TestMasterReplication

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1944//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1944//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1944//console

This message is automatically generated.

 TableInputFormat should handle as many errors as possible
 -

 Key: HBASE-5757
 URL: https://issues.apache.org/jira/browse/HBASE-5757
 Project: HBase
  Issue Type: Bug
  Components: mapred, mapreduce
Affects Versions: 0.90.6
Reporter: Jan Lukavsky
 Attachments: HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, 
 HBASE-5757.patch


 Prior to HBASE-4196 there was different handling of IOExceptions thrown from 
 scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this 
 handling so that if exception is caught a reconnect is attempted (without 
 bothering the mapred client). After that, HBASE-4269 changed this behavior 
 back, but in both mapred and mapreduce APIs. The question is, is there any 
 reason not to handle all errors that the input format can handle? In other 
 words, why not try to reissue the request after *any* IOException? I see the 
 following disadvantages of current approach
  * the client may see exceptions like LeaseException and 
 ScannerTimeoutException if he fails to process all fetched data in timeout
  * to avoid ScannerTimeoutException the client must raise 
 hbase.regionserver.lease.period
  * timeouts for tasks is aready configured in mapred.task.timeout, so this 
 seems to me a bit redundant, because typically one needs to update both these 
 parameters
  * I don't see any possibility to get rid of LeaseException (this is 
 configured on server side)
 I think all of these issues would be gone, if the DoNotRetryIOException would 
 not be rethrown. -On the other hand, handling errors in InputFormat has 
 disadvantage, that it may hide from the user some inefficiency. Eg. if I have 
 very big scanner.caching, and I manage to process only a few rows in timeout, 
 I will end up with single row being fetched many times (and will not be 
 explicitly notified about this). Could we solve this problem by adding some 
 counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

2012-05-21 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280249#comment-13280249
 ] 

Zhihong Yu commented on HBASE-5757:
---

I ran the following two tests and they passed with the latest patch:
{code}
  518  mt -Dtest=TestClassLoading
  519  mt -Dtest=TestSplitTransactionOnCluster
{code}
The replication tests have been failing and are not related to this change.

Minor comments:
{code}
+// try to handle exceptions all possible exceptions by restarting
{code}
The first 'exceptions ' should be removed.

 TableInputFormat should handle as many errors as possible
 -

 Key: HBASE-5757
 URL: https://issues.apache.org/jira/browse/HBASE-5757
 Project: HBase
  Issue Type: Bug
  Components: mapred, mapreduce
Affects Versions: 0.90.6
Reporter: Jan Lukavsky
 Attachments: HBASE-5757-trunk-r1341041.patch, HBASE-5757.patch, 
 HBASE-5757.patch


 Prior to HBASE-4196 there was different handling of IOExceptions thrown from 
 scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this 
 handling so that if exception is caught a reconnect is attempted (without 
 bothering the mapred client). After that, HBASE-4269 changed this behavior 
 back, but in both mapred and mapreduce APIs. The question is, is there any 
 reason not to handle all errors that the input format can handle? In other 
 words, why not try to reissue the request after *any* IOException? I see the 
 following disadvantages of current approach
  * the client may see exceptions like LeaseException and 
 ScannerTimeoutException if he fails to process all fetched data in timeout
  * to avoid ScannerTimeoutException the client must raise 
 hbase.regionserver.lease.period
  * timeouts for tasks is aready configured in mapred.task.timeout, so this 
 seems to me a bit redundant, because typically one needs to update both these 
 parameters
  * I don't see any possibility to get rid of LeaseException (this is 
 configured on server side)
 I think all of these issues would be gone, if the DoNotRetryIOException would 
 not be rethrown. -On the other hand, handling errors in InputFormat has 
 disadvantage, that it may hide from the user some inefficiency. Eg. if I have 
 very big scanner.caching, and I manage to process only a few rows in timeout, 
 I will end up with single row being fetched many times (and will not be 
 explicitly notified about this). Could we solve this problem by adding some 
 counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-5882) Prcoess RIT on master restart can try assigning the region if the region is found on a dead server instead of waiting for Timeout Monitor

2012-05-21 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan resolved HBASE-5882.
---

Resolution: Fixed

Committed the patch. Hence resolving this.

 Prcoess RIT on master restart can try assigning the region if the region is 
 found on a dead server instead of waiting for Timeout Monitor
 -

 Key: HBASE-5882
 URL: https://issues.apache.org/jira/browse/HBASE-5882
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.6, 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: Ashutosh Jindal
 Fix For: 0.96.0, 0.94.1

 Attachments: HBASE-5882_v5.patch, HBASE-5882_v6.patch, 
 hbase_5882.patch, hbase_5882_V2.patch, hbase_5882_V3.patch, 
 hbase_5882_V4.patch


 Currently on  master restart if it tries to do processRIT, any region if 
 found on dead server tries to avoid the nwe assignment so that timeout 
 monitor can take care.
 This case is more prominent if the node is found in RS_ZK_REGION_OPENING 
 state. I think we can handle this by triggering a new assignment with a new 
 plan.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

2012-05-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280285#comment-13280285
 ] 

Hadoop QA commented on HBASE-5757:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12528448/5757-trunk-v2.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 33 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.coprocessor.TestClassLoading
  org.apache.hadoop.hbase.replication.TestReplication
  org.apache.hadoop.hbase.replication.TestMultiSlaveReplication
  org.apache.hadoop.hbase.regionserver.wal.TestHLog
  org.apache.hadoop.hbase.replication.TestMasterReplication

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1945//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1945//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1945//console

This message is automatically generated.

 TableInputFormat should handle as many errors as possible
 -

 Key: HBASE-5757
 URL: https://issues.apache.org/jira/browse/HBASE-5757
 Project: HBase
  Issue Type: Bug
  Components: mapred, mapreduce
Affects Versions: 0.90.6
Reporter: Jan Lukavsky
 Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, 
 HBASE-5757.patch, HBASE-5757.patch


 Prior to HBASE-4196 there was different handling of IOExceptions thrown from 
 scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this 
 handling so that if exception is caught a reconnect is attempted (without 
 bothering the mapred client). After that, HBASE-4269 changed this behavior 
 back, but in both mapred and mapreduce APIs. The question is, is there any 
 reason not to handle all errors that the input format can handle? In other 
 words, why not try to reissue the request after *any* IOException? I see the 
 following disadvantages of current approach
  * the client may see exceptions like LeaseException and 
 ScannerTimeoutException if he fails to process all fetched data in timeout
  * to avoid ScannerTimeoutException the client must raise 
 hbase.regionserver.lease.period
  * timeouts for tasks is aready configured in mapred.task.timeout, so this 
 seems to me a bit redundant, because typically one needs to update both these 
 parameters
  * I don't see any possibility to get rid of LeaseException (this is 
 configured on server side)
 I think all of these issues would be gone, if the DoNotRetryIOException would 
 not be rethrown. -On the other hand, handling errors in InputFormat has 
 disadvantage, that it may hide from the user some inefficiency. Eg. if I have 
 very big scanner.caching, and I manage to process only a few rows in timeout, 
 I will end up with single row being fetched many times (and will not be 
 explicitly notified about this). Could we solve this problem by adding some 
 counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6059) Replaying recovered edits would make deleted data exist again

2012-05-21 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280295#comment-13280295
 ] 

Zhihong Yu commented on HBASE-6059:
---

If majorCompaction is false, we still need to check !kvs.isEmpty(), right ?

 Replaying recovered edits would make deleted data exist again
 -

 Key: HBASE-6059
 URL: https://issues.apache.org/jira/browse/HBASE-6059
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: HBASE-6059-testcase.patch, HBASE-6059.patch


 When we replay recovered edits, we used the minSeqId of Store, It may cause 
 deleted data appeared again.
 Let's see how it happens. Suppose the region with two families(cf1,cf2)
 1.put one data to the region (put r1,cf1:q1,v1)
 2.move the region from server A to server B.
 3.delete the data put by step 1(delete r1)
 4.flush this region.
 5.make major compaction for this region
 6.move the region from server B to server A.
 7.Abort server A
 8.After the region is online, we could get the deleted data(r1,cf1:q1,v1)
 (When we replay recovered edits, we used the minSeqId of Store, because cf2 
 has no store files, so its seqId is 0, so the edit log of put data will be 
 replayed to the region)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-05-21 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280311#comment-13280311
 ] 

Jean-Daniel Cryans commented on HBASE-5778:
---

I don't see how in theory the seek can be a problem when tail'ing a log from 
the start since we read the whole file. The only case where it will need to be 
handled differently is when a region server needs to replicate a log that 
another RS started working on but died. In that case we can just read the file 
up to the last seek position but don't replicate anything.

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Blocker
 Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

2012-05-21 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280315#comment-13280315
 ] 

Jonathan Hsieh commented on HBASE-5757:
---

Zhihong, thanks for pinging me about this.  Jan, thanks for being patient with 
me on this.

The changes look good.  Patch applies to 0.94 and trunk.  I believe the request 
was for getting this into 0.90 -- I'll look into backporting this behavior back 
into that version.



 TableInputFormat should handle as many errors as possible
 -

 Key: HBASE-5757
 URL: https://issues.apache.org/jira/browse/HBASE-5757
 Project: HBase
  Issue Type: Bug
  Components: mapred, mapreduce
Affects Versions: 0.90.6
Reporter: Jan Lukavsky
 Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, 
 HBASE-5757.patch, HBASE-5757.patch


 Prior to HBASE-4196 there was different handling of IOExceptions thrown from 
 scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this 
 handling so that if exception is caught a reconnect is attempted (without 
 bothering the mapred client). After that, HBASE-4269 changed this behavior 
 back, but in both mapred and mapreduce APIs. The question is, is there any 
 reason not to handle all errors that the input format can handle? In other 
 words, why not try to reissue the request after *any* IOException? I see the 
 following disadvantages of current approach
  * the client may see exceptions like LeaseException and 
 ScannerTimeoutException if he fails to process all fetched data in timeout
  * to avoid ScannerTimeoutException the client must raise 
 hbase.regionserver.lease.period
  * timeouts for tasks is aready configured in mapred.task.timeout, so this 
 seems to me a bit redundant, because typically one needs to update both these 
 parameters
  * I don't see any possibility to get rid of LeaseException (this is 
 configured on server side)
 I think all of these issues would be gone, if the DoNotRetryIOException would 
 not be rethrown. -On the other hand, handling errors in InputFormat has 
 disadvantage, that it may hide from the user some inefficiency. Eg. if I have 
 very big scanner.caching, and I manage to process only a few rows in timeout, 
 I will end up with single row being fetched many times (and will not be 
 explicitly notified about this). Could we solve this problem by adding some 
 counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5882) Prcoess RIT on master restart can try assigning the region if the region is found on a dead server instead of waiting for Timeout Monitor

2012-05-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280316#comment-13280316
 ] 

Hudson commented on HBASE-5882:
---

Integrated in HBase-TRUNK #2910 (See 
[https://builds.apache.org/job/HBase-TRUNK/2910/])
HBASE-5882 Prcoess RIT on master restart can try assigning the region if 
the region is found on a dead server instead of waiting for Timeout Monitor 
(Ashutosh) (Revision 1341110)

 Result = FAILURE
ramkrishna : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java


 Prcoess RIT on master restart can try assigning the region if the region is 
 found on a dead server instead of waiting for Timeout Monitor
 -

 Key: HBASE-5882
 URL: https://issues.apache.org/jira/browse/HBASE-5882
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.6, 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: Ashutosh Jindal
 Fix For: 0.96.0, 0.94.1

 Attachments: HBASE-5882_v5.patch, HBASE-5882_v6.patch, 
 hbase_5882.patch, hbase_5882_V2.patch, hbase_5882_V3.patch, 
 hbase_5882_V4.patch


 Currently on  master restart if it tries to do processRIT, any region if 
 found on dead server tries to avoid the nwe assignment so that timeout 
 monitor can take care.
 This case is more prominent if the node is found in RS_ZK_REGION_OPENING 
 state. I think we can handle this by triggering a new assignment with a new 
 plan.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5757) TableInputFormat should handle as many errors as possible

2012-05-21 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh reassigned HBASE-5757:
-

Assignee: Jan Lukavsky

 TableInputFormat should handle as many errors as possible
 -

 Key: HBASE-5757
 URL: https://issues.apache.org/jira/browse/HBASE-5757
 Project: HBase
  Issue Type: Bug
  Components: mapred, mapreduce
Affects Versions: 0.90.6
Reporter: Jan Lukavsky
Assignee: Jan Lukavsky
 Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, 
 HBASE-5757.patch, HBASE-5757.patch


 Prior to HBASE-4196 there was different handling of IOExceptions thrown from 
 scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this 
 handling so that if exception is caught a reconnect is attempted (without 
 bothering the mapred client). After that, HBASE-4269 changed this behavior 
 back, but in both mapred and mapreduce APIs. The question is, is there any 
 reason not to handle all errors that the input format can handle? In other 
 words, why not try to reissue the request after *any* IOException? I see the 
 following disadvantages of current approach
  * the client may see exceptions like LeaseException and 
 ScannerTimeoutException if he fails to process all fetched data in timeout
  * to avoid ScannerTimeoutException the client must raise 
 hbase.regionserver.lease.period
  * timeouts for tasks is aready configured in mapred.task.timeout, so this 
 seems to me a bit redundant, because typically one needs to update both these 
 parameters
  * I don't see any possibility to get rid of LeaseException (this is 
 configured on server side)
 I think all of these issues would be gone, if the DoNotRetryIOException would 
 not be rethrown. -On the other hand, handling errors in InputFormat has 
 disadvantage, that it may hide from the user some inefficiency. Eg. if I have 
 very big scanner.caching, and I manage to process only a few rows in timeout, 
 I will end up with single row being fetched many times (and will not be 
 explicitly notified about this). Could we solve this problem by adding some 
 counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

2012-05-21 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280318#comment-13280318
 ] 

Zhihong Yu commented on HBASE-5757:
---

TestHLog failure was caused by:
{code}
java.net.BindException: Problem binding to localhost/127.0.0.1:41331 : Address 
already in use
at org.apache.hadoop.ipc.Server.bind(Server.java:227)
at org.apache.hadoop.ipc.Server$Listener.init(Server.java:301)
{code}
I ran it locally and it passed.

 TableInputFormat should handle as many errors as possible
 -

 Key: HBASE-5757
 URL: https://issues.apache.org/jira/browse/HBASE-5757
 Project: HBase
  Issue Type: Bug
  Components: mapred, mapreduce
Affects Versions: 0.90.6
Reporter: Jan Lukavsky
Assignee: Jan Lukavsky
 Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, 
 HBASE-5757.patch, HBASE-5757.patch


 Prior to HBASE-4196 there was different handling of IOExceptions thrown from 
 scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this 
 handling so that if exception is caught a reconnect is attempted (without 
 bothering the mapred client). After that, HBASE-4269 changed this behavior 
 back, but in both mapred and mapreduce APIs. The question is, is there any 
 reason not to handle all errors that the input format can handle? In other 
 words, why not try to reissue the request after *any* IOException? I see the 
 following disadvantages of current approach
  * the client may see exceptions like LeaseException and 
 ScannerTimeoutException if he fails to process all fetched data in timeout
  * to avoid ScannerTimeoutException the client must raise 
 hbase.regionserver.lease.period
  * timeouts for tasks is aready configured in mapred.task.timeout, so this 
 seems to me a bit redundant, because typically one needs to update both these 
 parameters
  * I don't see any possibility to get rid of LeaseException (this is 
 configured on server side)
 I think all of these issues would be gone, if the DoNotRetryIOException would 
 not be rethrown. -On the other hand, handling errors in InputFormat has 
 disadvantage, that it may hide from the user some inefficiency. Eg. if I have 
 very big scanner.caching, and I manage to process only a few rows in timeout, 
 I will end up with single row being fetched many times (and will not be 
 explicitly notified about this). Could we solve this problem by adding some 
 counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

2012-05-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280328#comment-13280328
 ] 

Hudson commented on HBASE-5757:
---

Integrated in HBase-TRUNK #2911 (See 
[https://builds.apache.org/job/HBase-TRUNK/2911/])
HBASE-5757 TableInputFormat should handle as many errors as possible (Jan 
Lukavsky) (Revision 1341132)

 Result = FAILURE
jmhsieh : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapred/TableRecordReaderImpl.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapred/TestTableInputFormat.java


 TableInputFormat should handle as many errors as possible
 -

 Key: HBASE-5757
 URL: https://issues.apache.org/jira/browse/HBASE-5757
 Project: HBase
  Issue Type: Bug
  Components: mapred, mapreduce
Affects Versions: 0.90.6
Reporter: Jan Lukavsky
Assignee: Jan Lukavsky
 Fix For: 0.96.0, 0.94.1

 Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, 
 HBASE-5757.patch, HBASE-5757.patch


 Prior to HBASE-4196 there was different handling of IOExceptions thrown from 
 scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this 
 handling so that if exception is caught a reconnect is attempted (without 
 bothering the mapred client). After that, HBASE-4269 changed this behavior 
 back, but in both mapred and mapreduce APIs. The question is, is there any 
 reason not to handle all errors that the input format can handle? In other 
 words, why not try to reissue the request after *any* IOException? I see the 
 following disadvantages of current approach
  * the client may see exceptions like LeaseException and 
 ScannerTimeoutException if he fails to process all fetched data in timeout
  * to avoid ScannerTimeoutException the client must raise 
 hbase.regionserver.lease.period
  * timeouts for tasks is aready configured in mapred.task.timeout, so this 
 seems to me a bit redundant, because typically one needs to update both these 
 parameters
  * I don't see any possibility to get rid of LeaseException (this is 
 configured on server side)
 I think all of these issues would be gone, if the DoNotRetryIOException would 
 not be rethrown. -On the other hand, handling errors in InputFormat has 
 disadvantage, that it may hide from the user some inefficiency. Eg. if I have 
 very big scanner.caching, and I manage to process only a few rows in timeout, 
 I will end up with single row being fetched many times (and will not be 
 explicitly notified about this). Could we solve this problem by adding some 
 counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover

2012-05-21 Thread Enis Soztutar (JIRA)
Enis Soztutar created HBASE-6060:


 Summary: Regions's in OPENING state from failed regionservers 
takes a long time to recover
 Key: HBASE-6060
 URL: https://issues.apache.org/jira/browse/HBASE-6060
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Reporter: Enis Soztutar
Assignee: Enis Soztutar


we have seen a pattern in tests, that the regions are stuck in OPENING state 
for a very long time when the region server who is opening the region fails. My 
understanding of the process: 
 
 - master calls rs to open the region. If rs is offline, a new plan is 
generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
HMaster.assign()
 - RegionServer, starts opening a region, changes the state in znode. But that 
znode is not ephemeral. (see ZkAssign)
 - Rs transitions zk node from OFFLINE to OPENING. See 
OpenRegionHandler.process()
 - rs then opens the region, and changes znode from OPENING to OPENED
 - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
state, and the master just waits for rs to change the region state, but since 
rs is down, that wont happen. 
 - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
against these kind of conditions. It periodically checks (every 10 sec by 
default) the regions in transition to see whether they timedout 
(hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
which explains what you and I are seeing. 
 - ServerShutdownHandler in Master does not reassign regions in OPENING state, 
although it handles other states. 

Lowering that threshold from the configuration is one option, but still I think 
we can do better. 

Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5970) Improve the AssignmentManager#updateTimer and speed up handling opened event

2012-05-21 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280367#comment-13280367
 ] 

nkeywal commented on HBASE-5970:


Hi,

Could you share the logs of the tests? I would be interested to have a look at 
them.
The javadoc for updateTimers says it's not used for bulk assignment, is there a 
mix of regions 'bulk assigned' and other regions?
I see as well in the description that the time was once with 
'retainAssignment=true' and once without. Are the results comparable in both 
cases?

Thank you!

 Improve the AssignmentManager#updateTimer and speed up handling opened event
 

 Key: HBASE-5970
 URL: https://issues.apache.org/jira/browse/HBASE-5970
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: 5970v3.patch, HBASE-5970.patch, HBASE-5970v2.patch, 
 HBASE-5970v3.patch


 We found handing opened event very slow in the environment with lots of 
 regions.
 The problem is the slow AssignmentManager#updateTimer.
 We do the test for bulk assigning 10w (i.e. 100k) regions, the whole process 
 of bulk assigning took 1 hours.
 2012-05-06 20:31:49,201 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning 10 
 region(s) round-robin across 5 server(s)
 2012-05-06 21:26:32,103 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning done
 I think we could do the improvement for the AssignmentManager#updateTimer: 
 Make a thread do this work.
 After the improvement, it took only 4.5mins
 2012-05-07 11:03:36,581 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning 10 
 region(s) across 5 server(s), retainAssignment=true 
 2012-05-07 11:07:57,073 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning done 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

2012-05-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280385#comment-13280385
 ] 

Hudson commented on HBASE-5757:
---

Integrated in HBase-0.94 #205 (See 
[https://builds.apache.org/job/HBase-0.94/205/])
HBASE-5757 TableInputFormat should handle as many errors as possible (Jan 
Lukavsky) (Revision 1341133)

 Result = FAILURE
jmhsieh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/mapred/TableRecordReaderImpl.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/mapred/TestTableInputFormat.java


 TableInputFormat should handle as many errors as possible
 -

 Key: HBASE-5757
 URL: https://issues.apache.org/jira/browse/HBASE-5757
 Project: HBase
  Issue Type: Bug
  Components: mapred, mapreduce
Affects Versions: 0.90.6
Reporter: Jan Lukavsky
Assignee: Jan Lukavsky
 Fix For: 0.96.0, 0.94.1

 Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, 
 HBASE-5757.patch, HBASE-5757.patch


 Prior to HBASE-4196 there was different handling of IOExceptions thrown from 
 scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this 
 handling so that if exception is caught a reconnect is attempted (without 
 bothering the mapred client). After that, HBASE-4269 changed this behavior 
 back, but in both mapred and mapreduce APIs. The question is, is there any 
 reason not to handle all errors that the input format can handle? In other 
 words, why not try to reissue the request after *any* IOException? I see the 
 following disadvantages of current approach
  * the client may see exceptions like LeaseException and 
 ScannerTimeoutException if he fails to process all fetched data in timeout
  * to avoid ScannerTimeoutException the client must raise 
 hbase.regionserver.lease.period
  * timeouts for tasks is aready configured in mapred.task.timeout, so this 
 seems to me a bit redundant, because typically one needs to update both these 
 parameters
  * I don't see any possibility to get rid of LeaseException (this is 
 configured on server side)
 I think all of these issues would be gone, if the DoNotRetryIOException would 
 not be rethrown. -On the other hand, handling errors in InputFormat has 
 disadvantage, that it may hide from the user some inefficiency. Eg. if I have 
 very big scanner.caching, and I manage to process only a few rows in timeout, 
 I will end up with single row being fetched many times (and will not be 
 explicitly notified about this). Could we solve this problem by adding some 
 counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-1749) If RS looses lease, we used to restart by default; reinstitute

2012-05-21 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280387#comment-13280387
 ] 

nkeywal commented on HBASE-1749:


Yes, because of HBASE-5844  HBASE-5939, we now:
- delete immediately the znode when we exit
- restart after a non planned stop.

This is safer than retrying to reinstitute a region server in the same jvm, as 
it removes any memory or static variable effect. In both case we trigger a 
reassignment of the regions however.


 If RS looses lease, we used to restart by default; reinstitute
 --

 Key: HBASE-1749
 URL: https://issues.apache.org/jira/browse/HBASE-1749
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: nkeywal



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Work started] (HBASE-1749) If RS looses lease, we used to restart by default; reinstitute

2012-05-21 Thread nkeywal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-1749 started by nkeywal.

 If RS looses lease, we used to restart by default; reinstitute
 --

 Key: HBASE-1749
 URL: https://issues.apache.org/jira/browse/HBASE-1749
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: nkeywal



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-1749) If RS looses lease, we used to restart by default; reinstitute

2012-05-21 Thread nkeywal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal resolved HBASE-1749.


Resolution: Duplicate

 If RS looses lease, we used to restart by default; reinstitute
 --

 Key: HBASE-1749
 URL: https://issues.apache.org/jira/browse/HBASE-1749
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: nkeywal



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5757) TableInputFormat should handle as many errors as possible

2012-05-21 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-5757:
--

Attachment: hbase-5757-92.patch

hbase-5757-92.patch is for 0.92 and 0.90 versions.  Underlaying metrics have 
changed so it does not update metrics like in 0.94 or trunk/0.96.  It does 
however include the updated tests that demonstrated updated semantics.

 TableInputFormat should handle as many errors as possible
 -

 Key: HBASE-5757
 URL: https://issues.apache.org/jira/browse/HBASE-5757
 Project: HBase
  Issue Type: Bug
  Components: mapred, mapreduce
Affects Versions: 0.90.6
Reporter: Jan Lukavsky
Assignee: Jan Lukavsky
 Fix For: 0.96.0, 0.94.1

 Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, 
 HBASE-5757.patch, HBASE-5757.patch, hbase-5757-92.patch


 Prior to HBASE-4196 there was different handling of IOExceptions thrown from 
 scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this 
 handling so that if exception is caught a reconnect is attempted (without 
 bothering the mapred client). After that, HBASE-4269 changed this behavior 
 back, but in both mapred and mapreduce APIs. The question is, is there any 
 reason not to handle all errors that the input format can handle? In other 
 words, why not try to reissue the request after *any* IOException? I see the 
 following disadvantages of current approach
  * the client may see exceptions like LeaseException and 
 ScannerTimeoutException if he fails to process all fetched data in timeout
  * to avoid ScannerTimeoutException the client must raise 
 hbase.regionserver.lease.period
  * timeouts for tasks is aready configured in mapred.task.timeout, so this 
 seems to me a bit redundant, because typically one needs to update both these 
 parameters
  * I don't see any possibility to get rid of LeaseException (this is 
 configured on server side)
 I think all of these issues would be gone, if the DoNotRetryIOException would 
 not be rethrown. -On the other hand, handling errors in InputFormat has 
 disadvantage, that it may hide from the user some inefficiency. Eg. if I have 
 very big scanner.caching, and I manage to process only a few rows in timeout, 
 I will end up with single row being fetched many times (and will not be 
 explicitly notified about this). Could we solve this problem by adding some 
 counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

2012-05-21 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280398#comment-13280398
 ] 

Jonathan Hsieh commented on HBASE-5757:
---

Zhihong, Jan, if the 0.92/0.90 versions looks good to you I will commit.

 TableInputFormat should handle as many errors as possible
 -

 Key: HBASE-5757
 URL: https://issues.apache.org/jira/browse/HBASE-5757
 Project: HBase
  Issue Type: Bug
  Components: mapred, mapreduce
Affects Versions: 0.90.6
Reporter: Jan Lukavsky
Assignee: Jan Lukavsky
 Fix For: 0.96.0, 0.94.1

 Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, 
 HBASE-5757.patch, HBASE-5757.patch, hbase-5757-92.patch


 Prior to HBASE-4196 there was different handling of IOExceptions thrown from 
 scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this 
 handling so that if exception is caught a reconnect is attempted (without 
 bothering the mapred client). After that, HBASE-4269 changed this behavior 
 back, but in both mapred and mapreduce APIs. The question is, is there any 
 reason not to handle all errors that the input format can handle? In other 
 words, why not try to reissue the request after *any* IOException? I see the 
 following disadvantages of current approach
  * the client may see exceptions like LeaseException and 
 ScannerTimeoutException if he fails to process all fetched data in timeout
  * to avoid ScannerTimeoutException the client must raise 
 hbase.regionserver.lease.period
  * timeouts for tasks is aready configured in mapred.task.timeout, so this 
 seems to me a bit redundant, because typically one needs to update both these 
 parameters
  * I don't see any possibility to get rid of LeaseException (this is 
 configured on server side)
 I think all of these issues would be gone, if the DoNotRetryIOException would 
 not be rethrown. -On the other hand, handling errors in InputFormat has 
 disadvantage, that it may hide from the user some inefficiency. Eg. if I have 
 very big scanner.caching, and I manage to process only a few rows in timeout, 
 I will end up with single row being fetched many times (and will not be 
 explicitly notified about this). Could we solve this problem by adding some 
 counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

2012-05-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280412#comment-13280412
 ] 

Hadoop QA commented on HBASE-5757:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12528472/hbase-5757-92.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1946//console

This message is automatically generated.

 TableInputFormat should handle as many errors as possible
 -

 Key: HBASE-5757
 URL: https://issues.apache.org/jira/browse/HBASE-5757
 Project: HBase
  Issue Type: Bug
  Components: mapred, mapreduce
Affects Versions: 0.90.6
Reporter: Jan Lukavsky
Assignee: Jan Lukavsky
 Fix For: 0.96.0, 0.94.1

 Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, 
 HBASE-5757.patch, HBASE-5757.patch, hbase-5757-92.patch


 Prior to HBASE-4196 there was different handling of IOExceptions thrown from 
 scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this 
 handling so that if exception is caught a reconnect is attempted (without 
 bothering the mapred client). After that, HBASE-4269 changed this behavior 
 back, but in both mapred and mapreduce APIs. The question is, is there any 
 reason not to handle all errors that the input format can handle? In other 
 words, why not try to reissue the request after *any* IOException? I see the 
 following disadvantages of current approach
  * the client may see exceptions like LeaseException and 
 ScannerTimeoutException if he fails to process all fetched data in timeout
  * to avoid ScannerTimeoutException the client must raise 
 hbase.regionserver.lease.period
  * timeouts for tasks is aready configured in mapred.task.timeout, so this 
 seems to me a bit redundant, because typically one needs to update both these 
 parameters
  * I don't see any possibility to get rid of LeaseException (this is 
 configured on server side)
 I think all of these issues would be gone, if the DoNotRetryIOException would 
 not be rethrown. -On the other hand, handling errors in InputFormat has 
 disadvantage, that it may hide from the user some inefficiency. Eg. if I have 
 very big scanner.caching, and I manage to process only a few rows in timeout, 
 I will end up with single row being fetched many times (and will not be 
 explicitly notified about this). Could we solve this problem by adding some 
 counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6061) Fix ACL Admin Table inconsistent permission check

2012-05-21 Thread Matteo Bertozzi (JIRA)
Matteo Bertozzi created HBASE-6061:
--

 Summary: Fix ACL Admin Table inconsistent permission check
 Key: HBASE-6061
 URL: https://issues.apache.org/jira/browse/HBASE-6061
 Project: HBase
  Issue Type: Sub-task
  Components: security
Affects Versions: 0.94.0, 0.92.1, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Fix For: 0.92.2, 0.96.0, 0.94.1


the requirePermission() check for admin operation on a table is currently 
inconsistent.

Table Owner with CREATE rights (that means, the owner has created that table) 
can enable/disable and delete the table but needs ADMIN rights to 
add/remove/modify a column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6061) Fix ACL Admin Table inconsistent permission check

2012-05-21 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-6061:
---

Attachment: HBASE-6061-v0.patch

 Fix ACL Admin Table inconsistent permission check
 ---

 Key: HBASE-6061
 URL: https://issues.apache.org/jira/browse/HBASE-6061
 Project: HBase
  Issue Type: Sub-task
  Components: security
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
  Labels: acl, security
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6061-v0.patch


 the requirePermission() check for admin operation on a table is currently 
 inconsistent.
 Table Owner with CREATE rights (that means, the owner has created that table) 
 can enable/disable and delete the table but needs ADMIN rights to 
 add/remove/modify a column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

2012-05-21 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280433#comment-13280433
 ] 

Zhihong Yu commented on HBASE-5757:
---

TestTableInputFormat passed in 0.92 with 0.92 patch.

+1 from me.

 TableInputFormat should handle as many errors as possible
 -

 Key: HBASE-5757
 URL: https://issues.apache.org/jira/browse/HBASE-5757
 Project: HBase
  Issue Type: Bug
  Components: mapred, mapreduce
Affects Versions: 0.90.6
Reporter: Jan Lukavsky
Assignee: Jan Lukavsky
 Fix For: 0.96.0, 0.94.1

 Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, 
 HBASE-5757.patch, HBASE-5757.patch, hbase-5757-92.patch


 Prior to HBASE-4196 there was different handling of IOExceptions thrown from 
 scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this 
 handling so that if exception is caught a reconnect is attempted (without 
 bothering the mapred client). After that, HBASE-4269 changed this behavior 
 back, but in both mapred and mapreduce APIs. The question is, is there any 
 reason not to handle all errors that the input format can handle? In other 
 words, why not try to reissue the request after *any* IOException? I see the 
 following disadvantages of current approach
  * the client may see exceptions like LeaseException and 
 ScannerTimeoutException if he fails to process all fetched data in timeout
  * to avoid ScannerTimeoutException the client must raise 
 hbase.regionserver.lease.period
  * timeouts for tasks is aready configured in mapred.task.timeout, so this 
 seems to me a bit redundant, because typically one needs to update both these 
 parameters
  * I don't see any possibility to get rid of LeaseException (this is 
 configured on server side)
 I think all of these issues would be gone, if the DoNotRetryIOException would 
 not be rethrown. -On the other hand, handling errors in InputFormat has 
 disadvantage, that it may hide from the user some inefficiency. Eg. if I have 
 very big scanner.caching, and I manage to process only a few rows in timeout, 
 I will end up with single row being fetched many times (and will not be 
 explicitly notified about this). Could we solve this problem by adding some 
 counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6036) Add Cluster-level PB-based calls to HMasterInterface (minus file-format related calls)

2012-05-21 Thread Gregory Chanan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280436#comment-13280436
 ] 

Gregory Chanan commented on HBASE-6036:
---

These replication tests fail even without this patch applied, so I think this 
is good to go.

 Add Cluster-level PB-based calls to HMasterInterface (minus file-format 
 related calls)
 --

 Key: HBASE-6036
 URL: https://issues.apache.org/jira/browse/HBASE-6036
 Project: HBase
  Issue Type: Task
  Components: ipc, master, migration
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Fix For: 0.96.0

 Attachments: HBASE-6036-v2.patch, HBASE-6036.patch


 This should be a subtask of HBASE-5445, but since that is a subtask, I can't 
 also make this a subtask (apparently).
 Convert the cluster-level calls that do not touch the file-format related 
 calls (see HBASE-5453).  These are:
 IsMasterRunning
 Shutdown
 StopMaster
 Balance
 LoadBalancerIs (was synchronousBalanceSwitch/balanceSwitch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6043) Add Increment Coalescing in thrift.

2012-05-21 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280440#comment-13280440
 ] 

Elliott Clark commented on HBASE-6043:
--

Not sure why Phabricator isn't posting diffs but the review is up at 
https://reviews.facebook.net/D3315.

 Add Increment Coalescing in thrift.
 ---

 Key: HBASE-6043
 URL: https://issues.apache.org/jira/browse/HBASE-6043
 Project: HBase
  Issue Type: Improvement
Reporter: Elliott Clark
Assignee: Elliott Clark

 Since the thrift server uses the client api reducing the number of rpc's 
 greatly speeds up increments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6043) Add Increment Coalescing in thrift.

2012-05-21 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-6043:
-

Attachment: HBASE-6043-0.patch

 Add Increment Coalescing in thrift.
 ---

 Key: HBASE-6043
 URL: https://issues.apache.org/jira/browse/HBASE-6043
 Project: HBase
  Issue Type: Improvement
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-6043-0.patch


 Since the thrift server uses the client api reducing the number of rpc's 
 greatly speeds up increments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6043) Add Increment Coalescing in thrift.

2012-05-21 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-6043:
-

Status: Patch Available  (was: Open)

 Add Increment Coalescing in thrift.
 ---

 Key: HBASE-6043
 URL: https://issues.apache.org/jira/browse/HBASE-6043
 Project: HBase
  Issue Type: Improvement
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-6043-0.patch


 Since the thrift server uses the client api reducing the number of rpc's 
 greatly speeds up increments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6061) Fix ACL Admin Table inconsistent permission check

2012-05-21 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280442#comment-13280442
 ] 

Zhihong Yu commented on HBASE-6061:
---

Minor comment:
{code}
+   * If current user is the table owner, and has CREATE permission is a table 
admin,
{code}
', and has CREATE permission is a table admin' - ' and has CREATE permission, 
then he/she has table admin permission.' (wrap if line is too long)

 Fix ACL Admin Table inconsistent permission check
 ---

 Key: HBASE-6061
 URL: https://issues.apache.org/jira/browse/HBASE-6061
 Project: HBase
  Issue Type: Sub-task
  Components: security
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
  Labels: acl, security
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6061-v0.patch


 the requirePermission() check for admin operation on a table is currently 
 inconsistent.
 Table Owner with CREATE rights (that means, the owner has created that table) 
 can enable/disable and delete the table but needs ADMIN rights to 
 add/remove/modify a column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6061) Fix ACL Admin Table inconsistent permission check

2012-05-21 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280448#comment-13280448
 ] 

Andrew Purtell commented on HBASE-6061:
---

+1 yes, this is better, since the direction here is to let the creator take any 
action on the table, pulling up the logic to a small helper method is cleaner, 
fixes the issue, and will avoid error going forward.

 Fix ACL Admin Table inconsistent permission check
 ---

 Key: HBASE-6061
 URL: https://issues.apache.org/jira/browse/HBASE-6061
 Project: HBase
  Issue Type: Sub-task
  Components: security
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
  Labels: acl, security
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6061-v0.patch


 the requirePermission() check for admin operation on a table is currently 
 inconsistent.
 Table Owner with CREATE rights (that means, the owner has created that table) 
 can enable/disable and delete the table but needs ADMIN rights to 
 add/remove/modify a column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover

2012-05-21 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280453#comment-13280453
 ] 

Andrew Purtell commented on HBASE-6060:
---

The TimeoutMonitor timeout was increased to 30 minutes in HBASE-4126.

 Regions's in OPENING state from failed regionservers takes a long time to 
 recover
 -

 Key: HBASE-6060
 URL: https://issues.apache.org/jira/browse/HBASE-6060
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Reporter: Enis Soztutar
Assignee: Enis Soztutar

 we have seen a pattern in tests, that the regions are stuck in OPENING state 
 for a very long time when the region server who is opening the region fails. 
 My understanding of the process: 
  
  - master calls rs to open the region. If rs is offline, a new plan is 
 generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
 master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
 HMaster.assign()
  - RegionServer, starts opening a region, changes the state in znode. But 
 that znode is not ephemeral. (see ZkAssign)
  - Rs transitions zk node from OFFLINE to OPENING. See 
 OpenRegionHandler.process()
  - rs then opens the region, and changes znode from OPENING to OPENED
  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
 state, and the master just waits for rs to change the region state, but since 
 rs is down, that wont happen. 
  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
 against these kind of conditions. It periodically checks (every 10 sec by 
 default) the regions in transition to see whether they timedout 
 (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
 which explains what you and I are seeing. 
  - ServerShutdownHandler in Master does not reassign regions in OPENING 
 state, although it handles other states. 
 Lowering that threshold from the configuration is one option, but still I 
 think we can do better. 
 Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6061) Fix ACL Admin Table inconsistent permission check

2012-05-21 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280459#comment-13280459
 ] 

Matteo Bertozzi commented on HBASE-6061:


Not related but maybe we can squeeze into this one... preCheckAndPut() and 
preCheckAndDelete() checks for READ when they also want to WRITE... 
for me checking for WRITE permission is the right thing... what do you say 
about that? keep READ, replace with WRITE.. open new jira?

 Fix ACL Admin Table inconsistent permission check
 ---

 Key: HBASE-6061
 URL: https://issues.apache.org/jira/browse/HBASE-6061
 Project: HBase
  Issue Type: Sub-task
  Components: security
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
  Labels: acl, security
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6061-v0.patch


 the requirePermission() check for admin operation on a table is currently 
 inconsistent.
 Table Owner with CREATE rights (that means, the owner has created that table) 
 can enable/disable and delete the table but needs ADMIN rights to 
 add/remove/modify a column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6061) Fix ACL Admin Table inconsistent permission check

2012-05-21 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-6061:
---

Attachment: HBASE-6061-v1.patch

 Fix ACL Admin Table inconsistent permission check
 ---

 Key: HBASE-6061
 URL: https://issues.apache.org/jira/browse/HBASE-6061
 Project: HBase
  Issue Type: Sub-task
  Components: security
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
  Labels: acl, security
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6061-v0.patch, HBASE-6061-v1.patch


 the requirePermission() check for admin operation on a table is currently 
 inconsistent.
 Table Owner with CREATE rights (that means, the owner has created that table) 
 can enable/disable and delete the table but needs ADMIN rights to 
 add/remove/modify a column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6061) Fix ACL Admin Table inconsistent permission check

2012-05-21 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-6061:
---

Attachment: (was: HBASE-6061-v1.patch)

 Fix ACL Admin Table inconsistent permission check
 ---

 Key: HBASE-6061
 URL: https://issues.apache.org/jira/browse/HBASE-6061
 Project: HBase
  Issue Type: Sub-task
  Components: security
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
  Labels: acl, security
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6061-v0.patch, HBASE-6061-v1.patch


 the requirePermission() check for admin operation on a table is currently 
 inconsistent.
 Table Owner with CREATE rights (that means, the owner has created that table) 
 can enable/disable and delete the table but needs ADMIN rights to 
 add/remove/modify a column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6061) Fix ACL Admin Table inconsistent permission check

2012-05-21 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-6061:
---

Attachment: HBASE-6061-v1.patch

 Fix ACL Admin Table inconsistent permission check
 ---

 Key: HBASE-6061
 URL: https://issues.apache.org/jira/browse/HBASE-6061
 Project: HBase
  Issue Type: Sub-task
  Components: security
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
  Labels: acl, security
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6061-v0.patch, HBASE-6061-v1.patch


 the requirePermission() check for admin operation on a table is currently 
 inconsistent.
 Table Owner with CREATE rights (that means, the owner has created that table) 
 can enable/disable and delete the table but needs ADMIN rights to 
 add/remove/modify a column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6061) Fix ACL Admin Table inconsistent permission check

2012-05-21 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280466#comment-13280466
 ] 

Andrew Purtell commented on HBASE-6061:
---

bq. Not related but maybe we can squeeze into this one... preCheckAndPut() and 
preCheckAndDelete() checks for READ when they also want to WRITE

Yes, new jira.

 Fix ACL Admin Table inconsistent permission check
 ---

 Key: HBASE-6061
 URL: https://issues.apache.org/jira/browse/HBASE-6061
 Project: HBase
  Issue Type: Sub-task
  Components: security
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
  Labels: acl, security
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6061-v0.patch, HBASE-6061-v1.patch


 the requirePermission() check for admin operation on a table is currently 
 inconsistent.
 Table Owner with CREATE rights (that means, the owner has created that table) 
 can enable/disable and delete the table but needs ADMIN rights to 
 add/remove/modify a column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover

2012-05-21 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280469#comment-13280469
 ] 

Enis Soztutar commented on HBASE-6060:
--

Thanks Andrew for the pointer. Agreed that lowering the timeout can have deeper 
impacts. We should fix the issue properly instead. 

 Regions's in OPENING state from failed regionservers takes a long time to 
 recover
 -

 Key: HBASE-6060
 URL: https://issues.apache.org/jira/browse/HBASE-6060
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Reporter: Enis Soztutar
Assignee: Enis Soztutar

 we have seen a pattern in tests, that the regions are stuck in OPENING state 
 for a very long time when the region server who is opening the region fails. 
 My understanding of the process: 
  
  - master calls rs to open the region. If rs is offline, a new plan is 
 generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
 master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
 HMaster.assign()
  - RegionServer, starts opening a region, changes the state in znode. But 
 that znode is not ephemeral. (see ZkAssign)
  - Rs transitions zk node from OFFLINE to OPENING. See 
 OpenRegionHandler.process()
  - rs then opens the region, and changes znode from OPENING to OPENED
  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
 state, and the master just waits for rs to change the region state, but since 
 rs is down, that wont happen. 
  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
 against these kind of conditions. It periodically checks (every 10 sec by 
 default) the regions in transition to see whether they timedout 
 (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
 which explains what you and I are seeing. 
  - ServerShutdownHandler in Master does not reassign regions in OPENING 
 state, although it handles other states. 
 Lowering that threshold from the configuration is one option, but still I 
 think we can do better. 
 Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6062) preCheckAndPut/Delete() checks for READ when also a WRITE is performed

2012-05-21 Thread Matteo Bertozzi (JIRA)
Matteo Bertozzi created HBASE-6062:
--

 Summary: preCheckAndPut/Delete() checks for READ when also a WRITE 
is performed
 Key: HBASE-6062
 URL: https://issues.apache.org/jira/browse/HBASE-6062
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 0.94.0, 0.92.1, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Fix For: 0.92.2, 0.96.0, 0.94.1


preCheckAndPut() and preCheckAndDelete() checks for READ when they also want to 
WRITE... 
for me checking for WRITE permission is the right thing... 
what do you say about that? keep READ, replace with WRITE?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6062) preCheckAndPut/Delete() checks for READ when also a WRITE is performed

2012-05-21 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-6062:
---

Attachment: HBASE-6062-v0.patch

 preCheckAndPut/Delete() checks for READ when also a WRITE is performed
 --

 Key: HBASE-6062
 URL: https://issues.apache.org/jira/browse/HBASE-6062
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
  Labels: acl, security
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6062-v0.patch


 preCheckAndPut() and preCheckAndDelete() checks for READ when they also want 
 to WRITE... 
 for me checking for WRITE permission is the right thing... 
 what do you say about that? keep READ, replace with WRITE?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6062) preCheckAndPut/Delete() checks for READ when also a WRITE is performed

2012-05-21 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-6062:
---

Status: Patch Available  (was: Open)

 preCheckAndPut/Delete() checks for READ when also a WRITE is performed
 --

 Key: HBASE-6062
 URL: https://issues.apache.org/jira/browse/HBASE-6062
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 0.94.0, 0.92.1, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
  Labels: acl, security
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6062-v0.patch


 preCheckAndPut() and preCheckAndDelete() checks for READ when they also want 
 to WRITE... 
 for me checking for WRITE permission is the right thing... 
 what do you say about that? keep READ, replace with WRITE?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6044) copytable: remove rs.* parameters

2012-05-21 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-6044:
--

Attachment: hbase-6044-92.patch

minor tweak for 0.92

 copytable: remove rs.* parameters
 -

 Key: HBASE-6044
 URL: https://issues.apache.org/jira/browse/HBASE-6044
 Project: HBase
  Issue Type: New Feature
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: hbase-6044-92.patch, hbase-6044-v2.patch, 
 hbase-6044-v3.patch, hbase-6044-v4.patch, hbase-6044.patch


 In discussion of HBASE-6013 it was suggested that we remove these arguments 
 from 0.92+ (but keep in 0.90)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6044) copytable: remove rs.* parameters

2012-05-21 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-6044:
--

   Resolution: Fixed
Fix Version/s: 0.94.1
   0.96.0
   0.92.2
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed to 0.92/0.94/0.96-trunk.  Thanks for review stack!

 copytable: remove rs.* parameters
 -

 Key: HBASE-6044
 URL: https://issues.apache.org/jira/browse/HBASE-6044
 Project: HBase
  Issue Type: New Feature
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: hbase-6044-92.patch, hbase-6044-v2.patch, 
 hbase-6044-v3.patch, hbase-6044-v4.patch, hbase-6044.patch


 In discussion of HBASE-6013 it was suggested that we remove these arguments 
 from 0.92+ (but keep in 0.90)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5757) TableInputFormat should handle as many errors as possible

2012-05-21 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-5757:
--

   Resolution: Fixed
Fix Version/s: 0.92.2
   0.90.7
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Commited the 0.92 version to 0.92/0.90 branches.  Thanks for review Ted, thanks 
for patches Jan!

 TableInputFormat should handle as many errors as possible
 -

 Key: HBASE-5757
 URL: https://issues.apache.org/jira/browse/HBASE-5757
 Project: HBase
  Issue Type: Bug
  Components: mapred, mapreduce
Affects Versions: 0.90.6
Reporter: Jan Lukavsky
Assignee: Jan Lukavsky
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, 
 HBASE-5757.patch, HBASE-5757.patch, hbase-5757-92.patch


 Prior to HBASE-4196 there was different handling of IOExceptions thrown from 
 scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this 
 handling so that if exception is caught a reconnect is attempted (without 
 bothering the mapred client). After that, HBASE-4269 changed this behavior 
 back, but in both mapred and mapreduce APIs. The question is, is there any 
 reason not to handle all errors that the input format can handle? In other 
 words, why not try to reissue the request after *any* IOException? I see the 
 following disadvantages of current approach
  * the client may see exceptions like LeaseException and 
 ScannerTimeoutException if he fails to process all fetched data in timeout
  * to avoid ScannerTimeoutException the client must raise 
 hbase.regionserver.lease.period
  * timeouts for tasks is aready configured in mapred.task.timeout, so this 
 seems to me a bit redundant, because typically one needs to update both these 
 parameters
  * I don't see any possibility to get rid of LeaseException (this is 
 configured on server side)
 I think all of these issues would be gone, if the DoNotRetryIOException would 
 not be rethrown. -On the other hand, handling errors in InputFormat has 
 disadvantage, that it may hide from the user some inefficiency. Eg. if I have 
 very big scanner.caching, and I manage to process only a few rows in timeout, 
 I will end up with single row being fetched many times (and will not be 
 explicitly notified about this). Could we solve this problem by adding some 
 counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6061) Fix ACL Admin Table inconsistent permission check

2012-05-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280508#comment-13280508
 ] 

Hadoop QA commented on HBASE-6061:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12528475/HBASE-6061-v0.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 33 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.replication.TestReplication
  org.apache.hadoop.hbase.replication.TestMultiSlaveReplication
  org.apache.hadoop.hbase.replication.TestMasterReplication

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1947//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1947//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1947//console

This message is automatically generated.

 Fix ACL Admin Table inconsistent permission check
 ---

 Key: HBASE-6061
 URL: https://issues.apache.org/jira/browse/HBASE-6061
 Project: HBase
  Issue Type: Sub-task
  Components: security
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
  Labels: acl, security
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6061-v0.patch, HBASE-6061-v1.patch


 the requirePermission() check for admin operation on a table is currently 
 inconsistent.
 Table Owner with CREATE rights (that means, the owner has created that table) 
 can enable/disable and delete the table but needs ADMIN rights to 
 add/remove/modify a column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6041) NullPointerException prevents the master from starting up

2012-05-21 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280510#comment-13280510
 ] 

Zhihong Yu commented on HBASE-6041:
---

Patch looks good.
Do all tests pass ?

 NullPointerException prevents the master from starting up
 -

 Key: HBASE-6041
 URL: https://issues.apache.org/jira/browse/HBASE-6041
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.90.7

 Attachments: hbase-6041.patch


 This is 0.90 only.
 2012-05-04 14:27:57,913 FATAL org.apache.hadoop.hbase.master.HMaster: 
 Unhandled exception. Starting shutdown.
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:731)
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:215)
   at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:419)
   at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:293)
 2012-05-04 14:27:57,914 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
 2012-05-04 14:27:57,915 INFO org.apache.hadoop.ipc.HBaseServer: Stopping 
 server on 1433

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6061) Fix ACL Admin Table inconsistent permission check

2012-05-21 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280516#comment-13280516
 ] 

Zhihong Yu commented on HBASE-6061:
---

@Matteo:
Do you mind providing patch for 0.92 / 0.94 ?
The directory structure has changed.

 Fix ACL Admin Table inconsistent permission check
 ---

 Key: HBASE-6061
 URL: https://issues.apache.org/jira/browse/HBASE-6061
 Project: HBase
  Issue Type: Sub-task
  Components: security
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
  Labels: acl, security
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6061-v0.patch, HBASE-6061-v1.patch


 the requirePermission() check for admin operation on a table is currently 
 inconsistent.
 Table Owner with CREATE rights (that means, the owner has created that table) 
 can enable/disable and delete the table but needs ADMIN rights to 
 add/remove/modify a column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6033) Adding some fuction to check if a table/region is in compaction

2012-05-21 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280518#comment-13280518
 ] 

Jimmy Xiang commented on HBASE-6033:


Here is the review request:

https://reviews.apache.org/r/5167/

 Adding some fuction to check if a table/region is in compaction
 ---

 Key: HBASE-6033
 URL: https://issues.apache.org/jira/browse/HBASE-6033
 Project: HBase
  Issue Type: New Feature
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: table_ui.png


 This feature will be helpful to find out if a major compaction is going on.
 We can show if it is in any minor compaction too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6057) Change some tests categories to optimize build time

2012-05-21 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-6057:
--

   Resolution: Fixed
Fix Version/s: 0.96.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Ran the small tests with the patch, works ok. Committed.

 Change some tests categories to optimize build time
 ---

 Key: HBASE-6057
 URL: https://issues.apache.org/jira/browse/HBASE-6057
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6057.v1.patch


 Some tests categorized as small takes more than 15s: it's better if they are 
 executed in // with the medium tests.
 Some medium tests last less than 2s: it's better to have then executed with 
 the small tests: we save a fork.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6061) Fix ACL Admin Table inconsistent permission check

2012-05-21 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-6061:
---

Attachment: HBASE-6061-0.92.patch

Attached the 0.92 patch, also good for 0.94

 Fix ACL Admin Table inconsistent permission check
 ---

 Key: HBASE-6061
 URL: https://issues.apache.org/jira/browse/HBASE-6061
 Project: HBase
  Issue Type: Sub-task
  Components: security
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
  Labels: acl, security
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6061-0.92.patch, HBASE-6061-v0.patch, 
 HBASE-6061-v1.patch


 the requirePermission() check for admin operation on a table is currently 
 inconsistent.
 Table Owner with CREATE rights (that means, the owner has created that table) 
 can enable/disable and delete the table but needs ADMIN rights to 
 add/remove/modify a column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6062) preCheckAndPut/Delete() checks for READ when also a WRITE is performed

2012-05-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280534#comment-13280534
 ] 

Hadoop QA commented on HBASE-6062:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12528497/HBASE-6062-v0.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 33 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.coprocessor.TestMasterObserver
  org.apache.hadoop.hbase.replication.TestReplication
  org.apache.hadoop.hbase.replication.TestMultiSlaveReplication
  org.apache.hadoop.hbase.replication.TestMasterReplication

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1949//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1949//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1949//console

This message is automatically generated.

 preCheckAndPut/Delete() checks for READ when also a WRITE is performed
 --

 Key: HBASE-6062
 URL: https://issues.apache.org/jira/browse/HBASE-6062
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
  Labels: acl, security
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6062-v0.patch


 preCheckAndPut() and preCheckAndDelete() checks for READ when they also want 
 to WRITE... 
 for me checking for WRITE permission is the right thing... 
 what do you say about that? keep READ, replace with WRITE?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6044) copytable: remove rs.* parameters

2012-05-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280535#comment-13280535
 ] 

Hudson commented on HBASE-6044:
---

Integrated in HBase-TRUNK #2912 (See 
[https://builds.apache.org/job/HBase-TRUNK/2912/])
HBASE-6044 copytable: remove rs.* parameters (Revision 1341200)

 Result = FAILURE
jmhsieh : 
Files : 
* /hbase/trunk/src/docbkx/ops_mgt.xml
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/CopyTable.java


 copytable: remove rs.* parameters
 -

 Key: HBASE-6044
 URL: https://issues.apache.org/jira/browse/HBASE-6044
 Project: HBase
  Issue Type: New Feature
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: hbase-6044-92.patch, hbase-6044-v2.patch, 
 hbase-6044-v3.patch, hbase-6044-v4.patch, hbase-6044.patch


 In discussion of HBASE-6013 it was suggested that we remove these arguments 
 from 0.92+ (but keep in 0.90)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6061) Fix ACL Admin Table inconsistent permission check

2012-05-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280540#comment-13280540
 ] 

Hadoop QA commented on HBASE-6061:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12528491/HBASE-6061-v1.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 33 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.replication.TestReplication
  org.apache.hadoop.hbase.replication.TestMultiSlaveReplication
  org.apache.hadoop.hbase.replication.TestMasterReplication
  org.apache.hadoop.hbase.master.TestSplitLogManager

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1950//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1950//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1950//console

This message is automatically generated.

 Fix ACL Admin Table inconsistent permission check
 ---

 Key: HBASE-6061
 URL: https://issues.apache.org/jira/browse/HBASE-6061
 Project: HBase
  Issue Type: Sub-task
  Components: security
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
  Labels: acl, security
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6061-0.92.patch, HBASE-6061-v0.patch, 
 HBASE-6061-v1.patch


 the requirePermission() check for admin operation on a table is currently 
 inconsistent.
 Table Owner with CREATE rights (that means, the owner has created that table) 
 can enable/disable and delete the table but needs ADMIN rights to 
 add/remove/modify a column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6061) Fix ACL Admin Table inconsistent permission check

2012-05-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280583#comment-13280583
 ] 

Hadoop QA commented on HBASE-6061:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12528508/HBASE-6061-0.92.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 33 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.replication.TestReplication
  
org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
  org.apache.hadoop.hbase.replication.TestMultiSlaveReplication
  org.apache.hadoop.hbase.replication.TestMasterReplication

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1951//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1951//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1951//console

This message is automatically generated.

 Fix ACL Admin Table inconsistent permission check
 ---

 Key: HBASE-6061
 URL: https://issues.apache.org/jira/browse/HBASE-6061
 Project: HBase
  Issue Type: Sub-task
  Components: security
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
  Labels: acl, security
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6061-0.92.patch, HBASE-6061-v0.patch, 
 HBASE-6061-v1.patch


 the requirePermission() check for admin operation on a table is currently 
 inconsistent.
 Table Owner with CREATE rights (that means, the owner has created that table) 
 can enable/disable and delete the table but needs ADMIN rights to 
 add/remove/modify a column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6062) preCheckAndPut/Delete() checks for READ when also a WRITE is performed

2012-05-21 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280594#comment-13280594
 ] 

Andrew Purtell commented on HBASE-6062:
---

Patch looks good but please make sure TestAccessController includes tests for 
the change. 

 preCheckAndPut/Delete() checks for READ when also a WRITE is performed
 --

 Key: HBASE-6062
 URL: https://issues.apache.org/jira/browse/HBASE-6062
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
  Labels: acl, security
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6062-v0.patch


 preCheckAndPut() and preCheckAndDelete() checks for READ when they also want 
 to WRITE... 
 for me checking for WRITE permission is the right thing... 
 what do you say about that? keep READ, replace with WRITE?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6041) NullPointerException prevents the master from starting up

2012-05-21 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280595#comment-13280595
 ] 

Jimmy Xiang commented on HBASE-6041:


Yes, all tests pass. Thanks.

 NullPointerException prevents the master from starting up
 -

 Key: HBASE-6041
 URL: https://issues.apache.org/jira/browse/HBASE-6041
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.90.7

 Attachments: hbase-6041.patch


 This is 0.90 only.
 2012-05-04 14:27:57,913 FATAL org.apache.hadoop.hbase.master.HMaster: 
 Unhandled exception. Starting shutdown.
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:731)
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:215)
   at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:419)
   at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:293)
 2012-05-04 14:27:57,914 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
 2012-05-04 14:27:57,915 INFO org.apache.hadoop.ipc.HBaseServer: Stopping 
 server on 1433

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6041) NullPointerException prevents the master from starting up

2012-05-21 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280600#comment-13280600
 ] 

Zhihong Yu commented on HBASE-6041:
---

Integrated to 0.90 branch.

Thanks for the patch, Jimmy.

 NullPointerException prevents the master from starting up
 -

 Key: HBASE-6041
 URL: https://issues.apache.org/jira/browse/HBASE-6041
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.90.7

 Attachments: hbase-6041.patch


 This is 0.90 only.
 2012-05-04 14:27:57,913 FATAL org.apache.hadoop.hbase.master.HMaster: 
 Unhandled exception. Starting shutdown.
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:731)
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:215)
   at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:419)
   at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:293)
 2012-05-04 14:27:57,914 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
 2012-05-04 14:27:57,915 INFO org.apache.hadoop.ipc.HBaseServer: Stopping 
 server on 1433

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6043) Add Increment Coalescing in thrift.

2012-05-21 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-6043:
-

Attachment: HBASE-6043-1.patch

 Add Increment Coalescing in thrift.
 ---

 Key: HBASE-6043
 URL: https://issues.apache.org/jira/browse/HBASE-6043
 Project: HBase
  Issue Type: Improvement
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-6043-0.patch, HBASE-6043-1.patch


 Since the thrift server uses the client api reducing the number of rpc's 
 greatly speeds up increments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5979) Non-pread DFSInputStreams should be associated with scanners, not HFile.Readers

2012-05-21 Thread Kannan Muthukkaruppan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280634#comment-13280634
 ] 

Kannan Muthukkaruppan commented on HBASE-5979:
--

Todd: If we always use positional reads, we don't the benefit of HDFS sending 
the rest of the HDFS block, correct? So I didn't quite catch your recent 
suggestion. Did you mean, issue positional reads, but explicitly read a much 
larger chunk (in the Scan case) than just   the current block?

 Non-pread DFSInputStreams should be associated with scanners, not 
 HFile.Readers
 ---

 Key: HBASE-5979
 URL: https://issues.apache.org/jira/browse/HBASE-5979
 Project: HBase
  Issue Type: Improvement
  Components: performance, regionserver
Reporter: Todd Lipcon

 Currently, every HFile.Reader has a single DFSInputStream, which it uses to 
 service all gets and scans. For gets, we use the positional read API (aka 
 pread) and for scans we use a synchronized block to seek, then read. The 
 advantage of pread is that it doesn't hold any locks, so multiple gets can 
 proceed at the same time. The advantage of seek+read for scans is that the 
 datanode starts to send the entire rest of the HDFS block, rather than just 
 the single hfile block necessary. So, in a single thread, pread is faster for 
 gets, and seek+read is faster for scans since you get a strong pipelining 
 effect.
 However, in a multi-threaded case where there are multiple scans (including 
 scans which are actually part of compactions), the seek+read strategy falls 
 apart, since only one scanner may be reading at a time. Additionally, a large 
 amount of wasted IO is generated on the datanode side, and we get none of the 
 earlier-mentioned advantages.
 In one test, I switched scans to always use pread, and saw a 5x improvement 
 in throughput of the YCSB scan-only workload, since it previously was 
 completely blocked by contention on the DFSIS lock.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5979) Non-pread DFSInputStreams should be associated with scanners, not HFile.Readers

2012-05-21 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280640#comment-13280640
 ] 

Todd Lipcon commented on HBASE-5979:


Hey Kannan,

Sorry, let me elaborate on that suggestion:

The idea is to make a new FSReader implementation, which only has one API. That 
API would look like the current positional read call (i.e take a position and 
length).

Internally, it would have a pool of cached DFSInputStreams, and remember the 
position for each of them. Each of the input streams would be referencing the 
same file. When a read request comes in, it is matched against the pooled 
streams: if it is within N bytes forward from the current position of one of 
the streams, then a seek and read would be issued, synchronized on that stream. 
Otherwise, any random stream would be chosen and a position read would be 
chosen. Separately, we can track the last N positional reads: if we detect a 
sequential pattern in the position reads, we can take one of the pooled input 
streams and seek to the next predicted offset, so that future reads get the 
sequential benefit.

 Non-pread DFSInputStreams should be associated with scanners, not 
 HFile.Readers
 ---

 Key: HBASE-5979
 URL: https://issues.apache.org/jira/browse/HBASE-5979
 Project: HBase
  Issue Type: Improvement
  Components: performance, regionserver
Reporter: Todd Lipcon

 Currently, every HFile.Reader has a single DFSInputStream, which it uses to 
 service all gets and scans. For gets, we use the positional read API (aka 
 pread) and for scans we use a synchronized block to seek, then read. The 
 advantage of pread is that it doesn't hold any locks, so multiple gets can 
 proceed at the same time. The advantage of seek+read for scans is that the 
 datanode starts to send the entire rest of the HDFS block, rather than just 
 the single hfile block necessary. So, in a single thread, pread is faster for 
 gets, and seek+read is faster for scans since you get a strong pipelining 
 effect.
 However, in a multi-threaded case where there are multiple scans (including 
 scans which are actually part of compactions), the seek+read strategy falls 
 apart, since only one scanner may be reading at a time. Additionally, a large 
 amount of wasted IO is generated on the datanode side, and we get none of the 
 earlier-mentioned advantages.
 In one test, I switched scans to always use pread, and saw a 5x improvement 
 in throughput of the YCSB scan-only workload, since it previously was 
 completely blocked by contention on the DFSIS lock.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-4686) [89-fb] Fix per-store metrics aggregation

2012-05-21 Thread Mikhail Bautin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin resolved HBASE-4686.
---

Resolution: Fixed

This has already been committed to trunk.

 [89-fb] Fix per-store metrics aggregation 
 --

 Key: HBASE-4686
 URL: https://issues.apache.org/jira/browse/HBASE-4686
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: D87.1.patch, D87.2.patch, D87.3.patch, D87.4.patch, 
 HBASE-4686-TestRegionServerMetics-and-Store-metric-a-20111027134023-cc718144.patch,
  
 HBASE-4686-jira-89-fb-Fix-per-store-metrics-aggregat-20111027152723-05bea421.patch


 In r1182034 per-Store metrics were broken, because the aggregation of 
 StoreFile metrics over all stores in a region was replaced by overriding them 
 every time. We saw these metrics drop by a factor of numRegions on a 
 production cluster -- thanks to Kannan for noticing this!  We need to fix the 
 metrics and add a unit test to ensure regressions like this don't happen in 
 the future.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6043) Add Increment Coalescing in thrift.

2012-05-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280643#comment-13280643
 ] 

Hadoop QA commented on HBASE-6043:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12528531/HBASE-6043-1.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 35 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.replication.TestReplication
  org.apache.hadoop.hbase.replication.TestMultiSlaveReplication
  org.apache.hadoop.hbase.replication.TestMasterReplication

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1952//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1952//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1952//console

This message is automatically generated.

 Add Increment Coalescing in thrift.
 ---

 Key: HBASE-6043
 URL: https://issues.apache.org/jira/browse/HBASE-6043
 Project: HBase
  Issue Type: Improvement
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-6043-0.patch, HBASE-6043-1.patch, 
 HBASE-6043-2.patch


 Since the thrift server uses the client api reducing the number of rpc's 
 greatly speeds up increments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6063) Replication related failures on trunk after HBASE-5453

2012-05-21 Thread Gregory Chanan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gregory Chanan updated HBASE-6063:
--

Status: Patch Available  (was: Open)

 Replication related failures on trunk after HBASE-5453
 --

 Key: HBASE-6063
 URL: https://issues.apache.org/jira/browse/HBASE-6063
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-6063.patch


 HBASE-5453 added this line:
 {code}
 return ClusterId.parseFrom(data).toString();
 {code}
 in function:
 public static String readClusterIdZNode(ZooKeeperWatcher watcher)
 but this is not implemented, so you get log messages like:
 2012-05-21 16:46:31,256 ERROR 
 [RegionServer:0;cloudera-vm,60456,1337643971995-EventThread] 
 zookeeper.ClientCnxn$EventThread(523): Error while calling watcher 
 java.lang.IllegalArgumentException: Invalid UUID string: 
 org.apache.hadoop.hbase.ClusterId@5563d208
   at java.util.UUID.fromString(UUID.java:204)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.init(ReplicationSource.java:192)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.getReplicationSource(ReplicationSourceManager.java:328)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.addSource(ReplicationSourceManager.java:206)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager$PeersWatcher.nodeChildrenChanged(ReplicationSourceManager.java:505)
   at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:300)
   at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521)
   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
 2012-05-21 16:46:31,256 ERROR 
 [RegionServer:0;cloudera-vm,50926,1337643981835-EventThread] 
 zookeeper.ClientCnxn$EventThread(523): Error while calling watcher 
 and replication fails because the ClusterId does not match what is expected.  
 Patch coming soon.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6044) copytable: remove rs.* parameters

2012-05-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280649#comment-13280649
 ] 

Hudson commented on HBASE-6044:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #13 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/13/])
HBASE-6044 copytable: remove rs.* parameters (Revision 1341200)

 Result = FAILURE
jmhsieh : 
Files : 
* /hbase/trunk/src/docbkx/ops_mgt.xml
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/CopyTable.java


 copytable: remove rs.* parameters
 -

 Key: HBASE-6044
 URL: https://issues.apache.org/jira/browse/HBASE-6044
 Project: HBase
  Issue Type: New Feature
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: hbase-6044-92.patch, hbase-6044-v2.patch, 
 hbase-6044-v3.patch, hbase-6044-v4.patch, hbase-6044.patch


 In discussion of HBASE-6013 it was suggested that we remove these arguments 
 from 0.92+ (but keep in 0.90)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6057) Change some tests categories to optimize build time

2012-05-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280651#comment-13280651
 ] 

Hudson commented on HBASE-6057:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #13 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/13/])
HBASE-6057  Change some tests categories to optimize build time (nkeywal 
via JD) (Revision 1341211)

 Result = FAILURE
jdcryans : 
Files : 
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/encoding/TestBufferedDataBlockEncoder.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/encoding/TestEncodedSeekers.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestChecksum.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestFixedFileTrailer.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockCompatibility.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestLruBlockCache.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/ipc/TestPBOnWritableRpc.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestClockSkewDetection.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestDefaultLoadBalancer.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/metrics/TestMetricsMBeanBase.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/monitoring/TestMemoryBoundedLogMessageBuffer.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/monitoring/TestTaskMonitor.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestGetClosestAtOrBefore.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollingNoCluster.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestPoolMap.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestHQuorumPeer.java


 Change some tests categories to optimize build time
 ---

 Key: HBASE-6057
 URL: https://issues.apache.org/jira/browse/HBASE-6057
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6057.v1.patch


 Some tests categorized as small takes more than 15s: it's better if they are 
 executed in // with the medium tests.
 Some medium tests last less than 2s: it's better to have then executed with 
 the small tests: we save a fork.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5757) TableInputFormat should handle as many errors as possible

2012-05-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280650#comment-13280650
 ] 

Hudson commented on HBASE-5757:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #13 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/13/])
HBASE-5757 TableInputFormat should handle as many errors as possible (Jan 
Lukavsky) (Revision 1341132)

 Result = FAILURE
jmhsieh : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapred/TableRecordReaderImpl.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapred/TestTableInputFormat.java


 TableInputFormat should handle as many errors as possible
 -

 Key: HBASE-5757
 URL: https://issues.apache.org/jira/browse/HBASE-5757
 Project: HBase
  Issue Type: Bug
  Components: mapred, mapreduce
Affects Versions: 0.90.6
Reporter: Jan Lukavsky
Assignee: Jan Lukavsky
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: 5757-trunk-v2.txt, HBASE-5757-trunk-r1341041.patch, 
 HBASE-5757.patch, HBASE-5757.patch, hbase-5757-92.patch


 Prior to HBASE-4196 there was different handling of IOExceptions thrown from 
 scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this 
 handling so that if exception is caught a reconnect is attempted (without 
 bothering the mapred client). After that, HBASE-4269 changed this behavior 
 back, but in both mapred and mapreduce APIs. The question is, is there any 
 reason not to handle all errors that the input format can handle? In other 
 words, why not try to reissue the request after *any* IOException? I see the 
 following disadvantages of current approach
  * the client may see exceptions like LeaseException and 
 ScannerTimeoutException if he fails to process all fetched data in timeout
  * to avoid ScannerTimeoutException the client must raise 
 hbase.regionserver.lease.period
  * timeouts for tasks is aready configured in mapred.task.timeout, so this 
 seems to me a bit redundant, because typically one needs to update both these 
 parameters
  * I don't see any possibility to get rid of LeaseException (this is 
 configured on server side)
 I think all of these issues would be gone, if the DoNotRetryIOException would 
 not be rethrown. -On the other hand, handling errors in InputFormat has 
 disadvantage, that it may hide from the user some inefficiency. Eg. if I have 
 very big scanner.caching, and I manage to process only a few rows in timeout, 
 I will end up with single row being fetched many times (and will not be 
 explicitly notified about this). Could we solve this problem by adding some 
 counter to the InputFormat?-

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5882) Prcoess RIT on master restart can try assigning the region if the region is found on a dead server instead of waiting for Timeout Monitor

2012-05-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280653#comment-13280653
 ] 

Hudson commented on HBASE-5882:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #13 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/13/])
HBASE-5882 Prcoess RIT on master restart can try assigning the region if 
the region is found on a dead server instead of waiting for Timeout Monitor 
(Ashutosh) (Revision 1341110)

 Result = FAILURE
ramkrishna : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java


 Prcoess RIT on master restart can try assigning the region if the region is 
 found on a dead server instead of waiting for Timeout Monitor
 -

 Key: HBASE-5882
 URL: https://issues.apache.org/jira/browse/HBASE-5882
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.6, 0.92.1, 0.94.0
Reporter: ramkrishna.s.vasudevan
Assignee: Ashutosh Jindal
 Fix For: 0.96.0, 0.94.1

 Attachments: HBASE-5882_v5.patch, HBASE-5882_v6.patch, 
 hbase_5882.patch, hbase_5882_V2.patch, hbase_5882_V3.patch, 
 hbase_5882_V4.patch


 Currently on  master restart if it tries to do processRIT, any region if 
 found on dead server tries to avoid the nwe assignment so that timeout 
 monitor can take care.
 This case is more prominent if the node is found in RS_ZK_REGION_OPENING 
 state. I think we can handle this by triggering a new assignment with a new 
 plan.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6061) Fix ACL Admin Table inconsistent permission check

2012-05-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280652#comment-13280652
 ] 

Hudson commented on HBASE-6061:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #13 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/13/])
HBASE-6061 Fix ACL Admin Table inconsistent permission check (Matteo 
Bertozzi) (Revision 1341265)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java


 Fix ACL Admin Table inconsistent permission check
 ---

 Key: HBASE-6061
 URL: https://issues.apache.org/jira/browse/HBASE-6061
 Project: HBase
  Issue Type: Sub-task
  Components: security
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
  Labels: acl, security
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6061-0.92.patch, HBASE-6061-v0.patch, 
 HBASE-6061-v1.patch


 the requirePermission() check for admin operation on a table is currently 
 inconsistent.
 Table Owner with CREATE rights (that means, the owner has created that table) 
 can enable/disable and delete the table but needs ADMIN rights to 
 add/remove/modify a column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6064) Add timestamp to Mutation Thrift API

2012-05-21 Thread Mikhail Bautin (JIRA)
Mikhail Bautin created HBASE-6064:
-

 Summary: Add timestamp to Mutation Thrift API
 Key: HBASE-6064
 URL: https://issues.apache.org/jira/browse/HBASE-6064
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin


We need to be able to specify per-mutation timestamps in the HBase Thrift API. 
If the timestamp is not specified, the timestamp passed to the Thrift API 
method itself (mutateRowTs/mutateRowsTs) should be used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6043) Add Increment Coalescing in thrift.

2012-05-21 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280657#comment-13280657
 ] 

Elliott Clark commented on HBASE-6043:
--

Looks like those tests are failing on trunk right now.

 Add Increment Coalescing in thrift.
 ---

 Key: HBASE-6043
 URL: https://issues.apache.org/jira/browse/HBASE-6043
 Project: HBase
  Issue Type: Improvement
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-6043-0.patch, HBASE-6043-1.patch, 
 HBASE-6043-2.patch


 Since the thrift server uses the client api reducing the number of rpc's 
 greatly speeds up increments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6063) Replication related failures on trunk after HBASE-5453

2012-05-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280673#comment-13280673
 ] 

Hadoop QA commented on HBASE-6063:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12528539/HBASE-6063.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 33 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1954//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1954//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1954//console

This message is automatically generated.

 Replication related failures on trunk after HBASE-5453
 --

 Key: HBASE-6063
 URL: https://issues.apache.org/jira/browse/HBASE-6063
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-6063.patch


 HBASE-5453 added this line:
 {code}
 return ClusterId.parseFrom(data).toString();
 {code}
 in function:
 public static String readClusterIdZNode(ZooKeeperWatcher watcher)
 but this is not implemented, so you get log messages like:
 2012-05-21 16:46:31,256 ERROR 
 [RegionServer:0;cloudera-vm,60456,1337643971995-EventThread] 
 zookeeper.ClientCnxn$EventThread(523): Error while calling watcher 
 java.lang.IllegalArgumentException: Invalid UUID string: 
 org.apache.hadoop.hbase.ClusterId@5563d208
   at java.util.UUID.fromString(UUID.java:204)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.init(ReplicationSource.java:192)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.getReplicationSource(ReplicationSourceManager.java:328)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.addSource(ReplicationSourceManager.java:206)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager$PeersWatcher.nodeChildrenChanged(ReplicationSourceManager.java:505)
   at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:300)
   at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521)
   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
 2012-05-21 16:46:31,256 ERROR 
 [RegionServer:0;cloudera-vm,50926,1337643981835-EventThread] 
 zookeeper.ClientCnxn$EventThread(523): Error while calling watcher 
 and replication fails because the ClusterId does not match what is expected.  
 Patch coming soon.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6043) Add Increment Coalescing in thrift.

2012-05-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280676#comment-13280676
 ] 

Hadoop QA commented on HBASE-6043:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12528534/HBASE-6043-2.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.replication.TestMasterReplication
  org.apache.hadoop.hbase.replication.TestMultiSlaveReplication
  org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol
  org.apache.hadoop.hbase.replication.TestReplication

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1953//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1953//console

This message is automatically generated.

 Add Increment Coalescing in thrift.
 ---

 Key: HBASE-6043
 URL: https://issues.apache.org/jira/browse/HBASE-6043
 Project: HBase
  Issue Type: Improvement
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-6043-0.patch, HBASE-6043-1.patch, 
 HBASE-6043-2.patch


 Since the thrift server uses the client api reducing the number of rpc's 
 greatly speeds up increments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6061) Fix ACL Admin Table inconsistent permission check

2012-05-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280680#comment-13280680
 ] 

Hudson commented on HBASE-6061:
---

Integrated in HBase-TRUNK #2914 (See 
[https://builds.apache.org/job/HBase-TRUNK/2914/])
HBASE-6061 Fix ACL Admin Table inconsistent permission check (Matteo 
Bertozzi) (Revision 1341265)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java


 Fix ACL Admin Table inconsistent permission check
 ---

 Key: HBASE-6061
 URL: https://issues.apache.org/jira/browse/HBASE-6061
 Project: HBase
  Issue Type: Sub-task
  Components: security
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
  Labels: acl, security
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6061-0.92.patch, HBASE-6061-v0.patch, 
 HBASE-6061-v1.patch


 the requirePermission() check for admin operation on a table is currently 
 inconsistent.
 Table Owner with CREATE rights (that means, the owner has created that table) 
 can enable/disable and delete the table but needs ADMIN rights to 
 add/remove/modify a column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6061) Fix ACL Admin Table inconsistent permission check

2012-05-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280681#comment-13280681
 ] 

Hudson commented on HBASE-6061:
---

Integrated in HBase-0.94 #207 (See 
[https://builds.apache.org/job/HBase-0.94/207/])
HBASE-6061 Fix ACL Admin Table inconsistent permission check (Matteo 
Bertozzi) (Revision 1341267)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/branches/0.94/security/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java


 Fix ACL Admin Table inconsistent permission check
 ---

 Key: HBASE-6061
 URL: https://issues.apache.org/jira/browse/HBASE-6061
 Project: HBase
  Issue Type: Sub-task
  Components: security
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
  Labels: acl, security
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6061-0.92.patch, HBASE-6061-v0.patch, 
 HBASE-6061-v1.patch


 the requirePermission() check for admin operation on a table is currently 
 inconsistent.
 Table Owner with CREATE rights (that means, the owner has created that table) 
 can enable/disable and delete the table but needs ADMIN rights to 
 add/remove/modify a column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6061) Fix ACL Admin Table inconsistent permission check

2012-05-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280694#comment-13280694
 ] 

Hudson commented on HBASE-6061:
---

Integrated in HBase-0.92 #416 (See 
[https://builds.apache.org/job/HBase-0.92/416/])
HBASE-6061 Fix ACL Admin Table inconsistent permission check (Matteo 
Bertozzi) (Revision 1341268)

 Result = FAILURE
tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/security/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java


 Fix ACL Admin Table inconsistent permission check
 ---

 Key: HBASE-6061
 URL: https://issues.apache.org/jira/browse/HBASE-6061
 Project: HBase
  Issue Type: Sub-task
  Components: security
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
  Labels: acl, security
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6061-0.92.patch, HBASE-6061-v0.patch, 
 HBASE-6061-v1.patch


 the requirePermission() check for admin operation on a table is currently 
 inconsistent.
 Table Owner with CREATE rights (that means, the owner has created that table) 
 can enable/disable and delete the table but needs ADMIN rights to 
 add/remove/modify a column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6065) Log for flush would append a non-sequential edit in the hlog, may cause data loss

2012-05-21 Thread chunhui shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-6065:


Component/s: wal
   Assignee: chunhui shen
Summary: Log for flush would append a non-sequential edit in the hlog, 
may cause data loss  (was: Log for flush would append a non-sequential edit in 
the hlog, may cause data los)

 Log for flush would append a non-sequential edit in the hlog, may cause data 
 loss
 -

 Key: HBASE-6065
 URL: https://issues.apache.org/jira/browse/HBASE-6065
 Project: HBase
  Issue Type: Bug
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6065) Log for flush would append a non-sequential edit in the hlog, may cause data los

2012-05-21 Thread chunhui shen (JIRA)
chunhui shen created HBASE-6065:
---

 Summary: Log for flush would append a non-sequential edit in the 
hlog, may cause data los
 Key: HBASE-6065
 URL: https://issues.apache.org/jira/browse/HBASE-6065
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6065) Log for flush would append a non-sequential edit in the hlog, may cause data loss

2012-05-21 Thread chunhui shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-6065:


Attachment: HBASE-6065.patch

In the patch, I obtainSeqNum() for the flush log edit rather than the seqId 
from parameter.
So we could ensure the log seq id is always sequential in the file.
BTW, do we use the flush log edit anywhere?

There is another solution: change the splitted log file's name to the real max 
seq id, rather than the last seq id 

 Log for flush would append a non-sequential edit in the hlog, may cause data 
 loss
 -

 Key: HBASE-6065
 URL: https://issues.apache.org/jira/browse/HBASE-6065
 Project: HBase
  Issue Type: Bug
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: HBASE-6065.patch


 After completing flush region, we will append a log edit in the hlog file 
 through HLog#completeCacheFlush.
 {code}
 public void completeCacheFlush(final byte [] encodedRegionName,
   final byte [] tableName, final long logSeqId, final boolean 
 isMetaRegion)
 {
 ...
 HLogKey key = makeKey(encodedRegionName, tableName, logSeqId,
 System.currentTimeMillis(), HConstants.DEFAULT_CLUSTER_ID);
 ...
 }
 {code}
 when we make the hlog key, we use the seqId from the parameter, and it is 
 generated by HLog#startCacheFlush,
 Here, we may append a lower seq id edit than the last edit in the hlog file.
 If it is the last edit log in the file, it may cause data loss.
 because 
 {code}
 HRegion#replayRecoveredEditsIfAny{
 ...
 maxSeqId = Math.abs(Long.parseLong(fileName));
   if (maxSeqId = minSeqId) {
 String msg = Maximum sequenceid for this log is  + maxSeqId
 +  and minimum sequenceid for the region is  + minSeqId
 + , skipped the whole file, path= + edits;
 LOG.debug(msg);
 continue;
   }
 ...
 }
 {code}
 We may skip the splitted log file, because we use the lase edit's seq id as 
 its file name, and consider this seqId as the max seq id in this log file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6033) Adding some fuction to check if a table/region is in compaction

2012-05-21 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6033:
---

Status: Patch Available  (was: Open)

 Adding some fuction to check if a table/region is in compaction
 ---

 Key: HBASE-6033
 URL: https://issues.apache.org/jira/browse/HBASE-6033
 Project: HBase
  Issue Type: New Feature
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: hbase-6033_v2.patch, table_ui.png


 This feature will be helpful to find out if a major compaction is going on.
 We can show if it is in any minor compaction too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6033) Adding some fuction to check if a table/region is in compaction

2012-05-21 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6033:
---

Attachment: hbase-6033_v2.patch

 Adding some fuction to check if a table/region is in compaction
 ---

 Key: HBASE-6033
 URL: https://issues.apache.org/jira/browse/HBASE-6033
 Project: HBase
  Issue Type: New Feature
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: hbase-6033_v2.patch, table_ui.png


 This feature will be helpful to find out if a major compaction is going on.
 We can show if it is in any minor compaction too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6059) Replaying recovered edits would make deleted data exist again

2012-05-21 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280704#comment-13280704
 ] 

chunhui shen commented on HBASE-6059:
-

bq.If majorCompaction is false, we still need to check !kvs.isEmpty(), right?
Yes, I think just about majorCompaction, minorCompaction will retain delete 
type, there is no problem.


 Replaying recovered edits would make deleted data exist again
 -

 Key: HBASE-6059
 URL: https://issues.apache.org/jira/browse/HBASE-6059
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: HBASE-6059-testcase.patch, HBASE-6059.patch


 When we replay recovered edits, we used the minSeqId of Store, It may cause 
 deleted data appeared again.
 Let's see how it happens. Suppose the region with two families(cf1,cf2)
 1.put one data to the region (put r1,cf1:q1,v1)
 2.move the region from server A to server B.
 3.delete the data put by step 1(delete r1)
 4.flush this region.
 5.make major compaction for this region
 6.move the region from server B to server A.
 7.Abort server A
 8.After the region is online, we could get the deleted data(r1,cf1:q1,v1)
 (When we replay recovered edits, we used the minSeqId of Store, because cf2 
 has no store files, so its seqId is 0, so the edit log of put data will be 
 replayed to the region)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

2012-05-21 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280706#comment-13280706
 ] 

Zhihong Yu commented on HBASE-6055:
---

The design document is very good.
Will get back to reviewing HBASE-5547 first.

 Snapshots in HBase 0.96
 ---

 Key: HBASE-6055
 URL: https://issues.apache.org/jira/browse/HBASE-6055
 Project: HBase
  Issue Type: New Feature
  Components: client, master, regionserver, zookeeper
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 0.96.0

 Attachments: Snapshots in HBase.docx


 Continuation of HBASE-50 for the current trunk. Since the implementation has 
 drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6065) Log for flush would append a non-sequential edit in the hlog, may cause data loss

2012-05-21 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280707#comment-13280707
 ] 

ramkrishna.s.vasudevan commented on HBASE-6065:
---

So this applies to 0.94 and above only right?

 Log for flush would append a non-sequential edit in the hlog, may cause data 
 loss
 -

 Key: HBASE-6065
 URL: https://issues.apache.org/jira/browse/HBASE-6065
 Project: HBase
  Issue Type: Bug
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: HBASE-6065.patch


 After completing flush region, we will append a log edit in the hlog file 
 through HLog#completeCacheFlush.
 {code}
 public void completeCacheFlush(final byte [] encodedRegionName,
   final byte [] tableName, final long logSeqId, final boolean 
 isMetaRegion)
 {
 ...
 HLogKey key = makeKey(encodedRegionName, tableName, logSeqId,
 System.currentTimeMillis(), HConstants.DEFAULT_CLUSTER_ID);
 ...
 }
 {code}
 when we make the hlog key, we use the seqId from the parameter, and it is 
 generated by HLog#startCacheFlush,
 Here, we may append a lower seq id edit than the last edit in the hlog file.
 If it is the last edit log in the file, it may cause data loss.
 because 
 {code}
 HRegion#replayRecoveredEditsIfAny{
 ...
 maxSeqId = Math.abs(Long.parseLong(fileName));
   if (maxSeqId = minSeqId) {
 String msg = Maximum sequenceid for this log is  + maxSeqId
 +  and minimum sequenceid for the region is  + minSeqId
 + , skipped the whole file, path= + edits;
 LOG.debug(msg);
 continue;
   }
 ...
 }
 {code}
 We may skip the splitted log file, because we use the lase edit's seq id as 
 its file name, and consider this seqId as the max seq id in this log file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5974) Scanner retry behavior with RPC timeout on next() seems incorrect

2012-05-21 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280709#comment-13280709
 ] 

Anoop Sam John commented on HBASE-5974:
---

Thanks for the review Todd
{quote}
why do we need the new RegionScannerWithCookie class? why not add the cookie to 
RegionScanner itself? 
{quote}
I was also thinking initially in this way. There are 2 reasons why I have 
avoided to do the seqNo work within the RegionScanner 
1. In case of the caching1 there will be more than one call to the 
RegionScanner.next(). U mean passing the client sent seqNo ( I am avoiding 
cookie as I agree with you to rename this ) to the RegionScanner which will 
change the interface. This is exposed
2. This is the main reason. With the CP usage we have exposed the RegionScanner 
and using the preScannerOpen() and postScannerOpen() impls user can now return 
his own RegionScanner impl. If we do this seqNo maintain and check logics in 
RegionScanner this will make the user to worry abt these? I feel this should be 
handled by HBase core code.  What do u say?

{quote}
this isn't currently compatible with 0.94, since a new client wouldn't be able 
to scan an old server.
{quote}
Agree.. I can fix this
{quote}
let's rename cookie to callSequenceNumber 
{quote}
Already agreed.. :) 
{quote}
In the test, I think you should use HRegionInterface directly, so you don't 
have to actually generate an RPC timeout.
{quote}
I thought of an E2E FT case.. Yes as u said the other one also I can write. So 
what is your recommendation? Should I change?
{quote}
 As is, I think it's also not guaranteed to trigger the issue unless you set 
scanner caching to 1, right? 
{quote}
May be in that case I can explicitly set the caching=1 for this test case. I 
can do that

 Scanner retry behavior with RPC timeout on next() seems incorrect
 -

 Key: HBASE-5974
 URL: https://issues.apache.org/jira/browse/HBASE-5974
 Project: HBase
  Issue Type: Bug
  Components: client, regionserver
Affects Versions: 0.90.7, 0.92.1, 0.94.0, 0.96.0
Reporter: Todd Lipcon
Priority: Critical
 Attachments: HBASE-5974_0.94.patch


 I'm seeing the following behavior:
 - set RPC timeout to a short value
 - call next() for some batch of rows, big enough so the client times out 
 before the result is returned
 - the HConnectionManager stuff will retry the next() call to the same server. 
 At this point, one of two things can happen: 1) the previous next() call will 
 still be processing, in which case you get a LeaseException, because it was 
 removed from the map during the processing, or 2) the next() call will 
 succeed but skip the prior batch of rows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6065) Log for flush would append a non-sequential edit in the hlog, may cause data loss

2012-05-21 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280717#comment-13280717
 ] 

ramkrishna.s.vasudevan commented on HBASE-6065:
---

@Chunhui
What type of dataloass you see here? is it the edit with HBASE::CACHEFLUSH that 
gets missed here?
Ideally by design that edit is actually needed to show up to what point the 
flush has been done and the same is added as an entry in HLog.
Even while recovering we tend to skip this entry.
{code}
   // Check this edit is for me. Also, guard against writing the special
// METACOLUMN info such as HBASE::CACHEFLUSH entries
if (kv.matchingFamily(HLog.METAFAMILY) ||
!Bytes.equals(key.getEncodedRegionName(), 
this.regionInfo.getEncodedNameAsBytes())) {
  skippedEdits++;
  continue;
}
{code}
Did you find any other type of dataloss which i am not able to foresee here? 
Correct me if am wrong.

 Log for flush would append a non-sequential edit in the hlog, may cause data 
 loss
 -

 Key: HBASE-6065
 URL: https://issues.apache.org/jira/browse/HBASE-6065
 Project: HBase
  Issue Type: Bug
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: HBASE-6065.patch


 After completing flush region, we will append a log edit in the hlog file 
 through HLog#completeCacheFlush.
 {code}
 public void completeCacheFlush(final byte [] encodedRegionName,
   final byte [] tableName, final long logSeqId, final boolean 
 isMetaRegion)
 {
 ...
 HLogKey key = makeKey(encodedRegionName, tableName, logSeqId,
 System.currentTimeMillis(), HConstants.DEFAULT_CLUSTER_ID);
 ...
 }
 {code}
 when we make the hlog key, we use the seqId from the parameter, and it is 
 generated by HLog#startCacheFlush,
 Here, we may append a lower seq id edit than the last edit in the hlog file.
 If it is the last edit log in the file, it may cause data loss.
 because 
 {code}
 HRegion#replayRecoveredEditsIfAny{
 ...
 maxSeqId = Math.abs(Long.parseLong(fileName));
   if (maxSeqId = minSeqId) {
 String msg = Maximum sequenceid for this log is  + maxSeqId
 +  and minimum sequenceid for the region is  + minSeqId
 + , skipped the whole file, path= + edits;
 LOG.debug(msg);
 continue;
   }
 ...
 }
 {code}
 We may skip the splitted log file, because we use the lase edit's seq id as 
 its file name, and consider this seqId as the max seq id in this log file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6059) Replaying recovered edits would make deleted data exist again

2012-05-21 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280718#comment-13280718
 ] 

ramkrishna.s.vasudevan commented on HBASE-6059:
---

I think only major compaction could lead us to this problem which probabaly 
deletes it.  
Incase of TTL expiry of all the entries in a store file, can we have this 
scenario of empty StoreFile getting created on minor or major compaction? I 
think creating empty store file should be fine.  Lets take others input also on 
this?

 Replaying recovered edits would make deleted data exist again
 -

 Key: HBASE-6059
 URL: https://issues.apache.org/jira/browse/HBASE-6059
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: HBASE-6059-testcase.patch, HBASE-6059.patch


 When we replay recovered edits, we used the minSeqId of Store, It may cause 
 deleted data appeared again.
 Let's see how it happens. Suppose the region with two families(cf1,cf2)
 1.put one data to the region (put r1,cf1:q1,v1)
 2.move the region from server A to server B.
 3.delete the data put by step 1(delete r1)
 4.flush this region.
 5.make major compaction for this region
 6.move the region from server B to server A.
 7.Abort server A
 8.After the region is online, we could get the deleted data(r1,cf1:q1,v1)
 (When we replay recovered edits, we used the minSeqId of Store, because cf2 
 has no store files, so its seqId is 0, so the edit log of put data will be 
 replayed to the region)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6033) Adding some fuction to check if a table/region is in compaction

2012-05-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280719#comment-13280719
 ] 

Hadoop QA commented on HBASE-6033:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12528550/hbase-6033_v2.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified tests.

+1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 33 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.regionserver.TestCompactionState
  org.apache.hadoop.hbase.replication.TestReplication
  
org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
  org.apache.hadoop.hbase.replication.TestMultiSlaveReplication
  org.apache.hadoop.hbase.replication.TestMasterReplication

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1955//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1955//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1955//console

This message is automatically generated.

 Adding some fuction to check if a table/region is in compaction
 ---

 Key: HBASE-6033
 URL: https://issues.apache.org/jira/browse/HBASE-6033
 Project: HBase
  Issue Type: New Feature
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: hbase-6033_v2.patch, table_ui.png


 This feature will be helpful to find out if a major compaction is going on.
 We can show if it is in any minor compaction too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5974) Scanner retry behavior with RPC timeout on next() seems incorrect

2012-05-21 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280728#comment-13280728
 ] 

Anoop Sam John commented on HBASE-5974:
---

Thanks for the review Jieshan
{quote}
So what's your suggestion, Anoop? call CP hooks in the finally section?
{quote}
I mean in whatever case when we close the scanner we need to call the CP hooks. 
Currently before this patch we were not doing this when getting a NSRE 
{code}
catch (Throwable t) {
  if (t instanceof NotServingRegionException) {
this.scanners.remove(scannerName);
  }
  throw convertThrowableToIOE(cleanup(t));
}
{code}
Here we can see it is not calling the CP hooks.  As of now in case of the 
cookie out of order also I am not contacting the CP hooks.

{quote}
RegionScanner scanner = scanners.get(scannerIdString).s;
{quote}
Oh yes. Thanks for pointing it out. I will fix.. This was not in that direct 
next() call flow.. That is why I missed..:(  

 Scanner retry behavior with RPC timeout on next() seems incorrect
 -

 Key: HBASE-5974
 URL: https://issues.apache.org/jira/browse/HBASE-5974
 Project: HBase
  Issue Type: Bug
  Components: client, regionserver
Affects Versions: 0.90.7, 0.92.1, 0.94.0, 0.96.0
Reporter: Todd Lipcon
Priority: Critical
 Attachments: HBASE-5974_0.94.patch


 I'm seeing the following behavior:
 - set RPC timeout to a short value
 - call next() for some batch of rows, big enough so the client times out 
 before the result is returned
 - the HConnectionManager stuff will retry the next() call to the same server. 
 At this point, one of two things can happen: 1) the previous next() call will 
 still be processing, in which case you get a LeaseException, because it was 
 removed from the map during the processing, or 2) the next() call will 
 succeed but skip the prior batch of rows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6033) Adding some fuction to check if a table/region is in compaction

2012-05-21 Thread Zhihong Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280729#comment-13280729
 ] 

Zhihong Yu commented on HBASE-6033:
---

@Jimmy:
Can you check why TestCompactionState failed ?

 Adding some fuction to check if a table/region is in compaction
 ---

 Key: HBASE-6033
 URL: https://issues.apache.org/jira/browse/HBASE-6033
 Project: HBase
  Issue Type: New Feature
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: hbase-6033_v2.patch, table_ui.png


 This feature will be helpful to find out if a major compaction is going on.
 We can show if it is in any minor compaction too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6065) Log for flush would append a non-sequential edit in the hlog, may cause data loss

2012-05-21 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280731#comment-13280731
 ] 

chunhui shen commented on HBASE-6065:
-

Suppose region A on the regionserver B,
The issue could reproduce as the following step:

1.put one data to region A (append seq 1 in the hlog)
2.put one data to region A (append seq 2 in the hlog)
3.region A start flush,  it will call HLog#startCacheFlush (current seq num is 
3 in the hlog)
4.put one data to region A (append seq 4 in the hlog)
5.region A complete flush, it will call HLog#completeCacheFlush  (append seq 3 
in the hlog)
6.kill regionserver B.

So, the hlog file has four edit:
seq 1
seq 2
seq 4
seq 3

when splitting this hlog file, we generate the recoverd.edits file for region A 
which is named 3.(About the name, we could see HLogSplitter#splitLogFileToTemp)

Now, when replaying recoverd.edits file for region A, we will skip this file 
and cause data loss.





 Log for flush would append a non-sequential edit in the hlog, may cause data 
 loss
 -

 Key: HBASE-6065
 URL: https://issues.apache.org/jira/browse/HBASE-6065
 Project: HBase
  Issue Type: Bug
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: HBASE-6065.patch


 After completing flush region, we will append a log edit in the hlog file 
 through HLog#completeCacheFlush.
 {code}
 public void completeCacheFlush(final byte [] encodedRegionName,
   final byte [] tableName, final long logSeqId, final boolean 
 isMetaRegion)
 {
 ...
 HLogKey key = makeKey(encodedRegionName, tableName, logSeqId,
 System.currentTimeMillis(), HConstants.DEFAULT_CLUSTER_ID);
 ...
 }
 {code}
 when we make the hlog key, we use the seqId from the parameter, and it is 
 generated by HLog#startCacheFlush,
 Here, we may append a lower seq id edit than the last edit in the hlog file.
 If it is the last edit log in the file, it may cause data loss.
 because 
 {code}
 HRegion#replayRecoveredEditsIfAny{
 ...
 maxSeqId = Math.abs(Long.parseLong(fileName));
   if (maxSeqId = minSeqId) {
 String msg = Maximum sequenceid for this log is  + maxSeqId
 +  and minimum sequenceid for the region is  + minSeqId
 + , skipped the whole file, path= + edits;
 LOG.debug(msg);
 continue;
   }
 ...
 }
 {code}
 We may skip the splitted log file, because we use the lase edit's seq id as 
 its file name, and consider this seqId as the max seq id in this log file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6065) Log for flush would append a non-sequential edit in the hlog, may cause data loss

2012-05-21 Thread chunhui shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13280734#comment-13280734
 ] 

chunhui shen commented on HBASE-6065:
-

I have tried to write a test, but it is a little hard.

We also could fix the issue using another solution (patchv2):
In current logic, we consider the last edit's seq id as the maximal seq id in 
the recoverd.edits file, however it is wrong because we can't ensure the 
sequentia edit in the hlog.
So we should changed the logic of find the maximal seq id for the 
recoverd.edits file, 
We only need do a little for the method 
HLogSplitter#updateRegionMaximumEditLogSeqNum.

 Log for flush would append a non-sequential edit in the hlog, may cause data 
 loss
 -

 Key: HBASE-6065
 URL: https://issues.apache.org/jira/browse/HBASE-6065
 Project: HBase
  Issue Type: Bug
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: HBASE-6065.patch, HBASE-6065v2.patch


 After completing flush region, we will append a log edit in the hlog file 
 through HLog#completeCacheFlush.
 {code}
 public void completeCacheFlush(final byte [] encodedRegionName,
   final byte [] tableName, final long logSeqId, final boolean 
 isMetaRegion)
 {
 ...
 HLogKey key = makeKey(encodedRegionName, tableName, logSeqId,
 System.currentTimeMillis(), HConstants.DEFAULT_CLUSTER_ID);
 ...
 }
 {code}
 when we make the hlog key, we use the seqId from the parameter, and it is 
 generated by HLog#startCacheFlush,
 Here, we may append a lower seq id edit than the last edit in the hlog file.
 If it is the last edit log in the file, it may cause data loss.
 because 
 {code}
 HRegion#replayRecoveredEditsIfAny{
 ...
 maxSeqId = Math.abs(Long.parseLong(fileName));
   if (maxSeqId = minSeqId) {
 String msg = Maximum sequenceid for this log is  + maxSeqId
 +  and minimum sequenceid for the region is  + minSeqId
 + , skipped the whole file, path= + edits;
 LOG.debug(msg);
 continue;
   }
 ...
 }
 {code}
 We may skip the splitted log file, because we use the lase edit's seq id as 
 its file name, and consider this seqId as the max seq id in this log file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6065) Log for flush would append a non-sequential edit in the hlog, may cause data loss

2012-05-21 Thread chunhui shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-6065:


Attachment: HBASE-6065v2.patch

 Log for flush would append a non-sequential edit in the hlog, may cause data 
 loss
 -

 Key: HBASE-6065
 URL: https://issues.apache.org/jira/browse/HBASE-6065
 Project: HBase
  Issue Type: Bug
  Components: wal
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: HBASE-6065.patch, HBASE-6065v2.patch


 After completing flush region, we will append a log edit in the hlog file 
 through HLog#completeCacheFlush.
 {code}
 public void completeCacheFlush(final byte [] encodedRegionName,
   final byte [] tableName, final long logSeqId, final boolean 
 isMetaRegion)
 {
 ...
 HLogKey key = makeKey(encodedRegionName, tableName, logSeqId,
 System.currentTimeMillis(), HConstants.DEFAULT_CLUSTER_ID);
 ...
 }
 {code}
 when we make the hlog key, we use the seqId from the parameter, and it is 
 generated by HLog#startCacheFlush,
 Here, we may append a lower seq id edit than the last edit in the hlog file.
 If it is the last edit log in the file, it may cause data loss.
 because 
 {code}
 HRegion#replayRecoveredEditsIfAny{
 ...
 maxSeqId = Math.abs(Long.parseLong(fileName));
   if (maxSeqId = minSeqId) {
 String msg = Maximum sequenceid for this log is  + maxSeqId
 +  and minimum sequenceid for the region is  + minSeqId
 + , skipped the whole file, path= + edits;
 LOG.debug(msg);
 continue;
   }
 ...
 }
 {code}
 We may skip the splitted log file, because we use the lase edit's seq id as 
 its file name, and consider this seqId as the max seq id in this log file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >