[jira] [Updated] (HBASE-6316) Confirm can upgrade to 0.96 from 0.94 by just stopping and restarting
[ https://issues.apache.org/jira/browse/HBASE-6316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6316: - Attachment: 6316.txt Here is a fix for the failed parse of Reference files. Confirm can upgrade to 0.96 from 0.94 by just stopping and restarting - Key: HBASE-6316 URL: https://issues.apache.org/jira/browse/HBASE-6316 Project: HBase Issue Type: Bug Reporter: stack Priority: Blocker Fix For: 0.96.0 Attachments: 6316.txt Over in HBASE-6294, LarsH says you have to currently clear zk to get a 0.96 to start over data written by a 0.94. Need to fix it so don't have to do this -- that zk state left over gets auto-migrated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6316) Confirm can upgrade to 0.96 from 0.94 by just stopping and restarting
[ https://issues.apache.org/jira/browse/HBASE-6316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468366#comment-13468366 ] stack commented on HBASE-6316: -- Also get this when try to look at UIs: {code} HTTP ERROR 500 Problem accessing /rs-status. Reason: Unresolved compilation problems: The import org.apache.hadoop.hbase.tmpl cannot be resolved RSStatusTmpl cannot be resolved to a type RSStatusTmpl cannot be resolved to a type Caused by: java.lang.Error: Unresolved compilation problems: The import org.apache.hadoop.hbase.tmpl cannot be resolved RSStatusTmpl cannot be resolved to a type RSStatusTmpl cannot be resolved to a type at org.apache.hadoop.hbase.regionserver.RSStatusServlet.init(RSStatusServlet.java:29) ... {code} Confirm can upgrade to 0.96 from 0.94 by just stopping and restarting - Key: HBASE-6316 URL: https://issues.apache.org/jira/browse/HBASE-6316 Project: HBase Issue Type: Bug Reporter: stack Priority: Blocker Fix For: 0.96.0 Attachments: 6316.txt Over in HBASE-6294, LarsH says you have to currently clear zk to get a 0.96 to start over data written by a 0.94. Need to fix it so don't have to do this -- that zk state left over gets auto-migrated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6931) Refine WAL interface
Flavio Junqueira created HBASE-6931: --- Summary: Refine WAL interface Key: HBASE-6931 URL: https://issues.apache.org/jira/browse/HBASE-6931 Project: HBase Issue Type: Improvement Reporter: Flavio Junqueira We have transformed HLog into an interface and created FSHLog to contain the current implementation of HLog in HBASE-5937. In that patch, we have essentially exposed the public methods, moved method implementations to FSHLog, created a factory for HLog, and moved static methods to HLogUtil. In this umbrella jira, the idea is to refine the WAL interface, making it not dependent upon a file system as it is currently. The high-level idea is to revisit the methods in HLog and HLogUtil and come up an interface that can accommodate other backends, such as BookKeeper. Another major task here is to decide what to do with the splitter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6933) Revisit methods of HLogUtil
Flavio Junqueira created HBASE-6933: --- Summary: Revisit methods of HLogUtil Key: HBASE-6933 URL: https://issues.apache.org/jira/browse/HBASE-6933 Project: HBase Issue Type: Sub-task Reporter: Flavio Junqueira -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6932) Revisit methods of HLog
Flavio Junqueira created HBASE-6932: --- Summary: Revisit methods of HLog Key: HBASE-6932 URL: https://issues.apache.org/jira/browse/HBASE-6932 Project: HBase Issue Type: Sub-task Reporter: Flavio Junqueira -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6934) Revisit methods of HLogMetrics
Flavio Junqueira created HBASE-6934: --- Summary: Revisit methods of HLogMetrics Key: HBASE-6934 URL: https://issues.apache.org/jira/browse/HBASE-6934 Project: HBase Issue Type: Sub-task Reporter: Flavio Junqueira -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6935) Rename HLog interface to WAL
Flavio Junqueira created HBASE-6935: --- Summary: Rename HLog interface to WAL Key: HBASE-6935 URL: https://issues.apache.org/jira/browse/HBASE-6935 Project: HBase Issue Type: Sub-task Reporter: Flavio Junqueira -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6936) Remove splitter from the wal interface
Flavio Junqueira created HBASE-6936: --- Summary: Remove splitter from the wal interface Key: HBASE-6936 URL: https://issues.apache.org/jira/browse/HBASE-6936 Project: HBase Issue Type: Sub-task Reporter: Flavio Junqueira -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6937) Remove synchronization around closeLogSyncer (findbugs warning)
[ https://issues.apache.org/jira/browse/HBASE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated HBASE-6937: Priority: Minor (was: Major) Remove synchronization around closeLogSyncer (findbugs warning) --- Key: HBASE-6937 URL: https://issues.apache.org/jira/browse/HBASE-6937 Project: HBase Issue Type: Sub-task Reporter: Flavio Junqueira Priority: Minor -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6937) Remove synchronization around closeLogSyncer (findbugs warning)
Flavio Junqueira created HBASE-6937: --- Summary: Remove synchronization around closeLogSyncer (findbugs warning) Key: HBASE-6937 URL: https://issues.apache.org/jira/browse/HBASE-6937 Project: HBase Issue Type: Sub-task Reporter: Flavio Junqueira -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6938) Move resetLogReaderClass to TestHLogSplit
Flavio Junqueira created HBASE-6938: --- Summary: Move resetLogReaderClass to TestHLogSplit Key: HBASE-6938 URL: https://issues.apache.org/jira/browse/HBASE-6938 Project: HBase Issue Type: Sub-task Reporter: Flavio Junqueira -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6938) Move resetLogReaderClass to TestHLogSplit
[ https://issues.apache.org/jira/browse/HBASE-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated HBASE-6938: Priority: Minor (was: Major) Move resetLogReaderClass to TestHLogSplit - Key: HBASE-6938 URL: https://issues.apache.org/jira/browse/HBASE-6938 Project: HBase Issue Type: Sub-task Reporter: Flavio Junqueira Priority: Minor -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6912) Filters are not properly applied in certain cases
[ https://issues.apache.org/jira/browse/HBASE-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468439#comment-13468439 ] ramkrishna.s.vasudevan commented on HBASE-6912: --- Lazy seeking scenarios will be broken right Lars? bq.I have a patch, which fixes RowFilter. Can you upload this patch? Filters are not properly applied in certain cases - Key: HBASE-6912 URL: https://issues.apache.org/jira/browse/HBASE-6912 Project: HBase Issue Type: Bug Affects Versions: 0.94.1 Reporter: Alex Newman Fix For: 0.94.3, 0.96.0 Attachments: minimalTest.java Steps to reproduce: Create a table, load data into it. Flush the table. Do a scan with 1. Some filter which should not match the first entry in the scan 2. Where one specifies a family and column. You will notice that the first entry is returned even though it doesn't match the filter. It looks like the when the first KeyValue of a scan in the column from the point of view of the code HRegion.java {code} } else if (kv != null !kv.isInternal() filterRowKey(currentRow)) { {code} Is generated by {code} public static KeyValue createLastOnRow(final byte [] row, final int roffset, final int rlength, final byte [] family, final int foffset, final int flength, final byte [] qualifier, final int qoffset, final int qlength) { return new KeyValue(row, roffset, rlength, family, foffset, flength, qualifier, qoffset, qlength, HConstants.OLDEST_TIMESTAMP, Type.Minimum, null, 0, 0); } {code} So it is always internal from that point of the code. Only later from within StoreScanner.java {code} public synchronized boolean next(ListKeyValue outResult, int limit, String metric) throws IOException { LOOP: while((kv = this.heap.peek()) != null) { {code} ( The second time through) Do we get the actual kv, with a proper type and timestamp. This seems to mess with filtering. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6928) TestStoreFile sometimes fails with 'Column family prefix used twice'
[ https://issues.apache.org/jira/browse/HBASE-6928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468497#comment-13468497 ] Hudson commented on HBASE-6928: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #205 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/205/]) HBASE-6928 TestStoreFile sometimes fails with 'Column family prefix used twice' (Revision 1393284) Result = FAILURE stack : Files : * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java TestStoreFile sometimes fails with 'Column family prefix used twice' Key: HBASE-6928 URL: https://issues.apache.org/jira/browse/HBASE-6928 Project: HBase Issue Type: Bug Reporter: Ted Yu Attachments: 6928-debug.txt In build #3406, I saw: {code} java.lang.AssertionError: Column family prefix used twice: cf.cf.bt.Data.fsReadnumops at org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.validateMetricChanges(SchemaMetrics.java:822) at org.apache.hadoop.hbase.regionserver.TestStoreFile.tearDown(TestStoreFile.java:89) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6388) Avoid potential data loss if the flush fails during regionserver shutdown
[ https://issues.apache.org/jira/browse/HBASE-6388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468532#comment-13468532 ] ramkrishna.s.vasudevan commented on HBASE-6388: --- {code} if (!this.killed this.fsOk) { waitOnAllRegionsToClose(abortRequested); LOG.info(stopping server + this.serverNameFromMasterPOV + ; all regions closed.); } //fsOk flag may be changed when closing regions throws exception. if (!this.killed this.fsOk) { closeWAL(abortRequested ? false : true); } {code} I think WAL closing is fine but the closing is not done parallel here. Do we need to address parallelizing closes alone then? What you feel Stack? Avoid potential data loss if the flush fails during regionserver shutdown - Key: HBASE-6388 URL: https://issues.apache.org/jira/browse/HBASE-6388 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Critical Fix For: 0.96.0 Attachments: 0001-HBASE-6388-89-fb-parallelize-close-and-avoid-deletin.patch During a controlled shutdown, Regionserver deletes HLogs even if HRegion.close() fails. We should not be doing this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6912) Filters are not properly applied in certain cases
[ https://issues.apache.org/jira/browse/HBASE-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468593#comment-13468593 ] Lars Hofhansl commented on HBASE-6912: -- I am inclined to revert HBASE-6562 for now, and add Alex' test at the same time to guard against this in the future. Filters are not properly applied in certain cases - Key: HBASE-6912 URL: https://issues.apache.org/jira/browse/HBASE-6912 Project: HBase Issue Type: Bug Affects Versions: 0.94.1 Reporter: Alex Newman Fix For: 0.94.3, 0.96.0 Attachments: minimalTest.java Steps to reproduce: Create a table, load data into it. Flush the table. Do a scan with 1. Some filter which should not match the first entry in the scan 2. Where one specifies a family and column. You will notice that the first entry is returned even though it doesn't match the filter. It looks like the when the first KeyValue of a scan in the column from the point of view of the code HRegion.java {code} } else if (kv != null !kv.isInternal() filterRowKey(currentRow)) { {code} Is generated by {code} public static KeyValue createLastOnRow(final byte [] row, final int roffset, final int rlength, final byte [] family, final int foffset, final int flength, final byte [] qualifier, final int qoffset, final int qlength) { return new KeyValue(row, roffset, rlength, family, foffset, flength, qualifier, qoffset, qlength, HConstants.OLDEST_TIMESTAMP, Type.Minimum, null, 0, 0); } {code} So it is always internal from that point of the code. Only later from within StoreScanner.java {code} public synchronized boolean next(ListKeyValue outResult, int limit, String metric) throws IOException { LOOP: while((kv = this.heap.peek()) != null) { {code} ( The second time through) Do we get the actual kv, with a proper type and timestamp. This seems to mess with filtering. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6912) Filters are not properly applied in certain cases
[ https://issues.apache.org/jira/browse/HBASE-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468595#comment-13468595 ] Ted Yu commented on HBASE-6912: --- +1 to Lars' plan. Filters are not properly applied in certain cases - Key: HBASE-6912 URL: https://issues.apache.org/jira/browse/HBASE-6912 Project: HBase Issue Type: Bug Affects Versions: 0.94.1 Reporter: Alex Newman Fix For: 0.94.3, 0.96.0 Attachments: minimalTest.java Steps to reproduce: Create a table, load data into it. Flush the table. Do a scan with 1. Some filter which should not match the first entry in the scan 2. Where one specifies a family and column. You will notice that the first entry is returned even though it doesn't match the filter. It looks like the when the first KeyValue of a scan in the column from the point of view of the code HRegion.java {code} } else if (kv != null !kv.isInternal() filterRowKey(currentRow)) { {code} Is generated by {code} public static KeyValue createLastOnRow(final byte [] row, final int roffset, final int rlength, final byte [] family, final int foffset, final int flength, final byte [] qualifier, final int qoffset, final int qlength) { return new KeyValue(row, roffset, rlength, family, foffset, flength, qualifier, qoffset, qlength, HConstants.OLDEST_TIMESTAMP, Type.Minimum, null, 0, 0); } {code} So it is always internal from that point of the code. Only later from within StoreScanner.java {code} public synchronized boolean next(ListKeyValue outResult, int limit, String metric) throws IOException { LOOP: while((kv = this.heap.peek()) != null) { {code} ( The second time through) Do we get the actual kv, with a proper type and timestamp. This seems to mess with filtering. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6939) Add the possibility to set the ZK port in HBaseTestingUtility
nkeywal created HBASE-6939: -- Summary: Add the possibility to set the ZK port in HBaseTestingUtility Key: HBASE-6939 URL: https://issues.apache.org/jira/browse/HBASE-6939 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.1, 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Trivial It's useful when embedding the HBaseTestingUtility into another test server: fixing the ZK port allows it to put it simply into a shared instance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6738) Too aggressive task resubmission from the distributed log manager
[ https://issues.apache.org/jira/browse/HBASE-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468611#comment-13468611 ] nkeywal commented on HBASE-6738: Committed revision 1393537. Too aggressive task resubmission from the distributed log manager - Key: HBASE-6738 URL: https://issues.apache.org/jira/browse/HBASE-6738 Project: HBase Issue Type: Bug Components: master, regionserver Affects Versions: 0.94.1, 0.96.0 Environment: 3 nodes cluster test, but can occur as well on a much bigger one. It's all luck! Reporter: nkeywal Assignee: nkeywal Priority: Critical Fix For: 0.96.0 Attachments: 6738.v1.patch With default settings for hbase.splitlog.manager.timeout = 25s and hbase.splitlog.max.resubmit = 3. On tests mentionned on HBASE-5843, I have variations around this scenario, 0.94 + HDFS 1.0.3: The regionserver in charge of the split does not answer in less than 25s, so it gets interrupted but actually continues. Sometimes, we go out of the number of retry, sometimes not, sometimes we're out of retry, but the as the interrupts were ignored we finish nicely. In the mean time, the same single task is executed in parallel by multiple nodes, increasing the probability to get into race conditions. Details: t0: unplug a box with DN+RS t + x: other boxes are already connected, to their connection starts to dies. Nevertheless, they don't consider this node as suspect. t + 180s: zookeeper - master detects the node as dead. recovery start. It can be less than 180s sometimes it around 150s. t + 180s: distributed split starts. There is only 1 task, it's immediately acquired by a one RS. t + 205s: the RS has multiple errors when splitting, because a datanode is missing as well. The master decides to give the task to someone else. But often the task continues in the first RS. Interrupts are often ignored, as it's well stated in the code (// TODO interrupt often gets swallowed, do what else?) {code} 2012-09-04 18:27:30,404 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: Sending interrupt to stop the worker thread {code} t + 211s: two regionsservers are processing the same task. They fight for the leases: {code} 2012-09-04 18:27:32,004 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on /hbase/TABLE/4d1c1a4695b1df8c58d13382b834332e/recovered.edits/037.temp owned by DFSClient_hb_rs_BOX2,60020,1346775882980 but is accessed by DFSClient_hb_rs_BOX1,60020,1346775719125 {code} They can fight like this for many files, until the tasks finally get interrupted or finished. The taks on the second box can be cancelled as well. In this case, the task is created again for a new box. The master seems to stop after 3 attemps. It can as well renounce to split the files. Sometimes the tasks were not cancelled on the RS side, so the split is finished despites what the master thinks and logs. In this case, the assignement starts. In the other, it's we've got a problem). {code} 2012-09-04 18:43:52,724 INFO org.apache.hadoop.hbase.master.SplitLogManager: Skipping resubmissions of task /hbase/splitlog/hdfs%3A%2F%2FBOX1%3A9000%2Fhbase%2F.logs%2FBOX0%2C60020%2C1346776587640-splitting%2FBOX0%252C60020%252C1346776587640.1346776587832 because threshold 3 reached {code} t + 300s: split is finished. Assignement starts t + 330s: assignement is finished, regions are available again. There are a lot of subcases possible depending on the number of logs files, of region server and so on. The issues are: 1) it's difficult, especially in HBase but not only, to interrupt a task. The pattern is often {code} void f() throws IOException{ try { // whatever throw InterruptedException }catch(InterruptedException){ throw new InterruptedIOException(); } } boolean g(){ int nbRetry= 0; for(;;) try{ f(); return true; }catch(IOException e){ nbRetry++; if ( nbRetry maxRetry) return false; } } } {code} This tyically shallows the interrupt. There are other variation, but this one seems to be the standard. Even if we fix this in HBase, we need the other layers to be Interrupteble as well. That's not proven. 2) 25s is very aggressive, considering that we have a default timeout of 180s for zookeeper. In other words, we give 180s to a regionserver before acting, but when it comes to split, it's 25s only. There may be reasons for this, but it seems dangerous, as during a
[jira] [Updated] (HBASE-6738) Too aggressive task resubmission from the distributed log manager
[ https://issues.apache.org/jira/browse/HBASE-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6738: --- Resolution: Fixed Release Note: The Split Log Manager now takes into account the state of the region server doing the split. If this region server is marked as dead (i.e. its ZooKeeper connection expires), its task is immediately resubmitted. If the region server is still in the alive state, then we wait for 2 minutes before resubmitting, instead of 25 seconds previously. This delay can be changed with the parameter hbase.splitlog.manager.timeout (milliseconds, new default since 0.96: 12). was: The Split Log Manager now takes into account the state of the region server doing the split. If this region server is marked as dead (i.e. its ZooKeeper connection expires), its task is immediately resubmitted. If the region server is still in the alive state, then we wait for 2 minutes before resubmitting, instead of 25 seconds previously. This delay can be changed with the parameter hbase.splitlog.manager.timeout (milliseconds, new default: 12). Status: Resolved (was: Patch Available) Too aggressive task resubmission from the distributed log manager - Key: HBASE-6738 URL: https://issues.apache.org/jira/browse/HBASE-6738 Project: HBase Issue Type: Bug Components: master, regionserver Affects Versions: 0.94.1, 0.96.0 Environment: 3 nodes cluster test, but can occur as well on a much bigger one. It's all luck! Reporter: nkeywal Assignee: nkeywal Priority: Critical Fix For: 0.96.0 Attachments: 6738.v1.patch With default settings for hbase.splitlog.manager.timeout = 25s and hbase.splitlog.max.resubmit = 3. On tests mentionned on HBASE-5843, I have variations around this scenario, 0.94 + HDFS 1.0.3: The regionserver in charge of the split does not answer in less than 25s, so it gets interrupted but actually continues. Sometimes, we go out of the number of retry, sometimes not, sometimes we're out of retry, but the as the interrupts were ignored we finish nicely. In the mean time, the same single task is executed in parallel by multiple nodes, increasing the probability to get into race conditions. Details: t0: unplug a box with DN+RS t + x: other boxes are already connected, to their connection starts to dies. Nevertheless, they don't consider this node as suspect. t + 180s: zookeeper - master detects the node as dead. recovery start. It can be less than 180s sometimes it around 150s. t + 180s: distributed split starts. There is only 1 task, it's immediately acquired by a one RS. t + 205s: the RS has multiple errors when splitting, because a datanode is missing as well. The master decides to give the task to someone else. But often the task continues in the first RS. Interrupts are often ignored, as it's well stated in the code (// TODO interrupt often gets swallowed, do what else?) {code} 2012-09-04 18:27:30,404 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: Sending interrupt to stop the worker thread {code} t + 211s: two regionsservers are processing the same task. They fight for the leases: {code} 2012-09-04 18:27:32,004 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on /hbase/TABLE/4d1c1a4695b1df8c58d13382b834332e/recovered.edits/037.temp owned by DFSClient_hb_rs_BOX2,60020,1346775882980 but is accessed by DFSClient_hb_rs_BOX1,60020,1346775719125 {code} They can fight like this for many files, until the tasks finally get interrupted or finished. The taks on the second box can be cancelled as well. In this case, the task is created again for a new box. The master seems to stop after 3 attemps. It can as well renounce to split the files. Sometimes the tasks were not cancelled on the RS side, so the split is finished despites what the master thinks and logs. In this case, the assignement starts. In the other, it's we've got a problem). {code} 2012-09-04 18:43:52,724 INFO org.apache.hadoop.hbase.master.SplitLogManager: Skipping resubmissions of task /hbase/splitlog/hdfs%3A%2F%2FBOX1%3A9000%2Fhbase%2F.logs%2FBOX0%2C60020%2C1346776587640-splitting%2FBOX0%252C60020%252C1346776587640.1346776587832 because threshold 3 reached {code} t + 300s: split is finished. Assignement starts t + 330s: assignement is finished, regions are available again. There are a lot of subcases possible depending on the number of logs files, of region server and so on. The issues are: 1) it's difficult, especially in HBase but not
[jira] [Commented] (HBASE-6738) Too aggressive task resubmission from the distributed log manager
[ https://issues.apache.org/jira/browse/HBASE-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468642#comment-13468642 ] Hudson commented on HBASE-6738: --- Integrated in HBase-TRUNK #3413 (See [https://builds.apache.org/job/HBase-TRUNK/3413/]) HBASE-6738 Too aggressive task resubmission from the distributed log manager (Revision 1393537) Result = FAILURE nkeywal : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java Too aggressive task resubmission from the distributed log manager - Key: HBASE-6738 URL: https://issues.apache.org/jira/browse/HBASE-6738 Project: HBase Issue Type: Bug Components: master, regionserver Affects Versions: 0.94.1, 0.96.0 Environment: 3 nodes cluster test, but can occur as well on a much bigger one. It's all luck! Reporter: nkeywal Assignee: nkeywal Priority: Critical Fix For: 0.96.0 Attachments: 6738.v1.patch With default settings for hbase.splitlog.manager.timeout = 25s and hbase.splitlog.max.resubmit = 3. On tests mentionned on HBASE-5843, I have variations around this scenario, 0.94 + HDFS 1.0.3: The regionserver in charge of the split does not answer in less than 25s, so it gets interrupted but actually continues. Sometimes, we go out of the number of retry, sometimes not, sometimes we're out of retry, but the as the interrupts were ignored we finish nicely. In the mean time, the same single task is executed in parallel by multiple nodes, increasing the probability to get into race conditions. Details: t0: unplug a box with DN+RS t + x: other boxes are already connected, to their connection starts to dies. Nevertheless, they don't consider this node as suspect. t + 180s: zookeeper - master detects the node as dead. recovery start. It can be less than 180s sometimes it around 150s. t + 180s: distributed split starts. There is only 1 task, it's immediately acquired by a one RS. t + 205s: the RS has multiple errors when splitting, because a datanode is missing as well. The master decides to give the task to someone else. But often the task continues in the first RS. Interrupts are often ignored, as it's well stated in the code (// TODO interrupt often gets swallowed, do what else?) {code} 2012-09-04 18:27:30,404 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: Sending interrupt to stop the worker thread {code} t + 211s: two regionsservers are processing the same task. They fight for the leases: {code} 2012-09-04 18:27:32,004 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on /hbase/TABLE/4d1c1a4695b1df8c58d13382b834332e/recovered.edits/037.temp owned by DFSClient_hb_rs_BOX2,60020,1346775882980 but is accessed by DFSClient_hb_rs_BOX1,60020,1346775719125 {code} They can fight like this for many files, until the tasks finally get interrupted or finished. The taks on the second box can be cancelled as well. In this case, the task is created again for a new box. The master seems to stop after 3 attemps. It can as well renounce to split the files. Sometimes the tasks were not cancelled on the RS side, so the split is finished despites what the master thinks and logs. In this case, the assignement starts. In the other, it's we've got a problem). {code} 2012-09-04 18:43:52,724 INFO org.apache.hadoop.hbase.master.SplitLogManager: Skipping resubmissions of task /hbase/splitlog/hdfs%3A%2F%2FBOX1%3A9000%2Fhbase%2F.logs%2FBOX0%2C60020%2C1346776587640-splitting%2FBOX0%252C60020%252C1346776587640.1346776587832 because threshold 3 reached {code} t + 300s: split is finished. Assignement starts t + 330s: assignement is finished, regions are available again. There are a lot of subcases possible depending on the number of logs files, of region server and so on. The issues are: 1) it's difficult, especially in HBase but not only, to interrupt a task. The pattern is often {code} void f() throws IOException{ try { // whatever throw InterruptedException }catch(InterruptedException){ throw new InterruptedIOException(); } } boolean g(){ int nbRetry= 0; for(;;) try{ f(); return true; }catch(IOException e){ nbRetry++; if ( nbRetry maxRetry) return false; } } }
[jira] [Updated] (HBASE-6939) Add the possibility to set the ZK port in HBaseTestingUtility
[ https://issues.apache.org/jira/browse/HBASE-6939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6939: --- Attachment: 6939.v1.patch Add the possibility to set the ZK port in HBaseTestingUtility - Key: HBASE-6939 URL: https://issues.apache.org/jira/browse/HBASE-6939 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.1, 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Trivial Attachments: 6939.v1.patch It's useful when embedding the HBaseTestingUtility into another test server: fixing the ZK port allows it to put it simply into a shared instance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6939) Add the possibility to set the ZK port in HBaseTestingUtility
[ https://issues.apache.org/jira/browse/HBASE-6939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6939: --- Status: Patch Available (was: Open) Add the possibility to set the ZK port in HBaseTestingUtility - Key: HBASE-6939 URL: https://issues.apache.org/jira/browse/HBASE-6939 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.1, 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Trivial Attachments: 6939.v1.patch It's useful when embedding the HBaseTestingUtility into another test server: fixing the ZK port allows it to put it simply into a shared instance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6939) Add the possibility to set the ZK port in HBaseTestingUtility
[ https://issues.apache.org/jira/browse/HBASE-6939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6939: --- Attachment: 6939.094.v1.patch Add the possibility to set the ZK port in HBaseTestingUtility - Key: HBASE-6939 URL: https://issues.apache.org/jira/browse/HBASE-6939 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.1, 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Trivial Attachments: 6939.094.v1.patch, 6939.v1.patch It's useful when embedding the HBaseTestingUtility into another test server: fixing the ZK port allows it to put it simply into a shared instance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6939) Add the possibility to set the ZK port in HBaseTestingUtility
[ https://issues.apache.org/jira/browse/HBASE-6939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468660#comment-13468660 ] nkeywal commented on HBASE-6939: There's one patch for trunk one for 0.94... Add the possibility to set the ZK port in HBaseTestingUtility - Key: HBASE-6939 URL: https://issues.apache.org/jira/browse/HBASE-6939 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.1, 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Trivial Attachments: 6939.094.v1.patch, 6939.v1.patch It's useful when embedding the HBaseTestingUtility into another test server: fixing the ZK port allows it to put it simply into a shared instance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6939) Add the possibility to set the ZK port in HBaseTestingUtility
[ https://issues.apache.org/jira/browse/HBASE-6939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468664#comment-13468664 ] Hadoop QA commented on HBASE-6939: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12547552/6939.094.v1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2992//console This message is automatically generated. Add the possibility to set the ZK port in HBaseTestingUtility - Key: HBASE-6939 URL: https://issues.apache.org/jira/browse/HBASE-6939 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.1, 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Trivial Attachments: 6939.094.v1.patch, 6939.v1.patch It's useful when embedding the HBaseTestingUtility into another test server: fixing the ZK port allows it to put it simply into a shared instance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6912) Filters are not properly applied in certain cases
[ https://issues.apache.org/jira/browse/HBASE-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468663#comment-13468663 ] ramkrishna.s.vasudevan commented on HBASE-6912: --- Ok Lars.. Sounds good. Alex's test needs little changes like start mini cluster and stop minicluster. Filters are not properly applied in certain cases - Key: HBASE-6912 URL: https://issues.apache.org/jira/browse/HBASE-6912 Project: HBase Issue Type: Bug Affects Versions: 0.94.1 Reporter: Alex Newman Fix For: 0.94.3, 0.96.0 Attachments: minimalTest.java Steps to reproduce: Create a table, load data into it. Flush the table. Do a scan with 1. Some filter which should not match the first entry in the scan 2. Where one specifies a family and column. You will notice that the first entry is returned even though it doesn't match the filter. It looks like the when the first KeyValue of a scan in the column from the point of view of the code HRegion.java {code} } else if (kv != null !kv.isInternal() filterRowKey(currentRow)) { {code} Is generated by {code} public static KeyValue createLastOnRow(final byte [] row, final int roffset, final int rlength, final byte [] family, final int foffset, final int flength, final byte [] qualifier, final int qoffset, final int qlength) { return new KeyValue(row, roffset, rlength, family, foffset, flength, qualifier, qoffset, qlength, HConstants.OLDEST_TIMESTAMP, Type.Minimum, null, 0, 0); } {code} So it is always internal from that point of the code. Only later from within StoreScanner.java {code} public synchronized boolean next(ListKeyValue outResult, int limit, String metric) throws IOException { LOOP: while((kv = this.heap.peek()) != null) { {code} ( The second time through) Do we get the actual kv, with a proper type and timestamp. This seems to mess with filtering. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6388) Avoid potential data loss if the flush fails during regionserver shutdown
[ https://issues.apache.org/jira/browse/HBASE-6388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468666#comment-13468666 ] stack commented on HBASE-6388: -- Then I must be thinking of another issue Ram. Do you find merit in this patch? If so, lets forward port and get it in. Thanks. Avoid potential data loss if the flush fails during regionserver shutdown - Key: HBASE-6388 URL: https://issues.apache.org/jira/browse/HBASE-6388 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Critical Fix For: 0.96.0 Attachments: 0001-HBASE-6388-89-fb-parallelize-close-and-avoid-deletin.patch During a controlled shutdown, Regionserver deletes HLogs even if HRegion.close() fails. We should not be doing this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6939) Add the possibility to set the ZK port in HBaseTestingUtility
[ https://issues.apache.org/jira/browse/HBASE-6939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6939: --- Attachment: 6939.v1.patch Add the possibility to set the ZK port in HBaseTestingUtility - Key: HBASE-6939 URL: https://issues.apache.org/jira/browse/HBASE-6939 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.1, 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Trivial Attachments: 6939.094.v1.patch, 6939.v1.patch, 6939.v1.patch It's useful when embedding the HBaseTestingUtility into another test server: fixing the ZK port allows it to put it simply into a shared instance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6939) Add the possibility to set the ZK port in HBaseTestingUtility
[ https://issues.apache.org/jira/browse/HBASE-6939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6939: --- Status: Open (was: Patch Available) Add the possibility to set the ZK port in HBaseTestingUtility - Key: HBASE-6939 URL: https://issues.apache.org/jira/browse/HBASE-6939 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.1, 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Trivial Attachments: 6939.094.v1.patch, 6939.v1.patch, 6939.v1.patch It's useful when embedding the HBaseTestingUtility into another test server: fixing the ZK port allows it to put it simply into a shared instance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6939) Add the possibility to set the ZK port in HBaseTestingUtility
[ https://issues.apache.org/jira/browse/HBASE-6939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-6939: --- Status: Patch Available (was: Open) Add the possibility to set the ZK port in HBaseTestingUtility - Key: HBASE-6939 URL: https://issues.apache.org/jira/browse/HBASE-6939 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.1, 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Trivial Attachments: 6939.094.v1.patch, 6939.v1.patch, 6939.v1.patch It's useful when embedding the HBaseTestingUtility into another test server: fixing the ZK port allows it to put it simply into a shared instance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6939) Add the possibility to set the ZK port in HBaseTestingUtility
[ https://issues.apache.org/jira/browse/HBASE-6939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468691#comment-13468691 ] Hadoop QA commented on HBASE-6939: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12547551/6939.v1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 83 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2991//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2991//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2991//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2991//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2991//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2991//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2991//console This message is automatically generated. Add the possibility to set the ZK port in HBaseTestingUtility - Key: HBASE-6939 URL: https://issues.apache.org/jira/browse/HBASE-6939 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.1, 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Trivial Attachments: 6939.094.v1.patch, 6939.v1.patch, 6939.v1.patch It's useful when embedding the HBaseTestingUtility into another test server: fixing the ZK port allows it to put it simply into a shared instance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6785) Convert AggregateProtocol to protobuf defined coprocessor service
[ https://issues.apache.org/jira/browse/HBASE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468697#comment-13468697 ] Ted Yu commented on HBASE-6785: --- castToReturnType(Object) signature and its javadoc isn't shown above. FYI Convert AggregateProtocol to protobuf defined coprocessor service - Key: HBASE-6785 URL: https://issues.apache.org/jira/browse/HBASE-6785 Project: HBase Issue Type: Sub-task Components: Coprocessors Reporter: Gary Helmling Assignee: Devaraj Das Fix For: 0.96.0 Attachments: Aggregate.proto, Aggregate.proto With coprocessor endpoints now exposed as protobuf defined services, we should convert over all of our built-in endpoints to PB services. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6939) Add the possibility to set the ZK port in HBaseTestingUtility
[ https://issues.apache.org/jira/browse/HBASE-6939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468717#comment-13468717 ] Hadoop QA commented on HBASE-6939: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12547555/6939.v1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 83 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestAtomicOperation Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2993//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2993//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2993//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2993//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2993//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2993//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2993//console This message is automatically generated. Add the possibility to set the ZK port in HBaseTestingUtility - Key: HBASE-6939 URL: https://issues.apache.org/jira/browse/HBASE-6939 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.1, 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Trivial Attachments: 6939.094.v1.patch, 6939.v1.patch, 6939.v1.patch It's useful when embedding the HBaseTestingUtility into another test server: fixing the ZK port allows it to put it simply into a shared instance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468728#comment-13468728 ] Jean-Daniel Cryans commented on HBASE-6758: --- I really don't like that we have to pass down another instance of HRS (through RegionServerServices). The fact that we're now doing this: {code} -new Replication(this, this.fs, logdir, oldLogDir): null; +new Replication(this, this.fs, logdir, oldLogDir, this): null; {code} is making me sad. Also it leaks all over the code. It seems to me that there should be another way to handle this just in ReplicationSource. At the moment I'd be +1 for commit only to trunk and on commit this logging will need to cleaned up: {code} LOG.info(File + getCurrentPath() + in use); {code} Is ok with you [~devaraj]? [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file --- Key: HBASE-6758 URL: https://issues.apache.org/jira/browse/HBASE-6758 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Priority: Critical Fix For: 0.96.0 Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 6758-trunk-1.patch, TEST-org.apache.hadoop.hbase.replication.TestReplication.xml I have seen cases where the replication-executor would lose data to replicate since the file hasn't been closed yet. Upon closing, the new data becomes visible. Before that happens the ZK node shouldn't be deleted in ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6912) Filters are not properly applied in certain cases
[ https://issues.apache.org/jira/browse/HBASE-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468729#comment-13468729 ] Lars Hofhansl commented on HBASE-6912: -- Yeah, I modified the test and integrated it into TestFromClientSide. OK... So. I'll reopen HBASE-6562, revert that change, and move to 0.94.3 or even 0.96. As part of this jira I'll just commit Alex' test. Filters are not properly applied in certain cases - Key: HBASE-6912 URL: https://issues.apache.org/jira/browse/HBASE-6912 Project: HBase Issue Type: Bug Affects Versions: 0.94.1 Reporter: Alex Newman Fix For: 0.94.3, 0.96.0 Attachments: minimalTest.java Steps to reproduce: Create a table, load data into it. Flush the table. Do a scan with 1. Some filter which should not match the first entry in the scan 2. Where one specifies a family and column. You will notice that the first entry is returned even though it doesn't match the filter. It looks like the when the first KeyValue of a scan in the column from the point of view of the code HRegion.java {code} } else if (kv != null !kv.isInternal() filterRowKey(currentRow)) { {code} Is generated by {code} public static KeyValue createLastOnRow(final byte [] row, final int roffset, final int rlength, final byte [] family, final int foffset, final int flength, final byte [] qualifier, final int qoffset, final int qlength) { return new KeyValue(row, roffset, rlength, family, foffset, flength, qualifier, qoffset, qlength, HConstants.OLDEST_TIMESTAMP, Type.Minimum, null, 0, 0); } {code} So it is always internal from that point of the code. Only later from within StoreScanner.java {code} public synchronized boolean next(ListKeyValue outResult, int limit, String metric) throws IOException { LOOP: while((kv = this.heap.peek()) != null) { {code} ( The second time through) Do we get the actual kv, with a proper type and timestamp. This seems to mess with filtering. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HBASE-6562) Fake KVs are sometimes passed to filters
[ https://issues.apache.org/jira/browse/HBASE-6562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl reopened HBASE-6562: -- See HBASE-6912. I am going to revert this change. Fake KVs are sometimes passed to filters Key: HBASE-6562 URL: https://issues.apache.org/jira/browse/HBASE-6562 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.94.3, 0.96.0 Attachments: 6562.txt, 6562-v2.txt, 6562-v3.txt, minimalTest.java In internal tests at Salesforce we found that fake row keys sometimes are passed to filters (Filter.filterRowKey(...) specifically). The KVs are eventually filtered by the StoreScanner/ScanQueryMatcher, but the row key is passed to filterRowKey in RegionScannImpl *before* that happens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6562) Fake KVs are sometimes passed to filters
[ https://issues.apache.org/jira/browse/HBASE-6562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6562: - Fix Version/s: (was: 0.94.2) 0.94.3 Fake KVs are sometimes passed to filters Key: HBASE-6562 URL: https://issues.apache.org/jira/browse/HBASE-6562 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.94.3, 0.96.0 Attachments: 6562.txt, 6562-v2.txt, 6562-v3.txt, minimalTest.java In internal tests at Salesforce we found that fake row keys sometimes are passed to filters (Filter.filterRowKey(...) specifically). The KVs are eventually filtered by the StoreScanner/ScanQueryMatcher, but the row key is passed to filterRowKey in RegionScannImpl *before* that happens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6912) Filters are not properly applied in certain cases
[ https://issues.apache.org/jira/browse/HBASE-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6912: - Attachment: 6912-0.94.txt 0.94 revert of HBASE-6562, including Alex' test. (Leaving isInternal() on KeyValue, though, because that is useful to have) Filters are not properly applied in certain cases - Key: HBASE-6912 URL: https://issues.apache.org/jira/browse/HBASE-6912 Project: HBase Issue Type: Bug Affects Versions: 0.94.1 Reporter: Alex Newman Fix For: 0.94.3, 0.96.0 Attachments: 6912-0.94.txt, minimalTest.java Steps to reproduce: Create a table, load data into it. Flush the table. Do a scan with 1. Some filter which should not match the first entry in the scan 2. Where one specifies a family and column. You will notice that the first entry is returned even though it doesn't match the filter. It looks like the when the first KeyValue of a scan in the column from the point of view of the code HRegion.java {code} } else if (kv != null !kv.isInternal() filterRowKey(currentRow)) { {code} Is generated by {code} public static KeyValue createLastOnRow(final byte [] row, final int roffset, final int rlength, final byte [] family, final int foffset, final int flength, final byte [] qualifier, final int qoffset, final int qlength) { return new KeyValue(row, roffset, rlength, family, foffset, flength, qualifier, qoffset, qlength, HConstants.OLDEST_TIMESTAMP, Type.Minimum, null, 0, 0); } {code} So it is always internal from that point of the code. Only later from within StoreScanner.java {code} public synchronized boolean next(ListKeyValue outResult, int limit, String metric) throws IOException { LOOP: while((kv = this.heap.peek()) != null) { {code} ( The second time through) Do we get the actual kv, with a proper type and timestamp. This seems to mess with filtering. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6912) Filters are not properly applied in certain cases
[ https://issues.apache.org/jira/browse/HBASE-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6912: - Attachment: 6912-0.96.txt Same for 0.96. Filters are not properly applied in certain cases - Key: HBASE-6912 URL: https://issues.apache.org/jira/browse/HBASE-6912 Project: HBase Issue Type: Bug Affects Versions: 0.94.1 Reporter: Alex Newman Fix For: 0.94.3, 0.96.0 Attachments: 6912-0.94.txt, minimalTest.java Steps to reproduce: Create a table, load data into it. Flush the table. Do a scan with 1. Some filter which should not match the first entry in the scan 2. Where one specifies a family and column. You will notice that the first entry is returned even though it doesn't match the filter. It looks like the when the first KeyValue of a scan in the column from the point of view of the code HRegion.java {code} } else if (kv != null !kv.isInternal() filterRowKey(currentRow)) { {code} Is generated by {code} public static KeyValue createLastOnRow(final byte [] row, final int roffset, final int rlength, final byte [] family, final int foffset, final int flength, final byte [] qualifier, final int qoffset, final int qlength) { return new KeyValue(row, roffset, rlength, family, foffset, flength, qualifier, qoffset, qlength, HConstants.OLDEST_TIMESTAMP, Type.Minimum, null, 0, 0); } {code} So it is always internal from that point of the code. Only later from within StoreScanner.java {code} public synchronized boolean next(ListKeyValue outResult, int limit, String metric) throws IOException { LOOP: while((kv = this.heap.peek()) != null) { {code} ( The second time through) Do we get the actual kv, with a proper type and timestamp. This seems to mess with filtering. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6912) Filters are not properly applied in certain cases
[ https://issues.apache.org/jira/browse/HBASE-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6912: - Attachment: (was: 6912-0.94.txt) Filters are not properly applied in certain cases - Key: HBASE-6912 URL: https://issues.apache.org/jira/browse/HBASE-6912 Project: HBase Issue Type: Bug Affects Versions: 0.94.1 Reporter: Alex Newman Fix For: 0.94.3, 0.96.0 Attachments: 6912-0.94.txt, minimalTest.java Steps to reproduce: Create a table, load data into it. Flush the table. Do a scan with 1. Some filter which should not match the first entry in the scan 2. Where one specifies a family and column. You will notice that the first entry is returned even though it doesn't match the filter. It looks like the when the first KeyValue of a scan in the column from the point of view of the code HRegion.java {code} } else if (kv != null !kv.isInternal() filterRowKey(currentRow)) { {code} Is generated by {code} public static KeyValue createLastOnRow(final byte [] row, final int roffset, final int rlength, final byte [] family, final int foffset, final int flength, final byte [] qualifier, final int qoffset, final int qlength) { return new KeyValue(row, roffset, rlength, family, foffset, flength, qualifier, qoffset, qlength, HConstants.OLDEST_TIMESTAMP, Type.Minimum, null, 0, 0); } {code} So it is always internal from that point of the code. Only later from within StoreScanner.java {code} public synchronized boolean next(ListKeyValue outResult, int limit, String metric) throws IOException { LOOP: while((kv = this.heap.peek()) != null) { {code} ( The second time through) Do we get the actual kv, with a proper type and timestamp. This seems to mess with filtering. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6912) Filters are not properly applied in certain cases
[ https://issues.apache.org/jira/browse/HBASE-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6912: - Attachment: 6912-0.94.txt Right patch. Filters are not properly applied in certain cases - Key: HBASE-6912 URL: https://issues.apache.org/jira/browse/HBASE-6912 Project: HBase Issue Type: Bug Affects Versions: 0.94.1 Reporter: Alex Newman Fix For: 0.94.3, 0.96.0 Attachments: 6912-0.94.txt, minimalTest.java Steps to reproduce: Create a table, load data into it. Flush the table. Do a scan with 1. Some filter which should not match the first entry in the scan 2. Where one specifies a family and column. You will notice that the first entry is returned even though it doesn't match the filter. It looks like the when the first KeyValue of a scan in the column from the point of view of the code HRegion.java {code} } else if (kv != null !kv.isInternal() filterRowKey(currentRow)) { {code} Is generated by {code} public static KeyValue createLastOnRow(final byte [] row, final int roffset, final int rlength, final byte [] family, final int foffset, final int flength, final byte [] qualifier, final int qoffset, final int qlength) { return new KeyValue(row, roffset, rlength, family, foffset, flength, qualifier, qoffset, qlength, HConstants.OLDEST_TIMESTAMP, Type.Minimum, null, 0, 0); } {code} So it is always internal from that point of the code. Only later from within StoreScanner.java {code} public synchronized boolean next(ListKeyValue outResult, int limit, String metric) throws IOException { LOOP: while((kv = this.heap.peek()) != null) { {code} ( The second time through) Do we get the actual kv, with a proper type and timestamp. This seems to mess with filtering. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6912) Filters are not properly applied in certain cases
[ https://issues.apache.org/jira/browse/HBASE-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6912: - Attachment: (was: 6912-0.96.txt) Filters are not properly applied in certain cases - Key: HBASE-6912 URL: https://issues.apache.org/jira/browse/HBASE-6912 Project: HBase Issue Type: Bug Affects Versions: 0.94.1 Reporter: Alex Newman Fix For: 0.94.3, 0.96.0 Attachments: 6912-0.94.txt, minimalTest.java Steps to reproduce: Create a table, load data into it. Flush the table. Do a scan with 1. Some filter which should not match the first entry in the scan 2. Where one specifies a family and column. You will notice that the first entry is returned even though it doesn't match the filter. It looks like the when the first KeyValue of a scan in the column from the point of view of the code HRegion.java {code} } else if (kv != null !kv.isInternal() filterRowKey(currentRow)) { {code} Is generated by {code} public static KeyValue createLastOnRow(final byte [] row, final int roffset, final int rlength, final byte [] family, final int foffset, final int flength, final byte [] qualifier, final int qoffset, final int qlength) { return new KeyValue(row, roffset, rlength, family, foffset, flength, qualifier, qoffset, qlength, HConstants.OLDEST_TIMESTAMP, Type.Minimum, null, 0, 0); } {code} So it is always internal from that point of the code. Only later from within StoreScanner.java {code} public synchronized boolean next(ListKeyValue outResult, int limit, String metric) throws IOException { LOOP: while((kv = this.heap.peek()) != null) { {code} ( The second time through) Do we get the actual kv, with a proper type and timestamp. This seems to mess with filtering. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6912) Filters are not properly applied in certain cases
[ https://issues.apache.org/jira/browse/HBASE-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6912: - Fix Version/s: (was: 0.94.3) 0.94.2 Status: Patch Available (was: Open) Filters are not properly applied in certain cases - Key: HBASE-6912 URL: https://issues.apache.org/jira/browse/HBASE-6912 Project: HBase Issue Type: Bug Affects Versions: 0.94.1 Reporter: Alex Newman Fix For: 0.94.2, 0.96.0 Attachments: 6912-0.94.txt, 6912-0.96.txt, minimalTest.java Steps to reproduce: Create a table, load data into it. Flush the table. Do a scan with 1. Some filter which should not match the first entry in the scan 2. Where one specifies a family and column. You will notice that the first entry is returned even though it doesn't match the filter. It looks like the when the first KeyValue of a scan in the column from the point of view of the code HRegion.java {code} } else if (kv != null !kv.isInternal() filterRowKey(currentRow)) { {code} Is generated by {code} public static KeyValue createLastOnRow(final byte [] row, final int roffset, final int rlength, final byte [] family, final int foffset, final int flength, final byte [] qualifier, final int qoffset, final int qlength) { return new KeyValue(row, roffset, rlength, family, foffset, flength, qualifier, qoffset, qlength, HConstants.OLDEST_TIMESTAMP, Type.Minimum, null, 0, 0); } {code} So it is always internal from that point of the code. Only later from within StoreScanner.java {code} public synchronized boolean next(ListKeyValue outResult, int limit, String metric) throws IOException { LOOP: while((kv = this.heap.peek()) != null) { {code} ( The second time through) Do we get the actual kv, with a proper type and timestamp. This seems to mess with filtering. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6912) Filters are not properly applied in certain cases
[ https://issues.apache.org/jira/browse/HBASE-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6912: - Attachment: 6912-0.96.txt real 0.96 version Filters are not properly applied in certain cases - Key: HBASE-6912 URL: https://issues.apache.org/jira/browse/HBASE-6912 Project: HBase Issue Type: Bug Affects Versions: 0.94.1 Reporter: Alex Newman Fix For: 0.94.2, 0.96.0 Attachments: 6912-0.94.txt, 6912-0.96.txt, minimalTest.java Steps to reproduce: Create a table, load data into it. Flush the table. Do a scan with 1. Some filter which should not match the first entry in the scan 2. Where one specifies a family and column. You will notice that the first entry is returned even though it doesn't match the filter. It looks like the when the first KeyValue of a scan in the column from the point of view of the code HRegion.java {code} } else if (kv != null !kv.isInternal() filterRowKey(currentRow)) { {code} Is generated by {code} public static KeyValue createLastOnRow(final byte [] row, final int roffset, final int rlength, final byte [] family, final int foffset, final int flength, final byte [] qualifier, final int qoffset, final int qlength) { return new KeyValue(row, roffset, rlength, family, foffset, flength, qualifier, qoffset, qlength, HConstants.OLDEST_TIMESTAMP, Type.Minimum, null, 0, 0); } {code} So it is always internal from that point of the code. Only later from within StoreScanner.java {code} public synchronized boolean next(ListKeyValue outResult, int limit, String metric) throws IOException { LOOP: while((kv = this.heap.peek()) != null) { {code} ( The second time through) Do we get the actual kv, with a proper type and timestamp. This seems to mess with filtering. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6940) Enable GC logging by default
stack created HBASE-6940: Summary: Enable GC logging by default Key: HBASE-6940 URL: https://issues.apache.org/jira/browse/HBASE-6940 Project: HBase Issue Type: Improvement Components: Admin Reporter: stack Priority: Critical Fix For: 0.96.0 I think we should enable gc by default. Its pretty frictionless apparently and could help in the case where folks are getting off the ground. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6871) HFileBlockIndex Write Error in HFile V2 due to incorrect split into intermediate index blocks
[ https://issues.apache.org/jira/browse/HBASE-6871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6871: - Fix Version/s: (was: 0.94.3) 0.94.2 HFileBlockIndex Write Error in HFile V2 due to incorrect split into intermediate index blocks - Key: HBASE-6871 URL: https://issues.apache.org/jira/browse/HBASE-6871 Project: HBase Issue Type: Bug Components: HFile Affects Versions: 0.94.1 Environment: redhat 5u4 Reporter: Fenng Wang Assignee: Mikhail Bautin Priority: Critical Fix For: 0.92.3, 0.94.2, 0.96.0 Attachments: 428a400628ae412ca45d39fce15241fd.hfile, 6871.094.addendum2.txt, 6871.094.addendum.txt, 6871-0.94.txt, 6871-0.94v2.txt, 6871-hfile-index-0.92.txt, 6871-hfile-index-0.92-v2.txt, 6871.txt, 6871v2.txt, 787179746cc347ce9bb36f1989d17419.hfile, 960a026ca370464f84903ea58114bc75.hfile, d0026fa8d59b4df291718f59dd145aad.hfile, D5703.1.patch, D5703.2.patch, D5703.3.patch, D5703.4.patch, D5703.5.patch, hbase-6871-0.94.patch, ImportHFile.java, test_hfile_block_index.sh After writing some data, compaction and scan operation both failure, the exception message is below: 2012-09-18 06:32:26,227 ERROR org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest: Compaction failed regionName=hfile_test,,1347778722498.d220df43fb9d8af4633bd7f547613f9e., storeName=page_info, fileCount=7, fileSize=1.3m (188.0k, 188.0k, 188.0k, 188.0k, 188.0k, 185.8k, 223.3k), priority=9, time=45826250816757428java.io.IOException: Could not reseek StoreFileScanner[HFileScanner for reader reader=hdfs://hadoopdev1.cm6:9000/hbase/hfile_test/d220df43fb9d8af4633bd7f547613f9e/page_info/b0f6118f58de47ad9d87cac438ee0895, compression=lzo, cacheConf=CacheConfig:enabled [cacheDataOnRead=true] [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false], firstKey=http://com.truereligionbrandjeans.www/Womens_Dresses/pl/c/Womens_Shirts/pl/c/Womens_Shirts/pl/c/Womens_Shirts/pl/c/Womens_Shirts/pl/c/Womens_Shirts/pl/c/Womens_Shirts/pl/c/Womens_Shirts/pl/c/Womens_Shirts/pl/c/Womens_Shirts/pl/c/Womens_Shirts/pl/c/Womens_Shirts/pl/c/Womens_Shirts/pl/c/Womens_Shirts/pl/c/Womens_Shirts/pl/c/Womens_Shirts/pl/c/Womens_Sweaters/pl/c/Womens_Sweaters/pl/c/Womens_Sweaters/pl/c/Womens_Sweaters/pl/c/Womens_Sweaters/pl/c/Womens_Sweaters/pl/c/Womens_Shirts/pl/c/Womens_Sweaters/pl/c/Womens_Shirts/pl/c/Womens_Shirts/pl/c/Womens_Shirts/pl/c/Womens_Sweaters/pl/c/Womens_Sweaters/pl/c/Womens_Shirts/pl/c/Womens_Shirts/pl/c/Womens_Sweaters/pl/c/Womens_Sweaters/pl/c/Womens_Sweaters/pl/c/Womens_Sweaters/pl/c/Womens_Sweaters/pl/c/4010.html/page_info:anchor_sig/1347764439449/DeleteColumn, lastKey=http://com.trura.www//page_info:page_type/1347763395089/Put, avgKeyLen=776, avgValueLen=4, entries=12853, length=228611, cur=http://com.truereligionbrandjeans.www/Womens_Exclusive_Details/pl/c/4970.html/page_info:is_deleted/1347764003865/Put/vlen=1/ts=0] to key http://com.truereligionbrandjeans.www/Womens_Exclusive_Details/pl/c/4970.html/page_info:is_deleted/OLDEST_TIMESTAMP/Minimum/vlen=0/ts=0 at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:178) at org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:299) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:244) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:521) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:402) at org.apache.hadoop.hbase.regionserver.Store.compactStore(Store.java:1570) at org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:997) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1216) at org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest.run(CompactionRequest.java:250) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Expected block type LEAF_INDEX, but got INTERMEDIATE_INDEX: blockType=INTERMEDIATE_INDEX, onDiskSizeWithoutHeader=8514, uncompressedSizeWithoutHeader=131837, prevBlockOffset=-1,
[jira] [Updated] (HBASE-6906) TestHBaseFsck#testQuarantine* tests are flakey due to TableNotEnabledException
[ https://issues.apache.org/jira/browse/HBASE-6906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6906: - Fix Version/s: (was: 0.94.3) 0.94.2 TestHBaseFsck#testQuarantine* tests are flakey due to TableNotEnabledException -- Key: HBASE-6906 URL: https://issues.apache.org/jira/browse/HBASE-6906 Project: HBase Issue Type: Bug Components: hbck, test Affects Versions: 0.92.3, 0.94.2, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Fix For: 0.92.3, 0.94.2, 0.96.0 Attachments: hbase-6906-94.patch, hbase-6906.patch This test fails periodically (1 out of 10) times on our internal jenkins instance. {code} FAILED TESTS 1 tests failed. REGRESSION: org.apache.hadoop.hbase.util.TestHBaseFsck.testQuarantineMissingRegionDir Error Message: org.apache.hadoop.hbase.TableNotEnabledException: testQuarantineMissingRegionDir at org.apache.hadoop.hbase.master.handler.DisableTableHandler.init(DisableTableHandler.java:75) at org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:1170) at sun.reflect.GeneratedMethodAccessor68.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1345) Stack Trace: org.apache.hadoop.hbase.TableNotEnabledException: org.apache.hadoop.hbase.TableNotEnabledException: testQuarantineMissingRegionDir at org.apache.hadoop.hbase.master.handler.DisableTableHandler.init(DisableTableHandler.java:75) at org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:1170) at sun.reflect.GeneratedMethodAccessor68.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1345) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79) at org.apache.hadoop.hbase.client.HBaseAdmin.disableTableAsync(HBaseAdmin.java:766) at org.apache.hadoop.hbase.util.TestHBaseFsck.deleteTable(TestHBaseFsck.java:344) at org.apache.hadoop.hbase.util.TestHBaseFsck.doQuarantineTest(TestHBaseFsck.java:1351) at org.apache.hadoop.hbase.util.TestHBaseFsck.testQuarantineMissingRegionDir(TestHBaseFsck.java:1433) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hbase.TableNotEnabledException): org.apache.hadoop.hbase.TableNotEnabledException: testQuarantineMissingRegionDir at org.apache.hadoop.hbase.master.handler.DisableTableHandler.init(DisableTableHandler.java:75) at org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:1170) at sun.reflect.GeneratedMethodAccessor68.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1345) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:918) at
[jira] [Updated] (HBASE-6854) Deletion of SPLITTING node on split rollback should clear the region from RIT
[ https://issues.apache.org/jira/browse/HBASE-6854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6854: - Fix Version/s: (was: 0.94.3) 0.94.2 Deletion of SPLITTING node on split rollback should clear the region from RIT - Key: HBASE-6854 URL: https://issues.apache.org/jira/browse/HBASE-6854 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.2 Attachments: HBASE-6854.patch, HBASE-6854.patch If a failure happens in split before OFFLINING_PARENT, we tend to rollback the split including deleting the znodes created. On deletion of the RS_ZK_SPLITTING node we are getting a callback but not remvoving from RIT. We need to remove it from RIT, anyway SSH logic is well guarded in case the delete event comes due to RS down scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6679) RegionServer aborts due to race between compaction and split
[ https://issues.apache.org/jira/browse/HBASE-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6679: - Fix Version/s: (was: 0.94.3) 0.94.2 RegionServer aborts due to race between compaction and split Key: HBASE-6679 URL: https://issues.apache.org/jira/browse/HBASE-6679 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.92.3, 0.94.2, 0.96.0 Attachments: 6679-1.094.patch, 6679-1.patch, rs-crash-parallel-compact-split.log In our nightlies, we have seen RS aborts due to compaction and split racing. Original parent file gets deleted after the compaction, and hence, the daughters don't find the parent data file. The RS kills itself when this happens. Will attach a snippet of the relevant RS logs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4565) Maven HBase build broken on cygwin with copynativelib.sh call.
[ https://issues.apache.org/jira/browse/HBASE-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-4565: - Fix Version/s: (was: 0.94.3) 0.94.2 Maven HBase build broken on cygwin with copynativelib.sh call. -- Key: HBASE-4565 URL: https://issues.apache.org/jira/browse/HBASE-4565 Project: HBase Issue Type: Bug Components: build Affects Versions: 0.92.0 Environment: cygwin (on xp and win7) Reporter: Suraj Varma Assignee: Suraj Varma Labels: build, maven Fix For: 0.92.3, 0.94.2 Attachments: HBASE-4565-0.92.patch, HBASE-4565.patch, HBASE-4565-v2.patch, HBASE-4565-v3-0.92.patch, HBASE-4565-v3.patch, HBASE-4565-v4-0.92.patch, HBASE-4565-v4-0.94.patch This is broken in both 0.92 as well as trunk pom.xml Here's a sample maven log snippet from trunk (from Mayuresh on user mailing list) [INFO] [antrun:run {execution: package}] [INFO] Executing tasks main: [mkdir] Created dir: D:\workspace\mkshirsa\hbase-trunk\target\hbase-0.93-SNAPSHOT\hbase-0.93-SNAPSHOT\lib\native\${build.platform} [exec] ls: cannot access D:workspacemkshirsahbase-trunktarget/nativelib: No such file or directory [exec] tar (child): Cannot connect to D: resolve failed [INFO] [ERROR] BUILD ERROR [INFO] [INFO] An Ant BuildException has occured: exec returned: 3328 There are two issues: 1) The ant run task below doesn't resolve the windows file separator returned by the project.build.directory - this causes the above resolve failed. !-- Using Unix cp to preserve symlinks, using script to handle wildcards -- echo file=${project.build.directory}/copynativelibs.sh if [ `ls ${project.build.directory}/nativelib | wc -l` -ne 0]; then 2) The tar argument value below also has a similar issue in that the path arg doesn't resolve right. !-- Using Unix tar to preserve symlinks -- exec executable=tar failonerror=yes dir=${project.build.directory}/${project.artifactId}-${project.version} arg value=czf/ arg value=/cygdrive/c/workspaces/hbase-0.92-svn/target/${project.artifactId}-${project.version}.tar.gz/ arg value=./ /exec In both cases, the fix would probably be to use a cross-platform way to handle the directory locations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6299) RS starting region open while failing ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems
[ https://issues.apache.org/jira/browse/HBASE-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6299: - Fix Version/s: (was: 0.94.3) 0.94.2 RS starting region open while failing ack to HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a series of successive problems Key: HBASE-6299 URL: https://issues.apache.org/jira/browse/HBASE-6299 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.94.0 Reporter: Maryann Xue Assignee: Maryann Xue Priority: Critical Fix For: 0.92.3, 0.94.2, 0.96.0 Attachments: 6299v4.txt, 6299v4.txt, 6299v4.txt, HBASE-6299.patch, HBASE-6299-v2.patch, HBASE-6299-v3.patch 1. HMaster tries to assign a region to an RS. 2. HMaster creates a RegionState for this region and puts it into regionsInTransition. 3. In the first assign attempt, HMaster calls RS.openRegion(). The RS receives the open region request and starts to proceed, with success eventually. However, due to network problems, HMaster fails to receive the response for the openRegion() call, and the call times out. 4. HMaster attemps to assign for a second time, choosing another RS. 5. But since the HMaster's OpenedRegionHandler has been triggered by the region open of the previous RS, and the RegionState has already been removed from regionsInTransition, HMaster finds invalid and ignores the unassigned ZK node RS_ZK_REGION_OPENING updated by the second attempt. 6. The unassigned ZK node stays and a later unassign fails coz RS_ZK_REGION_CLOSING cannot be created. {code} 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.; plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568., src=swbss-hadoop-004,60020,1340890123243, dest=swbss-hadoop-006,60020,1340890678078 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. to swbss-hadoop-006,60020,1340890678078 2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:6, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:28,882 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=swbss-hadoop-006,60020,1340890678078, region=b713fd655fa02395496c5a6e39ddf568 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node 2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2377fee2ae80007 Deleting existing unassigned node for b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x2377fee2ae80007 Successfully deleted unassigned node for region b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED 2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has opened the region CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. that was online on serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301) 2012-06-29 07:07:41,140 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568. to serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=0, regions=575, usedHeap=0, maxHeap=0), trying to assign
[jira] [Updated] (HBASE-6901) Store file compactSelection throws ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/HBASE-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6901: - Fix Version/s: (was: 0.94.3) 0.94.2 Store file compactSelection throws ArrayIndexOutOfBoundsException - Key: HBASE-6901 URL: https://issues.apache.org/jira/browse/HBASE-6901 Project: HBase Issue Type: Bug Components: HFile Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.94.2, 0.96.0 Attachments: trunk-6901.patch When setting hbase.mapreduce.hfileoutputformat.compaction.exclude to true, and run compaction to exclude bulk loaded files could cause ArrayIndexOutOfBoundsException since all files are excluded. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6688) folder referred by thrift demo app instructions is outdated
[ https://issues.apache.org/jira/browse/HBASE-6688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6688: - Fix Version/s: (was: 0.94.3) 0.94.2 folder referred by thrift demo app instructions is outdated --- Key: HBASE-6688 URL: https://issues.apache.org/jira/browse/HBASE-6688 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Jimmy Xiang Assignee: stack Priority: Minor Fix For: 0.94.2, 0.96.0 Attachments: thrift094.txt, thrift.txt Due to the source tree module change for 0.96, the instructions in the thrift demo example don't match the folder structure any more. In the instruction, it is referring to: ../../../src/main/resources/org/apache/hadoop/hbase/thrift/Hbase.thrift it should be ../../hbase-server/src/main/resources/org/apache/hadoop/hbase/thrift/Hbase.thrift -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6888) HBase scripts ignore any HBASE_OPTS set in the environment
[ https://issues.apache.org/jira/browse/HBASE-6888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6888: - Fix Version/s: (was: 0.94.3) 0.94.2 HBase scripts ignore any HBASE_OPTS set in the environment -- Key: HBASE-6888 URL: https://issues.apache.org/jira/browse/HBASE-6888 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.94.0, 0.96.0 Reporter: Aditya Kishore Assignee: Aditya Kishore Priority: Minor Fix For: 0.94.2, 0.96.0 Attachments: HBASE-6888_trunk.patch hbase-env.sh which is sourced by hbase-config.sh which is eventually sourced by the main 'hbase' script defines HBASE_OPTS form scratch, ignoring any previous value set in the environment. This prevents from passing additional JVM parameters to HBase programs (shell, hbck, etc) launched through these scripts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6914) Scans/Gets/Mutations don't give a good error if the table is disabled.
[ https://issues.apache.org/jira/browse/HBASE-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6914: - Fix Version/s: (was: 0.94.3) 0.94.2 Scans/Gets/Mutations don't give a good error if the table is disabled. -- Key: HBASE-6914 URL: https://issues.apache.org/jira/browse/HBASE-6914 Project: HBase Issue Type: Improvement Components: Client Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 0.92.3, 0.94.2, 0.96.0 Attachments: HBASE-6914-092-3.patch, HBASE-6914-092-ADD.patch, HBASE-6914-094-3.patch, HBASE-6914-0.patch, HBASE-6914-1.patch, HBASE-6914-2.patch, HBASE-6914-3.patch Scan a table that is disabled will have the client retry multiple times and then will error out with NotServingRegionException. If the table is disabled there's no need to re-try and the message should be more explicit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6927) WrongFS using HRegionInfo.getTableDesc() and different fs for hbase.root and fs.defaultFS
[ https://issues.apache.org/jira/browse/HBASE-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6927: - Fix Version/s: (was: 0.94.3) 0.94.2 WrongFS using HRegionInfo.getTableDesc() and different fs for hbase.root and fs.defaultFS - Key: HBASE-6927 URL: https://issues.apache.org/jira/browse/HBASE-6927 Project: HBase Issue Type: Bug Affects Versions: 0.92.2, 0.94.1, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: 0.94.2, 0.96.0 Attachments: 6927.094.txt, HBASE-6927-v0.patch Calling HRegionInfo.getTableDesc() with different fs schema for hbase.root and fs.defaultFS raises IllegalArgumentException: Wrong FS exception. HRegionInfo.getTableDesc() is called only by bin/region_mover.rb to get the table name and can be easily replaced, getTableDesc() is also deprecated. The main problem is that getTableDesc() doesn't replace fs.defaultFS with hbase.root as all the other hbase code (all the code does this, except getTableDesc) {code} Configuration c = HBaseConfiguration.create(); c.set(fs.defaultFS, c.get(HConstants.HBASE_DIR)); c.set(fs.default.name, c.get(HConstants.HBASE_DIR)); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6920) On timeout connecting to master, client can get stuck and never make progress
[ https://issues.apache.org/jira/browse/HBASE-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6920: - Fix Version/s: 0.94.2 On timeout connecting to master, client can get stuck and never make progress - Key: HBASE-6920 URL: https://issues.apache.org/jira/browse/HBASE-6920 Project: HBase Issue Type: Bug Affects Versions: 0.94.2 Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Critical Fix For: 0.94.2 Attachments: HBASE-6920.patch, HBASE-6920-v2.patch HBASE-5058 appears to have introduced an issue where a timeout in HConnection.getMaster() can cause the client to never be able to connect to the master. So, for example, an HBaseAdmin object can never successfully be initialized. The issue is here: {code} if (tryMaster.isMasterRunning()) { this.master = tryMaster; this.masterLock.notifyAll(); break; } {code} If isMasterRunning times out, it throws an UndeclaredThrowableException, which is already not ideal, because it can be returned to the application. But if the first call to getMaster succeeds, it will set masterChecked = true, which makes us never try to reconnect; that is, we will set this.master = null and just throw MasterNotRunningExceptions, without even trying to connect. I tried out a 94 client (actually a 92 client with some 94 patches) on a cluster with some network issues, and it would constantly get stuck as described above. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6920) On timeout connecting to master, client can get stuck and never make progress
[ https://issues.apache.org/jira/browse/HBASE-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468745#comment-13468745 ] Lars Hofhansl commented on HBASE-6920: -- Actually, let's just fix the issue you discovered here. +1 on your patch. We can think about the other change I suggest for 0.94.3. It is time to get 0.94.2 out the door. On timeout connecting to master, client can get stuck and never make progress - Key: HBASE-6920 URL: https://issues.apache.org/jira/browse/HBASE-6920 Project: HBase Issue Type: Bug Affects Versions: 0.94.2 Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Critical Fix For: 0.94.2 Attachments: HBASE-6920.patch, HBASE-6920-v2.patch HBASE-5058 appears to have introduced an issue where a timeout in HConnection.getMaster() can cause the client to never be able to connect to the master. So, for example, an HBaseAdmin object can never successfully be initialized. The issue is here: {code} if (tryMaster.isMasterRunning()) { this.master = tryMaster; this.masterLock.notifyAll(); break; } {code} If isMasterRunning times out, it throws an UndeclaredThrowableException, which is already not ideal, because it can be returned to the application. But if the first call to getMaster succeeds, it will set masterChecked = true, which makes us never try to reconnect; that is, we will set this.master = null and just throw MasterNotRunningExceptions, without even trying to connect. I tried out a 94 client (actually a 92 client with some 94 patches) on a cluster with some network issues, and it would constantly get stuck as described above. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468751#comment-13468751 ] Devaraj Das commented on HBASE-6758: Thanks, [~jdcryans] for looking at the patch. Actually, upon looking at the RegionServerServices interface closely, I see that it extends the Server interface. So the problem you pointed out could be addressed by making the affected constructors and methods (the ones that I changed to have the new RegionServerServices argument) to have only RegionServerServices instead of Server/Stoppable instances. Will submit a patch soon. Hope that will look better. [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file --- Key: HBASE-6758 URL: https://issues.apache.org/jira/browse/HBASE-6758 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Priority: Critical Fix For: 0.96.0 Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 6758-trunk-1.patch, TEST-org.apache.hadoop.hbase.replication.TestReplication.xml I have seen cases where the replication-executor would lose data to replicate since the file hasn't been closed yet. Upon closing, the new data becomes visible. Before that happens the ZK node shouldn't be deleted in ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468762#comment-13468762 ] stack commented on HBASE-6758: -- Can we not pass down RegionServerServices? Can we pass a narrow Interface instead? [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file --- Key: HBASE-6758 URL: https://issues.apache.org/jira/browse/HBASE-6758 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Priority: Critical Fix For: 0.96.0 Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 6758-trunk-1.patch, TEST-org.apache.hadoop.hbase.replication.TestReplication.xml I have seen cases where the replication-executor would lose data to replicate since the file hasn't been closed yet. Upon closing, the new data becomes visible. Before that happens the ZK node shouldn't be deleted in ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6316) Confirm can upgrade to 0.96 from 0.94 by just stopping and restarting
[ https://issues.apache.org/jira/browse/HBASE-6316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468764#comment-13468764 ] stack commented on HBASE-6316: -- Hmm... trying again I don't get the 500 building on this machine. Confirm can upgrade to 0.96 from 0.94 by just stopping and restarting - Key: HBASE-6316 URL: https://issues.apache.org/jira/browse/HBASE-6316 Project: HBase Issue Type: Bug Reporter: stack Priority: Blocker Fix For: 0.96.0 Attachments: 6316.txt Over in HBASE-6294, LarsH says you have to currently clear zk to get a 0.96 to start over data written by a 0.94. Need to fix it so don't have to do this -- that zk state left over gets auto-migrated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6912) Filters are not properly applied in certain cases
[ https://issues.apache.org/jira/browse/HBASE-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468789#comment-13468789 ] Hadoop QA commented on HBASE-6912: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12547577/6912-0.96.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 83 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2994//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2994//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2994//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2994//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2994//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2994//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2994//console This message is automatically generated. Filters are not properly applied in certain cases - Key: HBASE-6912 URL: https://issues.apache.org/jira/browse/HBASE-6912 Project: HBase Issue Type: Bug Affects Versions: 0.94.1 Reporter: Alex Newman Fix For: 0.94.2, 0.96.0 Attachments: 6912-0.94.txt, 6912-0.96.txt, minimalTest.java Steps to reproduce: Create a table, load data into it. Flush the table. Do a scan with 1. Some filter which should not match the first entry in the scan 2. Where one specifies a family and column. You will notice that the first entry is returned even though it doesn't match the filter. It looks like the when the first KeyValue of a scan in the column from the point of view of the code HRegion.java {code} } else if (kv != null !kv.isInternal() filterRowKey(currentRow)) { {code} Is generated by {code} public static KeyValue createLastOnRow(final byte [] row, final int roffset, final int rlength, final byte [] family, final int foffset, final int flength, final byte [] qualifier, final int qoffset, final int qlength) { return new KeyValue(row, roffset, rlength, family, foffset, flength, qualifier, qoffset, qlength, HConstants.OLDEST_TIMESTAMP, Type.Minimum, null, 0, 0); } {code} So it is always internal from that point of the code. Only later from within StoreScanner.java {code} public synchronized boolean next(ListKeyValue outResult, int limit, String metric) throws IOException { LOOP: while((kv = this.heap.peek()) != null) { {code} ( The second time through) Do we get the actual kv, with a proper type and timestamp. This seems to mess with filtering. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468800#comment-13468800 ] Devaraj Das commented on HBASE-6758: bq. Can we not pass down RegionServerServices? Can we pass a narrow Interface instead? I think we can (I can pull out the getWAL() method from the interface RegionServerServices into a new interface and have RegionServerServices extend that..). But in that case we will pass two instances of HRS still (as pointed out by JD earlier). But thinking about it, that probably makes downstream methods' abstractions cleaner (when compared with the approach of having them accept a fat interface). [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file --- Key: HBASE-6758 URL: https://issues.apache.org/jira/browse/HBASE-6758 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Priority: Critical Fix For: 0.96.0 Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 6758-trunk-1.patch, TEST-org.apache.hadoop.hbase.replication.TestReplication.xml I have seen cases where the replication-executor would lose data to replicate since the file hasn't been closed yet. Upon closing, the new data becomes visible. Before that happens the ZK node shouldn't be deleted in ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-6912) Filters are not properly applied in certain cases
[ https://issues.apache.org/jira/browse/HBASE-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl reassigned HBASE-6912: Assignee: Lars Hofhansl Filters are not properly applied in certain cases - Key: HBASE-6912 URL: https://issues.apache.org/jira/browse/HBASE-6912 Project: HBase Issue Type: Bug Affects Versions: 0.94.1 Reporter: Alex Newman Assignee: Lars Hofhansl Fix For: 0.94.2, 0.96.0 Attachments: 6912-0.94.txt, 6912-0.96.txt, minimalTest.java Steps to reproduce: Create a table, load data into it. Flush the table. Do a scan with 1. Some filter which should not match the first entry in the scan 2. Where one specifies a family and column. You will notice that the first entry is returned even though it doesn't match the filter. It looks like the when the first KeyValue of a scan in the column from the point of view of the code HRegion.java {code} } else if (kv != null !kv.isInternal() filterRowKey(currentRow)) { {code} Is generated by {code} public static KeyValue createLastOnRow(final byte [] row, final int roffset, final int rlength, final byte [] family, final int foffset, final int flength, final byte [] qualifier, final int qoffset, final int qlength) { return new KeyValue(row, roffset, rlength, family, foffset, flength, qualifier, qoffset, qlength, HConstants.OLDEST_TIMESTAMP, Type.Minimum, null, 0, 0); } {code} So it is always internal from that point of the code. Only later from within StoreScanner.java {code} public synchronized boolean next(ListKeyValue outResult, int limit, String metric) throws IOException { LOOP: while((kv = this.heap.peek()) != null) { {code} ( The second time through) Do we get the actual kv, with a proper type and timestamp. This seems to mess with filtering. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6667) TestCatalogJanitor occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468820#comment-13468820 ] Jesse Yates commented on HBASE-6667: This test hasn't been failing on trunk for the last 100+ builds and has been running under a second in every build (generally under 100ms, with some variance). See https://builds.apache.org/job/HBase-TRUNK/3414/testReport/junit/org.apache.hadoop.hbase.master/TestCatalogJanitor/testArchiveOldRegion/history/ I'd like to move to close this as won't fix. I have no idea what went wrong with the original test - its all single threaded and a fairly simple test. It might have been some weird GC issue where the cleanup ran early or bled over from another test running cleanup. However, ran the test again 20x on trunk locally without issue. TestCatalogJanitor occasionally fails - Key: HBASE-6667 URL: https://issues.apache.org/jira/browse/HBASE-6667 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Jesse Yates Fix For: 0.96.0 Attachments: java_6667-v0.txt, testCatalogJanitor-output.txt Here is the OS: Linux sea0 2.6.38-11-generic #48-Ubuntu SMP Fri Jul 29 19:02:55 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux {code} testArchiveOldRegion(org.apache.hadoop.hbase.master.TestCatalogJanitor) Time elapsed: 0.007 sec FAILURE! java.lang.AssertionError: Not the same number of current files Expected (2): Gotten (0): Not Found: _store0 _store1 Extra: at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertNull(Assert.java:551) at org.apache.hadoop.hbase.util.HFileArchiveTestingUtil.assertArchiveEqualToOriginal(HFileArchiveTestingUtil.java:132) at org.apache.hadoop.hbase.util.HFileArchiveTestingUtil.assertArchiveEqualToOriginal(HFileArchiveTestingUtil.java:95) at org.apache.hadoop.hbase.master.TestCatalogJanitor.testArchiveOldRegion(TestCatalogJanitor.java:623) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6667) TestCatalogJanitor occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468824#comment-13468824 ] Ted Yu commented on HBASE-6667: --- We can resolve this and tackle more recent test failures. TestCatalogJanitor occasionally fails - Key: HBASE-6667 URL: https://issues.apache.org/jira/browse/HBASE-6667 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Jesse Yates Fix For: 0.96.0 Attachments: java_6667-v0.txt, testCatalogJanitor-output.txt Here is the OS: Linux sea0 2.6.38-11-generic #48-Ubuntu SMP Fri Jul 29 19:02:55 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux {code} testArchiveOldRegion(org.apache.hadoop.hbase.master.TestCatalogJanitor) Time elapsed: 0.007 sec FAILURE! java.lang.AssertionError: Not the same number of current files Expected (2): Gotten (0): Not Found: _store0 _store1 Extra: at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertNull(Assert.java:551) at org.apache.hadoop.hbase.util.HFileArchiveTestingUtil.assertArchiveEqualToOriginal(HFileArchiveTestingUtil.java:132) at org.apache.hadoop.hbase.util.HFileArchiveTestingUtil.assertArchiveEqualToOriginal(HFileArchiveTestingUtil.java:95) at org.apache.hadoop.hbase.master.TestCatalogJanitor.testArchiveOldRegion(TestCatalogJanitor.java:623) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-6667) TestCatalogJanitor occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved HBASE-6667. --- Resolution: Cannot Reproduce TestCatalogJanitor occasionally fails - Key: HBASE-6667 URL: https://issues.apache.org/jira/browse/HBASE-6667 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Jesse Yates Fix For: 0.96.0 Attachments: java_6667-v0.txt, testCatalogJanitor-output.txt Here is the OS: Linux sea0 2.6.38-11-generic #48-Ubuntu SMP Fri Jul 29 19:02:55 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux {code} testArchiveOldRegion(org.apache.hadoop.hbase.master.TestCatalogJanitor) Time elapsed: 0.007 sec FAILURE! java.lang.AssertionError: Not the same number of current files Expected (2): Gotten (0): Not Found: _store0 _store1 Extra: at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertNull(Assert.java:551) at org.apache.hadoop.hbase.util.HFileArchiveTestingUtil.assertArchiveEqualToOriginal(HFileArchiveTestingUtil.java:132) at org.apache.hadoop.hbase.util.HFileArchiveTestingUtil.assertArchiveEqualToOriginal(HFileArchiveTestingUtil.java:95) at org.apache.hadoop.hbase.master.TestCatalogJanitor.testArchiveOldRegion(TestCatalogJanitor.java:623) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6920) On timeout connecting to master, client can get stuck and never make progress
[ https://issues.apache.org/jira/browse/HBASE-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468835#comment-13468835 ] Gregory Chanan commented on HBASE-6920: --- Sounds good. I'm doing some cluster testing today, I'll commit if all looks good. On timeout connecting to master, client can get stuck and never make progress - Key: HBASE-6920 URL: https://issues.apache.org/jira/browse/HBASE-6920 Project: HBase Issue Type: Bug Affects Versions: 0.94.2 Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Critical Fix For: 0.94.2 Attachments: HBASE-6920.patch, HBASE-6920-v2.patch HBASE-5058 appears to have introduced an issue where a timeout in HConnection.getMaster() can cause the client to never be able to connect to the master. So, for example, an HBaseAdmin object can never successfully be initialized. The issue is here: {code} if (tryMaster.isMasterRunning()) { this.master = tryMaster; this.masterLock.notifyAll(); break; } {code} If isMasterRunning times out, it throws an UndeclaredThrowableException, which is already not ideal, because it can be returned to the application. But if the first call to getMaster succeeds, it will set masterChecked = true, which makes us never try to reconnect; that is, we will set this.master = null and just throw MasterNotRunningExceptions, without even trying to connect. I tried out a 94 client (actually a 92 client with some 94 patches) on a cluster with some network issues, and it would constantly get stuck as described above. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
[ https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468868#comment-13468868 ] Jean-Daniel Cryans commented on HBASE-6733: --- The log switching code really needs to be cleaned up, but my understanding is that this patch won't do anything. {{processEndOfFile}} always sets the {{currentPath}} to {{null}} so this: {code} + Path oldPath = getCurrentPath(); {code} would always return null in the case where we're switching log? [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2] --- Key: HBASE-6733 URL: https://issues.apache.org/jira/browse/HBASE-6733 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.92.3 Attachments: 6733-1.patch, 6733-2.patch The failure is in TestReplication.queueFailover (fails due to unreplicated rows). I have come across two problems: 1. The sleepMultiplier is not properly reset when the currentPath is changed (in ReplicationSource.java). 2. ReplicationExecutor sometime removes files to replicate from the queue too early, resulting in corresponding edits missing. Here the problem is due to the fact the log-file length that the replication executor finds is not the most updated one, and hence it doesn't read anything from there, and ultimately, when there is a log roll, the replication-queue gets a new entry, and the executor drops the old entry out of the queue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3793) HBASE-3468 Broke checkAndPut with null value
[ https://issues.apache.org/jira/browse/HBASE-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-3793: --- Attachment: D5835.1.patch mbautin requested code review of [jira] [HBASE-3793] [89-fb] Fix TestHRegion failure with zero-byte expected array in compare-and-put. Reviewers: Liyin, Kannan, JIRA Passing a zero-byte expected value to checkAndPut and similar methods now means we are expecting to see a zero-byte value, not a non-existent value. This should have been part of rHBASEEIGHTNINEFBBRANCH1391219. TEST PLAN TestHRegion REVISION DETAIL https://reviews.facebook.net/D5835 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/13821/ To: Liyin, Kannan, JIRA, mbautin HBASE-3468 Broke checkAndPut with null value Key: HBASE-3793 URL: https://issues.apache.org/jira/browse/HBASE-3793 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Lars George Assignee: Ming Ma Priority: Blocker Fix For: 0.92.0 Attachments: D5835.1.patch, HBASE-3793.patch, HBASE-3793-TRUNK.patch The previous code called Bytes.equal() which does a check for null on the left or right argument. Now the comparator calls Bytes.compareTo() - which has no check for null. But this is a valid input and checks for existence. I actually noticed this running https://github.com/larsgeorge/hbase-book/blob/master/ch04/src/main/java/client/CheckAndPutExample.java This used to work, now it throws an NPE {noformat} Caused by: java.lang.NullPointerException at org.apache.hadoop.hbase.util.Bytes.compareTo(Bytes.java:854) at org.apache.hadoop.hbase.filter.WritableByteArrayComparable.compareTo(WritableByteArrayComparable.java:63) at org.apache.hadoop.hbase.regionserver.HRegion.checkAndMutate(HRegion.java:1681) at org.apache.hadoop.hbase.regionserver.HRegionServer.checkAndMutate(HRegionServer.java:1693) ... 6 more at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1026) at org.apache.hadoop.hbase.client.HTable.checkAndPut(HTable.java:750) at client.CheckAndPutExample.main(CheckAndPutExample.java:33) {noformat} Easy fixable, just needs to handle the null value before even calling comparator.compareTo(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3793) HBASE-3468 Broke checkAndPut with null value
[ https://issues.apache.org/jira/browse/HBASE-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468879#comment-13468879 ] Phabricator commented on HBASE-3793: Kannan has accepted the revision [jira] [HBASE-3793] [89-fb] Fix TestHRegion failure with zero-length expected value in compare-and-put. REVISION DETAIL https://reviews.facebook.net/D5835 BRANCH fix_test_hregion To: Liyin, Kannan, JIRA, mbautin HBASE-3468 Broke checkAndPut with null value Key: HBASE-3793 URL: https://issues.apache.org/jira/browse/HBASE-3793 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Lars George Assignee: Ming Ma Priority: Blocker Fix For: 0.92.0 Attachments: D5835.1.patch, HBASE-3793.patch, HBASE-3793-TRUNK.patch The previous code called Bytes.equal() which does a check for null on the left or right argument. Now the comparator calls Bytes.compareTo() - which has no check for null. But this is a valid input and checks for existence. I actually noticed this running https://github.com/larsgeorge/hbase-book/blob/master/ch04/src/main/java/client/CheckAndPutExample.java This used to work, now it throws an NPE {noformat} Caused by: java.lang.NullPointerException at org.apache.hadoop.hbase.util.Bytes.compareTo(Bytes.java:854) at org.apache.hadoop.hbase.filter.WritableByteArrayComparable.compareTo(WritableByteArrayComparable.java:63) at org.apache.hadoop.hbase.regionserver.HRegion.checkAndMutate(HRegion.java:1681) at org.apache.hadoop.hbase.regionserver.HRegionServer.checkAndMutate(HRegionServer.java:1693) ... 6 more at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1026) at org.apache.hadoop.hbase.client.HTable.checkAndPut(HTable.java:750) at client.CheckAndPutExample.main(CheckAndPutExample.java:33) {noformat} Easy fixable, just needs to handle the null value before even calling comparator.compareTo(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
[ https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468895#comment-13468895 ] Devaraj Das commented on HBASE-6733: bq. would always return null in the case where we're switching log? That's true.. But the patch still works :-) The check _if (getCurrentPath() != null !getCurrentPath().equals(oldPath))_ would return true (after a call to getNextPath()) and the sleepMultiplier would be reset.. [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2] --- Key: HBASE-6733 URL: https://issues.apache.org/jira/browse/HBASE-6733 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.92.3 Attachments: 6733-1.patch, 6733-2.patch The failure is in TestReplication.queueFailover (fails due to unreplicated rows). I have come across two problems: 1. The sleepMultiplier is not properly reset when the currentPath is changed (in ReplicationSource.java). 2. ReplicationExecutor sometime removes files to replicate from the queue too early, resulting in corresponding edits missing. Here the problem is due to the fact the log-file length that the replication executor finds is not the most updated one, and hence it doesn't read anything from there, and ultimately, when there is a log roll, the replication-queue gets a new entry, and the executor drops the old entry out of the queue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6941) LoadIncrementalHFiles uses the Tool interface incorrectly for loading configs
Harsh J created HBASE-6941: -- Summary: LoadIncrementalHFiles uses the Tool interface incorrectly for loading configs Key: HBASE-6941 URL: https://issues.apache.org/jira/browse/HBASE-6941 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.90.6 Reporter: Harsh J Assignee: Harsh J The LoadIncrementalHFiles tool has pretty complex config loading structured in it, which seems unnecessary and also causes problem since it is ignoring any settings passed to it via Tool's -Dprop=value parameters. This makes integration with tools such as Oozie harder, as it doesn't accept different addresses of ZK, etc. unless there's a hbase-site.xml on the classpath to load from (which is painful to achieve on Oozie). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
[ https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468909#comment-13468909 ] Devaraj Das commented on HBASE-6733: The patch should continue to work if at some point of time, log switching behavior is changed so that the currentPath always points to a valid non-null path... But for now, yeah, null works as well (and I have checked in the Hadoop code that the implementation of equals method with a null argument is handled). [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2] --- Key: HBASE-6733 URL: https://issues.apache.org/jira/browse/HBASE-6733 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.92.3 Attachments: 6733-1.patch, 6733-2.patch The failure is in TestReplication.queueFailover (fails due to unreplicated rows). I have come across two problems: 1. The sleepMultiplier is not properly reset when the currentPath is changed (in ReplicationSource.java). 2. ReplicationExecutor sometime removes files to replicate from the queue too early, resulting in corresponding edits missing. Here the problem is due to the fact the log-file length that the replication executor finds is not the most updated one, and hence it doesn't read anything from there, and ultimately, when there is a log roll, the replication-queue gets a new entry, and the executor drops the old entry out of the queue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6228) Fixup daughters twice cause daughter region assigned twice
[ https://issues.apache.org/jira/browse/HBASE-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-6228: --- Resolution: Implemented Status: Resolved (was: Patch Available) Yes, I think the issue is not there any more. Since we are using SSH to handle dead servers in failover mode, this piece of code in HMaster to fixup daughter is not needed any more. I will remove it in HBASE-6611. Fixup daughters twice cause daughter region assigned twice --- Key: HBASE-6228 URL: https://issues.apache.org/jira/browse/HBASE-6228 Project: HBase Issue Type: Bug Components: master Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0 Attachments: HBASE-6228.patch, HBASE-6228v2.patch, HBASE-6228v2.patch, HBASE-6228v3.patch, HBASE-6228v4.patch First, how fixup daughters twice happen? 1.we will fixupDaughters at the last of HMaster#finishInitialization 2.ServerShutdownHandler will fixupDaughters when reassigning region through ServerShutdownHandler#processDeadRegion When fixupDaughters, we will added daughters to .META., but it coudn't prevent the above case, because FindDaughterVisitor. The detail is as the following: Suppose region A is a splitted parent region, and its daughter region B is missing 1.First, ServerShutdownHander thread fixup daughter, so add daughter region B to .META. with serverName=null, and assign the daughter. 2.Then, Master's initialization thread will also find the daughter region B is missing and assign it. It is because FindDaughterVisitor consider daughter is missing if its serverName=null -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6930) [89-fb] Avoid acquiring the same row lock repeatedly
[ https://issues.apache.org/jira/browse/HBASE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-6930: --- Attachment: D5841.1.patch mbautin requested code review of [jira] [HBASE-6930] [89-fb] Fix TestThriftServerLegacy: notifyAll should be inside synchronized block. Reviewers: Kannan, Liyin, Karthik, JIRA There were a couple of reasons why TestThriftServerLegacy has been failing recently in the HBase 89-fb branch: - rHBASEEIGHTNINEFBBRANCH1393468 was calling notifyAll outside a synchronized block - rHBASEEIGHTNINEFBBRANCH1391219 changed the meaning of a null expected value passed to checkAndMutate but that was not reflected in the Thrift handler TEST PLAN Run TestThriftServerLegacy REVISION DETAIL https://reviews.facebook.net/D5841 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/13833/ To: Kannan, Liyin, Karthik, JIRA, mbautin [89-fb] Avoid acquiring the same row lock repeatedly Key: HBASE-6930 URL: https://issues.apache.org/jira/browse/HBASE-6930 Project: HBase Issue Type: Bug Reporter: Liyin Tang Attachments: D5841.1.patch, D5841.2.patch When processing the multiPut, multiMutations or multiDelete operations, each IPC handler thread tries to acquire a lock for each row key in these batches. If there are duplicated row keys in these batches, previously the IPC handler thread will repeatedly acquire the same row key again and again. So the optimization is to sort each batch operation based on the row key in the client side, and skip acquiring the same row lock repeatedly in the server side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6930) [89-fb] Avoid acquiring the same row lock repeatedly
[ https://issues.apache.org/jira/browse/HBASE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-6930: --- Attachment: D5841.2.patch mbautin updated the revision [jira] [HBASE-6930] [89-fb] Fix TestThriftServerLegacy: notifyAll should be inside synchronized block, and a null checkAndMutate expected value should be handled correctly. Reviewers: Kannan, Liyin, Karthik, JIRA Adding ThriftServerRunner fixes REVISION DETAIL https://reviews.facebook.net/D5841 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java src/main/java/org/apache/hadoop/hbase/thrift/ThriftServerRunner.java src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServerLegacy.java To: Kannan, Liyin, Karthik, JIRA, mbautin [89-fb] Avoid acquiring the same row lock repeatedly Key: HBASE-6930 URL: https://issues.apache.org/jira/browse/HBASE-6930 Project: HBase Issue Type: Bug Reporter: Liyin Tang Attachments: D5841.1.patch, D5841.2.patch When processing the multiPut, multiMutations or multiDelete operations, each IPC handler thread tries to acquire a lock for each row key in these batches. If there are duplicated row keys in these batches, previously the IPC handler thread will repeatedly acquire the same row key again and again. So the optimization is to sort each batch operation based on the row key in the client side, and skip acquiring the same row lock repeatedly in the server side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
[ https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468920#comment-13468920 ] Jean-Daniel Cryans commented on HBASE-6733: --- You are right, but I'd rather have the code expose what it's really doing. Also, reading more, this looks weird: {code} + boolean pathNull = getNextPath(); ... - if (!getNextPath()) { + if (!pathNull) { {code} {{getNextPath}} returns true if the path was not null so shouldn't the variable be named pathNotNull or hasCurrentPath and then remove the exclamation point? [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2] --- Key: HBASE-6733 URL: https://issues.apache.org/jira/browse/HBASE-6733 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.92.3 Attachments: 6733-1.patch, 6733-2.patch The failure is in TestReplication.queueFailover (fails due to unreplicated rows). I have come across two problems: 1. The sleepMultiplier is not properly reset when the currentPath is changed (in ReplicationSource.java). 2. ReplicationExecutor sometime removes files to replicate from the queue too early, resulting in corresponding edits missing. Here the problem is due to the fact the log-file length that the replication executor finds is not the most updated one, and hence it doesn't read anything from there, and ultimately, when there is a log roll, the replication-queue gets a new entry, and the executor drops the old entry out of the queue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6439) Ignore .archive directory as a table
[ https://issues.apache.org/jira/browse/HBASE-6439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates updated HBASE-6439: --- Status: Patch Available (was: Open) Ignore .archive directory as a table Key: HBASE-6439 URL: https://issues.apache.org/jira/browse/HBASE-6439 Project: HBase Issue Type: Bug Components: io, regionserver Affects Versions: 0.96.0 Reporter: Jesse Yates Assignee: Jesse Yates Labels: newbie Attachments: hbase-6439-r0.patch From a recent test run: {quote} 2012-07-22 02:27:30,699 WARN [IPC Server handler 0 on 47087] util.FSTableDescriptors(168): The following folder is in HBase's root directory and doesn't contain a table descriptor, do consider deleting it: .archive {quote} With the addition of HBASE-5547, table-level folders are no-longer all table folders. FSTableDescriptors needs to then have a 'gold-list' that we can update with directories that aren't tables so we don't have this kind of thing showing up in the logs. Currently, we have the following block: {quote} invocations++; if (HTableDescriptor.ROOT_TABLEDESC.getNameAsString().equals(tablename)) { cachehits++; return HTableDescriptor.ROOT_TABLEDESC; } if (HTableDescriptor.META_TABLEDESC.getNameAsString().equals(tablename)) { cachehits++; return HTableDescriptor.META_TABLEDESC; } {quote} to handle special cases, but that's a bit clunky and not clean in terms of table-level directories that need to be ignored. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6930) [89-fb] Avoid acquiring the same row lock repeatedly
[ https://issues.apache.org/jira/browse/HBASE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468929#comment-13468929 ] Phabricator commented on HBASE-6930: Liyin has accepted the revision [jira] [HBASE-6930] [89-fb] Fix TestThriftServerLegacy: notifyAll should be inside synchronized block, and a null checkAndMutate expected value should be handled correctly. Thanks Mikhail ! REVISION DETAIL https://reviews.facebook.net/D5841 BRANCH fix_locked_rows_v2 To: Kannan, Liyin, Karthik, JIRA, mbautin [89-fb] Avoid acquiring the same row lock repeatedly Key: HBASE-6930 URL: https://issues.apache.org/jira/browse/HBASE-6930 Project: HBase Issue Type: Bug Reporter: Liyin Tang Attachments: D5841.1.patch, D5841.2.patch When processing the multiPut, multiMutations or multiDelete operations, each IPC handler thread tries to acquire a lock for each row key in these batches. If there are duplicated row keys in these batches, previously the IPC handler thread will repeatedly acquire the same row key again and again. So the optimization is to sort each batch operation based on the row key in the client side, and skip acquiring the same row lock repeatedly in the server side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6941) LoadIncrementalHFiles uses the Tool interface incorrectly for loading configs
[ https://issues.apache.org/jira/browse/HBASE-6941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HBASE-6941: --- Attachment: HBASE-6941.patch LoadIncrementalHFiles uses the Tool interface incorrectly for loading configs - Key: HBASE-6941 URL: https://issues.apache.org/jira/browse/HBASE-6941 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.90.6 Reporter: Harsh J Assignee: Harsh J Attachments: HBASE-6941.patch The LoadIncrementalHFiles tool has pretty complex config loading structured in it, which seems unnecessary and also causes problem since it is ignoring any settings passed to it via Tool's -Dprop=value parameters. This makes integration with tools such as Oozie harder, as it doesn't accept different addresses of ZK, etc. unless there's a hbase-site.xml on the classpath to load from (which is painful to achieve on Oozie). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6941) LoadIncrementalHFiles uses the Tool interface incorrectly for loading configs
[ https://issues.apache.org/jira/browse/HBASE-6941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468932#comment-13468932 ] Harsh J commented on HBASE-6941: - Unified the configuration to use the getConf() from Configured alone. Added HBase configs to it upon construction. This is the right way to use Tool + HBaseConfiguration. - Constantized 3 of the used config params in the tool, to HConstants, and updated their references in the tool. LoadIncrementalHFiles uses the Tool interface incorrectly for loading configs - Key: HBASE-6941 URL: https://issues.apache.org/jira/browse/HBASE-6941 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.90.6 Reporter: Harsh J Assignee: Harsh J Attachments: HBASE-6941.patch The LoadIncrementalHFiles tool has pretty complex config loading structured in it, which seems unnecessary and also causes problem since it is ignoring any settings passed to it via Tool's -Dprop=value parameters. This makes integration with tools such as Oozie harder, as it doesn't accept different addresses of ZK, etc. unless there's a hbase-site.xml on the classpath to load from (which is painful to achieve on Oozie). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6941) LoadIncrementalHFiles uses the Tool interface incorrectly for loading configs
[ https://issues.apache.org/jira/browse/HBASE-6941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HBASE-6941: --- Status: Patch Available (was: Open) LoadIncrementalHFiles uses the Tool interface incorrectly for loading configs - Key: HBASE-6941 URL: https://issues.apache.org/jira/browse/HBASE-6941 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.90.6 Reporter: Harsh J Assignee: Harsh J Attachments: HBASE-6941.patch The LoadIncrementalHFiles tool has pretty complex config loading structured in it, which seems unnecessary and also causes problem since it is ignoring any settings passed to it via Tool's -Dprop=value parameters. This makes integration with tools such as Oozie harder, as it doesn't accept different addresses of ZK, etc. unless there's a hbase-site.xml on the classpath to load from (which is painful to achieve on Oozie). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
[ https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468933#comment-13468933 ] Devaraj Das commented on HBASE-6733: bq. but I'd rather have the code expose what it's really doing. Do you want me to put a comment or something? bq. the variable be named pathNotNull or hasCurrentPath and then remove the exclamation point? Agree. I'll rename pathNull to hasCurrentPath (but the check will remain the same - if (!hasCurrentPath) ..) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2] --- Key: HBASE-6733 URL: https://issues.apache.org/jira/browse/HBASE-6733 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.92.3 Attachments: 6733-1.patch, 6733-2.patch The failure is in TestReplication.queueFailover (fails due to unreplicated rows). I have come across two problems: 1. The sleepMultiplier is not properly reset when the currentPath is changed (in ReplicationSource.java). 2. ReplicationExecutor sometime removes files to replicate from the queue too early, resulting in corresponding edits missing. Here the problem is due to the fact the log-file length that the replication executor finds is not the most updated one, and hence it doesn't read anything from there, and ultimately, when there is a log roll, the replication-queue gets a new entry, and the executor drops the old entry out of the queue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HBASE-6228) Fixup daughters twice cause daughter region assigned twice
[ https://issues.apache.org/jira/browse/HBASE-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468913#comment-13468913 ] Jimmy Xiang edited comment on HBASE-6228 at 10/4/12 9:34 AM: - Yes, I think the issue is not there any more. was (Author: jxiang): Yes, I think the issue is not there any more. Since we are using SSH to handle dead servers in failover mode, this piece of code in HMaster to fixup daughter is not needed any more. I will remove it in HBASE-6611. Fixup daughters twice cause daughter region assigned twice --- Key: HBASE-6228 URL: https://issues.apache.org/jira/browse/HBASE-6228 Project: HBase Issue Type: Bug Components: master Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0 Attachments: HBASE-6228.patch, HBASE-6228v2.patch, HBASE-6228v2.patch, HBASE-6228v3.patch, HBASE-6228v4.patch First, how fixup daughters twice happen? 1.we will fixupDaughters at the last of HMaster#finishInitialization 2.ServerShutdownHandler will fixupDaughters when reassigning region through ServerShutdownHandler#processDeadRegion When fixupDaughters, we will added daughters to .META., but it coudn't prevent the above case, because FindDaughterVisitor. The detail is as the following: Suppose region A is a splitted parent region, and its daughter region B is missing 1.First, ServerShutdownHander thread fixup daughter, so add daughter region B to .META. with serverName=null, and assign the daughter. 2.Then, Master's initialization thread will also find the daughter region B is missing and assign it. It is because FindDaughterVisitor consider daughter is missing if its serverName=null -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
[ https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468936#comment-13468936 ] Jean-Daniel Cryans commented on HBASE-6733: --- bq. (but the check will remain the same - if (!hasCurrentPath) ..) Ah geez yeah keep that. Damn double negations. bq. Do you want me to put a comment or something? Check for null if that's what you expect I'd say. [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2] --- Key: HBASE-6733 URL: https://issues.apache.org/jira/browse/HBASE-6733 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.92.3 Attachments: 6733-1.patch, 6733-2.patch The failure is in TestReplication.queueFailover (fails due to unreplicated rows). I have come across two problems: 1. The sleepMultiplier is not properly reset when the currentPath is changed (in ReplicationSource.java). 2. ReplicationExecutor sometime removes files to replicate from the queue too early, resulting in corresponding edits missing. Here the problem is due to the fact the log-file length that the replication executor finds is not the most updated one, and hence it doesn't read anything from there, and ultimately, when there is a log roll, the replication-queue gets a new entry, and the executor drops the old entry out of the queue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6941) LoadIncrementalHFiles uses the Tool interface incorrectly for loading configs
[ https://issues.apache.org/jira/browse/HBASE-6941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HBASE-6941: --- Status: Open (was: Patch Available) Missed test-reliance. Cancelling, until I complete that. LoadIncrementalHFiles uses the Tool interface incorrectly for loading configs - Key: HBASE-6941 URL: https://issues.apache.org/jira/browse/HBASE-6941 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.90.6 Reporter: Harsh J Assignee: Harsh J Attachments: HBASE-6941.patch The LoadIncrementalHFiles tool has pretty complex config loading structured in it, which seems unnecessary and also causes problem since it is ignoring any settings passed to it via Tool's -Dprop=value parameters. This makes integration with tools such as Oozie harder, as it doesn't accept different addresses of ZK, etc. unless there's a hbase-site.xml on the classpath to load from (which is painful to achieve on Oozie). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
[ https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated HBASE-6733: --- Attachment: 6733-3.patch This should address your comments, [~jdcryans] [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2] --- Key: HBASE-6733 URL: https://issues.apache.org/jira/browse/HBASE-6733 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.92.3 Attachments: 6733-1.patch, 6733-2.patch, 6733-3.patch The failure is in TestReplication.queueFailover (fails due to unreplicated rows). I have come across two problems: 1. The sleepMultiplier is not properly reset when the currentPath is changed (in ReplicationSource.java). 2. ReplicationExecutor sometime removes files to replicate from the queue too early, resulting in corresponding edits missing. Here the problem is due to the fact the log-file length that the replication executor finds is not the most updated one, and hence it doesn't read anything from there, and ultimately, when there is a log roll, the replication-queue gets a new entry, and the executor drops the old entry out of the queue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6738) Too aggressive task resubmission from the distributed log manager
[ https://issues.apache.org/jira/browse/HBASE-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468966#comment-13468966 ] Hudson commented on HBASE-6738: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #206 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/206/]) HBASE-6738 Too aggressive task resubmission from the distributed log manager (Revision 1393537) Result = FAILURE nkeywal : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java Too aggressive task resubmission from the distributed log manager - Key: HBASE-6738 URL: https://issues.apache.org/jira/browse/HBASE-6738 Project: HBase Issue Type: Bug Components: master, regionserver Affects Versions: 0.94.1, 0.96.0 Environment: 3 nodes cluster test, but can occur as well on a much bigger one. It's all luck! Reporter: nkeywal Assignee: nkeywal Priority: Critical Fix For: 0.96.0 Attachments: 6738.v1.patch With default settings for hbase.splitlog.manager.timeout = 25s and hbase.splitlog.max.resubmit = 3. On tests mentionned on HBASE-5843, I have variations around this scenario, 0.94 + HDFS 1.0.3: The regionserver in charge of the split does not answer in less than 25s, so it gets interrupted but actually continues. Sometimes, we go out of the number of retry, sometimes not, sometimes we're out of retry, but the as the interrupts were ignored we finish nicely. In the mean time, the same single task is executed in parallel by multiple nodes, increasing the probability to get into race conditions. Details: t0: unplug a box with DN+RS t + x: other boxes are already connected, to their connection starts to dies. Nevertheless, they don't consider this node as suspect. t + 180s: zookeeper - master detects the node as dead. recovery start. It can be less than 180s sometimes it around 150s. t + 180s: distributed split starts. There is only 1 task, it's immediately acquired by a one RS. t + 205s: the RS has multiple errors when splitting, because a datanode is missing as well. The master decides to give the task to someone else. But often the task continues in the first RS. Interrupts are often ignored, as it's well stated in the code (// TODO interrupt often gets swallowed, do what else?) {code} 2012-09-04 18:27:30,404 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: Sending interrupt to stop the worker thread {code} t + 211s: two regionsservers are processing the same task. They fight for the leases: {code} 2012-09-04 18:27:32,004 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch on /hbase/TABLE/4d1c1a4695b1df8c58d13382b834332e/recovered.edits/037.temp owned by DFSClient_hb_rs_BOX2,60020,1346775882980 but is accessed by DFSClient_hb_rs_BOX1,60020,1346775719125 {code} They can fight like this for many files, until the tasks finally get interrupted or finished. The taks on the second box can be cancelled as well. In this case, the task is created again for a new box. The master seems to stop after 3 attemps. It can as well renounce to split the files. Sometimes the tasks were not cancelled on the RS side, so the split is finished despites what the master thinks and logs. In this case, the assignement starts. In the other, it's we've got a problem). {code} 2012-09-04 18:43:52,724 INFO org.apache.hadoop.hbase.master.SplitLogManager: Skipping resubmissions of task /hbase/splitlog/hdfs%3A%2F%2FBOX1%3A9000%2Fhbase%2F.logs%2FBOX0%2C60020%2C1346776587640-splitting%2FBOX0%252C60020%252C1346776587640.1346776587832 because threshold 3 reached {code} t + 300s: split is finished. Assignement starts t + 330s: assignement is finished, regions are available again. There are a lot of subcases possible depending on the number of logs files, of region server and so on. The issues are: 1) it's difficult, especially in HBase but not only, to interrupt a task. The pattern is often {code} void f() throws IOException{ try { // whatever throw InterruptedException }catch(InterruptedException){ throw new InterruptedIOException(); } } boolean g(){ int nbRetry= 0; for(;;) try{ f(); return true; }catch(IOException e){ nbRetry++; if ( nbRetry maxRetry) return
[jira] [Commented] (HBASE-6439) Ignore .archive directory as a table
[ https://issues.apache.org/jira/browse/HBASE-6439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468969#comment-13468969 ] Hadoop QA commented on HBASE-6439: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12547613/hbase-6439-r0.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 12 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 81 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2995//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2995//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2995//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2995//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2995//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2995//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2995//console This message is automatically generated. Ignore .archive directory as a table Key: HBASE-6439 URL: https://issues.apache.org/jira/browse/HBASE-6439 Project: HBase Issue Type: Bug Components: io, regionserver Affects Versions: 0.96.0 Reporter: Jesse Yates Assignee: Jesse Yates Labels: newbie Attachments: hbase-6439-r0.patch From a recent test run: {quote} 2012-07-22 02:27:30,699 WARN [IPC Server handler 0 on 47087] util.FSTableDescriptors(168): The following folder is in HBase's root directory and doesn't contain a table descriptor, do consider deleting it: .archive {quote} With the addition of HBASE-5547, table-level folders are no-longer all table folders. FSTableDescriptors needs to then have a 'gold-list' that we can update with directories that aren't tables so we don't have this kind of thing showing up in the logs. Currently, we have the following block: {quote} invocations++; if (HTableDescriptor.ROOT_TABLEDESC.getNameAsString().equals(tablename)) { cachehits++; return HTableDescriptor.ROOT_TABLEDESC; } if (HTableDescriptor.META_TABLEDESC.getNameAsString().equals(tablename)) { cachehits++; return HTableDescriptor.META_TABLEDESC; } {quote} to handle special cases, but that's a bit clunky and not clean in terms of table-level directories that need to be ignored. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6916) HBA logs at info level errors that won't show in the shell
[ https://issues.apache.org/jira/browse/HBASE-6916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans updated HBASE-6916: -- Attachment: HBASE-6916.patch Attaching the trunk patch. HBA logs at info level errors that won't show in the shell -- Key: HBASE-6916 URL: https://issues.apache.org/jira/browse/HBASE-6916 Project: HBase Issue Type: Bug Components: shell Affects Versions: 0.90.6, 0.92.1, 0.94.1, 0.96.0 Reporter: Jean-Daniel Cryans Priority: Minor Fix For: 0.92.2, 0.94.3, 0.96.0 Attachments: HBASE-6916-0.94.patch, HBASE-6916.patch There is a weird interaction between the shell and HBA. When you try to close a region that doesn't exist, it doesn't throw any error: {noformat} hbase(main):029:0 close_region 'thisisaninvalidregion' 0 row(s) in 0.0580 seconds {noformat} Normally one should get UnknownRegionException. Starting the shell with -d I see what a non-shell user would see along with a ton of logging from ZK (skipped here): {noformat} INFO client.HBaseAdmin: No server in .META. for thisisaninvalidregion; pair=null {noformat} But again this is not the right message, it should have shown {noformat} INFO client.HBaseAdmin: No server in .META. for thisisaninvalidregion; pair=null {noformat} And this is because that part of the code treats both UnknownRegionException and NoServerForRegionException like if it was the same thing. There is also some ugliness in flush, compact, and split but it normally doesn't show since the code treats everything like it's a table and sends a TableNotFoundException. This jira is about making sure that the exceptions are correctly coming out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6916) HBA logs at info level errors that won't show in the shell
[ https://issues.apache.org/jira/browse/HBASE-6916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans updated HBASE-6916: -- Fix Version/s: (was: 0.90.7) Assignee: Jean-Daniel Cryans Status: Patch Available (was: Open) HBA logs at info level errors that won't show in the shell -- Key: HBASE-6916 URL: https://issues.apache.org/jira/browse/HBASE-6916 Project: HBase Issue Type: Bug Components: shell Affects Versions: 0.94.1, 0.92.1, 0.90.6, 0.96.0 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Priority: Minor Fix For: 0.94.3, 0.96.0, 0.92.2 Attachments: HBASE-6916-0.94.patch, HBASE-6916.patch There is a weird interaction between the shell and HBA. When you try to close a region that doesn't exist, it doesn't throw any error: {noformat} hbase(main):029:0 close_region 'thisisaninvalidregion' 0 row(s) in 0.0580 seconds {noformat} Normally one should get UnknownRegionException. Starting the shell with -d I see what a non-shell user would see along with a ton of logging from ZK (skipped here): {noformat} INFO client.HBaseAdmin: No server in .META. for thisisaninvalidregion; pair=null {noformat} But again this is not the right message, it should have shown {noformat} INFO client.HBaseAdmin: No server in .META. for thisisaninvalidregion; pair=null {noformat} And this is because that part of the code treats both UnknownRegionException and NoServerForRegionException like if it was the same thing. There is also some ugliness in flush, compact, and split but it normally doesn't show since the code treats everything like it's a table and sends a TableNotFoundException. This jira is about making sure that the exceptions are correctly coming out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-6883) CleanerChore treats .archive as a table and throws TableInfoMissingException
[ https://issues.apache.org/jira/browse/HBASE-6883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang resolved HBASE-6883. Resolution: Duplicate Duplicate of HBASE-6439 CleanerChore treats .archive as a table and throws TableInfoMissingException Key: HBASE-6883 URL: https://issues.apache.org/jira/browse/HBASE-6883 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang {noformat} 2012-09-25 14:52:21,902 DEBUG org.apache.hadoop.hbase.util.FSTableDescriptors: Exception during readTableDecriptor. Current table name = .archive org.apache.hadoop.hbase.TableInfoMissingException: No .tableinfo file under hdfs://c0322.hal.cloudera.com:56020/hbase/.archive at org.apache.hadoop.hbase.util.FSTableDescriptors.getTableDescriptor(FSTableDescriptors.java:417) at org.apache.hadoop.hbase.util.FSTableDescriptors.getTableDescriptor(FSTableDescriptors.java:408) at org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:170) at org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:201) at org.apache.hadoop.hbase.master.HMaster.getTableDescriptors(HMaster.java:2205) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.ProtobufRpcEngine$Server.call(ProtobufRpcEngine.java:357) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1816) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6439) Ignore .archive directory as a table
[ https://issues.apache.org/jira/browse/HBASE-6439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468982#comment-13468982 ] stack commented on HBASE-6439: -- [~jesse_yates] Did this test org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient fail? Ignore .archive directory as a table Key: HBASE-6439 URL: https://issues.apache.org/jira/browse/HBASE-6439 Project: HBase Issue Type: Bug Components: io, regionserver Affects Versions: 0.96.0 Reporter: Jesse Yates Assignee: Jesse Yates Labels: newbie Attachments: hbase-6439-r0.patch From a recent test run: {quote} 2012-07-22 02:27:30,699 WARN [IPC Server handler 0 on 47087] util.FSTableDescriptors(168): The following folder is in HBase's root directory and doesn't contain a table descriptor, do consider deleting it: .archive {quote} With the addition of HBASE-5547, table-level folders are no-longer all table folders. FSTableDescriptors needs to then have a 'gold-list' that we can update with directories that aren't tables so we don't have this kind of thing showing up in the logs. Currently, we have the following block: {quote} invocations++; if (HTableDescriptor.ROOT_TABLEDESC.getNameAsString().equals(tablename)) { cachehits++; return HTableDescriptor.ROOT_TABLEDESC; } if (HTableDescriptor.META_TABLEDESC.getNameAsString().equals(tablename)) { cachehits++; return HTableDescriptor.META_TABLEDESC; } {quote} to handle special cases, but that's a bit clunky and not clean in terms of table-level directories that need to be ignored. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468986#comment-13468986 ] Devaraj Das commented on HBASE-6758: In the trunk case, I think something better can be done (and the interface changes can be avoided). Replication.postLogRoll could do the enqueue of the new path in the ReplicationSource's queue. The Replication.preLogRoll would do everything else (creating ZK entries, etc.) except the enqueuing of the path in the queue.. The postLogRoll is currently called before the writer is reset (to _nextWriter_) in FSHLog.rollWriter. I propose that it be called after the writer is reset. That in my opinion seems to be a more precise place for calling postLogRoll.. Thoughts? [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file --- Key: HBASE-6758 URL: https://issues.apache.org/jira/browse/HBASE-6758 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Priority: Critical Fix For: 0.96.0 Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 6758-trunk-1.patch, TEST-org.apache.hadoop.hbase.replication.TestReplication.xml I have seen cases where the replication-executor would lose data to replicate since the file hasn't been closed yet. Upon closing, the new data becomes visible. Before that happens the ZK node shouldn't be deleted in ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6916) HBA logs at info level errors that won't show in the shell
[ https://issues.apache.org/jira/browse/HBASE-6916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468988#comment-13468988 ] Jimmy Xiang commented on HBASE-6916: +1 HBA logs at info level errors that won't show in the shell -- Key: HBASE-6916 URL: https://issues.apache.org/jira/browse/HBASE-6916 Project: HBase Issue Type: Bug Components: shell Affects Versions: 0.90.6, 0.92.1, 0.94.1, 0.96.0 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Priority: Minor Fix For: 0.92.2, 0.94.3, 0.96.0 Attachments: HBASE-6916-0.94.patch, HBASE-6916.patch There is a weird interaction between the shell and HBA. When you try to close a region that doesn't exist, it doesn't throw any error: {noformat} hbase(main):029:0 close_region 'thisisaninvalidregion' 0 row(s) in 0.0580 seconds {noformat} Normally one should get UnknownRegionException. Starting the shell with -d I see what a non-shell user would see along with a ton of logging from ZK (skipped here): {noformat} INFO client.HBaseAdmin: No server in .META. for thisisaninvalidregion; pair=null {noformat} But again this is not the right message, it should have shown {noformat} INFO client.HBaseAdmin: No server in .META. for thisisaninvalidregion; pair=null {noformat} And this is because that part of the code treats both UnknownRegionException and NoServerForRegionException like if it was the same thing. There is also some ugliness in flush, compact, and split but it normally doesn't show since the code treats everything like it's a table and sends a TableNotFoundException. This jira is about making sure that the exceptions are correctly coming out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6872) Number of records written/read per second on regionserver level
[ https://issues.apache.org/jira/browse/HBASE-6872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-6872: --- Attachment: D5853.1.patch mbautin requested code review of [jira] [HBASE-6872] [89-fb] Fix TestRegionServerMetrics.testNumReadsAndWrites. Reviewers: Kannan, Karthik, JIRA rHBASEEIGHTNINEFBBRANCH1389841 introduced an unstable test in TestRegionServerMetrics: testNumReadsAndWrites. Read and write counters should be reset to zero before starting the test. TEST PLAN Run TestRegionServerMetrics REVISION DETAIL https://reviews.facebook.net/D5853 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/13863/ To: Kannan, Karthik, JIRA, mbautin Number of records written/read per second on regionserver level --- Key: HBASE-6872 URL: https://issues.apache.org/jira/browse/HBASE-6872 Project: HBase Issue Type: New Feature Components: regionserver Reporter: Adela Maznikar Priority: Minor Attachments: D5853.1.patch Regionserver level metrics that shows the number of records written/read per second. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6439) Ignore .archive directory as a table
[ https://issues.apache.org/jira/browse/HBASE-6439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468985#comment-13468985 ] stack commented on HBASE-6439: -- Otherwise, +1 on patch. This gets rid of this issue Jesse when I start up hbase? {code} 2012-10-03 12:34:28,515 DEBUG org.apache.hadoop.hbase.util.FSTableDescriptors: Exception during readTableDecriptor. Current table name = .archive org.apache.hadoop.hbase.TableInfoMissingException: No .tableinfo file under file:/Users/Stack/Downloads/hbase-stack/hbase/.archive {code} What you reckon of the test failure? Ignore .archive directory as a table Key: HBASE-6439 URL: https://issues.apache.org/jira/browse/HBASE-6439 Project: HBase Issue Type: Bug Components: io, regionserver Affects Versions: 0.96.0 Reporter: Jesse Yates Assignee: Jesse Yates Labels: newbie Attachments: hbase-6439-r0.patch From a recent test run: {quote} 2012-07-22 02:27:30,699 WARN [IPC Server handler 0 on 47087] util.FSTableDescriptors(168): The following folder is in HBase's root directory and doesn't contain a table descriptor, do consider deleting it: .archive {quote} With the addition of HBASE-5547, table-level folders are no-longer all table folders. FSTableDescriptors needs to then have a 'gold-list' that we can update with directories that aren't tables so we don't have this kind of thing showing up in the logs. Currently, we have the following block: {quote} invocations++; if (HTableDescriptor.ROOT_TABLEDESC.getNameAsString().equals(tablename)) { cachehits++; return HTableDescriptor.ROOT_TABLEDESC; } if (HTableDescriptor.META_TABLEDESC.getNameAsString().equals(tablename)) { cachehits++; return HTableDescriptor.META_TABLEDESC; } {quote} to handle special cases, but that's a bit clunky and not clean in terms of table-level directories that need to be ignored. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6872) Number of records written/read per second on regionserver level
[ https://issues.apache.org/jira/browse/HBASE-6872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468995#comment-13468995 ] Phabricator commented on HBASE-6872: Kannan has added CCs to the revision [jira] [HBASE-6872] [89-fb] Fix TestRegionServerMetrics.testNumReadsAndWrites. Added CCs: adela, Liyin, aaiyer, avf REVISION DETAIL https://reviews.facebook.net/D5853 To: Kannan, Karthik, JIRA, mbautin Cc: adela, Liyin, aaiyer, avf Number of records written/read per second on regionserver level --- Key: HBASE-6872 URL: https://issues.apache.org/jira/browse/HBASE-6872 Project: HBase Issue Type: New Feature Components: regionserver Reporter: Adela Maznikar Priority: Minor Attachments: D5853.1.patch Regionserver level metrics that shows the number of records written/read per second. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira