[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.
[ https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621838#comment-13621838 ] Hudson commented on HBASE-8229: --- Integrated in HBase-TRUNK #4010 (See [https://builds.apache.org/job/HBase-TRUNK/4010/]) HBASE-8229 Replication code logs like crazy if a target table cannot be found. (Revision 1464278) Result = FAILURE larsh : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java Replication code logs like crazy if a target table cannot be found. --- Key: HBASE-8229 URL: https://issues.apache.org/jira/browse/HBASE-8229 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.95.1, 0.98.0, 0.94.7 Attachments: 8229-0.94.txt One of our RS/DN machines ran out of diskspace on the partition to which we write the log files. It turns out we still had a table in our source cluster with REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster. In then logged a long stack trace every 50ms or so, over a few days that filled up our log partition. Since ReplicationSource cannot make any progress in this case anyway, it should probably sleep a bit before retrying (or at least limit the rate at which it spews out these exceptions to the log). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.
[ https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621859#comment-13621859 ] Hudson commented on HBASE-8229: --- Integrated in hbase-0.95 #122 (See [https://builds.apache.org/job/hbase-0.95/122/]) HBASE-8229 Replication code logs like crazy if a target table cannot be found. (Revision 1464273) Result = FAILURE larsh : Files : * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java Replication code logs like crazy if a target table cannot be found. --- Key: HBASE-8229 URL: https://issues.apache.org/jira/browse/HBASE-8229 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.95.1, 0.98.0, 0.94.7 Attachments: 8229-0.94.txt One of our RS/DN machines ran out of diskspace on the partition to which we write the log files. It turns out we still had a table in our source cluster with REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster. In then logged a long stack trace every 50ms or so, over a few days that filled up our log partition. Since ReplicationSource cannot make any progress in this case anyway, it should probably sleep a bit before retrying (or at least limit the rate at which it spews out these exceptions to the log). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.
[ https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621919#comment-13621919 ] Hudson commented on HBASE-8229: --- Integrated in HBase-0.94 #939 (See [https://builds.apache.org/job/HBase-0.94/939/]) HBASE-8229 Replication code logs like crazy if a target table cannot be found. (Revision 1464275) Result = FAILURE larsh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java Replication code logs like crazy if a target table cannot be found. --- Key: HBASE-8229 URL: https://issues.apache.org/jira/browse/HBASE-8229 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.95.1, 0.98.0, 0.94.7 Attachments: 8229-0.94.txt One of our RS/DN machines ran out of diskspace on the partition to which we write the log files. It turns out we still had a table in our source cluster with REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster. In then logged a long stack trace every 50ms or so, over a few days that filled up our log partition. Since ReplicationSource cannot make any progress in this case anyway, it should probably sleep a bit before retrying (or at least limit the rate at which it spews out these exceptions to the log). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.
[ https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13622180#comment-13622180 ] Hudson commented on HBASE-8229: --- Integrated in hbase-0.95-on-hadoop2 #54 (See [https://builds.apache.org/job/hbase-0.95-on-hadoop2/54/]) HBASE-8229 Replication code logs like crazy if a target table cannot be found. (Revision 1464273) Result = FAILURE larsh : Files : * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java Replication code logs like crazy if a target table cannot be found. --- Key: HBASE-8229 URL: https://issues.apache.org/jira/browse/HBASE-8229 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.95.1, 0.98.0, 0.94.7 Attachments: 8229-0.94.txt One of our RS/DN machines ran out of diskspace on the partition to which we write the log files. It turns out we still had a table in our source cluster with REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster. In then logged a long stack trace every 50ms or so, over a few days that filled up our log partition. Since ReplicationSource cannot make any progress in this case anyway, it should probably sleep a bit before retrying (or at least limit the rate at which it spews out these exceptions to the log). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.
[ https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13622128#comment-13622128 ] Hudson commented on HBASE-8229: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #476 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/476/]) HBASE-8229 Replication code logs like crazy if a target table cannot be found. (Revision 1464278) Result = FAILURE larsh : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java Replication code logs like crazy if a target table cannot be found. --- Key: HBASE-8229 URL: https://issues.apache.org/jira/browse/HBASE-8229 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.95.1, 0.98.0, 0.94.7 Attachments: 8229-0.94.txt One of our RS/DN machines ran out of diskspace on the partition to which we write the log files. It turns out we still had a table in our source cluster with REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster. In then logged a long stack trace every 50ms or so, over a few days that filled up our log partition. Since ReplicationSource cannot make any progress in this case anyway, it should probably sleep a bit before retrying (or at least limit the rate at which it spews out these exceptions to the log). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.
[ https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13622799#comment-13622799 ] Hudson commented on HBASE-8229: --- Integrated in HBase-0.94-security #130 (See [https://builds.apache.org/job/HBase-0.94-security/130/]) HBASE-8229 Replication code logs like crazy if a target table cannot be found. (Revision 1464275) Result = FAILURE larsh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java Replication code logs like crazy if a target table cannot be found. --- Key: HBASE-8229 URL: https://issues.apache.org/jira/browse/HBASE-8229 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.95.1, 0.98.0, 0.94.7 Attachments: 8229-0.94.txt One of our RS/DN machines ran out of diskspace on the partition to which we write the log files. It turns out we still had a table in our source cluster with REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster. In then logged a long stack trace every 50ms or so, over a few days that filled up our log partition. Since ReplicationSource cannot make any progress in this case anyway, it should probably sleep a bit before retrying (or at least limit the rate at which it spews out these exceptions to the log). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.
[ https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13623187#comment-13623187 ] Hudson commented on HBASE-8229: --- Integrated in HBase-0.94-security-on-Hadoop-23 #13 (See [https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/13/]) HBASE-8229 Replication code logs like crazy if a target table cannot be found. (Revision 1464275) Result = FAILURE larsh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java Replication code logs like crazy if a target table cannot be found. --- Key: HBASE-8229 URL: https://issues.apache.org/jira/browse/HBASE-8229 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.95.1, 0.98.0, 0.94.7 Attachments: 8229-0.94.txt One of our RS/DN machines ran out of diskspace on the partition to which we write the log files. It turns out we still had a table in our source cluster with REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster. In then logged a long stack trace every 50ms or so, over a few days that filled up our log partition. Since ReplicationSource cannot make any progress in this case anyway, it should probably sleep a bit before retrying (or at least limit the rate at which it spews out these exceptions to the log). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.
[ https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620666#comment-13620666 ] stack commented on HBASE-8229: -- +1 on first version of patch. Its an improvment (What you think [~jeason]?) [~himan...@cloudera.com] Nice idea on exposing replication state in UI. Replication code logs like crazy if a target table cannot be found. --- Key: HBASE-8229 URL: https://issues.apache.org/jira/browse/HBASE-8229 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.95.0, 0.98.0, 0.94.7 Attachments: 8229-0.94.txt, 8229-0.94-V2.txt One of our RS/DN machines ran out of diskspace on the partition to which we write the log files. It turns out we still had a table in our source cluster with REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster. In then logged a long stack trace every 50ms or so, over a few days that filled up our log partition. Since ReplicationSource cannot make any progress in this case anyway, it should probably sleep a bit before retrying (or at least limit the rate at which it spews out these exceptions to the log). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.
[ https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620667#comment-13620667 ] Jieshan Bean commented on HBASE-8229: - +1 on first version of patch. Replication code logs like crazy if a target table cannot be found. --- Key: HBASE-8229 URL: https://issues.apache.org/jira/browse/HBASE-8229 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.95.1, 0.98.0, 0.94.7 Attachments: 8229-0.94.txt, 8229-0.94-V2.txt One of our RS/DN machines ran out of diskspace on the partition to which we write the log files. It turns out we still had a table in our source cluster with REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster. In then logged a long stack trace every 50ms or so, over a few days that filled up our log partition. Since ReplicationSource cannot make any progress in this case anyway, it should probably sleep a bit before retrying (or at least limit the rate at which it spews out these exceptions to the log). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.
[ https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619722#comment-13619722 ] Jieshan Bean commented on HBASE-8229: - bq.The idea is to return back into the run() loop of ReplicationSource, so that the edits are rechecked (and not shipped to the peer if the local table's status has changed). I didn't see anywhere do this re-check, hope I misread the code:). Even if local tabls' replication status has been changed, ReplicationSource still has the responsibility to replicate all the edits before the time of table got changed, right? So I prefer to not return back directly. Just let it retry and sleep until that table be created. Replication code logs like crazy if a target table cannot be found. --- Key: HBASE-8229 URL: https://issues.apache.org/jira/browse/HBASE-8229 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.95.0, 0.98.0, 0.94.7 Attachments: 8229-0.94.txt, 8229-0.94-V2.txt One of our RS/DN machines ran out of diskspace on the partition to which we write the log files. It turns out we still had a table in our source cluster with REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster. In then logged a long stack trace every 50ms or so, over a few days that filled up our log partition. Since ReplicationSource cannot make any progress in this case anyway, it should probably sleep a bit before retrying (or at least limit the rate at which it spews out these exceptions to the log). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.
[ https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620484#comment-13620484 ] Lars Hofhansl commented on HBASE-8229: -- Wouldn't it recreate the set of edits to ship in readAllEntriesToReplicateOrNextFile(...) called from run(). To just last point. Hmm... I see your point. If the table did not exist in the peer cluster *and* the local replication_scope was changed to 0, do we still want to ship the data? I am just concerned that replication will not be able to make any progress for any table, until the table is created on the peer cluster and there is no other way out of this situation. Replication code logs like crazy if a target table cannot be found. --- Key: HBASE-8229 URL: https://issues.apache.org/jira/browse/HBASE-8229 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.95.0, 0.98.0, 0.94.7 Attachments: 8229-0.94.txt, 8229-0.94-V2.txt One of our RS/DN machines ran out of diskspace on the partition to which we write the log files. It turns out we still had a table in our source cluster with REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster. In then logged a long stack trace every 50ms or so, over a few days that filled up our log partition. Since ReplicationSource cannot make any progress in this case anyway, it should probably sleep a bit before retrying (or at least limit the rate at which it spews out these exceptions to the log). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.
[ https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620499#comment-13620499 ] Jieshan Bean commented on HBASE-8229: - bq.Wouldn't it recreate the set of edits to ship in readAllEntriesToReplicateOrNextFile(...) called from run(). Yes, it will read and recreate the set again. But it's the same set as the previous one. The current logic in removeNonReplicableEdits only check the scope property which owned by the edit itself, not the table scope. Replication code logs like crazy if a target table cannot be found. --- Key: HBASE-8229 URL: https://issues.apache.org/jira/browse/HBASE-8229 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.95.0, 0.98.0, 0.94.7 Attachments: 8229-0.94.txt, 8229-0.94-V2.txt One of our RS/DN machines ran out of diskspace on the partition to which we write the log files. It turns out we still had a table in our source cluster with REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster. In then logged a long stack trace every 50ms or so, over a few days that filled up our log partition. Since ReplicationSource cannot make any progress in this case anyway, it should probably sleep a bit before retrying (or at least limit the rate at which it spews out these exceptions to the log). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.
[ https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620560#comment-13620560 ] Lars Hofhansl commented on HBASE-8229: -- You are right. OK. In that case I would like to propose just the first version of this patch. Replication code logs like crazy if a target table cannot be found. --- Key: HBASE-8229 URL: https://issues.apache.org/jira/browse/HBASE-8229 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.95.0, 0.98.0, 0.94.7 Attachments: 8229-0.94.txt, 8229-0.94-V2.txt One of our RS/DN machines ran out of diskspace on the partition to which we write the log files. It turns out we still had a table in our source cluster with REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster. In then logged a long stack trace every 50ms or so, over a few days that filled up our log partition. Since ReplicationSource cannot make any progress in this case anyway, it should probably sleep a bit before retrying (or at least limit the rate at which it spews out these exceptions to the log). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.
[ https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618946#comment-13618946 ] Chris Trezzo commented on HBASE-8229: - bq. For this issue, I'll just add the same waiting we do when the peer is down (which is the same logical behavior we currently have, but without the insane busy retrying). +1 Replication code logs like crazy if a target table cannot be found. --- Key: HBASE-8229 URL: https://issues.apache.org/jira/browse/HBASE-8229 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.95.0, 0.98.0, 0.94.7 One of our RS/DN machines ran out of diskspace on the partition to which we write the log files. It turns out we still had a table in our source cluster with REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster. In then logged a long stack trace every 50ms or so, over a few days that filled up our log partition. Since ReplicationSource cannot make any progress in this case anyway, it should probably sleep a bit before retrying (or at least limit the rate at which it spews out these exceptions to the log). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.
[ https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618948#comment-13618948 ] Himanshu Vashishtha commented on HBASE-8229: Yea, I wonder if we show a replication tab on master UI or somewhere which shows some kind of replication state. Basic errors like table not existing on slave, missing family, or may be exception thrown by the slave at peer level are shown. That should give some basic idea to the user. What others think about this? Replication code logs like crazy if a target table cannot be found. --- Key: HBASE-8229 URL: https://issues.apache.org/jira/browse/HBASE-8229 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.95.0, 0.98.0, 0.94.7 One of our RS/DN machines ran out of diskspace on the partition to which we write the log files. It turns out we still had a table in our source cluster with REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster. In then logged a long stack trace every 50ms or so, over a few days that filled up our log partition. Since ReplicationSource cannot make any progress in this case anyway, it should probably sleep a bit before retrying (or at least limit the rate at which it spews out these exceptions to the log). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.
[ https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618956#comment-13618956 ] Chris Trezzo commented on HBASE-8229: - I like the idea of a basic admin tab a lot. Also, maybe the list of peer clusters being replicated to, the ability to dump a list of hlogs in a queue, etc. etc. Replication code logs like crazy if a target table cannot be found. --- Key: HBASE-8229 URL: https://issues.apache.org/jira/browse/HBASE-8229 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.95.0, 0.98.0, 0.94.7 One of our RS/DN machines ran out of diskspace on the partition to which we write the log files. It turns out we still had a table in our source cluster with REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster. In then logged a long stack trace every 50ms or so, over a few days that filled up our log partition. Since ReplicationSource cannot make any progress in this case anyway, it should probably sleep a bit before retrying (or at least limit the rate at which it spews out these exceptions to the log). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.
[ https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618964#comment-13618964 ] Himanshu Vashishtha commented on HBASE-8229: Hey Chris: Thanks for response :) I'll create a jira for that and add it to the description. For a dump of list of hlogs, we do have it in zkdump link on the master UI (HBASE-7540). Replication code logs like crazy if a target table cannot be found. --- Key: HBASE-8229 URL: https://issues.apache.org/jira/browse/HBASE-8229 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.95.0, 0.98.0, 0.94.7 One of our RS/DN machines ran out of diskspace on the partition to which we write the log files. It turns out we still had a table in our source cluster with REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster. In then logged a long stack trace every 50ms or so, over a few days that filled up our log partition. Since ReplicationSource cannot make any progress in this case anyway, it should probably sleep a bit before retrying (or at least limit the rate at which it spews out these exceptions to the log). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.
[ https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618970#comment-13618970 ] Chris Trezzo commented on HBASE-8229: - Good point, I guess that would be sort of redundant. Although, I was thinking of something that isn't tied to ZK (i.e. a way to view the outstanding hlogs that need to be replicated without having to understand the structure of the ZK nodes or how replication queues are stored). Replication code logs like crazy if a target table cannot be found. --- Key: HBASE-8229 URL: https://issues.apache.org/jira/browse/HBASE-8229 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.95.0, 0.98.0, 0.94.7 One of our RS/DN machines ran out of diskspace on the partition to which we write the log files. It turns out we still had a table in our source cluster with REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster. In then logged a long stack trace every 50ms or so, over a few days that filled up our log partition. Since ReplicationSource cannot make any progress in this case anyway, it should probably sleep a bit before retrying (or at least limit the rate at which it spews out these exceptions to the log). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.
[ https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619486#comment-13619486 ] Lars Hofhansl commented on HBASE-8229: -- +1 on some kind of replication status info on the RS UI page. Replication code logs like crazy if a target table cannot be found. --- Key: HBASE-8229 URL: https://issues.apache.org/jira/browse/HBASE-8229 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.95.0, 0.98.0, 0.94.7 One of our RS/DN machines ran out of diskspace on the partition to which we write the log files. It turns out we still had a table in our source cluster with REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster. In then logged a long stack trace every 50ms or so, over a few days that filled up our log partition. Since ReplicationSource cannot make any progress in this case anyway, it should probably sleep a bit before retrying (or at least limit the rate at which it spews out these exceptions to the log). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.
[ https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619487#comment-13619487 ] Jieshan Bean commented on HBASE-8229: - Yes, it's really a good idea. Replication code logs like crazy if a target table cannot be found. --- Key: HBASE-8229 URL: https://issues.apache.org/jira/browse/HBASE-8229 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.95.0, 0.98.0, 0.94.7 One of our RS/DN machines ran out of diskspace on the partition to which we write the log files. It turns out we still had a table in our source cluster with REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster. In then logged a long stack trace every 50ms or so, over a few days that filled up our log partition. Since ReplicationSource cannot make any progress in this case anyway, it should probably sleep a bit before retrying (or at least limit the rate at which it spews out these exceptions to the log). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.
[ https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618313#comment-13618313 ] Jean-Marc Spaggiari commented on HBASE-8229: [~jeason], I should have put a question mark at the end ;) Was more a question than an affirmation. I was thinking like if the node which is trying to create the table on the other side fails (power suply, etc.), is another one going to take is place? anyway, adding the delay will fix the crazy loggin issue. Replication code logs like crazy if a target table cannot be found. --- Key: HBASE-8229 URL: https://issues.apache.org/jira/browse/HBASE-8229 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 0.95.0, 0.98.0, 0.94.7 One of our RS/DN machines ran out of diskspace on the partition to which we write the log files. It turns out we still had a table in our source cluster with REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster. In then logged a long stack trace every 50ms or so, over a few days that filled up our log partition. Since ReplicationSource cannot make any progress in this case anyway, it should probably sleep a bit before retrying (or at least limit the rate at which it spews out these exceptions to the log). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.
[ https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618559#comment-13618559 ] Jieshan Bean commented on HBASE-8229: - bq.For this issue, I'll just add the same waiting we do when the peer is down (which is the same logical behavior we currently have, but without the insane busy retrying). +1 Replication code logs like crazy if a target table cannot be found. --- Key: HBASE-8229 URL: https://issues.apache.org/jira/browse/HBASE-8229 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.95.0, 0.98.0, 0.94.7 One of our RS/DN machines ran out of diskspace on the partition to which we write the log files. It turns out we still had a table in our source cluster with REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster. In then logged a long stack trace every 50ms or so, over a few days that filled up our log partition. Since ReplicationSource cannot make any progress in this case anyway, it should probably sleep a bit before retrying (or at least limit the rate at which it spews out these exceptions to the log). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.
[ https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618003#comment-13618003 ] Jieshan Bean commented on HBASE-8229: - I suggest to let ReplicationSource wait if one replicating table is not present, likes the scenario of peer cluster is unavailable. Replication code logs like crazy if a target table cannot be found. --- Key: HBASE-8229 URL: https://issues.apache.org/jira/browse/HBASE-8229 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 0.95.0, 0.98.0, 0.94.7 One of our RS/DN machines ran out of diskspace on the partition to which we write the log files. It turns out we still had a table in our source cluster with REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster. In then logged a long stack trace every 50ms or so, over a few days that filled up our log partition. Since ReplicationSource cannot make any progress in this case anyway, it should probably sleep a bit before retrying (or at least limit the rate at which it spews out these exceptions to the log). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.
[ https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618070#comment-13618070 ] Jean-Marc Spaggiari commented on HBASE-8229: Or should it not skip it? To make sure the rest is replicated correctly? And add a timeout for this specific table so it's not retried before a certain time? Replication code logs like crazy if a target table cannot be found. --- Key: HBASE-8229 URL: https://issues.apache.org/jira/browse/HBASE-8229 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 0.95.0, 0.98.0, 0.94.7 One of our RS/DN machines ran out of diskspace on the partition to which we write the log files. It turns out we still had a table in our source cluster with REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster. In then logged a long stack trace every 50ms or so, over a few days that filled up our log partition. Since ReplicationSource cannot make any progress in this case anyway, it should probably sleep a bit before retrying (or at least limit the rate at which it spews out these exceptions to the log). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.
[ https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618164#comment-13618164 ] Lars Hofhansl commented on HBASE-8229: -- If we skip, we can never go back a reapply these edits; I don't think that would be a good idea. Like Jieshan I was thinking we should treat like the peer cluster is not available (at least in terms waiting - we do not have the rechoose the sinks in that case). Replication code logs like crazy if a target table cannot be found. --- Key: HBASE-8229 URL: https://issues.apache.org/jira/browse/HBASE-8229 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 0.95.0, 0.98.0, 0.94.7 One of our RS/DN machines ran out of diskspace on the partition to which we write the log files. It turns out we still had a table in our source cluster with REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster. In then logged a long stack trace every 50ms or so, over a few days that filled up our log partition. Since ReplicationSource cannot make any progress in this case anyway, it should probably sleep a bit before retrying (or at least limit the rate at which it spews out these exceptions to the log). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.
[ https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618187#comment-13618187 ] Jean-Marc Spaggiari commented on HBASE-8229: I see. But can we simply wait again and again until the table is created on the other side? At some point, if there is any failure, we will still miss the edits. Also, what if finally the table is deleted in the source cluster? Replication code logs like crazy if a target table cannot be found. --- Key: HBASE-8229 URL: https://issues.apache.org/jira/browse/HBASE-8229 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 0.95.0, 0.98.0, 0.94.7 One of our RS/DN machines ran out of diskspace on the partition to which we write the log files. It turns out we still had a table in our source cluster with REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster. In then logged a long stack trace every 50ms or so, over a few days that filled up our log partition. Since ReplicationSource cannot make any progress in this case anyway, it should probably sleep a bit before retrying (or at least limit the rate at which it spews out these exceptions to the log). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.
[ https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618207#comment-13618207 ] Jieshan Bean commented on HBASE-8229: - bq.But can we simply wait again and again until the table is created on the other side? I'm afraid we should do that. Unless we add a mechanism to check whether a table has already been deleted. But I think ReplicationSource still has the responsibility to finish all the rest edits. Any skip may cause data-loss. I think the most probable scenario of this problem is we forgot to create table for sink side. bq. At some point, if there is any failure, we will still miss the edits. [~jmspaggi] Can you show me one scenario? :) Replication code logs like crazy if a target table cannot be found. --- Key: HBASE-8229 URL: https://issues.apache.org/jira/browse/HBASE-8229 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 0.95.0, 0.98.0, 0.94.7 One of our RS/DN machines ran out of diskspace on the partition to which we write the log files. It turns out we still had a table in our source cluster with REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster. In then logged a long stack trace every 50ms or so, over a few days that filled up our log partition. Since ReplicationSource cannot make any progress in this case anyway, it should probably sleep a bit before retrying (or at least limit the rate at which it spews out these exceptions to the log). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.
[ https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618218#comment-13618218 ] Lars Hofhansl commented on HBASE-8229: -- yeah, I think that is out of the question. The replication marks the last edit it sent to the peer(s); if we skip edits we would have to keep track of ranges of edit that are still outstanding, or lose replication data. Presumably if you marked a table with REPLCATION_SCOPE = 1 you want that table's data to replicated. An admin can fix this by dropping the table or to setting REPLICATION_SCOPE back to 0 (at least that is how should work - need to look at the code again). For this issue, I'll just add the same waiting we do when the peer is down (which is the same logical behavior we currently have, but without the insane busy retrying). Replication code logs like crazy if a target table cannot be found. --- Key: HBASE-8229 URL: https://issues.apache.org/jira/browse/HBASE-8229 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 0.95.0, 0.98.0, 0.94.7 One of our RS/DN machines ran out of diskspace on the partition to which we write the log files. It turns out we still had a table in our source cluster with REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster. In then logged a long stack trace every 50ms or so, over a few days that filled up our log partition. Since ReplicationSource cannot make any progress in this case anyway, it should probably sleep a bit before retrying (or at least limit the rate at which it spews out these exceptions to the log). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira