[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-04-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621838#comment-13621838
 ] 

Hudson commented on HBASE-8229:
---

Integrated in HBase-TRUNK #4010 (See 
[https://builds.apache.org/job/HBase-TRUNK/4010/])
HBASE-8229 Replication code logs like crazy if a target table cannot be 
found. (Revision 1464278)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java


 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
 Fix For: 0.95.1, 0.98.0, 0.94.7

 Attachments: 8229-0.94.txt


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-04-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621859#comment-13621859
 ] 

Hudson commented on HBASE-8229:
---

Integrated in hbase-0.95 #122 (See 
[https://builds.apache.org/job/hbase-0.95/122/])
HBASE-8229 Replication code logs like crazy if a target table cannot be 
found. (Revision 1464273)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java


 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
 Fix For: 0.95.1, 0.98.0, 0.94.7

 Attachments: 8229-0.94.txt


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-04-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621919#comment-13621919
 ] 

Hudson commented on HBASE-8229:
---

Integrated in HBase-0.94 #939 (See 
[https://builds.apache.org/job/HBase-0.94/939/])
HBASE-8229 Replication code logs like crazy if a target table cannot be 
found. (Revision 1464275)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java


 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
 Fix For: 0.95.1, 0.98.0, 0.94.7

 Attachments: 8229-0.94.txt


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-04-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13622180#comment-13622180
 ] 

Hudson commented on HBASE-8229:
---

Integrated in hbase-0.95-on-hadoop2 #54 (See 
[https://builds.apache.org/job/hbase-0.95-on-hadoop2/54/])
HBASE-8229 Replication code logs like crazy if a target table cannot be 
found. (Revision 1464273)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java


 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
 Fix For: 0.95.1, 0.98.0, 0.94.7

 Attachments: 8229-0.94.txt


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-04-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13622128#comment-13622128
 ] 

Hudson commented on HBASE-8229:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #476 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/476/])
HBASE-8229 Replication code logs like crazy if a target table cannot be 
found. (Revision 1464278)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java


 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
 Fix For: 0.95.1, 0.98.0, 0.94.7

 Attachments: 8229-0.94.txt


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-04-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13622799#comment-13622799
 ] 

Hudson commented on HBASE-8229:
---

Integrated in HBase-0.94-security #130 (See 
[https://builds.apache.org/job/HBase-0.94-security/130/])
HBASE-8229 Replication code logs like crazy if a target table cannot be 
found. (Revision 1464275)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java


 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
 Fix For: 0.95.1, 0.98.0, 0.94.7

 Attachments: 8229-0.94.txt


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-04-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13623187#comment-13623187
 ] 

Hudson commented on HBASE-8229:
---

Integrated in HBase-0.94-security-on-Hadoop-23 #13 (See 
[https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/13/])
HBASE-8229 Replication code logs like crazy if a target table cannot be 
found. (Revision 1464275)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java


 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
 Fix For: 0.95.1, 0.98.0, 0.94.7

 Attachments: 8229-0.94.txt


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-04-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620666#comment-13620666
 ] 

stack commented on HBASE-8229:
--

+1 on first version of patch.  Its an improvment (What you think [~jeason]?)

[~himan...@cloudera.com] Nice idea on exposing replication state in UI.

 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
 Fix For: 0.95.0, 0.98.0, 0.94.7

 Attachments: 8229-0.94.txt, 8229-0.94-V2.txt


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-04-03 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620667#comment-13620667
 ] 

Jieshan Bean commented on HBASE-8229:
-

+1 on first version of patch.

 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
 Fix For: 0.95.1, 0.98.0, 0.94.7

 Attachments: 8229-0.94.txt, 8229-0.94-V2.txt


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-04-02 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619722#comment-13619722
 ] 

Jieshan Bean commented on HBASE-8229:
-

bq.The idea is to return back into the run() loop of ReplicationSource, so that 
the edits are rechecked (and not shipped to the peer if the local table's 
status has changed).
I didn't see anywhere do this re-check, hope I misread the code:). Even if 
local tabls' replication status has been changed, ReplicationSource still has 
the responsibility to replicate all the edits before the time of table got 
changed, right? So I prefer to not return back directly. Just let it retry and 
sleep until that table be created. 

 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
 Fix For: 0.95.0, 0.98.0, 0.94.7

 Attachments: 8229-0.94.txt, 8229-0.94-V2.txt


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-04-02 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620484#comment-13620484
 ] 

Lars Hofhansl commented on HBASE-8229:
--

Wouldn't it recreate the set of edits to ship in 
readAllEntriesToReplicateOrNextFile(...) called from run().
To just last point. Hmm... I see your point. If the table did not exist in the 
peer cluster *and* the local replication_scope was changed to 0, do we still 
want to ship the data?

I am just concerned that replication will not be able to make any progress for 
any table, until the table is created on the peer cluster and there is no other 
way out of this situation.

 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
 Fix For: 0.95.0, 0.98.0, 0.94.7

 Attachments: 8229-0.94.txt, 8229-0.94-V2.txt


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-04-02 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620499#comment-13620499
 ] 

Jieshan Bean commented on HBASE-8229:
-

bq.Wouldn't it recreate the set of edits to ship in 
readAllEntriesToReplicateOrNextFile(...) called from run().
Yes, it will read and recreate the set again. But it's the same set as the 
previous one. The current logic in removeNonReplicableEdits only check the 
scope property which owned by the edit itself, not the table scope.


 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
 Fix For: 0.95.0, 0.98.0, 0.94.7

 Attachments: 8229-0.94.txt, 8229-0.94-V2.txt


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-04-02 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620560#comment-13620560
 ] 

Lars Hofhansl commented on HBASE-8229:
--

You are right. OK. In that case I would like to propose just the first version 
of this patch.

 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
 Fix For: 0.95.0, 0.98.0, 0.94.7

 Attachments: 8229-0.94.txt, 8229-0.94-V2.txt


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-04-01 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618946#comment-13618946
 ] 

Chris Trezzo commented on HBASE-8229:
-

bq. For this issue, I'll just add the same waiting we do when the peer is down 
(which is the same logical behavior we currently have, but without the insane 
busy retrying).

+1

 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
 Fix For: 0.95.0, 0.98.0, 0.94.7


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-04-01 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618948#comment-13618948
 ] 

Himanshu Vashishtha commented on HBASE-8229:


Yea, I wonder if we show a replication tab on master UI or somewhere which 
shows some kind of replication state. Basic errors like table not existing on 
slave, missing family, or may be exception thrown by the slave at peer level 
are shown. That should give some basic idea to the user. What others think 
about this?

 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
 Fix For: 0.95.0, 0.98.0, 0.94.7


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-04-01 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618956#comment-13618956
 ] 

Chris Trezzo commented on HBASE-8229:
-

I like the idea of a basic admin tab a lot. Also, maybe the list of peer 
clusters being replicated to, the ability to dump a list of hlogs in a queue, 
etc. etc.

 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
 Fix For: 0.95.0, 0.98.0, 0.94.7


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-04-01 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618964#comment-13618964
 ] 

Himanshu Vashishtha commented on HBASE-8229:


Hey Chris: Thanks for response :) I'll create a jira for that and add it to the 
description.
For a dump of list of hlogs, we do have it in zkdump link on the master UI 
(HBASE-7540).

 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
 Fix For: 0.95.0, 0.98.0, 0.94.7


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-04-01 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618970#comment-13618970
 ] 

Chris Trezzo commented on HBASE-8229:
-

Good point, I guess that would be sort of redundant. Although, I was thinking 
of something that isn't tied to ZK (i.e. a way to view the outstanding hlogs 
that need to be replicated without having to understand the structure of the ZK 
nodes or how replication queues are stored).

 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
 Fix For: 0.95.0, 0.98.0, 0.94.7


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-04-01 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619486#comment-13619486
 ] 

Lars Hofhansl commented on HBASE-8229:
--

+1 on some kind of replication status info on the RS UI page.

 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
 Fix For: 0.95.0, 0.98.0, 0.94.7


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-04-01 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619487#comment-13619487
 ] 

Jieshan Bean commented on HBASE-8229:
-

Yes, it's really a good idea. 

 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
 Fix For: 0.95.0, 0.98.0, 0.94.7


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-03-31 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618313#comment-13618313
 ] 

Jean-Marc Spaggiari commented on HBASE-8229:


[~jeason], I should have put a question mark at the end ;) Was more a question 
than an affirmation. I was thinking like if the node which is trying to create 
the table on the other side fails (power suply, etc.), is another one going to 
take is place?

anyway, adding the delay will fix the crazy loggin issue.

 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
 Fix For: 0.95.0, 0.98.0, 0.94.7


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-03-31 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618559#comment-13618559
 ] 

Jieshan Bean commented on HBASE-8229:
-

bq.For this issue, I'll just add the same waiting we do when the peer is down 
(which is the same logical behavior we currently have, but without the insane 
busy retrying).
+1





 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
 Fix For: 0.95.0, 0.98.0, 0.94.7


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-03-30 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618003#comment-13618003
 ] 

Jieshan Bean commented on HBASE-8229:
-

I suggest to let ReplicationSource wait if one replicating table is not 
present, likes the scenario of peer cluster is unavailable.

 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
 Fix For: 0.95.0, 0.98.0, 0.94.7


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-03-30 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618070#comment-13618070
 ] 

Jean-Marc Spaggiari commented on HBASE-8229:


Or should it not skip it? To make sure the rest is replicated correctly? And 
add a timeout for this specific table so it's not retried before a certain time?

 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
 Fix For: 0.95.0, 0.98.0, 0.94.7


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-03-30 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618164#comment-13618164
 ] 

Lars Hofhansl commented on HBASE-8229:
--

If we skip, we can never go back a reapply these edits; I don't think that 
would be a good idea.
Like Jieshan I was thinking we should treat like the peer cluster is not 
available (at least in terms waiting - we do not have the rechoose the sinks in 
that case).


 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
 Fix For: 0.95.0, 0.98.0, 0.94.7


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-03-30 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618187#comment-13618187
 ] 

Jean-Marc Spaggiari commented on HBASE-8229:


I see. But can we simply wait again and again until the table is created on the 
other side? At some point, if there is any failure, we will still miss the 
edits. Also, what if finally the table is deleted in the source cluster?

 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
 Fix For: 0.95.0, 0.98.0, 0.94.7


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-03-30 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618207#comment-13618207
 ] 

Jieshan Bean commented on HBASE-8229:
-

bq.But can we simply wait again and again until the table is created on the 
other side?
I'm afraid we should do that. Unless we add a mechanism to check whether a 
table has already been deleted. But I think ReplicationSource still has the 
responsibility to finish all the rest edits. Any skip may cause data-loss.
I think the most probable scenario of this problem is we forgot to create table 
for sink side.

bq. At some point, if there is any failure, we will still miss the edits.
[~jmspaggi] Can you show me one scenario? :)


 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
 Fix For: 0.95.0, 0.98.0, 0.94.7


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-03-30 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618218#comment-13618218
 ] 

Lars Hofhansl commented on HBASE-8229:
--

yeah, I think that is out of the question. The replication marks the last edit 
it sent to the peer(s); if we skip edits we would have to keep track of ranges 
of edit that are still outstanding, or lose replication data.

Presumably if you marked a table with REPLCATION_SCOPE = 1 you want that 
table's data to replicated. An admin can fix this by dropping the table or to 
setting REPLICATION_SCOPE back to 0 (at least that is how should work - need to 
look at the code again).

For this issue, I'll just add the same waiting we do when the peer is down 
(which is the same logical behavior we currently have, but without the insane 
busy retrying).


 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
 Fix For: 0.95.0, 0.98.0, 0.94.7


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira