[jira] [Commented] (HBASE-11143) ageOfLastShippedOp metric is confusing
[ https://issues.apache.org/jira/browse/HBASE-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13994146#comment-13994146 ] Lars Hofhansl commented on HBASE-11143: --- And while I'm add it, might as well add a new metric about how many bytes were shipped in addition to how many batches and how many rows. ageOfLastShippedOp metric is confusing -- Key: HBASE-11143 URL: https://issues.apache.org/jira/browse/HBASE-11143 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.20 Attachments: 11143-0.94-v2.txt, 11143-0.94.txt We are trying to report on replication lag and find that there is no good single metric to do that. ageOfLastShippedOp is close, but unfortunately it is increased even when there is nothing to ship on a particular RegionServer. I would like discuss a few options here: Add a new metric: replicationQueueTime (or something) with the above meaning. I.e. if we have something to ship we set the age of that last shipped edit, if we fail we increment that last time (just like we do now). But if there is nothing to replicate we set it to current time (and hence that metric is reported to close to 0). Alternatively we could change the meaning of ageOfLastShippedOp to mean to do that. That might lead to surprises, but the current behavior is clearly weird when there is nothing to replicate. Comments? [~jdcryans], [~stack]. If approach sounds good, I'll make a patch for all branches. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11143) ageOfLastShippedOp metric is confusing
[ https://issues.apache.org/jira/browse/HBASE-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995276#comment-13995276 ] Lars Hofhansl commented on HBASE-11143: --- Yep, I'd add the new metric to 0.96 and later. Cool... I'll commit this to all branches in a bit (0.96+ will only get the new metric). ageOfLastShippedOp metric is confusing -- Key: HBASE-11143 URL: https://issues.apache.org/jira/browse/HBASE-11143 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.20 Attachments: 11143-0.94-v2.txt, 11143-0.94.txt We are trying to report on replication lag and find that there is no good single metric to do that. ageOfLastShippedOp is close, but unfortunately it is increased even when there is nothing to ship on a particular RegionServer. I would like discuss a few options here: Add a new metric: replicationQueueTime (or something) with the above meaning. I.e. if we have something to ship we set the age of that last shipped edit, if we fail we increment that last time (just like we do now). But if there is nothing to replicate we set it to current time (and hence that metric is reported to close to 0). Alternatively we could change the meaning of ageOfLastShippedOp to mean to do that. That might lead to surprises, but the current behavior is clearly weird when there is nothing to replicate. Comments? [~jdcryans], [~stack]. If approach sounds good, I'll make a patch for all branches. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11143) ageOfLastShippedOp metric is confusing
[ https://issues.apache.org/jira/browse/HBASE-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996647#comment-13996647 ] Lars Hofhansl commented on HBASE-11143: --- [~apurtell] ping :) ageOfLastShippedOp metric is confusing -- Key: HBASE-11143 URL: https://issues.apache.org/jira/browse/HBASE-11143 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.99.0, 0.94.20, 0.98.3 Attachments: 11143-0.94-v2.txt, 11143-0.94.txt, 11143-trunk.txt We are trying to report on replication lag and find that there is no good single metric to do that. ageOfLastShippedOp is close, but unfortunately it is increased even when there is nothing to ship on a particular RegionServer. I would like discuss a few options here: Add a new metric: replicationQueueTime (or something) with the above meaning. I.e. if we have something to ship we set the age of that last shipped edit, if we fail we increment that last time (just like we do now). But if there is nothing to replicate we set it to current time (and hence that metric is reported to close to 0). Alternatively we could change the meaning of ageOfLastShippedOp to mean to do that. That might lead to surprises, but the current behavior is clearly weird when there is nothing to replicate. Comments? [~jdcryans], [~stack]. If approach sounds good, I'll make a patch for all branches. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11143) ageOfLastShippedOp metric is confusing
[ https://issues.apache.org/jira/browse/HBASE-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995242#comment-13995242 ] Jean-Daniel Cryans commented on HBASE-11143: +1, and can we get the new metric in 0.96+? ageOfLastShippedOp metric is confusing -- Key: HBASE-11143 URL: https://issues.apache.org/jira/browse/HBASE-11143 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.20 Attachments: 11143-0.94-v2.txt, 11143-0.94.txt We are trying to report on replication lag and find that there is no good single metric to do that. ageOfLastShippedOp is close, but unfortunately it is increased even when there is nothing to ship on a particular RegionServer. I would like discuss a few options here: Add a new metric: replicationQueueTime (or something) with the above meaning. I.e. if we have something to ship we set the age of that last shipped edit, if we fail we increment that last time (just like we do now). But if there is nothing to replicate we set it to current time (and hence that metric is reported to close to 0). Alternatively we could change the meaning of ageOfLastShippedOp to mean to do that. That might lead to surprises, but the current behavior is clearly weird when there is nothing to replicate. Comments? [~jdcryans], [~stack]. If approach sounds good, I'll make a patch for all branches. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11143) ageOfLastShippedOp metric is confusing
[ https://issues.apache.org/jira/browse/HBASE-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995525#comment-13995525 ] Jean-Daniel Cryans commented on HBASE-11143: I'm +1 for the trunk patch too. ageOfLastShippedOp metric is confusing -- Key: HBASE-11143 URL: https://issues.apache.org/jira/browse/HBASE-11143 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.99.0, 0.96.3, 0.94.20, 0.98.3 Attachments: 11143-0.94-v2.txt, 11143-0.94.txt, 11143-trunk.txt We are trying to report on replication lag and find that there is no good single metric to do that. ageOfLastShippedOp is close, but unfortunately it is increased even when there is nothing to ship on a particular RegionServer. I would like discuss a few options here: Add a new metric: replicationQueueTime (or something) with the above meaning. I.e. if we have something to ship we set the age of that last shipped edit, if we fail we increment that last time (just like we do now). But if there is nothing to replicate we set it to current time (and hence that metric is reported to close to 0). Alternatively we could change the meaning of ageOfLastShippedOp to mean to do that. That might lead to surprises, but the current behavior is clearly weird when there is nothing to replicate. Comments? [~jdcryans], [~stack]. If approach sounds good, I'll make a patch for all branches. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11143) ageOfLastShippedOp metric is confusing
[ https://issues.apache.org/jira/browse/HBASE-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995638#comment-13995638 ] stack commented on HBASE-11143: --- Suggest do not put in 0.96 unless someone asks explicitly for it. ageOfLastShippedOp metric is confusing -- Key: HBASE-11143 URL: https://issues.apache.org/jira/browse/HBASE-11143 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.99.0, 0.96.3, 0.94.20, 0.98.3 Attachments: 11143-0.94-v2.txt, 11143-0.94.txt, 11143-trunk.txt We are trying to report on replication lag and find that there is no good single metric to do that. ageOfLastShippedOp is close, but unfortunately it is increased even when there is nothing to ship on a particular RegionServer. I would like discuss a few options here: Add a new metric: replicationQueueTime (or something) with the above meaning. I.e. if we have something to ship we set the age of that last shipped edit, if we fail we increment that last time (just like we do now). But if there is nothing to replicate we set it to current time (and hence that metric is reported to close to 0). Alternatively we could change the meaning of ageOfLastShippedOp to mean to do that. That might lead to surprises, but the current behavior is clearly weird when there is nothing to replicate. Comments? [~jdcryans], [~stack]. If approach sounds good, I'll make a patch for all branches. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11143) ageOfLastShippedOp metric is confusing
[ https://issues.apache.org/jira/browse/HBASE-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995661#comment-13995661 ] Lars Hofhansl commented on HBASE-11143: --- [~stack], the new metric? Sure. I suppose our message at this point is to upgrade from 0.94 to 0.98, right? OK... Lemme me put it in 0.94, 0.98, and trunk. ageOfLastShippedOp metric is confusing -- Key: HBASE-11143 URL: https://issues.apache.org/jira/browse/HBASE-11143 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.99.0, 0.96.3, 0.94.20, 0.98.3 Attachments: 11143-0.94-v2.txt, 11143-0.94.txt, 11143-trunk.txt We are trying to report on replication lag and find that there is no good single metric to do that. ageOfLastShippedOp is close, but unfortunately it is increased even when there is nothing to ship on a particular RegionServer. I would like discuss a few options here: Add a new metric: replicationQueueTime (or something) with the above meaning. I.e. if we have something to ship we set the age of that last shipped edit, if we fail we increment that last time (just like we do now). But if there is nothing to replicate we set it to current time (and hence that metric is reported to close to 0). Alternatively we could change the meaning of ageOfLastShippedOp to mean to do that. That might lead to surprises, but the current behavior is clearly weird when there is nothing to replicate. Comments? [~jdcryans], [~stack]. If approach sounds good, I'll make a patch for all branches. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11143) ageOfLastShippedOp metric is confusing
[ https://issues.apache.org/jira/browse/HBASE-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995895#comment-13995895 ] Lars Hofhansl commented on HBASE-11143: --- [~apurtell], you OK with this in 0.98? It's just a new metric. ageOfLastShippedOp metric is confusing -- Key: HBASE-11143 URL: https://issues.apache.org/jira/browse/HBASE-11143 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.99.0, 0.94.20, 0.98.3 Attachments: 11143-0.94-v2.txt, 11143-0.94.txt, 11143-trunk.txt We are trying to report on replication lag and find that there is no good single metric to do that. ageOfLastShippedOp is close, but unfortunately it is increased even when there is nothing to ship on a particular RegionServer. I would like discuss a few options here: Add a new metric: replicationQueueTime (or something) with the above meaning. I.e. if we have something to ship we set the age of that last shipped edit, if we fail we increment that last time (just like we do now). But if there is nothing to replicate we set it to current time (and hence that metric is reported to close to 0). Alternatively we could change the meaning of ageOfLastShippedOp to mean to do that. That might lead to surprises, but the current behavior is clearly weird when there is nothing to replicate. Comments? [~jdcryans], [~stack]. If approach sounds good, I'll make a patch for all branches. -- This message was sent by Atlassian JIRA (v6.2#6252)