[jira] [Commented] (HBASE-11143) ageOfLastShippedOp metric is confusing

2014-05-15 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13994146#comment-13994146
 ] 

Lars Hofhansl commented on HBASE-11143:
---

And while I'm add it, might as well add a new metric about how many bytes were 
shipped in addition to how many batches and how many rows.


 ageOfLastShippedOp metric is confusing
 --

 Key: HBASE-11143
 URL: https://issues.apache.org/jira/browse/HBASE-11143
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.20

 Attachments: 11143-0.94-v2.txt, 11143-0.94.txt


 We are trying to report on replication lag and find that there is no good 
 single metric to do that.
 ageOfLastShippedOp is close, but unfortunately it is increased even when 
 there is nothing to ship on a particular RegionServer.
 I would like discuss a few options here:
 Add a new metric: replicationQueueTime (or something) with the above meaning. 
 I.e. if we have something to ship we set the age of that last shipped edit, 
 if we fail we increment that last time (just like we do now). But if there is 
 nothing to replicate we set it to current time (and hence that metric is 
 reported to close to 0).
 Alternatively we could change the meaning of ageOfLastShippedOp to mean to do 
 that. That might lead to surprises, but the current behavior is clearly weird 
 when there is nothing to replicate.
 Comments? [~jdcryans], [~stack].
 If approach sounds good, I'll make a patch for all branches.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11143) ageOfLastShippedOp metric is confusing

2014-05-13 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995276#comment-13995276
 ] 

Lars Hofhansl commented on HBASE-11143:
---

Yep, I'd add the new metric to 0.96 and later.
Cool... I'll commit this to all branches in a bit (0.96+ will only get the new 
metric).

 ageOfLastShippedOp metric is confusing
 --

 Key: HBASE-11143
 URL: https://issues.apache.org/jira/browse/HBASE-11143
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.20

 Attachments: 11143-0.94-v2.txt, 11143-0.94.txt


 We are trying to report on replication lag and find that there is no good 
 single metric to do that.
 ageOfLastShippedOp is close, but unfortunately it is increased even when 
 there is nothing to ship on a particular RegionServer.
 I would like discuss a few options here:
 Add a new metric: replicationQueueTime (or something) with the above meaning. 
 I.e. if we have something to ship we set the age of that last shipped edit, 
 if we fail we increment that last time (just like we do now). But if there is 
 nothing to replicate we set it to current time (and hence that metric is 
 reported to close to 0).
 Alternatively we could change the meaning of ageOfLastShippedOp to mean to do 
 that. That might lead to surprises, but the current behavior is clearly weird 
 when there is nothing to replicate.
 Comments? [~jdcryans], [~stack].
 If approach sounds good, I'll make a patch for all branches.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11143) ageOfLastShippedOp metric is confusing

2014-05-13 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996647#comment-13996647
 ] 

Lars Hofhansl commented on HBASE-11143:
---

[~apurtell] ping :)

 ageOfLastShippedOp metric is confusing
 --

 Key: HBASE-11143
 URL: https://issues.apache.org/jira/browse/HBASE-11143
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.99.0, 0.94.20, 0.98.3

 Attachments: 11143-0.94-v2.txt, 11143-0.94.txt, 11143-trunk.txt


 We are trying to report on replication lag and find that there is no good 
 single metric to do that.
 ageOfLastShippedOp is close, but unfortunately it is increased even when 
 there is nothing to ship on a particular RegionServer.
 I would like discuss a few options here:
 Add a new metric: replicationQueueTime (or something) with the above meaning. 
 I.e. if we have something to ship we set the age of that last shipped edit, 
 if we fail we increment that last time (just like we do now). But if there is 
 nothing to replicate we set it to current time (and hence that metric is 
 reported to close to 0).
 Alternatively we could change the meaning of ageOfLastShippedOp to mean to do 
 that. That might lead to surprises, but the current behavior is clearly weird 
 when there is nothing to replicate.
 Comments? [~jdcryans], [~stack].
 If approach sounds good, I'll make a patch for all branches.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11143) ageOfLastShippedOp metric is confusing

2014-05-12 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995242#comment-13995242
 ] 

Jean-Daniel Cryans commented on HBASE-11143:


+1, and can we get the new metric in 0.96+?

 ageOfLastShippedOp metric is confusing
 --

 Key: HBASE-11143
 URL: https://issues.apache.org/jira/browse/HBASE-11143
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.20

 Attachments: 11143-0.94-v2.txt, 11143-0.94.txt


 We are trying to report on replication lag and find that there is no good 
 single metric to do that.
 ageOfLastShippedOp is close, but unfortunately it is increased even when 
 there is nothing to ship on a particular RegionServer.
 I would like discuss a few options here:
 Add a new metric: replicationQueueTime (or something) with the above meaning. 
 I.e. if we have something to ship we set the age of that last shipped edit, 
 if we fail we increment that last time (just like we do now). But if there is 
 nothing to replicate we set it to current time (and hence that metric is 
 reported to close to 0).
 Alternatively we could change the meaning of ageOfLastShippedOp to mean to do 
 that. That might lead to surprises, but the current behavior is clearly weird 
 when there is nothing to replicate.
 Comments? [~jdcryans], [~stack].
 If approach sounds good, I'll make a patch for all branches.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11143) ageOfLastShippedOp metric is confusing

2014-05-12 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995525#comment-13995525
 ] 

Jean-Daniel Cryans commented on HBASE-11143:


I'm +1 for the trunk patch too.

 ageOfLastShippedOp metric is confusing
 --

 Key: HBASE-11143
 URL: https://issues.apache.org/jira/browse/HBASE-11143
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.99.0, 0.96.3, 0.94.20, 0.98.3

 Attachments: 11143-0.94-v2.txt, 11143-0.94.txt, 11143-trunk.txt


 We are trying to report on replication lag and find that there is no good 
 single metric to do that.
 ageOfLastShippedOp is close, but unfortunately it is increased even when 
 there is nothing to ship on a particular RegionServer.
 I would like discuss a few options here:
 Add a new metric: replicationQueueTime (or something) with the above meaning. 
 I.e. if we have something to ship we set the age of that last shipped edit, 
 if we fail we increment that last time (just like we do now). But if there is 
 nothing to replicate we set it to current time (and hence that metric is 
 reported to close to 0).
 Alternatively we could change the meaning of ageOfLastShippedOp to mean to do 
 that. That might lead to surprises, but the current behavior is clearly weird 
 when there is nothing to replicate.
 Comments? [~jdcryans], [~stack].
 If approach sounds good, I'll make a patch for all branches.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11143) ageOfLastShippedOp metric is confusing

2014-05-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995638#comment-13995638
 ] 

stack commented on HBASE-11143:
---

Suggest do not put in 0.96 unless someone asks explicitly for it.

 ageOfLastShippedOp metric is confusing
 --

 Key: HBASE-11143
 URL: https://issues.apache.org/jira/browse/HBASE-11143
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.99.0, 0.96.3, 0.94.20, 0.98.3

 Attachments: 11143-0.94-v2.txt, 11143-0.94.txt, 11143-trunk.txt


 We are trying to report on replication lag and find that there is no good 
 single metric to do that.
 ageOfLastShippedOp is close, but unfortunately it is increased even when 
 there is nothing to ship on a particular RegionServer.
 I would like discuss a few options here:
 Add a new metric: replicationQueueTime (or something) with the above meaning. 
 I.e. if we have something to ship we set the age of that last shipped edit, 
 if we fail we increment that last time (just like we do now). But if there is 
 nothing to replicate we set it to current time (and hence that metric is 
 reported to close to 0).
 Alternatively we could change the meaning of ageOfLastShippedOp to mean to do 
 that. That might lead to surprises, but the current behavior is clearly weird 
 when there is nothing to replicate.
 Comments? [~jdcryans], [~stack].
 If approach sounds good, I'll make a patch for all branches.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11143) ageOfLastShippedOp metric is confusing

2014-05-12 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995661#comment-13995661
 ] 

Lars Hofhansl commented on HBASE-11143:
---

[~stack], the new metric? Sure. I suppose our message at this point is to 
upgrade from 0.94 to 0.98, right?
OK... Lemme me put it in 0.94, 0.98, and trunk.

 ageOfLastShippedOp metric is confusing
 --

 Key: HBASE-11143
 URL: https://issues.apache.org/jira/browse/HBASE-11143
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.99.0, 0.96.3, 0.94.20, 0.98.3

 Attachments: 11143-0.94-v2.txt, 11143-0.94.txt, 11143-trunk.txt


 We are trying to report on replication lag and find that there is no good 
 single metric to do that.
 ageOfLastShippedOp is close, but unfortunately it is increased even when 
 there is nothing to ship on a particular RegionServer.
 I would like discuss a few options here:
 Add a new metric: replicationQueueTime (or something) with the above meaning. 
 I.e. if we have something to ship we set the age of that last shipped edit, 
 if we fail we increment that last time (just like we do now). But if there is 
 nothing to replicate we set it to current time (and hence that metric is 
 reported to close to 0).
 Alternatively we could change the meaning of ageOfLastShippedOp to mean to do 
 that. That might lead to surprises, but the current behavior is clearly weird 
 when there is nothing to replicate.
 Comments? [~jdcryans], [~stack].
 If approach sounds good, I'll make a patch for all branches.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11143) ageOfLastShippedOp metric is confusing

2014-05-12 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995895#comment-13995895
 ] 

Lars Hofhansl commented on HBASE-11143:
---

[~apurtell], you OK with this in 0.98? It's just a new metric.

 ageOfLastShippedOp metric is confusing
 --

 Key: HBASE-11143
 URL: https://issues.apache.org/jira/browse/HBASE-11143
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.99.0, 0.94.20, 0.98.3

 Attachments: 11143-0.94-v2.txt, 11143-0.94.txt, 11143-trunk.txt


 We are trying to report on replication lag and find that there is no good 
 single metric to do that.
 ageOfLastShippedOp is close, but unfortunately it is increased even when 
 there is nothing to ship on a particular RegionServer.
 I would like discuss a few options here:
 Add a new metric: replicationQueueTime (or something) with the above meaning. 
 I.e. if we have something to ship we set the age of that last shipped edit, 
 if we fail we increment that last time (just like we do now). But if there is 
 nothing to replicate we set it to current time (and hence that metric is 
 reported to close to 0).
 Alternatively we could change the meaning of ageOfLastShippedOp to mean to do 
 that. That might lead to surprises, but the current behavior is clearly weird 
 when there is nothing to replicate.
 Comments? [~jdcryans], [~stack].
 If approach sounds good, I'll make a patch for all branches.



--
This message was sent by Atlassian JIRA
(v6.2#6252)