Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-18 Thread Matthias J. Sax
Thanks for the background John.

I am happy with `Windowed` and `Cumulative`, too.


-Matthias

On 7/17/19 11:22 AM, Ryanne Dolan wrote:
> John, makes sense to me! Thanks.
> 
> Ryanne
> 
> On Wed, Jul 17, 2019, 1:16 PM John Roesler  wrote:
> 
>> Agreed. I think the names are actually not ambiguous once you recall
>> that the stats summarize measurements and each measurement is a
>> floating point number, but there's enough overlap that I also was
>> initially confused as well. I do plan to make this super clear in the
>> documentation.
>>
>> Thanks,
>> -John
>>
>> On Wed, Jul 17, 2019 at 1:08 PM Sophie Blee-Goldman 
>> wrote:
>>>
>>> Sounds good to me
>>>
>>> By the way, while I agree that we can't really do better than Sum and
>> Count
>>> I will say I also found the distinction a bit unclear at first glance. We
>>> should at least document clearly that "Sum" is a "sum of values" whereas
>>> "Count" is a "number of things" -- but that doesn't need to be part of
>> the
>>> KIP
>>>
>>> On Wed, Jul 17, 2019 at 11:00 AM John Roesler  wrote:
>>>
 Thanks for the replies.

 I guess that if we did add (e.g.) ExponentiallyWeightedWindowedX or
 something, it should still be pretty obvious that WindowedX is the
 unweighted version? In that case, I buy the argument that we don't
 need "Simple" and we can just go with:

 WindowedSum, WindowedCount
 CumulativeSum, CumulativeCount

 Sound good?
 Thanks,
 -John

 On Wed, Jul 17, 2019 at 11:53 AM Sophie Blee-Goldman
  wrote:
>
> Thanks for the crash course in statistical terms :)
>
> In light of this I definitely support Cumulative{Sum,Count}, but I'm
 really
> not crazy about SimpleWindowed{Sum,Count} (vs just Windowed). Not so
>> much
> because of its unfortunate length (although that is unfortunate it
> shouldn't be a deciding factor) but because it seems to have the
 potential
> to confuse further. I'm not sure what we gain by adding "Simple"
>> since to
> me at least, the unweighted-ness is obvious and the definition of
>> simple
 is
> not. To those who haven't been exposed to the finer details of
 statistical
> definitions, I think they are more likely to read "SimpleXX" and
>> wonder
 "is
> there an 'advanced' or non-simple kind of Windowed?" than they are to
> wonder what is the weighting behind these metrics.
>
> Sophie
>
> On Wed, Jul 17, 2019 at 8:18 AM John Roesler 
>> wrote:
>
>> Thanks for the discussion, all.
>>
>> I've done a little more research into the statistical terminology.
>> Matthias is correct, "running" and "moving" appear to be synonyms.
>> Unfortunately, both can be computed either over a window of the
>> last N
>> measurements or over all prior measurements. "Moving" just
>> signifies
>> that the statistic is computed over a "live" data set, i.e., a
>> continuous stream of measurements, and the expectation is that the
>> stat would be updated in response to new measurements.
>>
>> I found https://en.wikipedia.org/wiki/Moving_average to have a
>> pretty
>> good overview of the whole picture.
>>
>> After considering the discussion so far and some light reading, it
>> seems like "Cumulative" is truly the correct term for the all-time
>> metrics:
>>
>>> In a cumulative moving average, the data arrive in an
>>> ordered datum stream, and the user would like to get
>>> the average of all of the data up until the current datum
>>> point.
>>
>> I know that we previously felt that "cumulative" was too much of a
>> mouthful, but it seems like our quest for a terser term led us
>> into a
>> briar patch. Also, now there is an independent source (the wiki
>> page)
>> indicating that this is indeed the correct term, and it doesn't
>> offer
>> any synonyms to choose from. Maybe we can take comfort in the fact
>> that we'll rarely be saying the name of the classes out loud.
>>
>> As far as moving stats that operate over a window of the last N
>> measurements, there are multiple options, including Simple
>> (unweighted), Weighted, and Exponentially Weighted, and presumably
>> infinite variations with other weighting functions. In our domain,
>> there is only one weighting function available, but it's still more
>> self-documenting and future-proof to specify the type of windowed
>> statistic. Therefore, I'm proposing "Simple" as the term for the
>> windowed (aka sampled) stats, while keeping Windowed in the name to
>> distinguish it from the all-time metrics.
>>
>>> In financial applications a simple moving average (SMA)
>>> is the unweighted mean of the previous n data.
>>
>> Therefore, we would have the proposed matrix:
>>
>> SimpleWindowedSum, SimpleWindowedCount
>> CumulativeSum, CumulativeCount
>>
>> Again, all 

Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-17 Thread John Roesler
Cool. I'll update the KIP, then, and we can re-start the vote. (looks
like you've already cast a vote :) )

-John


On Wed, Jul 17, 2019 at 1:23 PM Ryanne Dolan  wrote:
>
> John, makes sense to me! Thanks.
>
> Ryanne
>
> On Wed, Jul 17, 2019, 1:16 PM John Roesler  wrote:
>
> > Agreed. I think the names are actually not ambiguous once you recall
> > that the stats summarize measurements and each measurement is a
> > floating point number, but there's enough overlap that I also was
> > initially confused as well. I do plan to make this super clear in the
> > documentation.
> >
> > Thanks,
> > -John
> >
> > On Wed, Jul 17, 2019 at 1:08 PM Sophie Blee-Goldman 
> > wrote:
> > >
> > > Sounds good to me
> > >
> > > By the way, while I agree that we can't really do better than Sum and
> > Count
> > > I will say I also found the distinction a bit unclear at first glance. We
> > > should at least document clearly that "Sum" is a "sum of values" whereas
> > > "Count" is a "number of things" -- but that doesn't need to be part of
> > the
> > > KIP
> > >
> > > On Wed, Jul 17, 2019 at 11:00 AM John Roesler  wrote:
> > >
> > > > Thanks for the replies.
> > > >
> > > > I guess that if we did add (e.g.) ExponentiallyWeightedWindowedX or
> > > > something, it should still be pretty obvious that WindowedX is the
> > > > unweighted version? In that case, I buy the argument that we don't
> > > > need "Simple" and we can just go with:
> > > >
> > > > WindowedSum, WindowedCount
> > > > CumulativeSum, CumulativeCount
> > > >
> > > > Sound good?
> > > > Thanks,
> > > > -John
> > > >
> > > > On Wed, Jul 17, 2019 at 11:53 AM Sophie Blee-Goldman
> > > >  wrote:
> > > > >
> > > > > Thanks for the crash course in statistical terms :)
> > > > >
> > > > > In light of this I definitely support Cumulative{Sum,Count}, but I'm
> > > > really
> > > > > not crazy about SimpleWindowed{Sum,Count} (vs just Windowed). Not so
> > much
> > > > > because of its unfortunate length (although that is unfortunate it
> > > > > shouldn't be a deciding factor) but because it seems to have the
> > > > potential
> > > > > to confuse further. I'm not sure what we gain by adding "Simple"
> > since to
> > > > > me at least, the unweighted-ness is obvious and the definition of
> > simple
> > > > is
> > > > > not. To those who haven't been exposed to the finer details of
> > > > statistical
> > > > > definitions, I think they are more likely to read "SimpleXX" and
> > wonder
> > > > "is
> > > > > there an 'advanced' or non-simple kind of Windowed?" than they are to
> > > > > wonder what is the weighting behind these metrics.
> > > > >
> > > > > Sophie
> > > > >
> > > > > On Wed, Jul 17, 2019 at 8:18 AM John Roesler 
> > wrote:
> > > > >
> > > > > > Thanks for the discussion, all.
> > > > > >
> > > > > > I've done a little more research into the statistical terminology.
> > > > > > Matthias is correct, "running" and "moving" appear to be synonyms.
> > > > > > Unfortunately, both can be computed either over a window of the
> > last N
> > > > > > measurements or over all prior measurements. "Moving" just
> > signifies
> > > > > > that the statistic is computed over a "live" data set, i.e., a
> > > > > > continuous stream of measurements, and the expectation is that the
> > > > > > stat would be updated in response to new measurements.
> > > > > >
> > > > > > I found https://en.wikipedia.org/wiki/Moving_average to have a
> > pretty
> > > > > > good overview of the whole picture.
> > > > > >
> > > > > > After considering the discussion so far and some light reading, it
> > > > > > seems like "Cumulative" is truly the correct term for the all-time
> > > > > > metrics:
> > > > > >
> > > > > > > In a cumulative moving average, the data arrive in an
> > > > > > > ordered datum stream, and the user would like to get
> > > > > > > the average of all of the data up until the current datum
> > > > > > > point.
> > > > > >
> > > > > > I know that we previously felt that "cumulative" was too much of a
> > > > > > mouthful, but it seems like our quest for a terser term led us
> > into a
> > > > > > briar patch. Also, now there is an independent source (the wiki
> > page)
> > > > > > indicating that this is indeed the correct term, and it doesn't
> > offer
> > > > > > any synonyms to choose from. Maybe we can take comfort in the fact
> > > > > > that we'll rarely be saying the name of the classes out loud.
> > > > > >
> > > > > > As far as moving stats that operate over a window of the last N
> > > > > > measurements, there are multiple options, including Simple
> > > > > > (unweighted), Weighted, and Exponentially Weighted, and presumably
> > > > > > infinite variations with other weighting functions. In our domain,
> > > > > > there is only one weighting function available, but it's still more
> > > > > > self-documenting and future-proof to specify the type of windowed
> > > > > > statistic. Therefore, I'm proposing "Simple" as the term for the
> > > > > > windowed (aka 

Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-17 Thread Ryanne Dolan
John, makes sense to me! Thanks.

Ryanne

On Wed, Jul 17, 2019, 1:16 PM John Roesler  wrote:

> Agreed. I think the names are actually not ambiguous once you recall
> that the stats summarize measurements and each measurement is a
> floating point number, but there's enough overlap that I also was
> initially confused as well. I do plan to make this super clear in the
> documentation.
>
> Thanks,
> -John
>
> On Wed, Jul 17, 2019 at 1:08 PM Sophie Blee-Goldman 
> wrote:
> >
> > Sounds good to me
> >
> > By the way, while I agree that we can't really do better than Sum and
> Count
> > I will say I also found the distinction a bit unclear at first glance. We
> > should at least document clearly that "Sum" is a "sum of values" whereas
> > "Count" is a "number of things" -- but that doesn't need to be part of
> the
> > KIP
> >
> > On Wed, Jul 17, 2019 at 11:00 AM John Roesler  wrote:
> >
> > > Thanks for the replies.
> > >
> > > I guess that if we did add (e.g.) ExponentiallyWeightedWindowedX or
> > > something, it should still be pretty obvious that WindowedX is the
> > > unweighted version? In that case, I buy the argument that we don't
> > > need "Simple" and we can just go with:
> > >
> > > WindowedSum, WindowedCount
> > > CumulativeSum, CumulativeCount
> > >
> > > Sound good?
> > > Thanks,
> > > -John
> > >
> > > On Wed, Jul 17, 2019 at 11:53 AM Sophie Blee-Goldman
> > >  wrote:
> > > >
> > > > Thanks for the crash course in statistical terms :)
> > > >
> > > > In light of this I definitely support Cumulative{Sum,Count}, but I'm
> > > really
> > > > not crazy about SimpleWindowed{Sum,Count} (vs just Windowed). Not so
> much
> > > > because of its unfortunate length (although that is unfortunate it
> > > > shouldn't be a deciding factor) but because it seems to have the
> > > potential
> > > > to confuse further. I'm not sure what we gain by adding "Simple"
> since to
> > > > me at least, the unweighted-ness is obvious and the definition of
> simple
> > > is
> > > > not. To those who haven't been exposed to the finer details of
> > > statistical
> > > > definitions, I think they are more likely to read "SimpleXX" and
> wonder
> > > "is
> > > > there an 'advanced' or non-simple kind of Windowed?" than they are to
> > > > wonder what is the weighting behind these metrics.
> > > >
> > > > Sophie
> > > >
> > > > On Wed, Jul 17, 2019 at 8:18 AM John Roesler 
> wrote:
> > > >
> > > > > Thanks for the discussion, all.
> > > > >
> > > > > I've done a little more research into the statistical terminology.
> > > > > Matthias is correct, "running" and "moving" appear to be synonyms.
> > > > > Unfortunately, both can be computed either over a window of the
> last N
> > > > > measurements or over all prior measurements. "Moving" just
> signifies
> > > > > that the statistic is computed over a "live" data set, i.e., a
> > > > > continuous stream of measurements, and the expectation is that the
> > > > > stat would be updated in response to new measurements.
> > > > >
> > > > > I found https://en.wikipedia.org/wiki/Moving_average to have a
> pretty
> > > > > good overview of the whole picture.
> > > > >
> > > > > After considering the discussion so far and some light reading, it
> > > > > seems like "Cumulative" is truly the correct term for the all-time
> > > > > metrics:
> > > > >
> > > > > > In a cumulative moving average, the data arrive in an
> > > > > > ordered datum stream, and the user would like to get
> > > > > > the average of all of the data up until the current datum
> > > > > > point.
> > > > >
> > > > > I know that we previously felt that "cumulative" was too much of a
> > > > > mouthful, but it seems like our quest for a terser term led us
> into a
> > > > > briar patch. Also, now there is an independent source (the wiki
> page)
> > > > > indicating that this is indeed the correct term, and it doesn't
> offer
> > > > > any synonyms to choose from. Maybe we can take comfort in the fact
> > > > > that we'll rarely be saying the name of the classes out loud.
> > > > >
> > > > > As far as moving stats that operate over a window of the last N
> > > > > measurements, there are multiple options, including Simple
> > > > > (unweighted), Weighted, and Exponentially Weighted, and presumably
> > > > > infinite variations with other weighting functions. In our domain,
> > > > > there is only one weighting function available, but it's still more
> > > > > self-documenting and future-proof to specify the type of windowed
> > > > > statistic. Therefore, I'm proposing "Simple" as the term for the
> > > > > windowed (aka sampled) stats, while keeping Windowed in the name to
> > > > > distinguish it from the all-time metrics.
> > > > >
> > > > > > In financial applications a simple moving average (SMA)
> > > > > > is the unweighted mean of the previous n data.
> > > > >
> > > > > Therefore, we would have the proposed matrix:
> > > > >
> > > > > SimpleWindowedSum, SimpleWindowedCount
> > > > > CumulativeSum, CumulativeCount

Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-17 Thread John Roesler
Agreed. I think the names are actually not ambiguous once you recall
that the stats summarize measurements and each measurement is a
floating point number, but there's enough overlap that I also was
initially confused as well. I do plan to make this super clear in the
documentation.

Thanks,
-John

On Wed, Jul 17, 2019 at 1:08 PM Sophie Blee-Goldman  wrote:
>
> Sounds good to me
>
> By the way, while I agree that we can't really do better than Sum and Count
> I will say I also found the distinction a bit unclear at first glance. We
> should at least document clearly that "Sum" is a "sum of values" whereas
> "Count" is a "number of things" -- but that doesn't need to be part of the
> KIP
>
> On Wed, Jul 17, 2019 at 11:00 AM John Roesler  wrote:
>
> > Thanks for the replies.
> >
> > I guess that if we did add (e.g.) ExponentiallyWeightedWindowedX or
> > something, it should still be pretty obvious that WindowedX is the
> > unweighted version? In that case, I buy the argument that we don't
> > need "Simple" and we can just go with:
> >
> > WindowedSum, WindowedCount
> > CumulativeSum, CumulativeCount
> >
> > Sound good?
> > Thanks,
> > -John
> >
> > On Wed, Jul 17, 2019 at 11:53 AM Sophie Blee-Goldman
> >  wrote:
> > >
> > > Thanks for the crash course in statistical terms :)
> > >
> > > In light of this I definitely support Cumulative{Sum,Count}, but I'm
> > really
> > > not crazy about SimpleWindowed{Sum,Count} (vs just Windowed). Not so much
> > > because of its unfortunate length (although that is unfortunate it
> > > shouldn't be a deciding factor) but because it seems to have the
> > potential
> > > to confuse further. I'm not sure what we gain by adding "Simple" since to
> > > me at least, the unweighted-ness is obvious and the definition of simple
> > is
> > > not. To those who haven't been exposed to the finer details of
> > statistical
> > > definitions, I think they are more likely to read "SimpleXX" and wonder
> > "is
> > > there an 'advanced' or non-simple kind of Windowed?" than they are to
> > > wonder what is the weighting behind these metrics.
> > >
> > > Sophie
> > >
> > > On Wed, Jul 17, 2019 at 8:18 AM John Roesler  wrote:
> > >
> > > > Thanks for the discussion, all.
> > > >
> > > > I've done a little more research into the statistical terminology.
> > > > Matthias is correct, "running" and "moving" appear to be synonyms.
> > > > Unfortunately, both can be computed either over a window of the last N
> > > > measurements or over all prior measurements. "Moving" just signifies
> > > > that the statistic is computed over a "live" data set, i.e., a
> > > > continuous stream of measurements, and the expectation is that the
> > > > stat would be updated in response to new measurements.
> > > >
> > > > I found https://en.wikipedia.org/wiki/Moving_average to have a pretty
> > > > good overview of the whole picture.
> > > >
> > > > After considering the discussion so far and some light reading, it
> > > > seems like "Cumulative" is truly the correct term for the all-time
> > > > metrics:
> > > >
> > > > > In a cumulative moving average, the data arrive in an
> > > > > ordered datum stream, and the user would like to get
> > > > > the average of all of the data up until the current datum
> > > > > point.
> > > >
> > > > I know that we previously felt that "cumulative" was too much of a
> > > > mouthful, but it seems like our quest for a terser term led us into a
> > > > briar patch. Also, now there is an independent source (the wiki page)
> > > > indicating that this is indeed the correct term, and it doesn't offer
> > > > any synonyms to choose from. Maybe we can take comfort in the fact
> > > > that we'll rarely be saying the name of the classes out loud.
> > > >
> > > > As far as moving stats that operate over a window of the last N
> > > > measurements, there are multiple options, including Simple
> > > > (unweighted), Weighted, and Exponentially Weighted, and presumably
> > > > infinite variations with other weighting functions. In our domain,
> > > > there is only one weighting function available, but it's still more
> > > > self-documenting and future-proof to specify the type of windowed
> > > > statistic. Therefore, I'm proposing "Simple" as the term for the
> > > > windowed (aka sampled) stats, while keeping Windowed in the name to
> > > > distinguish it from the all-time metrics.
> > > >
> > > > > In financial applications a simple moving average (SMA)
> > > > > is the unweighted mean of the previous n data.
> > > >
> > > > Therefore, we would have the proposed matrix:
> > > >
> > > > SimpleWindowedSum, SimpleWindowedCount
> > > > CumulativeSum, CumulativeCount
> > > >
> > > > Again, all these proposed names are less pithy than we might wish, but
> > > > the whole point of this exercise is to demystify and disambiguate
> > > > them. It seems like the discussion so far illustrates the futility of
> > > > trying to find names that are both short and descriptive.
> > > >
> > > > How does 

Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-17 Thread Sophie Blee-Goldman
Sounds good to me

By the way, while I agree that we can't really do better than Sum and Count
I will say I also found the distinction a bit unclear at first glance. We
should at least document clearly that "Sum" is a "sum of values" whereas
"Count" is a "number of things" -- but that doesn't need to be part of the
KIP

On Wed, Jul 17, 2019 at 11:00 AM John Roesler  wrote:

> Thanks for the replies.
>
> I guess that if we did add (e.g.) ExponentiallyWeightedWindowedX or
> something, it should still be pretty obvious that WindowedX is the
> unweighted version? In that case, I buy the argument that we don't
> need "Simple" and we can just go with:
>
> WindowedSum, WindowedCount
> CumulativeSum, CumulativeCount
>
> Sound good?
> Thanks,
> -John
>
> On Wed, Jul 17, 2019 at 11:53 AM Sophie Blee-Goldman
>  wrote:
> >
> > Thanks for the crash course in statistical terms :)
> >
> > In light of this I definitely support Cumulative{Sum,Count}, but I'm
> really
> > not crazy about SimpleWindowed{Sum,Count} (vs just Windowed). Not so much
> > because of its unfortunate length (although that is unfortunate it
> > shouldn't be a deciding factor) but because it seems to have the
> potential
> > to confuse further. I'm not sure what we gain by adding "Simple" since to
> > me at least, the unweighted-ness is obvious and the definition of simple
> is
> > not. To those who haven't been exposed to the finer details of
> statistical
> > definitions, I think they are more likely to read "SimpleXX" and wonder
> "is
> > there an 'advanced' or non-simple kind of Windowed?" than they are to
> > wonder what is the weighting behind these metrics.
> >
> > Sophie
> >
> > On Wed, Jul 17, 2019 at 8:18 AM John Roesler  wrote:
> >
> > > Thanks for the discussion, all.
> > >
> > > I've done a little more research into the statistical terminology.
> > > Matthias is correct, "running" and "moving" appear to be synonyms.
> > > Unfortunately, both can be computed either over a window of the last N
> > > measurements or over all prior measurements. "Moving" just signifies
> > > that the statistic is computed over a "live" data set, i.e., a
> > > continuous stream of measurements, and the expectation is that the
> > > stat would be updated in response to new measurements.
> > >
> > > I found https://en.wikipedia.org/wiki/Moving_average to have a pretty
> > > good overview of the whole picture.
> > >
> > > After considering the discussion so far and some light reading, it
> > > seems like "Cumulative" is truly the correct term for the all-time
> > > metrics:
> > >
> > > > In a cumulative moving average, the data arrive in an
> > > > ordered datum stream, and the user would like to get
> > > > the average of all of the data up until the current datum
> > > > point.
> > >
> > > I know that we previously felt that "cumulative" was too much of a
> > > mouthful, but it seems like our quest for a terser term led us into a
> > > briar patch. Also, now there is an independent source (the wiki page)
> > > indicating that this is indeed the correct term, and it doesn't offer
> > > any synonyms to choose from. Maybe we can take comfort in the fact
> > > that we'll rarely be saying the name of the classes out loud.
> > >
> > > As far as moving stats that operate over a window of the last N
> > > measurements, there are multiple options, including Simple
> > > (unweighted), Weighted, and Exponentially Weighted, and presumably
> > > infinite variations with other weighting functions. In our domain,
> > > there is only one weighting function available, but it's still more
> > > self-documenting and future-proof to specify the type of windowed
> > > statistic. Therefore, I'm proposing "Simple" as the term for the
> > > windowed (aka sampled) stats, while keeping Windowed in the name to
> > > distinguish it from the all-time metrics.
> > >
> > > > In financial applications a simple moving average (SMA)
> > > > is the unweighted mean of the previous n data.
> > >
> > > Therefore, we would have the proposed matrix:
> > >
> > > SimpleWindowedSum, SimpleWindowedCount
> > > CumulativeSum, CumulativeCount
> > >
> > > Again, all these proposed names are less pithy than we might wish, but
> > > the whole point of this exercise is to demystify and disambiguate
> > > them. It seems like the discussion so far illustrates the futility of
> > > trying to find names that are both short and descriptive.
> > >
> > > How does that sound?
> > > -John
> > >
> > > On Tue, Jul 16, 2019 at 4:43 PM Matthias J. Sax  >
> > > wrote:
> > > >
> > > > It's a fair point that Ryanne raises. However, "running sum" is the
> same
> > > > as "moving sum" from my understanding.
> > > >
> > > > The issue is still, that `Sum` and `Count` which seem to be the
> cleanest
> > > > names cannot be used. While I agree that `TotalSum` and `TotalCount`
> is
> > > > somewhat redundant, I still think it the best suggestion so far.
> > > >
> > > > For the "sampled" version, I am personally fine with either
> 

Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-17 Thread John Roesler
Thanks for the replies.

I guess that if we did add (e.g.) ExponentiallyWeightedWindowedX or
something, it should still be pretty obvious that WindowedX is the
unweighted version? In that case, I buy the argument that we don't
need "Simple" and we can just go with:

WindowedSum, WindowedCount
CumulativeSum, CumulativeCount

Sound good?
Thanks,
-John

On Wed, Jul 17, 2019 at 11:53 AM Sophie Blee-Goldman
 wrote:
>
> Thanks for the crash course in statistical terms :)
>
> In light of this I definitely support Cumulative{Sum,Count}, but I'm really
> not crazy about SimpleWindowed{Sum,Count} (vs just Windowed). Not so much
> because of its unfortunate length (although that is unfortunate it
> shouldn't be a deciding factor) but because it seems to have the potential
> to confuse further. I'm not sure what we gain by adding "Simple" since to
> me at least, the unweighted-ness is obvious and the definition of simple is
> not. To those who haven't been exposed to the finer details of statistical
> definitions, I think they are more likely to read "SimpleXX" and wonder "is
> there an 'advanced' or non-simple kind of Windowed?" than they are to
> wonder what is the weighting behind these metrics.
>
> Sophie
>
> On Wed, Jul 17, 2019 at 8:18 AM John Roesler  wrote:
>
> > Thanks for the discussion, all.
> >
> > I've done a little more research into the statistical terminology.
> > Matthias is correct, "running" and "moving" appear to be synonyms.
> > Unfortunately, both can be computed either over a window of the last N
> > measurements or over all prior measurements. "Moving" just signifies
> > that the statistic is computed over a "live" data set, i.e., a
> > continuous stream of measurements, and the expectation is that the
> > stat would be updated in response to new measurements.
> >
> > I found https://en.wikipedia.org/wiki/Moving_average to have a pretty
> > good overview of the whole picture.
> >
> > After considering the discussion so far and some light reading, it
> > seems like "Cumulative" is truly the correct term for the all-time
> > metrics:
> >
> > > In a cumulative moving average, the data arrive in an
> > > ordered datum stream, and the user would like to get
> > > the average of all of the data up until the current datum
> > > point.
> >
> > I know that we previously felt that "cumulative" was too much of a
> > mouthful, but it seems like our quest for a terser term led us into a
> > briar patch. Also, now there is an independent source (the wiki page)
> > indicating that this is indeed the correct term, and it doesn't offer
> > any synonyms to choose from. Maybe we can take comfort in the fact
> > that we'll rarely be saying the name of the classes out loud.
> >
> > As far as moving stats that operate over a window of the last N
> > measurements, there are multiple options, including Simple
> > (unweighted), Weighted, and Exponentially Weighted, and presumably
> > infinite variations with other weighting functions. In our domain,
> > there is only one weighting function available, but it's still more
> > self-documenting and future-proof to specify the type of windowed
> > statistic. Therefore, I'm proposing "Simple" as the term for the
> > windowed (aka sampled) stats, while keeping Windowed in the name to
> > distinguish it from the all-time metrics.
> >
> > > In financial applications a simple moving average (SMA)
> > > is the unweighted mean of the previous n data.
> >
> > Therefore, we would have the proposed matrix:
> >
> > SimpleWindowedSum, SimpleWindowedCount
> > CumulativeSum, CumulativeCount
> >
> > Again, all these proposed names are less pithy than we might wish, but
> > the whole point of this exercise is to demystify and disambiguate
> > them. It seems like the discussion so far illustrates the futility of
> > trying to find names that are both short and descriptive.
> >
> > How does that sound?
> > -John
> >
> > On Tue, Jul 16, 2019 at 4:43 PM Matthias J. Sax 
> > wrote:
> > >
> > > It's a fair point that Ryanne raises. However, "running sum" is the same
> > > as "moving sum" from my understanding.
> > >
> > > The issue is still, that `Sum` and `Count` which seem to be the cleanest
> > > names cannot be used. While I agree that `TotalSum` and `TotalCount` is
> > > somewhat redundant, I still think it the best suggestion so far.
> > >
> > > For the "sampled" version, I am personally fine with either `MovingXxx`,
> > > `WindowedXxx`, or `RunningXxx` -- to me, that are all equally good to
> > > describe the semantics.
> > >
> > >
> > >
> > > -Matthias
> > >
> > >
> > >
> > > On 7/16/19 2:25 PM, Sophie Blee-Goldman wrote:
> > > > I'm +1 on Windowed, was about to suggest that as I was catching up on
> > the
> > > > discussion but Bill beat me to it :)
> > > >
> > > > On Tue, Jul 16, 2019 at 2:23 PM Bill Bejeck  wrote:
> > > >
> > > >> Hi John,
> > > >>
> > > >> Thanks for the updates.
> > > >>
> > > >> I like RunningCount and RunningSum.
> > > >>
> > > >> What about WindowedCount, 

Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-17 Thread Sophie Blee-Goldman
Thanks for the crash course in statistical terms :)

In light of this I definitely support Cumulative{Sum,Count}, but I'm really
not crazy about SimpleWindowed{Sum,Count} (vs just Windowed). Not so much
because of its unfortunate length (although that is unfortunate it
shouldn't be a deciding factor) but because it seems to have the potential
to confuse further. I'm not sure what we gain by adding "Simple" since to
me at least, the unweighted-ness is obvious and the definition of simple is
not. To those who haven't been exposed to the finer details of statistical
definitions, I think they are more likely to read "SimpleXX" and wonder "is
there an 'advanced' or non-simple kind of Windowed?" than they are to
wonder what is the weighting behind these metrics.

Sophie

On Wed, Jul 17, 2019 at 8:18 AM John Roesler  wrote:

> Thanks for the discussion, all.
>
> I've done a little more research into the statistical terminology.
> Matthias is correct, "running" and "moving" appear to be synonyms.
> Unfortunately, both can be computed either over a window of the last N
> measurements or over all prior measurements. "Moving" just signifies
> that the statistic is computed over a "live" data set, i.e., a
> continuous stream of measurements, and the expectation is that the
> stat would be updated in response to new measurements.
>
> I found https://en.wikipedia.org/wiki/Moving_average to have a pretty
> good overview of the whole picture.
>
> After considering the discussion so far and some light reading, it
> seems like "Cumulative" is truly the correct term for the all-time
> metrics:
>
> > In a cumulative moving average, the data arrive in an
> > ordered datum stream, and the user would like to get
> > the average of all of the data up until the current datum
> > point.
>
> I know that we previously felt that "cumulative" was too much of a
> mouthful, but it seems like our quest for a terser term led us into a
> briar patch. Also, now there is an independent source (the wiki page)
> indicating that this is indeed the correct term, and it doesn't offer
> any synonyms to choose from. Maybe we can take comfort in the fact
> that we'll rarely be saying the name of the classes out loud.
>
> As far as moving stats that operate over a window of the last N
> measurements, there are multiple options, including Simple
> (unweighted), Weighted, and Exponentially Weighted, and presumably
> infinite variations with other weighting functions. In our domain,
> there is only one weighting function available, but it's still more
> self-documenting and future-proof to specify the type of windowed
> statistic. Therefore, I'm proposing "Simple" as the term for the
> windowed (aka sampled) stats, while keeping Windowed in the name to
> distinguish it from the all-time metrics.
>
> > In financial applications a simple moving average (SMA)
> > is the unweighted mean of the previous n data.
>
> Therefore, we would have the proposed matrix:
>
> SimpleWindowedSum, SimpleWindowedCount
> CumulativeSum, CumulativeCount
>
> Again, all these proposed names are less pithy than we might wish, but
> the whole point of this exercise is to demystify and disambiguate
> them. It seems like the discussion so far illustrates the futility of
> trying to find names that are both short and descriptive.
>
> How does that sound?
> -John
>
> On Tue, Jul 16, 2019 at 4:43 PM Matthias J. Sax 
> wrote:
> >
> > It's a fair point that Ryanne raises. However, "running sum" is the same
> > as "moving sum" from my understanding.
> >
> > The issue is still, that `Sum` and `Count` which seem to be the cleanest
> > names cannot be used. While I agree that `TotalSum` and `TotalCount` is
> > somewhat redundant, I still think it the best suggestion so far.
> >
> > For the "sampled" version, I am personally fine with either `MovingXxx`,
> > `WindowedXxx`, or `RunningXxx` -- to me, that are all equally good to
> > describe the semantics.
> >
> >
> >
> > -Matthias
> >
> >
> >
> > On 7/16/19 2:25 PM, Sophie Blee-Goldman wrote:
> > > I'm +1 on Windowed, was about to suggest that as I was catching up on
> the
> > > discussion but Bill beat me to it :)
> > >
> > > On Tue, Jul 16, 2019 at 2:23 PM Bill Bejeck  wrote:
> > >
> > >> Hi John,
> > >>
> > >> Thanks for the updates.
> > >>
> > >> I like RunningCount and RunningSum.
> > >>
> > >> What about WindowedCount, WindowedSum instead of Moving?
> > >> I'm just throwing this out there as Windowed seems more intuitive to
> me,
> > >> but I'm not married to the idea.
> > >>
> > >> -Bill
> > >>
> > >> On Tue, Jul 16, 2019 at 5:09 PM John Roesler 
> wrote:
> > >>
> > >>> No worries! Choosing good public API names is a high-impact design
> > >>> activity.
> > >>>
> > >>> Matthias, Bruno, Bill, and Stanislav,
> > >>>
> > >>> You've all contributed to this discussion or the vote so far... How
> do
> > >>> you feel about the proposed name change:
> > >>>
> > >>> MovingCount, MovingSum   (instead of Sampled)
> > >>> RunningCount, RunningSum   

Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-17 Thread Bill Bejeck
I'm fine with the new names.

While I was previously in the "brevity" camp when it came to the Cumulative
name, I'll take clarity over brevity.

Thanks for the updates, John.

-Bill

On Wed, Jul 17, 2019 at 11:18 AM John Roesler  wrote:

> Thanks for the discussion, all.
>
> I've done a little more research into the statistical terminology.
> Matthias is correct, "running" and "moving" appear to be synonyms.
> Unfortunately, both can be computed either over a window of the last N
> measurements or over all prior measurements. "Moving" just signifies
> that the statistic is computed over a "live" data set, i.e., a
> continuous stream of measurements, and the expectation is that the
> stat would be updated in response to new measurements.
>
> I found https://en.wikipedia.org/wiki/Moving_average to have a pretty
> good overview of the whole picture.
>
> After considering the discussion so far and some light reading, it
> seems like "Cumulative" is truly the correct term for the all-time
> metrics:
>
> > In a cumulative moving average, the data arrive in an
> > ordered datum stream, and the user would like to get
> > the average of all of the data up until the current datum
> > point.
>
> I know that we previously felt that "cumulative" was too much of a
> mouthful, but it seems like our quest for a terser term led us into a
> briar patch. Also, now there is an independent source (the wiki page)
> indicating that this is indeed the correct term, and it doesn't offer
> any synonyms to choose from. Maybe we can take comfort in the fact
> that we'll rarely be saying the name of the classes out loud.
>
> As far as moving stats that operate over a window of the last N
> measurements, there are multiple options, including Simple
> (unweighted), Weighted, and Exponentially Weighted, and presumably
> infinite variations with other weighting functions. In our domain,
> there is only one weighting function available, but it's still more
> self-documenting and future-proof to specify the type of windowed
> statistic. Therefore, I'm proposing "Simple" as the term for the
> windowed (aka sampled) stats, while keeping Windowed in the name to
> distinguish it from the all-time metrics.
>
> > In financial applications a simple moving average (SMA)
> > is the unweighted mean of the previous n data.
>
> Therefore, we would have the proposed matrix:
>
> SimpleWindowedSum, SimpleWindowedCount
> CumulativeSum, CumulativeCount
>
> Again, all these proposed names are less pithy than we might wish, but
> the whole point of this exercise is to demystify and disambiguate
> them. It seems like the discussion so far illustrates the futility of
> trying to find names that are both short and descriptive.
>
> How does that sound?
> -John
>
> On Tue, Jul 16, 2019 at 4:43 PM Matthias J. Sax 
> wrote:
> >
> > It's a fair point that Ryanne raises. However, "running sum" is the same
> > as "moving sum" from my understanding.
> >
> > The issue is still, that `Sum` and `Count` which seem to be the cleanest
> > names cannot be used. While I agree that `TotalSum` and `TotalCount` is
> > somewhat redundant, I still think it the best suggestion so far.
> >
> > For the "sampled" version, I am personally fine with either `MovingXxx`,
> > `WindowedXxx`, or `RunningXxx` -- to me, that are all equally good to
> > describe the semantics.
> >
> >
> >
> > -Matthias
> >
> >
> >
> > On 7/16/19 2:25 PM, Sophie Blee-Goldman wrote:
> > > I'm +1 on Windowed, was about to suggest that as I was catching up on
> the
> > > discussion but Bill beat me to it :)
> > >
> > > On Tue, Jul 16, 2019 at 2:23 PM Bill Bejeck  wrote:
> > >
> > >> Hi John,
> > >>
> > >> Thanks for the updates.
> > >>
> > >> I like RunningCount and RunningSum.
> > >>
> > >> What about WindowedCount, WindowedSum instead of Moving?
> > >> I'm just throwing this out there as Windowed seems more intuitive to
> me,
> > >> but I'm not married to the idea.
> > >>
> > >> -Bill
> > >>
> > >> On Tue, Jul 16, 2019 at 5:09 PM John Roesler 
> wrote:
> > >>
> > >>> No worries! Choosing good public API names is a high-impact design
> > >>> activity.
> > >>>
> > >>> Matthias, Bruno, Bill, and Stanislav,
> > >>>
> > >>> You've all contributed to this discussion or the vote so far... How
> do
> > >>> you feel about the proposed name change:
> > >>>
> > >>> MovingCount, MovingSum   (instead of Sampled)
> > >>> RunningCount, RunningSum   (Instead of Total)
> > >>>
> > >>> Thanks,
> > >>> -John
> > >>>
> > >>> On Tue, Jul 16, 2019 at 3:04 PM Ryanne Dolan 
> > >>> wrote:
> > 
> >  John, that makes sense to me. Sorry for the bikeshedding.
> > 
> >  Ryanne
> > 
> >  On Tue, Jul 16, 2019 at 12:49 PM John Roesler 
> > >> wrote:
> > 
> > > Thanks for the explanation and the suggestion, Ryanne,
> > >
> > > I went with "sampled" just because these are instances of
> > >> SampledStat,
> > > which in the Kafka Metrics ecosystem are computed from a window of
> > > recent samples. 

Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-17 Thread John Roesler
Thanks for the discussion, all.

I've done a little more research into the statistical terminology.
Matthias is correct, "running" and "moving" appear to be synonyms.
Unfortunately, both can be computed either over a window of the last N
measurements or over all prior measurements. "Moving" just signifies
that the statistic is computed over a "live" data set, i.e., a
continuous stream of measurements, and the expectation is that the
stat would be updated in response to new measurements.

I found https://en.wikipedia.org/wiki/Moving_average to have a pretty
good overview of the whole picture.

After considering the discussion so far and some light reading, it
seems like "Cumulative" is truly the correct term for the all-time
metrics:

> In a cumulative moving average, the data arrive in an
> ordered datum stream, and the user would like to get
> the average of all of the data up until the current datum
> point.

I know that we previously felt that "cumulative" was too much of a
mouthful, but it seems like our quest for a terser term led us into a
briar patch. Also, now there is an independent source (the wiki page)
indicating that this is indeed the correct term, and it doesn't offer
any synonyms to choose from. Maybe we can take comfort in the fact
that we'll rarely be saying the name of the classes out loud.

As far as moving stats that operate over a window of the last N
measurements, there are multiple options, including Simple
(unweighted), Weighted, and Exponentially Weighted, and presumably
infinite variations with other weighting functions. In our domain,
there is only one weighting function available, but it's still more
self-documenting and future-proof to specify the type of windowed
statistic. Therefore, I'm proposing "Simple" as the term for the
windowed (aka sampled) stats, while keeping Windowed in the name to
distinguish it from the all-time metrics.

> In financial applications a simple moving average (SMA)
> is the unweighted mean of the previous n data.

Therefore, we would have the proposed matrix:

SimpleWindowedSum, SimpleWindowedCount
CumulativeSum, CumulativeCount

Again, all these proposed names are less pithy than we might wish, but
the whole point of this exercise is to demystify and disambiguate
them. It seems like the discussion so far illustrates the futility of
trying to find names that are both short and descriptive.

How does that sound?
-John

On Tue, Jul 16, 2019 at 4:43 PM Matthias J. Sax  wrote:
>
> It's a fair point that Ryanne raises. However, "running sum" is the same
> as "moving sum" from my understanding.
>
> The issue is still, that `Sum` and `Count` which seem to be the cleanest
> names cannot be used. While I agree that `TotalSum` and `TotalCount` is
> somewhat redundant, I still think it the best suggestion so far.
>
> For the "sampled" version, I am personally fine with either `MovingXxx`,
> `WindowedXxx`, or `RunningXxx` -- to me, that are all equally good to
> describe the semantics.
>
>
>
> -Matthias
>
>
>
> On 7/16/19 2:25 PM, Sophie Blee-Goldman wrote:
> > I'm +1 on Windowed, was about to suggest that as I was catching up on the
> > discussion but Bill beat me to it :)
> >
> > On Tue, Jul 16, 2019 at 2:23 PM Bill Bejeck  wrote:
> >
> >> Hi John,
> >>
> >> Thanks for the updates.
> >>
> >> I like RunningCount and RunningSum.
> >>
> >> What about WindowedCount, WindowedSum instead of Moving?
> >> I'm just throwing this out there as Windowed seems more intuitive to me,
> >> but I'm not married to the idea.
> >>
> >> -Bill
> >>
> >> On Tue, Jul 16, 2019 at 5:09 PM John Roesler  wrote:
> >>
> >>> No worries! Choosing good public API names is a high-impact design
> >>> activity.
> >>>
> >>> Matthias, Bruno, Bill, and Stanislav,
> >>>
> >>> You've all contributed to this discussion or the vote so far... How do
> >>> you feel about the proposed name change:
> >>>
> >>> MovingCount, MovingSum   (instead of Sampled)
> >>> RunningCount, RunningSum   (Instead of Total)
> >>>
> >>> Thanks,
> >>> -John
> >>>
> >>> On Tue, Jul 16, 2019 at 3:04 PM Ryanne Dolan 
> >>> wrote:
> 
>  John, that makes sense to me. Sorry for the bikeshedding.
> 
>  Ryanne
> 
>  On Tue, Jul 16, 2019 at 12:49 PM John Roesler 
> >> wrote:
> 
> > Thanks for the explanation and the suggestion, Ryanne,
> >
> > I went with "sampled" just because these are instances of
> >> SampledStat,
> > which in the Kafka Metrics ecosystem are computed from a window of
> > recent samples. Thinking more about it, the fact that they are
> >> sampled
> > and the fact that they are windowed are orthogonal, which is what
> > you're pointing out... sampling by itself doesn't indicate that it's
> >> a
> > moving average.
> >
> > Since there is no way in Kafka Metrics for a metric to be sampled and
> > not windowed/moving/decaying, calling them Sampled would never be
> > incorrect. But to someone unfamiliar with the code, it wouldn't
> > 

Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-16 Thread Matthias J. Sax
It's a fair point that Ryanne raises. However, "running sum" is the same
as "moving sum" from my understanding.

The issue is still, that `Sum` and `Count` which seem to be the cleanest
names cannot be used. While I agree that `TotalSum` and `TotalCount` is
somewhat redundant, I still think it the best suggestion so far.

For the "sampled" version, I am personally fine with either `MovingXxx`,
`WindowedXxx`, or `RunningXxx` -- to me, that are all equally good to
describe the semantics.



-Matthias



On 7/16/19 2:25 PM, Sophie Blee-Goldman wrote:
> I'm +1 on Windowed, was about to suggest that as I was catching up on the
> discussion but Bill beat me to it :)
> 
> On Tue, Jul 16, 2019 at 2:23 PM Bill Bejeck  wrote:
> 
>> Hi John,
>>
>> Thanks for the updates.
>>
>> I like RunningCount and RunningSum.
>>
>> What about WindowedCount, WindowedSum instead of Moving?
>> I'm just throwing this out there as Windowed seems more intuitive to me,
>> but I'm not married to the idea.
>>
>> -Bill
>>
>> On Tue, Jul 16, 2019 at 5:09 PM John Roesler  wrote:
>>
>>> No worries! Choosing good public API names is a high-impact design
>>> activity.
>>>
>>> Matthias, Bruno, Bill, and Stanislav,
>>>
>>> You've all contributed to this discussion or the vote so far... How do
>>> you feel about the proposed name change:
>>>
>>> MovingCount, MovingSum   (instead of Sampled)
>>> RunningCount, RunningSum   (Instead of Total)
>>>
>>> Thanks,
>>> -John
>>>
>>> On Tue, Jul 16, 2019 at 3:04 PM Ryanne Dolan 
>>> wrote:

 John, that makes sense to me. Sorry for the bikeshedding.

 Ryanne

 On Tue, Jul 16, 2019 at 12:49 PM John Roesler 
>> wrote:

> Thanks for the explanation and the suggestion, Ryanne,
>
> I went with "sampled" just because these are instances of
>> SampledStat,
> which in the Kafka Metrics ecosystem are computed from a window of
> recent samples. Thinking more about it, the fact that they are
>> sampled
> and the fact that they are windowed are orthogonal, which is what
> you're pointing out... sampling by itself doesn't indicate that it's
>> a
> moving average.
>
> Since there is no way in Kafka Metrics for a metric to be sampled and
> not windowed/moving/decaying, calling them Sampled would never be
> incorrect. But to someone unfamiliar with the code, it wouldn't
> immediately suggest the behavior of the metric that actually matters.
> That is, the behavior that distinguishes the two classes of metrics
>> we
> want to disambiguate here.
>
> It sounds like you'd suggest a new matrix of names:
> MovingCount, MovingSum
> RunningCount, RunningSum
>
> Are these names unambiguous and self explanatory?
>
> Thanks,
> -John
>
> On Tue, Jul 16, 2019 at 12:32 PM Ryanne Dolan >>
> wrote:
>>
>>> measurements, which decay/expire over time
>>
>> Thanks John for the clarification. This was my (re-)reading of the
>>> code,
>> but this is not what I think of when I hear "sampled". In fact,
>>> you'll
>> notice that the Wikipedia pages for "Sample (statistics)" and
>> "Sample
>> (signal processing)" do not contain the words decay, expire,
>> recent,
>> history, or anything similar.
>>
>> Similar to "running", I'd suggest the more correct "moving", as in
> "moving
>> average" and "moving sum", which involve looking back N samples,
> applying a
>> sliding window, decaying over time etc.
>>
>> Ryanne
>>
>> On Tue, Jul 16, 2019, 11:58 AM John Roesler 
>>> wrote:
>>
>>> Thanks for raising this concern, Ryanne,
>>>
>>> "Sampled" indicates that the metrics is sampled, namely that we
>>> maintain a set of samples from recent value measurements, which
>>> decay/expire over time. So, the metric value is only
>>> representative of
>>> the recent past.
>>>
>>> "Total" indicates that the metric value contains all the
>>> information
>>> from the creation of the metric. For example., the total sum
>> would
>>> include all measurements since the app started up.
>>>
>>> It seems like your concern is that the word "total" doesn't
>> really
>>> pinpoint this meaning, which is true. It's especially confusing
>>> that
>>> another meaning of "total" is synonymous with "sum", rendering
>> the
>>> name "TotalSum" sort of absurd.
>>>
>>> We previously considered "cumulative", which was rejected as a
>>> mouthful (it's four syllables) .
>>>
>>> You mentioned "running", which might be a more appropriate
>> modifier
>>> (RunningSum and RunningCount).
>>>
>>> What would everyone think about that?
>>>
>>> Thanks,
>>> -John
>>>
>>> On Tue, Jul 16, 2019 at 11:27 AM Ryanne Dolan <
>>> ryannedo...@gmail.com>
>>> wrote:

 John, I mentioned on the VOTE thread that the proposed names
>> are
>>> a
> bit
 confusing,


Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-16 Thread Sophie Blee-Goldman
I'm +1 on Windowed, was about to suggest that as I was catching up on the
discussion but Bill beat me to it :)

On Tue, Jul 16, 2019 at 2:23 PM Bill Bejeck  wrote:

> Hi John,
>
> Thanks for the updates.
>
> I like RunningCount and RunningSum.
>
> What about WindowedCount, WindowedSum instead of Moving?
> I'm just throwing this out there as Windowed seems more intuitive to me,
> but I'm not married to the idea.
>
> -Bill
>
> On Tue, Jul 16, 2019 at 5:09 PM John Roesler  wrote:
>
> > No worries! Choosing good public API names is a high-impact design
> > activity.
> >
> > Matthias, Bruno, Bill, and Stanislav,
> >
> > You've all contributed to this discussion or the vote so far... How do
> > you feel about the proposed name change:
> >
> > MovingCount, MovingSum   (instead of Sampled)
> > RunningCount, RunningSum   (Instead of Total)
> >
> > Thanks,
> > -John
> >
> > On Tue, Jul 16, 2019 at 3:04 PM Ryanne Dolan 
> > wrote:
> > >
> > > John, that makes sense to me. Sorry for the bikeshedding.
> > >
> > > Ryanne
> > >
> > > On Tue, Jul 16, 2019 at 12:49 PM John Roesler 
> wrote:
> > >
> > > > Thanks for the explanation and the suggestion, Ryanne,
> > > >
> > > > I went with "sampled" just because these are instances of
> SampledStat,
> > > > which in the Kafka Metrics ecosystem are computed from a window of
> > > > recent samples. Thinking more about it, the fact that they are
> sampled
> > > > and the fact that they are windowed are orthogonal, which is what
> > > > you're pointing out... sampling by itself doesn't indicate that it's
> a
> > > > moving average.
> > > >
> > > > Since there is no way in Kafka Metrics for a metric to be sampled and
> > > > not windowed/moving/decaying, calling them Sampled would never be
> > > > incorrect. But to someone unfamiliar with the code, it wouldn't
> > > > immediately suggest the behavior of the metric that actually matters.
> > > > That is, the behavior that distinguishes the two classes of metrics
> we
> > > > want to disambiguate here.
> > > >
> > > > It sounds like you'd suggest a new matrix of names:
> > > > MovingCount, MovingSum
> > > > RunningCount, RunningSum
> > > >
> > > > Are these names unambiguous and self explanatory?
> > > >
> > > > Thanks,
> > > > -John
> > > >
> > > > On Tue, Jul 16, 2019 at 12:32 PM Ryanne Dolan  >
> > > > wrote:
> > > > >
> > > > > > measurements, which decay/expire over time
> > > > >
> > > > > Thanks John for the clarification. This was my (re-)reading of the
> > code,
> > > > > but this is not what I think of when I hear "sampled". In fact,
> > you'll
> > > > > notice that the Wikipedia pages for "Sample (statistics)" and
> "Sample
> > > > > (signal processing)" do not contain the words decay, expire,
> recent,
> > > > > history, or anything similar.
> > > > >
> > > > > Similar to "running", I'd suggest the more correct "moving", as in
> > > > "moving
> > > > > average" and "moving sum", which involve looking back N samples,
> > > > applying a
> > > > > sliding window, decaying over time etc.
> > > > >
> > > > > Ryanne
> > > > >
> > > > > On Tue, Jul 16, 2019, 11:58 AM John Roesler 
> > wrote:
> > > > >
> > > > > > Thanks for raising this concern, Ryanne,
> > > > > >
> > > > > > "Sampled" indicates that the metrics is sampled, namely that we
> > > > > > maintain a set of samples from recent value measurements, which
> > > > > > decay/expire over time. So, the metric value is only
> > representative of
> > > > > > the recent past.
> > > > > >
> > > > > > "Total" indicates that the metric value contains all the
> > information
> > > > > > from the creation of the metric. For example., the total sum
> would
> > > > > > include all measurements since the app started up.
> > > > > >
> > > > > > It seems like your concern is that the word "total" doesn't
> really
> > > > > > pinpoint this meaning, which is true. It's especially confusing
> > that
> > > > > > another meaning of "total" is synonymous with "sum", rendering
> the
> > > > > > name "TotalSum" sort of absurd.
> > > > > >
> > > > > > We previously considered "cumulative", which was rejected as a
> > > > > > mouthful (it's four syllables) .
> > > > > >
> > > > > > You mentioned "running", which might be a more appropriate
> modifier
> > > > > > (RunningSum and RunningCount).
> > > > > >
> > > > > > What would everyone think about that?
> > > > > >
> > > > > > Thanks,
> > > > > > -John
> > > > > >
> > > > > > On Tue, Jul 16, 2019 at 11:27 AM Ryanne Dolan <
> > ryannedo...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > John, I mentioned on the VOTE thread that the proposed names
> are
> > a
> > > > bit
> > > > > > > confusing,
> > > > > > >
> > > > > > > > given that "sum", "total", and "count" are roughly
> > synonymous...
> > > > > > >
> > > > > > > In particular, TotalSum is, I think, a "running total", though
> > the
> > > > naming
> > > > > > > doesn't really capture that.
> > > > > > >
> > > > > > > I think to avoid confusion, we should define exactly what
> > 

Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-16 Thread Bill Bejeck
Hi John,

Thanks for the updates.

I like RunningCount and RunningSum.

What about WindowedCount, WindowedSum instead of Moving?
I'm just throwing this out there as Windowed seems more intuitive to me,
but I'm not married to the idea.

-Bill

On Tue, Jul 16, 2019 at 5:09 PM John Roesler  wrote:

> No worries! Choosing good public API names is a high-impact design
> activity.
>
> Matthias, Bruno, Bill, and Stanislav,
>
> You've all contributed to this discussion or the vote so far... How do
> you feel about the proposed name change:
>
> MovingCount, MovingSum   (instead of Sampled)
> RunningCount, RunningSum   (Instead of Total)
>
> Thanks,
> -John
>
> On Tue, Jul 16, 2019 at 3:04 PM Ryanne Dolan 
> wrote:
> >
> > John, that makes sense to me. Sorry for the bikeshedding.
> >
> > Ryanne
> >
> > On Tue, Jul 16, 2019 at 12:49 PM John Roesler  wrote:
> >
> > > Thanks for the explanation and the suggestion, Ryanne,
> > >
> > > I went with "sampled" just because these are instances of SampledStat,
> > > which in the Kafka Metrics ecosystem are computed from a window of
> > > recent samples. Thinking more about it, the fact that they are sampled
> > > and the fact that they are windowed are orthogonal, which is what
> > > you're pointing out... sampling by itself doesn't indicate that it's a
> > > moving average.
> > >
> > > Since there is no way in Kafka Metrics for a metric to be sampled and
> > > not windowed/moving/decaying, calling them Sampled would never be
> > > incorrect. But to someone unfamiliar with the code, it wouldn't
> > > immediately suggest the behavior of the metric that actually matters.
> > > That is, the behavior that distinguishes the two classes of metrics we
> > > want to disambiguate here.
> > >
> > > It sounds like you'd suggest a new matrix of names:
> > > MovingCount, MovingSum
> > > RunningCount, RunningSum
> > >
> > > Are these names unambiguous and self explanatory?
> > >
> > > Thanks,
> > > -John
> > >
> > > On Tue, Jul 16, 2019 at 12:32 PM Ryanne Dolan 
> > > wrote:
> > > >
> > > > > measurements, which decay/expire over time
> > > >
> > > > Thanks John for the clarification. This was my (re-)reading of the
> code,
> > > > but this is not what I think of when I hear "sampled". In fact,
> you'll
> > > > notice that the Wikipedia pages for "Sample (statistics)" and "Sample
> > > > (signal processing)" do not contain the words decay, expire, recent,
> > > > history, or anything similar.
> > > >
> > > > Similar to "running", I'd suggest the more correct "moving", as in
> > > "moving
> > > > average" and "moving sum", which involve looking back N samples,
> > > applying a
> > > > sliding window, decaying over time etc.
> > > >
> > > > Ryanne
> > > >
> > > > On Tue, Jul 16, 2019, 11:58 AM John Roesler 
> wrote:
> > > >
> > > > > Thanks for raising this concern, Ryanne,
> > > > >
> > > > > "Sampled" indicates that the metrics is sampled, namely that we
> > > > > maintain a set of samples from recent value measurements, which
> > > > > decay/expire over time. So, the metric value is only
> representative of
> > > > > the recent past.
> > > > >
> > > > > "Total" indicates that the metric value contains all the
> information
> > > > > from the creation of the metric. For example., the total sum would
> > > > > include all measurements since the app started up.
> > > > >
> > > > > It seems like your concern is that the word "total" doesn't really
> > > > > pinpoint this meaning, which is true. It's especially confusing
> that
> > > > > another meaning of "total" is synonymous with "sum", rendering the
> > > > > name "TotalSum" sort of absurd.
> > > > >
> > > > > We previously considered "cumulative", which was rejected as a
> > > > > mouthful (it's four syllables) .
> > > > >
> > > > > You mentioned "running", which might be a more appropriate modifier
> > > > > (RunningSum and RunningCount).
> > > > >
> > > > > What would everyone think about that?
> > > > >
> > > > > Thanks,
> > > > > -John
> > > > >
> > > > > On Tue, Jul 16, 2019 at 11:27 AM Ryanne Dolan <
> ryannedo...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > John, I mentioned on the VOTE thread that the proposed names are
> a
> > > bit
> > > > > > confusing,
> > > > > >
> > > > > > > given that "sum", "total", and "count" are roughly
> synonymous...
> > > > > >
> > > > > > In particular, TotalSum is, I think, a "running total", though
> the
> > > naming
> > > > > > doesn't really capture that.
> > > > > >
> > > > > > I think to avoid confusion, we should define exactly what
> "total" and
> > > > > > "sampled" are supposed to indicate, and perhaps pick appropriate
> > > naming
> > > > > > from there.
> > > > > >
> > > > > > Ryanne
> > > > > >
> > > > > >
> > > > > > On Fri, Jul 12, 2019 at 1:42 PM John Roesler 
> > > wrote:
> > > > > >
> > > > > > > Hey, thanks Matthias and Bruno,
> > > > > > >
> > > > > > > I agree, "Cumulative" is a mouthful. "TotalX" sounds fine to
> me.
> > > > > > >
> > > > > > > Also, yes, I would have 

Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-16 Thread John Roesler
No worries! Choosing good public API names is a high-impact design activity.

Matthias, Bruno, Bill, and Stanislav,

You've all contributed to this discussion or the vote so far... How do
you feel about the proposed name change:

MovingCount, MovingSum   (instead of Sampled)
RunningCount, RunningSum   (Instead of Total)

Thanks,
-John

On Tue, Jul 16, 2019 at 3:04 PM Ryanne Dolan  wrote:
>
> John, that makes sense to me. Sorry for the bikeshedding.
>
> Ryanne
>
> On Tue, Jul 16, 2019 at 12:49 PM John Roesler  wrote:
>
> > Thanks for the explanation and the suggestion, Ryanne,
> >
> > I went with "sampled" just because these are instances of SampledStat,
> > which in the Kafka Metrics ecosystem are computed from a window of
> > recent samples. Thinking more about it, the fact that they are sampled
> > and the fact that they are windowed are orthogonal, which is what
> > you're pointing out... sampling by itself doesn't indicate that it's a
> > moving average.
> >
> > Since there is no way in Kafka Metrics for a metric to be sampled and
> > not windowed/moving/decaying, calling them Sampled would never be
> > incorrect. But to someone unfamiliar with the code, it wouldn't
> > immediately suggest the behavior of the metric that actually matters.
> > That is, the behavior that distinguishes the two classes of metrics we
> > want to disambiguate here.
> >
> > It sounds like you'd suggest a new matrix of names:
> > MovingCount, MovingSum
> > RunningCount, RunningSum
> >
> > Are these names unambiguous and self explanatory?
> >
> > Thanks,
> > -John
> >
> > On Tue, Jul 16, 2019 at 12:32 PM Ryanne Dolan 
> > wrote:
> > >
> > > > measurements, which decay/expire over time
> > >
> > > Thanks John for the clarification. This was my (re-)reading of the code,
> > > but this is not what I think of when I hear "sampled". In fact, you'll
> > > notice that the Wikipedia pages for "Sample (statistics)" and "Sample
> > > (signal processing)" do not contain the words decay, expire, recent,
> > > history, or anything similar.
> > >
> > > Similar to "running", I'd suggest the more correct "moving", as in
> > "moving
> > > average" and "moving sum", which involve looking back N samples,
> > applying a
> > > sliding window, decaying over time etc.
> > >
> > > Ryanne
> > >
> > > On Tue, Jul 16, 2019, 11:58 AM John Roesler  wrote:
> > >
> > > > Thanks for raising this concern, Ryanne,
> > > >
> > > > "Sampled" indicates that the metrics is sampled, namely that we
> > > > maintain a set of samples from recent value measurements, which
> > > > decay/expire over time. So, the metric value is only representative of
> > > > the recent past.
> > > >
> > > > "Total" indicates that the metric value contains all the information
> > > > from the creation of the metric. For example., the total sum would
> > > > include all measurements since the app started up.
> > > >
> > > > It seems like your concern is that the word "total" doesn't really
> > > > pinpoint this meaning, which is true. It's especially confusing that
> > > > another meaning of "total" is synonymous with "sum", rendering the
> > > > name "TotalSum" sort of absurd.
> > > >
> > > > We previously considered "cumulative", which was rejected as a
> > > > mouthful (it's four syllables) .
> > > >
> > > > You mentioned "running", which might be a more appropriate modifier
> > > > (RunningSum and RunningCount).
> > > >
> > > > What would everyone think about that?
> > > >
> > > > Thanks,
> > > > -John
> > > >
> > > > On Tue, Jul 16, 2019 at 11:27 AM Ryanne Dolan 
> > > > wrote:
> > > > >
> > > > > John, I mentioned on the VOTE thread that the proposed names are a
> > bit
> > > > > confusing,
> > > > >
> > > > > > given that "sum", "total", and "count" are roughly synonymous...
> > > > >
> > > > > In particular, TotalSum is, I think, a "running total", though the
> > naming
> > > > > doesn't really capture that.
> > > > >
> > > > > I think to avoid confusion, we should define exactly what "total" and
> > > > > "sampled" are supposed to indicate, and perhaps pick appropriate
> > naming
> > > > > from there.
> > > > >
> > > > > Ryanne
> > > > >
> > > > >
> > > > > On Fri, Jul 12, 2019 at 1:42 PM John Roesler 
> > wrote:
> > > > >
> > > > > > Hey, thanks Matthias and Bruno,
> > > > > >
> > > > > > I agree, "Cumulative" is a mouthful. "TotalX" sounds fine to me.
> > > > > >
> > > > > > Also, yes, I would have liked to not have any modifier for
> > > > > > "non-sampled", but there is a name conflict with Sum.
> > > > > >
> > > > > > I'll update the KIP to reflect "TotalX" and then start the vote
> > thread.
> > > > > >
> > > > > > Thanks again,
> > > > > > -John
> > > > > >
> > > > > > On Fri, Jul 12, 2019 at 11:27 AM Bruno Cadonna  > >
> > > > wrote:
> > > > > > >
> > > > > > > OK, makes sense. Then, I am in favour of TotalCount and TotalSum.
> > > > > > >
> > > > > > > Best,
> > > > > > > Bruno
> > > > > > >
> > > > > > > On Fri, Jul 12, 2019 at 12:57 AM Matthias J. Sax <
> > > > 

Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-16 Thread Ryanne Dolan
John, that makes sense to me. Sorry for the bikeshedding.

Ryanne

On Tue, Jul 16, 2019 at 12:49 PM John Roesler  wrote:

> Thanks for the explanation and the suggestion, Ryanne,
>
> I went with "sampled" just because these are instances of SampledStat,
> which in the Kafka Metrics ecosystem are computed from a window of
> recent samples. Thinking more about it, the fact that they are sampled
> and the fact that they are windowed are orthogonal, which is what
> you're pointing out... sampling by itself doesn't indicate that it's a
> moving average.
>
> Since there is no way in Kafka Metrics for a metric to be sampled and
> not windowed/moving/decaying, calling them Sampled would never be
> incorrect. But to someone unfamiliar with the code, it wouldn't
> immediately suggest the behavior of the metric that actually matters.
> That is, the behavior that distinguishes the two classes of metrics we
> want to disambiguate here.
>
> It sounds like you'd suggest a new matrix of names:
> MovingCount, MovingSum
> RunningCount, RunningSum
>
> Are these names unambiguous and self explanatory?
>
> Thanks,
> -John
>
> On Tue, Jul 16, 2019 at 12:32 PM Ryanne Dolan 
> wrote:
> >
> > > measurements, which decay/expire over time
> >
> > Thanks John for the clarification. This was my (re-)reading of the code,
> > but this is not what I think of when I hear "sampled". In fact, you'll
> > notice that the Wikipedia pages for "Sample (statistics)" and "Sample
> > (signal processing)" do not contain the words decay, expire, recent,
> > history, or anything similar.
> >
> > Similar to "running", I'd suggest the more correct "moving", as in
> "moving
> > average" and "moving sum", which involve looking back N samples,
> applying a
> > sliding window, decaying over time etc.
> >
> > Ryanne
> >
> > On Tue, Jul 16, 2019, 11:58 AM John Roesler  wrote:
> >
> > > Thanks for raising this concern, Ryanne,
> > >
> > > "Sampled" indicates that the metrics is sampled, namely that we
> > > maintain a set of samples from recent value measurements, which
> > > decay/expire over time. So, the metric value is only representative of
> > > the recent past.
> > >
> > > "Total" indicates that the metric value contains all the information
> > > from the creation of the metric. For example., the total sum would
> > > include all measurements since the app started up.
> > >
> > > It seems like your concern is that the word "total" doesn't really
> > > pinpoint this meaning, which is true. It's especially confusing that
> > > another meaning of "total" is synonymous with "sum", rendering the
> > > name "TotalSum" sort of absurd.
> > >
> > > We previously considered "cumulative", which was rejected as a
> > > mouthful (it's four syllables) .
> > >
> > > You mentioned "running", which might be a more appropriate modifier
> > > (RunningSum and RunningCount).
> > >
> > > What would everyone think about that?
> > >
> > > Thanks,
> > > -John
> > >
> > > On Tue, Jul 16, 2019 at 11:27 AM Ryanne Dolan 
> > > wrote:
> > > >
> > > > John, I mentioned on the VOTE thread that the proposed names are a
> bit
> > > > confusing,
> > > >
> > > > > given that "sum", "total", and "count" are roughly synonymous...
> > > >
> > > > In particular, TotalSum is, I think, a "running total", though the
> naming
> > > > doesn't really capture that.
> > > >
> > > > I think to avoid confusion, we should define exactly what "total" and
> > > > "sampled" are supposed to indicate, and perhaps pick appropriate
> naming
> > > > from there.
> > > >
> > > > Ryanne
> > > >
> > > >
> > > > On Fri, Jul 12, 2019 at 1:42 PM John Roesler 
> wrote:
> > > >
> > > > > Hey, thanks Matthias and Bruno,
> > > > >
> > > > > I agree, "Cumulative" is a mouthful. "TotalX" sounds fine to me.
> > > > >
> > > > > Also, yes, I would have liked to not have any modifier for
> > > > > "non-sampled", but there is a name conflict with Sum.
> > > > >
> > > > > I'll update the KIP to reflect "TotalX" and then start the vote
> thread.
> > > > >
> > > > > Thanks again,
> > > > > -John
> > > > >
> > > > > On Fri, Jul 12, 2019 at 11:27 AM Bruno Cadonna  >
> > > wrote:
> > > > > >
> > > > > > OK, makes sense. Then, I am in favour of TotalCount and TotalSum.
> > > > > >
> > > > > > Best,
> > > > > > Bruno
> > > > > >
> > > > > > On Fri, Jul 12, 2019 at 12:57 AM Matthias J. Sax <
> > > matth...@confluent.io>
> > > > > wrote:
> > > > > > >
> > > > > > > `Sum` is an existing name, for the "sampled sum" metric, that
> gets
> > > > > > > deprecated. Hence, we cannot use it.
> > > > > > >
> > > > > > > If we cannot use `Sum` and use `TotalSum`, we should also not
> use
> > > > > > > `Count` but `TotalCount` for consistency.
> > > > > > >
> > > > > > >
> > > > > > > -Matthias
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On 7/11/19 12:58 PM, Bruno Cadonna wrote:
> > > > > > > > Hi John,
> > > > > > > >
> > > > > > > > Thank you for the KIP.
> > > > > > > >
> > > > > > > > LGTM
> > > > > > > >
> > > > > > > > I also do 

Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-16 Thread John Roesler
Thanks for the explanation and the suggestion, Ryanne,

I went with "sampled" just because these are instances of SampledStat,
which in the Kafka Metrics ecosystem are computed from a window of
recent samples. Thinking more about it, the fact that they are sampled
and the fact that they are windowed are orthogonal, which is what
you're pointing out... sampling by itself doesn't indicate that it's a
moving average.

Since there is no way in Kafka Metrics for a metric to be sampled and
not windowed/moving/decaying, calling them Sampled would never be
incorrect. But to someone unfamiliar with the code, it wouldn't
immediately suggest the behavior of the metric that actually matters.
That is, the behavior that distinguishes the two classes of metrics we
want to disambiguate here.

It sounds like you'd suggest a new matrix of names:
MovingCount, MovingSum
RunningCount, RunningSum

Are these names unambiguous and self explanatory?

Thanks,
-John

On Tue, Jul 16, 2019 at 12:32 PM Ryanne Dolan  wrote:
>
> > measurements, which decay/expire over time
>
> Thanks John for the clarification. This was my (re-)reading of the code,
> but this is not what I think of when I hear "sampled". In fact, you'll
> notice that the Wikipedia pages for "Sample (statistics)" and "Sample
> (signal processing)" do not contain the words decay, expire, recent,
> history, or anything similar.
>
> Similar to "running", I'd suggest the more correct "moving", as in "moving
> average" and "moving sum", which involve looking back N samples, applying a
> sliding window, decaying over time etc.
>
> Ryanne
>
> On Tue, Jul 16, 2019, 11:58 AM John Roesler  wrote:
>
> > Thanks for raising this concern, Ryanne,
> >
> > "Sampled" indicates that the metrics is sampled, namely that we
> > maintain a set of samples from recent value measurements, which
> > decay/expire over time. So, the metric value is only representative of
> > the recent past.
> >
> > "Total" indicates that the metric value contains all the information
> > from the creation of the metric. For example., the total sum would
> > include all measurements since the app started up.
> >
> > It seems like your concern is that the word "total" doesn't really
> > pinpoint this meaning, which is true. It's especially confusing that
> > another meaning of "total" is synonymous with "sum", rendering the
> > name "TotalSum" sort of absurd.
> >
> > We previously considered "cumulative", which was rejected as a
> > mouthful (it's four syllables) .
> >
> > You mentioned "running", which might be a more appropriate modifier
> > (RunningSum and RunningCount).
> >
> > What would everyone think about that?
> >
> > Thanks,
> > -John
> >
> > On Tue, Jul 16, 2019 at 11:27 AM Ryanne Dolan 
> > wrote:
> > >
> > > John, I mentioned on the VOTE thread that the proposed names are a bit
> > > confusing,
> > >
> > > > given that "sum", "total", and "count" are roughly synonymous...
> > >
> > > In particular, TotalSum is, I think, a "running total", though the naming
> > > doesn't really capture that.
> > >
> > > I think to avoid confusion, we should define exactly what "total" and
> > > "sampled" are supposed to indicate, and perhaps pick appropriate naming
> > > from there.
> > >
> > > Ryanne
> > >
> > >
> > > On Fri, Jul 12, 2019 at 1:42 PM John Roesler  wrote:
> > >
> > > > Hey, thanks Matthias and Bruno,
> > > >
> > > > I agree, "Cumulative" is a mouthful. "TotalX" sounds fine to me.
> > > >
> > > > Also, yes, I would have liked to not have any modifier for
> > > > "non-sampled", but there is a name conflict with Sum.
> > > >
> > > > I'll update the KIP to reflect "TotalX" and then start the vote thread.
> > > >
> > > > Thanks again,
> > > > -John
> > > >
> > > > On Fri, Jul 12, 2019 at 11:27 AM Bruno Cadonna 
> > wrote:
> > > > >
> > > > > OK, makes sense. Then, I am in favour of TotalCount and TotalSum.
> > > > >
> > > > > Best,
> > > > > Bruno
> > > > >
> > > > > On Fri, Jul 12, 2019 at 12:57 AM Matthias J. Sax <
> > matth...@confluent.io>
> > > > wrote:
> > > > > >
> > > > > > `Sum` is an existing name, for the "sampled sum" metric, that gets
> > > > > > deprecated. Hence, we cannot use it.
> > > > > >
> > > > > > If we cannot use `Sum` and use `TotalSum`, we should also not use
> > > > > > `Count` but `TotalCount` for consistency.
> > > > > >
> > > > > >
> > > > > > -Matthias
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 7/11/19 12:58 PM, Bruno Cadonna wrote:
> > > > > > > Hi John,
> > > > > > >
> > > > > > > Thank you for the KIP.
> > > > > > >
> > > > > > > LGTM
> > > > > > >
> > > > > > > I also do not like CumulativeSum/Count so much. I propose to just
> > > > call
> > > > > > > it Sum and Count.
> > > > > > >
> > > > > > > I understand that you want to unequivocally distinguish the two
> > > > metric
> > > > > > > functions by their names, but I have the feeling the names become
> > > > > > > artificially complex. The exact semantics can also be documented
> > in
> > > > > > > the javadocs, 

Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-16 Thread Ryanne Dolan
> measurements, which decay/expire over time

Thanks John for the clarification. This was my (re-)reading of the code,
but this is not what I think of when I hear "sampled". In fact, you'll
notice that the Wikipedia pages for "Sample (statistics)" and "Sample
(signal processing)" do not contain the words decay, expire, recent,
history, or anything similar.

Similar to "running", I'd suggest the more correct "moving", as in "moving
average" and "moving sum", which involve looking back N samples, applying a
sliding window, decaying over time etc.

Ryanne

On Tue, Jul 16, 2019, 11:58 AM John Roesler  wrote:

> Thanks for raising this concern, Ryanne,
>
> "Sampled" indicates that the metrics is sampled, namely that we
> maintain a set of samples from recent value measurements, which
> decay/expire over time. So, the metric value is only representative of
> the recent past.
>
> "Total" indicates that the metric value contains all the information
> from the creation of the metric. For example., the total sum would
> include all measurements since the app started up.
>
> It seems like your concern is that the word "total" doesn't really
> pinpoint this meaning, which is true. It's especially confusing that
> another meaning of "total" is synonymous with "sum", rendering the
> name "TotalSum" sort of absurd.
>
> We previously considered "cumulative", which was rejected as a
> mouthful (it's four syllables) .
>
> You mentioned "running", which might be a more appropriate modifier
> (RunningSum and RunningCount).
>
> What would everyone think about that?
>
> Thanks,
> -John
>
> On Tue, Jul 16, 2019 at 11:27 AM Ryanne Dolan 
> wrote:
> >
> > John, I mentioned on the VOTE thread that the proposed names are a bit
> > confusing,
> >
> > > given that "sum", "total", and "count" are roughly synonymous...
> >
> > In particular, TotalSum is, I think, a "running total", though the naming
> > doesn't really capture that.
> >
> > I think to avoid confusion, we should define exactly what "total" and
> > "sampled" are supposed to indicate, and perhaps pick appropriate naming
> > from there.
> >
> > Ryanne
> >
> >
> > On Fri, Jul 12, 2019 at 1:42 PM John Roesler  wrote:
> >
> > > Hey, thanks Matthias and Bruno,
> > >
> > > I agree, "Cumulative" is a mouthful. "TotalX" sounds fine to me.
> > >
> > > Also, yes, I would have liked to not have any modifier for
> > > "non-sampled", but there is a name conflict with Sum.
> > >
> > > I'll update the KIP to reflect "TotalX" and then start the vote thread.
> > >
> > > Thanks again,
> > > -John
> > >
> > > On Fri, Jul 12, 2019 at 11:27 AM Bruno Cadonna 
> wrote:
> > > >
> > > > OK, makes sense. Then, I am in favour of TotalCount and TotalSum.
> > > >
> > > > Best,
> > > > Bruno
> > > >
> > > > On Fri, Jul 12, 2019 at 12:57 AM Matthias J. Sax <
> matth...@confluent.io>
> > > wrote:
> > > > >
> > > > > `Sum` is an existing name, for the "sampled sum" metric, that gets
> > > > > deprecated. Hence, we cannot use it.
> > > > >
> > > > > If we cannot use `Sum` and use `TotalSum`, we should also not use
> > > > > `Count` but `TotalCount` for consistency.
> > > > >
> > > > >
> > > > > -Matthias
> > > > >
> > > > >
> > > > >
> > > > > On 7/11/19 12:58 PM, Bruno Cadonna wrote:
> > > > > > Hi John,
> > > > > >
> > > > > > Thank you for the KIP.
> > > > > >
> > > > > > LGTM
> > > > > >
> > > > > > I also do not like CumulativeSum/Count so much. I propose to just
> > > call
> > > > > > it Sum and Count.
> > > > > >
> > > > > > I understand that you want to unequivocally distinguish the two
> > > metric
> > > > > > functions by their names, but I have the feeling the names become
> > > > > > artificially complex. The exact semantics can also be documented
> in
> > > > > > the javadocs, which btw could also be improved in those classes.
> > > > > >
> > > > > > Best,
> > > > > > Bruno
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Jul 11, 2019 at 8:25 PM Matthias J. Sax <
> > > matth...@confluent.io> wrote:
> > > > > >>
> > > > > >> Thanks for the KIP. Overall LGTM.
> > > > > >>
> > > > > >> The only though I have is, if we may want to use `TotalSum` and
> > > > > >> `TotalCount` instead of `CumulativeSum/Count` as names?
> > > > > >>
> > > > > >>
> > > > > >> -Matthias
> > > > > >>
> > > > > >>
> > > > > >> On 7/11/19 9:31 AM, John Roesler wrote:
> > > > > >>> Hi Kafka devs,
> > > > > >>>
> > > > > >>> I'd like to propose KIP-488 as a minor cleanup of some of our
> > > metric
> > > > > >>> implementations.
> > > > > >>>
> > > > > >>> KIP-488: https://cwiki.apache.org/confluence/x/kkAyBw
> > > > > >>>
> > > > > >>> Over time, iterative updates to these metrics has resulted in a
> > > pretty
> > > > > >>> confusing little collection of classes, and I've personally
> been
> > > > > >>> involved in three separate moderately time-consuming
> iterations of
> > > me
> > > > > >>> or someone else trying to work out which metrics are
> available, and
> > > > > >>> which ones are desired for a given use 

Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-16 Thread John Roesler
Thanks for raising this concern, Ryanne,

"Sampled" indicates that the metrics is sampled, namely that we
maintain a set of samples from recent value measurements, which
decay/expire over time. So, the metric value is only representative of
the recent past.

"Total" indicates that the metric value contains all the information
from the creation of the metric. For example., the total sum would
include all measurements since the app started up.

It seems like your concern is that the word "total" doesn't really
pinpoint this meaning, which is true. It's especially confusing that
another meaning of "total" is synonymous with "sum", rendering the
name "TotalSum" sort of absurd.

We previously considered "cumulative", which was rejected as a
mouthful (it's four syllables) .

You mentioned "running", which might be a more appropriate modifier
(RunningSum and RunningCount).

What would everyone think about that?

Thanks,
-John

On Tue, Jul 16, 2019 at 11:27 AM Ryanne Dolan  wrote:
>
> John, I mentioned on the VOTE thread that the proposed names are a bit
> confusing,
>
> > given that "sum", "total", and "count" are roughly synonymous...
>
> In particular, TotalSum is, I think, a "running total", though the naming
> doesn't really capture that.
>
> I think to avoid confusion, we should define exactly what "total" and
> "sampled" are supposed to indicate, and perhaps pick appropriate naming
> from there.
>
> Ryanne
>
>
> On Fri, Jul 12, 2019 at 1:42 PM John Roesler  wrote:
>
> > Hey, thanks Matthias and Bruno,
> >
> > I agree, "Cumulative" is a mouthful. "TotalX" sounds fine to me.
> >
> > Also, yes, I would have liked to not have any modifier for
> > "non-sampled", but there is a name conflict with Sum.
> >
> > I'll update the KIP to reflect "TotalX" and then start the vote thread.
> >
> > Thanks again,
> > -John
> >
> > On Fri, Jul 12, 2019 at 11:27 AM Bruno Cadonna  wrote:
> > >
> > > OK, makes sense. Then, I am in favour of TotalCount and TotalSum.
> > >
> > > Best,
> > > Bruno
> > >
> > > On Fri, Jul 12, 2019 at 12:57 AM Matthias J. Sax 
> > wrote:
> > > >
> > > > `Sum` is an existing name, for the "sampled sum" metric, that gets
> > > > deprecated. Hence, we cannot use it.
> > > >
> > > > If we cannot use `Sum` and use `TotalSum`, we should also not use
> > > > `Count` but `TotalCount` for consistency.
> > > >
> > > >
> > > > -Matthias
> > > >
> > > >
> > > >
> > > > On 7/11/19 12:58 PM, Bruno Cadonna wrote:
> > > > > Hi John,
> > > > >
> > > > > Thank you for the KIP.
> > > > >
> > > > > LGTM
> > > > >
> > > > > I also do not like CumulativeSum/Count so much. I propose to just
> > call
> > > > > it Sum and Count.
> > > > >
> > > > > I understand that you want to unequivocally distinguish the two
> > metric
> > > > > functions by their names, but I have the feeling the names become
> > > > > artificially complex. The exact semantics can also be documented in
> > > > > the javadocs, which btw could also be improved in those classes.
> > > > >
> > > > > Best,
> > > > > Bruno
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Jul 11, 2019 at 8:25 PM Matthias J. Sax <
> > matth...@confluent.io> wrote:
> > > > >>
> > > > >> Thanks for the KIP. Overall LGTM.
> > > > >>
> > > > >> The only though I have is, if we may want to use `TotalSum` and
> > > > >> `TotalCount` instead of `CumulativeSum/Count` as names?
> > > > >>
> > > > >>
> > > > >> -Matthias
> > > > >>
> > > > >>
> > > > >> On 7/11/19 9:31 AM, John Roesler wrote:
> > > > >>> Hi Kafka devs,
> > > > >>>
> > > > >>> I'd like to propose KIP-488 as a minor cleanup of some of our
> > metric
> > > > >>> implementations.
> > > > >>>
> > > > >>> KIP-488: https://cwiki.apache.org/confluence/x/kkAyBw
> > > > >>>
> > > > >>> Over time, iterative updates to these metrics has resulted in a
> > pretty
> > > > >>> confusing little collection of classes, and I've personally been
> > > > >>> involved in three separate moderately time-consuming iterations of
> > me
> > > > >>> or someone else trying to work out which metrics are available, and
> > > > >>> which ones are desired for a given use case. One of these was
> > actually
> > > > >>> a long-running bug in Kafka Streams' metrics, so not only has this
> > > > >>> confusion been a time sink, but it has also led to bugs.
> > > > >>>
> > > > >>> I'm hoping this change won't be too controversial.
> > > > >>>
> > > > >>> Thanks,
> > > > >>> -John
> > > > >>>
> > > > >>
> > > >
> >


Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-16 Thread Ryanne Dolan
John, I mentioned on the VOTE thread that the proposed names are a bit
confusing,

> given that "sum", "total", and "count" are roughly synonymous...

In particular, TotalSum is, I think, a "running total", though the naming
doesn't really capture that.

I think to avoid confusion, we should define exactly what "total" and
"sampled" are supposed to indicate, and perhaps pick appropriate naming
from there.

Ryanne


On Fri, Jul 12, 2019 at 1:42 PM John Roesler  wrote:

> Hey, thanks Matthias and Bruno,
>
> I agree, "Cumulative" is a mouthful. "TotalX" sounds fine to me.
>
> Also, yes, I would have liked to not have any modifier for
> "non-sampled", but there is a name conflict with Sum.
>
> I'll update the KIP to reflect "TotalX" and then start the vote thread.
>
> Thanks again,
> -John
>
> On Fri, Jul 12, 2019 at 11:27 AM Bruno Cadonna  wrote:
> >
> > OK, makes sense. Then, I am in favour of TotalCount and TotalSum.
> >
> > Best,
> > Bruno
> >
> > On Fri, Jul 12, 2019 at 12:57 AM Matthias J. Sax 
> wrote:
> > >
> > > `Sum` is an existing name, for the "sampled sum" metric, that gets
> > > deprecated. Hence, we cannot use it.
> > >
> > > If we cannot use `Sum` and use `TotalSum`, we should also not use
> > > `Count` but `TotalCount` for consistency.
> > >
> > >
> > > -Matthias
> > >
> > >
> > >
> > > On 7/11/19 12:58 PM, Bruno Cadonna wrote:
> > > > Hi John,
> > > >
> > > > Thank you for the KIP.
> > > >
> > > > LGTM
> > > >
> > > > I also do not like CumulativeSum/Count so much. I propose to just
> call
> > > > it Sum and Count.
> > > >
> > > > I understand that you want to unequivocally distinguish the two
> metric
> > > > functions by their names, but I have the feeling the names become
> > > > artificially complex. The exact semantics can also be documented in
> > > > the javadocs, which btw could also be improved in those classes.
> > > >
> > > > Best,
> > > > Bruno
> > > >
> > > >
> > > >
> > > > On Thu, Jul 11, 2019 at 8:25 PM Matthias J. Sax <
> matth...@confluent.io> wrote:
> > > >>
> > > >> Thanks for the KIP. Overall LGTM.
> > > >>
> > > >> The only though I have is, if we may want to use `TotalSum` and
> > > >> `TotalCount` instead of `CumulativeSum/Count` as names?
> > > >>
> > > >>
> > > >> -Matthias
> > > >>
> > > >>
> > > >> On 7/11/19 9:31 AM, John Roesler wrote:
> > > >>> Hi Kafka devs,
> > > >>>
> > > >>> I'd like to propose KIP-488 as a minor cleanup of some of our
> metric
> > > >>> implementations.
> > > >>>
> > > >>> KIP-488: https://cwiki.apache.org/confluence/x/kkAyBw
> > > >>>
> > > >>> Over time, iterative updates to these metrics has resulted in a
> pretty
> > > >>> confusing little collection of classes, and I've personally been
> > > >>> involved in three separate moderately time-consuming iterations of
> me
> > > >>> or someone else trying to work out which metrics are available, and
> > > >>> which ones are desired for a given use case. One of these was
> actually
> > > >>> a long-running bug in Kafka Streams' metrics, so not only has this
> > > >>> confusion been a time sink, but it has also led to bugs.
> > > >>>
> > > >>> I'm hoping this change won't be too controversial.
> > > >>>
> > > >>> Thanks,
> > > >>> -John
> > > >>>
> > > >>
> > >
>


Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-15 Thread John Roesler
Hey Stanislav,

Thanks for the feedback! I do think we should try to keep the scope of
this KIP small. The idea you proposed is orthogonal, though, so if you
wanted to write up a quick KIP concurrently, I'd be happy to review
it.

Regarding deprecation: yes, the metrics are in the public interface,
because they are not inside a package named `internal`. Therefore, we
do need to do deprecations and migrations when considering changes to
them. I wish we didn't, since it could be a cleaner transition without
them, but we shouldn't risk breaking source compatibility.

Thanks,
-John

On Sat, Jul 13, 2019 at 1:30 PM Stanislav Kozlovski
 wrote:
>
> Hello,
>
> Thanks for the KIP, I'm happy we're converging on implementations.
>
> I was wondering whether we need to deprecate the old metrics - are those
> classes public interfaces? I see we don't build a JavaDoc for them (
> https://kafka.apache.org/22/javadoc/allclasses-noframe.html) and, as far as
> I know, that means we don't necessarily need to go with the deprecation
> route.
>
> I think we also have the Measurable and Gauge classes which seem to do the
> same thing. From the names, I intuitively thought that Measurable indicates
> something that might require some amount of processing to get the value of
> and Gauge to be something that is instantly read. Reading their usages,
> though, I see that is not the case. Measurable is inherited by
> MeasurableStat which itself is inherited by most of the classes you
> mentioned in the KIP. Is it worth it converging on those as well (perhaps
> deprecating/removing Measurable and opting for Gauge) or do we prefer to
> keep the scope of this KIP small?
>
> Thanks,
> Stanislav
>
> On Fri, Jul 12, 2019 at 7:42 PM John Roesler  wrote:
>
> > Hey, thanks Matthias and Bruno,
> >
> > I agree, "Cumulative" is a mouthful. "TotalX" sounds fine to me.
> >
> > Also, yes, I would have liked to not have any modifier for
> > "non-sampled", but there is a name conflict with Sum.
> >
> > I'll update the KIP to reflect "TotalX" and then start the vote thread.
> >
> > Thanks again,
> > -John
> >
> > On Fri, Jul 12, 2019 at 11:27 AM Bruno Cadonna  wrote:
> > >
> > > OK, makes sense. Then, I am in favour of TotalCount and TotalSum.
> > >
> > > Best,
> > > Bruno
> > >
> > > On Fri, Jul 12, 2019 at 12:57 AM Matthias J. Sax 
> > wrote:
> > > >
> > > > `Sum` is an existing name, for the "sampled sum" metric, that gets
> > > > deprecated. Hence, we cannot use it.
> > > >
> > > > If we cannot use `Sum` and use `TotalSum`, we should also not use
> > > > `Count` but `TotalCount` for consistency.
> > > >
> > > >
> > > > -Matthias
> > > >
> > > >
> > > >
> > > > On 7/11/19 12:58 PM, Bruno Cadonna wrote:
> > > > > Hi John,
> > > > >
> > > > > Thank you for the KIP.
> > > > >
> > > > > LGTM
> > > > >
> > > > > I also do not like CumulativeSum/Count so much. I propose to just
> > call
> > > > > it Sum and Count.
> > > > >
> > > > > I understand that you want to unequivocally distinguish the two
> > metric
> > > > > functions by their names, but I have the feeling the names become
> > > > > artificially complex. The exact semantics can also be documented in
> > > > > the javadocs, which btw could also be improved in those classes.
> > > > >
> > > > > Best,
> > > > > Bruno
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Jul 11, 2019 at 8:25 PM Matthias J. Sax <
> > matth...@confluent.io> wrote:
> > > > >>
> > > > >> Thanks for the KIP. Overall LGTM.
> > > > >>
> > > > >> The only though I have is, if we may want to use `TotalSum` and
> > > > >> `TotalCount` instead of `CumulativeSum/Count` as names?
> > > > >>
> > > > >>
> > > > >> -Matthias
> > > > >>
> > > > >>
> > > > >> On 7/11/19 9:31 AM, John Roesler wrote:
> > > > >>> Hi Kafka devs,
> > > > >>>
> > > > >>> I'd like to propose KIP-488 as a minor cleanup of some of our
> > metric
> > > > >>> implementations.
> > > > >>>
> > > > >>> KIP-488: https://cwiki.apache.org/confluence/x/kkAyBw
> > > > >>>
> > > > >>> Over time, iterative updates to these metrics has resulted in a
> > pretty
> > > > >>> confusing little collection of classes, and I've personally been
> > > > >>> involved in three separate moderately time-consuming iterations of
> > me
> > > > >>> or someone else trying to work out which metrics are available, and
> > > > >>> which ones are desired for a given use case. One of these was
> > actually
> > > > >>> a long-running bug in Kafka Streams' metrics, so not only has this
> > > > >>> confusion been a time sink, but it has also led to bugs.
> > > > >>>
> > > > >>> I'm hoping this change won't be too controversial.
> > > > >>>
> > > > >>> Thanks,
> > > > >>> -John
> > > > >>>
> > > > >>
> > > >
> >


Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-13 Thread Stanislav Kozlovski
Hello,

Thanks for the KIP, I'm happy we're converging on implementations.

I was wondering whether we need to deprecate the old metrics - are those
classes public interfaces? I see we don't build a JavaDoc for them (
https://kafka.apache.org/22/javadoc/allclasses-noframe.html) and, as far as
I know, that means we don't necessarily need to go with the deprecation
route.

I think we also have the Measurable and Gauge classes which seem to do the
same thing. From the names, I intuitively thought that Measurable indicates
something that might require some amount of processing to get the value of
and Gauge to be something that is instantly read. Reading their usages,
though, I see that is not the case. Measurable is inherited by
MeasurableStat which itself is inherited by most of the classes you
mentioned in the KIP. Is it worth it converging on those as well (perhaps
deprecating/removing Measurable and opting for Gauge) or do we prefer to
keep the scope of this KIP small?

Thanks,
Stanislav

On Fri, Jul 12, 2019 at 7:42 PM John Roesler  wrote:

> Hey, thanks Matthias and Bruno,
>
> I agree, "Cumulative" is a mouthful. "TotalX" sounds fine to me.
>
> Also, yes, I would have liked to not have any modifier for
> "non-sampled", but there is a name conflict with Sum.
>
> I'll update the KIP to reflect "TotalX" and then start the vote thread.
>
> Thanks again,
> -John
>
> On Fri, Jul 12, 2019 at 11:27 AM Bruno Cadonna  wrote:
> >
> > OK, makes sense. Then, I am in favour of TotalCount and TotalSum.
> >
> > Best,
> > Bruno
> >
> > On Fri, Jul 12, 2019 at 12:57 AM Matthias J. Sax 
> wrote:
> > >
> > > `Sum` is an existing name, for the "sampled sum" metric, that gets
> > > deprecated. Hence, we cannot use it.
> > >
> > > If we cannot use `Sum` and use `TotalSum`, we should also not use
> > > `Count` but `TotalCount` for consistency.
> > >
> > >
> > > -Matthias
> > >
> > >
> > >
> > > On 7/11/19 12:58 PM, Bruno Cadonna wrote:
> > > > Hi John,
> > > >
> > > > Thank you for the KIP.
> > > >
> > > > LGTM
> > > >
> > > > I also do not like CumulativeSum/Count so much. I propose to just
> call
> > > > it Sum and Count.
> > > >
> > > > I understand that you want to unequivocally distinguish the two
> metric
> > > > functions by their names, but I have the feeling the names become
> > > > artificially complex. The exact semantics can also be documented in
> > > > the javadocs, which btw could also be improved in those classes.
> > > >
> > > > Best,
> > > > Bruno
> > > >
> > > >
> > > >
> > > > On Thu, Jul 11, 2019 at 8:25 PM Matthias J. Sax <
> matth...@confluent.io> wrote:
> > > >>
> > > >> Thanks for the KIP. Overall LGTM.
> > > >>
> > > >> The only though I have is, if we may want to use `TotalSum` and
> > > >> `TotalCount` instead of `CumulativeSum/Count` as names?
> > > >>
> > > >>
> > > >> -Matthias
> > > >>
> > > >>
> > > >> On 7/11/19 9:31 AM, John Roesler wrote:
> > > >>> Hi Kafka devs,
> > > >>>
> > > >>> I'd like to propose KIP-488 as a minor cleanup of some of our
> metric
> > > >>> implementations.
> > > >>>
> > > >>> KIP-488: https://cwiki.apache.org/confluence/x/kkAyBw
> > > >>>
> > > >>> Over time, iterative updates to these metrics has resulted in a
> pretty
> > > >>> confusing little collection of classes, and I've personally been
> > > >>> involved in three separate moderately time-consuming iterations of
> me
> > > >>> or someone else trying to work out which metrics are available, and
> > > >>> which ones are desired for a given use case. One of these was
> actually
> > > >>> a long-running bug in Kafka Streams' metrics, so not only has this
> > > >>> confusion been a time sink, but it has also led to bugs.
> > > >>>
> > > >>> I'm hoping this change won't be too controversial.
> > > >>>
> > > >>> Thanks,
> > > >>> -John
> > > >>>
> > > >>
> > >
>


Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-12 Thread John Roesler
Hey, thanks Matthias and Bruno,

I agree, "Cumulative" is a mouthful. "TotalX" sounds fine to me.

Also, yes, I would have liked to not have any modifier for
"non-sampled", but there is a name conflict with Sum.

I'll update the KIP to reflect "TotalX" and then start the vote thread.

Thanks again,
-John

On Fri, Jul 12, 2019 at 11:27 AM Bruno Cadonna  wrote:
>
> OK, makes sense. Then, I am in favour of TotalCount and TotalSum.
>
> Best,
> Bruno
>
> On Fri, Jul 12, 2019 at 12:57 AM Matthias J. Sax  
> wrote:
> >
> > `Sum` is an existing name, for the "sampled sum" metric, that gets
> > deprecated. Hence, we cannot use it.
> >
> > If we cannot use `Sum` and use `TotalSum`, we should also not use
> > `Count` but `TotalCount` for consistency.
> >
> >
> > -Matthias
> >
> >
> >
> > On 7/11/19 12:58 PM, Bruno Cadonna wrote:
> > > Hi John,
> > >
> > > Thank you for the KIP.
> > >
> > > LGTM
> > >
> > > I also do not like CumulativeSum/Count so much. I propose to just call
> > > it Sum and Count.
> > >
> > > I understand that you want to unequivocally distinguish the two metric
> > > functions by their names, but I have the feeling the names become
> > > artificially complex. The exact semantics can also be documented in
> > > the javadocs, which btw could also be improved in those classes.
> > >
> > > Best,
> > > Bruno
> > >
> > >
> > >
> > > On Thu, Jul 11, 2019 at 8:25 PM Matthias J. Sax  
> > > wrote:
> > >>
> > >> Thanks for the KIP. Overall LGTM.
> > >>
> > >> The only though I have is, if we may want to use `TotalSum` and
> > >> `TotalCount` instead of `CumulativeSum/Count` as names?
> > >>
> > >>
> > >> -Matthias
> > >>
> > >>
> > >> On 7/11/19 9:31 AM, John Roesler wrote:
> > >>> Hi Kafka devs,
> > >>>
> > >>> I'd like to propose KIP-488 as a minor cleanup of some of our metric
> > >>> implementations.
> > >>>
> > >>> KIP-488: https://cwiki.apache.org/confluence/x/kkAyBw
> > >>>
> > >>> Over time, iterative updates to these metrics has resulted in a pretty
> > >>> confusing little collection of classes, and I've personally been
> > >>> involved in three separate moderately time-consuming iterations of me
> > >>> or someone else trying to work out which metrics are available, and
> > >>> which ones are desired for a given use case. One of these was actually
> > >>> a long-running bug in Kafka Streams' metrics, so not only has this
> > >>> confusion been a time sink, but it has also led to bugs.
> > >>>
> > >>> I'm hoping this change won't be too controversial.
> > >>>
> > >>> Thanks,
> > >>> -John
> > >>>
> > >>
> >


Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-12 Thread Bruno Cadonna
OK, makes sense. Then, I am in favour of TotalCount and TotalSum.

Best,
Bruno

On Fri, Jul 12, 2019 at 12:57 AM Matthias J. Sax  wrote:
>
> `Sum` is an existing name, for the "sampled sum" metric, that gets
> deprecated. Hence, we cannot use it.
>
> If we cannot use `Sum` and use `TotalSum`, we should also not use
> `Count` but `TotalCount` for consistency.
>
>
> -Matthias
>
>
>
> On 7/11/19 12:58 PM, Bruno Cadonna wrote:
> > Hi John,
> >
> > Thank you for the KIP.
> >
> > LGTM
> >
> > I also do not like CumulativeSum/Count so much. I propose to just call
> > it Sum and Count.
> >
> > I understand that you want to unequivocally distinguish the two metric
> > functions by their names, but I have the feeling the names become
> > artificially complex. The exact semantics can also be documented in
> > the javadocs, which btw could also be improved in those classes.
> >
> > Best,
> > Bruno
> >
> >
> >
> > On Thu, Jul 11, 2019 at 8:25 PM Matthias J. Sax  
> > wrote:
> >>
> >> Thanks for the KIP. Overall LGTM.
> >>
> >> The only though I have is, if we may want to use `TotalSum` and
> >> `TotalCount` instead of `CumulativeSum/Count` as names?
> >>
> >>
> >> -Matthias
> >>
> >>
> >> On 7/11/19 9:31 AM, John Roesler wrote:
> >>> Hi Kafka devs,
> >>>
> >>> I'd like to propose KIP-488 as a minor cleanup of some of our metric
> >>> implementations.
> >>>
> >>> KIP-488: https://cwiki.apache.org/confluence/x/kkAyBw
> >>>
> >>> Over time, iterative updates to these metrics has resulted in a pretty
> >>> confusing little collection of classes, and I've personally been
> >>> involved in three separate moderately time-consuming iterations of me
> >>> or someone else trying to work out which metrics are available, and
> >>> which ones are desired for a given use case. One of these was actually
> >>> a long-running bug in Kafka Streams' metrics, so not only has this
> >>> confusion been a time sink, but it has also led to bugs.
> >>>
> >>> I'm hoping this change won't be too controversial.
> >>>
> >>> Thanks,
> >>> -John
> >>>
> >>
>


Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-11 Thread Matthias J. Sax
`Sum` is an existing name, for the "sampled sum" metric, that gets
deprecated. Hence, we cannot use it.

If we cannot use `Sum` and use `TotalSum`, we should also not use
`Count` but `TotalCount` for consistency.


-Matthias



On 7/11/19 12:58 PM, Bruno Cadonna wrote:
> Hi John,
> 
> Thank you for the KIP.
> 
> LGTM
> 
> I also do not like CumulativeSum/Count so much. I propose to just call
> it Sum and Count.
> 
> I understand that you want to unequivocally distinguish the two metric
> functions by their names, but I have the feeling the names become
> artificially complex. The exact semantics can also be documented in
> the javadocs, which btw could also be improved in those classes.
> 
> Best,
> Bruno
> 
> 
> 
> On Thu, Jul 11, 2019 at 8:25 PM Matthias J. Sax  wrote:
>>
>> Thanks for the KIP. Overall LGTM.
>>
>> The only though I have is, if we may want to use `TotalSum` and
>> `TotalCount` instead of `CumulativeSum/Count` as names?
>>
>>
>> -Matthias
>>
>>
>> On 7/11/19 9:31 AM, John Roesler wrote:
>>> Hi Kafka devs,
>>>
>>> I'd like to propose KIP-488 as a minor cleanup of some of our metric
>>> implementations.
>>>
>>> KIP-488: https://cwiki.apache.org/confluence/x/kkAyBw
>>>
>>> Over time, iterative updates to these metrics has resulted in a pretty
>>> confusing little collection of classes, and I've personally been
>>> involved in three separate moderately time-consuming iterations of me
>>> or someone else trying to work out which metrics are available, and
>>> which ones are desired for a given use case. One of these was actually
>>> a long-running bug in Kafka Streams' metrics, so not only has this
>>> confusion been a time sink, but it has also led to bugs.
>>>
>>> I'm hoping this change won't be too controversial.
>>>
>>> Thanks,
>>> -John
>>>
>>



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-11 Thread Bruno Cadonna
Hi John,

Thank you for the KIP.

LGTM

I also do not like CumulativeSum/Count so much. I propose to just call
it Sum and Count.

I understand that you want to unequivocally distinguish the two metric
functions by their names, but I have the feeling the names become
artificially complex. The exact semantics can also be documented in
the javadocs, which btw could also be improved in those classes.

Best,
Bruno



On Thu, Jul 11, 2019 at 8:25 PM Matthias J. Sax  wrote:
>
> Thanks for the KIP. Overall LGTM.
>
> The only though I have is, if we may want to use `TotalSum` and
> `TotalCount` instead of `CumulativeSum/Count` as names?
>
>
> -Matthias
>
>
> On 7/11/19 9:31 AM, John Roesler wrote:
> > Hi Kafka devs,
> >
> > I'd like to propose KIP-488 as a minor cleanup of some of our metric
> > implementations.
> >
> > KIP-488: https://cwiki.apache.org/confluence/x/kkAyBw
> >
> > Over time, iterative updates to these metrics has resulted in a pretty
> > confusing little collection of classes, and I've personally been
> > involved in three separate moderately time-consuming iterations of me
> > or someone else trying to work out which metrics are available, and
> > which ones are desired for a given use case. One of these was actually
> > a long-running bug in Kafka Streams' metrics, so not only has this
> > confusion been a time sink, but it has also led to bugs.
> >
> > I'm hoping this change won't be too controversial.
> >
> > Thanks,
> > -John
> >
>


Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-11 Thread Matthias J. Sax
Thanks for the KIP. Overall LGTM.

The only though I have is, if we may want to use `TotalSum` and
`TotalCount` instead of `CumulativeSum/Count` as names?


-Matthias


On 7/11/19 9:31 AM, John Roesler wrote:
> Hi Kafka devs,
> 
> I'd like to propose KIP-488 as a minor cleanup of some of our metric
> implementations.
> 
> KIP-488: https://cwiki.apache.org/confluence/x/kkAyBw
> 
> Over time, iterative updates to these metrics has resulted in a pretty
> confusing little collection of classes, and I've personally been
> involved in three separate moderately time-consuming iterations of me
> or someone else trying to work out which metrics are available, and
> which ones are desired for a given use case. One of these was actually
> a long-running bug in Kafka Streams' metrics, so not only has this
> confusion been a time sink, but it has also led to bugs.
> 
> I'm hoping this change won't be too controversial.
> 
> Thanks,
> -John
> 



signature.asc
Description: OpenPGP digital signature


[DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-11 Thread John Roesler
Hi Kafka devs,

I'd like to propose KIP-488 as a minor cleanup of some of our metric
implementations.

KIP-488: https://cwiki.apache.org/confluence/x/kkAyBw

Over time, iterative updates to these metrics has resulted in a pretty
confusing little collection of classes, and I've personally been
involved in three separate moderately time-consuming iterations of me
or someone else trying to work out which metrics are available, and
which ones are desired for a given use case. One of these was actually
a long-running bug in Kafka Streams' metrics, so not only has this
confusion been a time sink, but it has also led to bugs.

I'm hoping this change won't be too controversial.

Thanks,
-John