Hi Ted,
I need percentiles. Ideally not pre-defined ones, because one person may want
e.g. 70th pctile, while somebody else might want 75th pctile for the same
metric.
Deal breakers:
High memory footprint. ("high" means "higher than QDigest from stream-lib" for
us.... and we could test and compare with QDigest relatively easily with live
data)
Algos that create data structures that cannot be merged
Loss of accuracy that is not predictably small or configurable
Thank you,
Otis
----
Performance Monitoring for Solr / ElasticSearch / Hadoop / HBase -
http://sematext.com/spm
>________________________________
> From: Ted Dunning <[email protected]>
>To: "[email protected]" <[email protected]>; Otis Gospodnetic
><[email protected]>
>Sent: Wednesday, August 7, 2013 11:48 PM
>Subject: Re: Is OnlineSummarizer mergeable?
>
>
>
>Otis,
>
>
>What statistics do you need?
>
>
>What guarantees?
>
>
>
>
>
>On Wed, Aug 7, 2013 at 1:26 PM, Otis Gospodnetic <[email protected]>
>wrote:
>
>Hi Ted,
>>
>>I'm actually trying to find an alternative to QDigest (the stream-lib impl
>>specifically) because even though it seems good, we have to deal with crazy
>>volumes of data in SPM (performance monitoring service, see signature)... I'm
>>hoping we can find something that has both a lower memory footprint than
>>QDigest AND that is mergeable a la QDigest. Utopia?
>>
>>Thanks,
>>Otis
>>----
>>Performance Monitoring for Solr / ElasticSearch / Hadoop / HBase -
>>http://sematext.com/spm
>>
>>
>>
>>
>>>________________________________
>>> From: Ted Dunning <[email protected]>
>>>To: "[email protected]" <[email protected]>
>>>Sent: Wednesday, August 7, 2013 4:51 PM
>>>Subject: Re: Is OnlineSummarizer mergeable?
>>>
>>>
>>>It isn't as mergeable as I would like. If you have randomized record
>>>selection, it should be possible, but perverse ordering can cause serious
>>>errors.
>>>
>>>It would be better to use something like a Q-digest.
>>>
>>>http://www.cs.virginia.edu/~son/cs851/papers/ucsb.sensys04.pdf
>>>
>>>
>>>
>>>
>>>On Wed, Aug 7, 2013 at 4:21 AM, Otis Gospodnetic <[email protected]
>>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Is OnlineSummarizer algo "mergeable"?
>>>>
>>>> Say that we compute a percentile for some metric for time 12:00-12:01
>>>> and store that somewhere, then we compute it for 1201-12:02 and store
>>>> that separately, and so on.
>>>>
>>>> Can we then later merge these computed and previously stored
>>>> percentile "instances" and get an accurate value?
>>>>
>>>> Thanks,
>>>> Otis
>>>> --
>>>> Performance Monitoring -- http://sematext.com/spm
>>>> Solr & ElasticSearch Support -- http://sematext.com/
>>>>
>>>
>>>
>>>
>
>
>