I just looked at the source for QDigest from streamlib. I think that the memory usage could be trimmed substantially, possibly by as much as 5:1 by using more primitive friendly structures.
On Wed, Aug 7, 2013 at 3:04 PM, Otis Gospodnetic <[email protected] > wrote: > Hi Ted, > > I need percentiles. Ideally not pre-defined ones, because one person may > want e.g. 70th pctile, while somebody else might want 75th pctile for the > same metric. > > Deal breakers: > High memory footprint. ("high" means "higher than QDigest from stream-lib" > for us.... and we could test and compare with QDigest relatively easily > with live data) > Algos that create data structures that cannot be merged > Loss of accuracy that is not predictably small or configurable > > Thank you, > Otis > ---- > > Performance Monitoring for Solr / ElasticSearch / Hadoop / HBase - > http://sematext.com/spm > > > > > >________________________________ > > From: Ted Dunning <[email protected]> > >To: "[email protected]" <[email protected]>; Otis Gospodnetic < > [email protected]> > >Sent: Wednesday, August 7, 2013 11:48 PM > >Subject: Re: Is OnlineSummarizer mergeable? > > > > > > > >Otis, > > > > > >What statistics do you need? > > > > > >What guarantees? > > > > > > > > > > > >On Wed, Aug 7, 2013 at 1:26 PM, Otis Gospodnetic < > [email protected]> wrote: > > > >Hi Ted, > >> > >>I'm actually trying to find an alternative to QDigest (the stream-lib > impl specifically) because even though it seems good, we have to deal with > crazy volumes of data in SPM (performance monitoring service, see > signature)... I'm hoping we can find something that has both a lower memory > footprint than QDigest AND that is mergeable a la QDigest. Utopia? > >> > >>Thanks, > >>Otis > >>---- > >>Performance Monitoring for Solr / ElasticSearch / Hadoop / HBase - > http://sematext.com/spm > >> > >> > >> > >> > >>>________________________________ > >>> From: Ted Dunning <[email protected]> > >>>To: "[email protected]" <[email protected]> > >>>Sent: Wednesday, August 7, 2013 4:51 PM > >>>Subject: Re: Is OnlineSummarizer mergeable? > >>> > >>> > >>>It isn't as mergeable as I would like. If you have randomized record > >>>selection, it should be possible, but perverse ordering can cause > serious > >>>errors. > >>> > >>>It would be better to use something like a Q-digest. > >>> > >>>http://www.cs.virginia.edu/~son/cs851/papers/ucsb.sensys04.pdf > >>> > >>> > >>> > >>> > >>>On Wed, Aug 7, 2013 at 4:21 AM, Otis Gospodnetic < > [email protected] > >>>> wrote: > >>> > >>>> Hi, > >>>> > >>>> Is OnlineSummarizer algo "mergeable"? > >>>> > >>>> Say that we compute a percentile for some metric for time 12:00-12:01 > >>>> and store that somewhere, then we compute it for 1201-12:02 and store > >>>> that separately, and so on. > >>>> > >>>> Can we then later merge these computed and previously stored > >>>> percentile "instances" and get an accurate value? > >>>> > >>>> Thanks, > >>>> Otis > >>>> -- > >>>> Performance Monitoring -- http://sematext.com/spm > >>>> Solr & ElasticSearch Support -- http://sematext.com/ > >>>> > >>> > >>> > >>> > > > > > >
