I was about to point you at that pull request. How droll. Didn't know it was from you guys.
On Thu, Aug 8, 2013 at 3:35 PM, Otis Gospodnetic <[email protected] > wrote: > Hi Ted, > > Yes, that's what we did recently, too: > https://github.com/clearspring/stream-lib/pull/47 > > ... but it's still a little too phat...which is what made me think of your > OnlineSummarizer as a possible, slimmer alternative. > > Otis > ---- > Performance Monitoring for Solr / ElasticSearch / Hadoop / HBase - > http://sematext.com/spm > > > > > >________________________________ > > From: Ted Dunning <[email protected]> > >To: "[email protected]" <[email protected]>; Otis Gospodnetic < > [email protected]> > >Sent: Thursday, August 8, 2013 8:27 AM > >Subject: Re: Is OnlineSummarizer mergeable? > > > > > > > >I just looked at the source for QDigest from streamlib. > > > > > >I think that the memory usage could be trimmed substantially, possibly by > as much as 5:1 by using more primitive friendly structures. > > > > > > > > > > > >On Wed, Aug 7, 2013 at 3:04 PM, Otis Gospodnetic < > [email protected]> wrote: > > > >Hi Ted, > >> > >>I need percentiles. Ideally not pre-defined ones, because one person > may want e.g. 70th pctile, while somebody else might want 75th pctile for > the same metric. > >> > >>Deal breakers: > >>High memory footprint. ("high" means "higher than QDigest from > stream-lib" for us.... and we could test and compare with QDigest > relatively easily with live data) > >>Algos that create data structures that cannot be merged > >>Loss of accuracy that is not predictably small or configurable > >> > >>Thank you, > >>Otis > >>---- > >> > >>Performance Monitoring for Solr / ElasticSearch / Hadoop / HBase - > http://sematext.com/spm > >> > >> > >> > >> > >>>________________________________ > >>> From: Ted Dunning <[email protected]> > >>>To: "[email protected]" <[email protected]>; Otis > Gospodnetic <[email protected]> > >>>Sent: Wednesday, August 7, 2013 11:48 PM > >>>Subject: Re: Is OnlineSummarizer mergeable? > >>> > >>> > >>> > >>>Otis, > >>> > >>> > >>>What statistics do you need? > >>> > >>> > >>>What guarantees? > >>> > >>> > >>> > >>> > >>> > >>>On Wed, Aug 7, 2013 at 1:26 PM, Otis Gospodnetic < > [email protected]> wrote: > >>> > >>>Hi Ted, > >>>> > >>>>I'm actually trying to find an alternative to QDigest (the stream-lib > impl specifically) because even though it seems good, we have to deal with > crazy volumes of data in SPM (performance monitoring service, see > signature)... I'm hoping we can find something that has both a lower memory > footprint than QDigest AND that is mergeable a la QDigest. Utopia? > >>>> > >>>>Thanks, > >>>>Otis > >>>>---- > >>>>Performance Monitoring for Solr / ElasticSearch / Hadoop / HBase - > http://sematext.com/spm > >>>> > >>>> > >>>> > >>>> > >>>>>________________________________ > >>>>> From: Ted Dunning <[email protected]> > >>>>>To: "[email protected]" <[email protected]> > >>>>>Sent: Wednesday, August 7, 2013 4:51 PM > >>>>>Subject: Re: Is OnlineSummarizer mergeable? > >>>>> > >>>>> > >>>>>It isn't as mergeable as I would like. If you have randomized record > >>>>>selection, it should be possible, but perverse ordering can cause > serious > >>>>>errors. > >>>>> > >>>>>It would be better to use something like a Q-digest. > >>>>> > >>>>>http://www.cs.virginia.edu/~son/cs851/papers/ucsb.sensys04.pdf > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>On Wed, Aug 7, 2013 at 4:21 AM, Otis Gospodnetic < > [email protected] > >>>>>> wrote: > >>>>> > >>>>>> Hi, > >>>>>> > >>>>>> Is OnlineSummarizer algo "mergeable"? > >>>>>> > >>>>>> Say that we compute a percentile for some metric for time > 12:00-12:01 > >>>>>> and store that somewhere, then we compute it for 1201-12:02 and > store > >>>>>> that separately, and so on. > >>>>>> > >>>>>> Can we then later merge these computed and previously stored > >>>>>> percentile "instances" and get an accurate value? > >>>>>> > >>>>>> Thanks, > >>>>>> Otis > >>>>>> -- > >>>>>> Performance Monitoring -- http://sematext.com/spm > >>>>>> Solr & ElasticSearch Support -- http://sematext.com/ > >>>>>> > >>>>> > >>>>> > >>>>> > >>> > >>> > >>> > > > > > >
