Wow, I just realized I misstated things rather significantly. We very much have tuple sketches in C++, but the python wrapper for them is a work in progress. I thought I had it ready, but it turns out there are some pretty significant limitations with the wrapper we're using (pybind11) that I now need to design around. I'm now trying to determine the least-bad way to do that.
jon On Tue, Jan 17, 2023 at 10:38 AM Alexander Saydakov via users < users@datasketches.apache.org> wrote: > Yes, Druid does this on top of the specialized Tuple sketch called > ArrayOfDoublesSketch (in Java). > Each key in the sketch has an array of floating-point values associated > with it. > PostAggregator functions can convert these columns into means and > variances using org.apache.commons.math3.stat.descriptive.SummaryStatistics. > Here is the code for variance: > > https://github.com/apache/druid/blob/master/extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/tuple/ArrayOfDoublesSketchToVariancesPostAggregator.java > > > > > On Mon, Jan 16, 2023 at 5:54 AM Tomer B <tomer...@gmail.com> wrote: > >> Thanks yeah ! (tuple sketch and not theta as you said!). >> I have another question please I looked at the tuple sketch I looked at: >> https://datasketches.apache.org/api/java/snapshot/apidocs/org/apache/datasketches/tuple/aninteger/IntegerSummary.Mode.html >> <https://urldefense.com/v3/__https://datasketches.apache.org/api/java/snapshot/apidocs/org/apache/datasketches/tuple/aninteger/IntegerSummary.Mode.html__;!!Op6eflyXZCqGR5I!H89uu0Se4Jc-CW8BoOGfWwb86tOutxtY99QICcTS6w2ouS48kYdzn0NQTlxcJzRwTOAsQ9vUgGooQ1kunA$> >> and I see possible values of mode are: Sum, Min, Max, AlwaysOne so I don't >> see there is 'Variance'. So is tuple sketch not supporting variance out of >> the box? I looked at druid and I see it does support variance sketch >> https://druid.apache.org/docs/latest/development/extensions-core/datasketches-tuple.html#variance-values-for-each-column >> <https://urldefense.com/v3/__https://druid.apache.org/docs/latest/development/extensions-core/datasketches-tuple.html*variance-values-for-each-column__;Iw!!Op6eflyXZCqGR5I!H89uu0Se4Jc-CW8BoOGfWwb86tOutxtY99QICcTS6w2ouS48kYdzn0NQTlxcJzRwTOAsQ9vUgGqD0Jet1g$> >> does this means the following: Tuple sketches do not support variance out >> of the box, but as druid supports it on top of the tuple sketches it's >> probably going to be possible for me to add similar implementation on top >> of DataSketches TupleSketches ? >> >> Thanks! >> >> >> On Sun, Jan 1, 2023 at 2:03 AM Jon Malkin <jon.mal...@gmail.com> wrote: >> >>> I believe you're looking at the tuple sketch code in java, not theta >>> sketch. We don't yet have tuple support in C++ (on which python is based). >>> It's planned, but I haven't yet had time to sit down and figure out how to >>> do it -- and specifically how to do so with a reasonable API. >>> >>> jon >>> >> >> >> -- >> >>