Wow, I just realized I misstated things rather significantly.

We very much have tuple sketches in C++, but the python wrapper for them is
a work in progress. I thought I had it ready, but it turns out there are
some pretty significant limitations with the wrapper we're using (pybind11)
that I now need to design around. I'm now trying to determine the least-bad
way to do that.

  jon

On Tue, Jan 17, 2023 at 10:38 AM Alexander Saydakov via users <
users@datasketches.apache.org> wrote:

> Yes, Druid does this on top of the specialized Tuple sketch called
> ArrayOfDoublesSketch (in Java).
> Each key in the sketch has an array of floating-point values associated
> with it.
> PostAggregator functions can convert these columns into means and
> variances using org.apache.commons.math3.stat.descriptive.SummaryStatistics.
> Here is the code for variance:
>
> https://github.com/apache/druid/blob/master/extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/tuple/ArrayOfDoublesSketchToVariancesPostAggregator.java
>
>
>
>
> On Mon, Jan 16, 2023 at 5:54 AM Tomer B <tomer...@gmail.com> wrote:
>
>> Thanks yeah ! (tuple sketch and not theta as you said!).
>> I have another question please I looked at the tuple sketch I looked at:
>> https://datasketches.apache.org/api/java/snapshot/apidocs/org/apache/datasketches/tuple/aninteger/IntegerSummary.Mode.html
>> <https://urldefense.com/v3/__https://datasketches.apache.org/api/java/snapshot/apidocs/org/apache/datasketches/tuple/aninteger/IntegerSummary.Mode.html__;!!Op6eflyXZCqGR5I!H89uu0Se4Jc-CW8BoOGfWwb86tOutxtY99QICcTS6w2ouS48kYdzn0NQTlxcJzRwTOAsQ9vUgGooQ1kunA$>
>> and I see possible values of mode are: Sum, Min, Max, AlwaysOne so I don't
>> see there is 'Variance'.  So is tuple sketch not supporting variance out of
>> the box?  I looked at druid and I see it does support variance sketch
>> https://druid.apache.org/docs/latest/development/extensions-core/datasketches-tuple.html#variance-values-for-each-column
>> <https://urldefense.com/v3/__https://druid.apache.org/docs/latest/development/extensions-core/datasketches-tuple.html*variance-values-for-each-column__;Iw!!Op6eflyXZCqGR5I!H89uu0Se4Jc-CW8BoOGfWwb86tOutxtY99QICcTS6w2ouS48kYdzn0NQTlxcJzRwTOAsQ9vUgGqD0Jet1g$>
>> does this means the following: Tuple sketches do not support variance out
>> of the box, but as druid supports it on top of the tuple sketches it's
>> probably going to be possible for me to add similar implementation on top
>> of DataSketches TupleSketches ?
>>
>> Thanks!
>>
>>
>> On Sun, Jan 1, 2023 at 2:03 AM Jon Malkin <jon.mal...@gmail.com> wrote:
>>
>>> I believe you're looking at the tuple sketch code in java, not theta
>>> sketch. We don't yet have tuple support in C++ (on which python is based).
>>> It's planned, but I haven't yet had time to sit down and figure out how to
>>> do it -- and specifically how to do so with a reasonable API.
>>>
>>>   jon
>>>
>>
>>
>> --
>>
>>

Reply via email to