Hi Jon,
The first sketch is the one where I see the jump. The exact count without
the first sketch is 24765.

The result for lgK=12 without the first sketch is 11% off, lgK=5 is within
2%.

Thanks,
Marko

On Fri, 14 Aug 2020 at 00:24, Jon Malkin <jon.mal...@gmail.com> wrote:

> Hi Marko,
>
> Could you please let us know two more things:
> 1) Which is the one particular sketch that causes the estimate to jump?
> 2) What is the exact unique count of the others without that sketch?
>
> It sort of seems like the first sketch, but it's hard to know for sure
> since we don't know the true leave-one-out exact counts.
>
> Thanks,
>   jon
>
> On Thu, Aug 13, 2020 at 8:41 AM Marko Mušnjak <marko.musn...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Could someone help me understand a behavior I see when trying to union
>> some HLL sketches?
>>
>> I have 14 HLL sketches, and I know the exact unique counts for each of
>> them. All the individual sketches give estimates within 2% of the exact
>> counts.
>>
>> When I try to create a union, using the default lgMaxK parameter results
>> in total estimate that is way off (25% larger then exact count).
>>
>> However, reducing the lgMaxK parameter in the union to 4 or 5 gives
>> results that are within 2.5% of the exact counts.
>>
>> Also, one particular sketch seems to cause the final estimate to jump -
>> not adding that sketch to the union keeps the result close to the exact
>> count.
>>
>> Am I just seeing a very bad random error, or is there anything I'm doing
>> wrong with the unions?
>>
>> Running on Java, using version 1.3.0. Just in case, the sketches are in
>> the linked gist (hex encoded, one per line):
>> https://gist.github.com/mmusnjak/c00a72b3dfbc52e780c2980acfd98351
>> and the exact counts:
>> https://gist.github.com/mmusnjak/dcbff67101be6cfc28ba01e63e41f73c
>>
>> Thank you!
>> Marko Musnjak
>>
>>

Reply via email to