Hi Jon, The first sketch is the one where I see the jump. The exact count without the first sketch is 24765.
The result for lgK=12 without the first sketch is 11% off, lgK=5 is within 2%. Thanks, Marko On Fri, 14 Aug 2020 at 00:24, Jon Malkin <jon.mal...@gmail.com> wrote: > Hi Marko, > > Could you please let us know two more things: > 1) Which is the one particular sketch that causes the estimate to jump? > 2) What is the exact unique count of the others without that sketch? > > It sort of seems like the first sketch, but it's hard to know for sure > since we don't know the true leave-one-out exact counts. > > Thanks, > jon > > On Thu, Aug 13, 2020 at 8:41 AM Marko Mušnjak <marko.musn...@gmail.com> > wrote: > >> Hi, >> >> Could someone help me understand a behavior I see when trying to union >> some HLL sketches? >> >> I have 14 HLL sketches, and I know the exact unique counts for each of >> them. All the individual sketches give estimates within 2% of the exact >> counts. >> >> When I try to create a union, using the default lgMaxK parameter results >> in total estimate that is way off (25% larger then exact count). >> >> However, reducing the lgMaxK parameter in the union to 4 or 5 gives >> results that are within 2.5% of the exact counts. >> >> Also, one particular sketch seems to cause the final estimate to jump - >> not adding that sketch to the union keeps the result close to the exact >> count. >> >> Am I just seeing a very bad random error, or is there anything I'm doing >> wrong with the unions? >> >> Running on Java, using version 1.3.0. Just in case, the sketches are in >> the linked gist (hex encoded, one per line): >> https://gist.github.com/mmusnjak/c00a72b3dfbc52e780c2980acfd98351 >> and the exact counts: >> https://gist.github.com/mmusnjak/dcbff67101be6cfc28ba01e63e41f73c >> >> Thank you! >> Marko Musnjak >> >>