Hi Vijay,
Please ignore parts of my previous email. The solution is a bit more
complicated.
Of the three metrics only the Adspend is truly additive. Summing the
category fields makes no sense. This means you have to design the
implementation of the SummarySetOperations class so that it makes
Vijay,
Sorry about the delay in getting back to you.
There is some critical information missing from your description and that
is the domain of what you are sketching.
I presume that it is User-IDs, otherwise it doesn't make sense.
If this is the case I think the solution can be achieved in a
I'd have to think about it more. But the FDT sketch was put in the library
as an example. With tuple sketches you would have to write the code that
encapsulates the tuple summary cells to do what you want and then extend
the summary aggregator to do the proper merge operations. So in a sense
Not directly. But the FDT sketch is really pretty simple to code yourself,
and is in the library as primarily an example.
Nonetheless, one of the reasons that only a few of our sketches have been
adapted for Druid is that Druid requires that all sketches be capable of
operating off-heap.
Which is
sample of the raw records(random uniform sampling) and
>> then extrapolate the query results on the sample but multiplying it with 20
>>
>> I would like to note that the above 2 queries are only the initial set of
>> queries that I found interesting and probably there would be
Hi Karik,
The problem you describe is typical for on-line advertising and similar to
ones we have worked on before. Solving this problem with sketches will
provide approximate results in near-real time. However, doing so even with
sketches may require considerably more resources than you may be
Hi Karl,
I just want to explain the reasons you cannot create an UpdateSketch
directly from a CompactSketch:
The CompactSketch is by definition immutable and has the smallest footprint
and simplest structure. It is produced as the result of all of the set
operations because the set operations
Folks,
Now that we have been approved for graduation by the ASF Board, the URLs to
some of our assets will be changing as we transition to a Top-Level Project
(TLP).
For example:
- GitHub Repositories, for example:
https://github.com/apache/incubator-datasketches-java will become
izon-media/>
> <http://www.instagram.com/verizonmedia>
>
>
>
> On Thu, Nov 19, 2020 at 9:57 AM leerho wrote:
>
>> Hi Justin, the site you referenced returns an error 500 (internal server
>> error). It might be down, or out-of-service. You might also check
;>> implemented in the library is implicitly performing a type of downsampling
>>> internally and then summarizing the sample (this is a little bit of a
>>> simplification).
>>>
>>> Something similar is true for frequent items. However, it is not true
>>> for "
Sorry, if you presample your data all bets are off in terms of accuracy.
On Wed, Nov 18, 2020 at 10:55 AM Sergio Castro wrote:
> Hi, I am new to DataSketches.
>
> I know Datasketches provides an *approximate* calculation of statistics
> with *mathematically proven error bounds*.
>
> My
I have placed a [DISCUSS] thread on our d...@datasketches.apache.org list if
you wish to suggest some ideas! :)
On Fri, Aug 14, 2020 at 4:06 PM leerho wrote:
> The other option would be to deprecate the Hive SketchState update(...)
> method and create a "newUpdate(...) method that
The other option would be to deprecate the Hive SketchState update(...)
method and create a "newUpdate(...) method that has strings encode with
UTF-8. And also document the reason why. Any other ideas?
On Fri, Aug 14, 2020 at 4:03 PM leerho wrote:
> Yep! It turns out that there is
the Kafka Streams app to char[] will be a good first step.
>
> I'll give that a try and report back.
>
> Thanks everyone for your help in finding the source of this!
>
> Kind regards,
> Marko
>
> On Fri, 14 Aug 2020 at 20:58, leerho wrote:
>
>> Hi Marko,
>&
e care of local times, etc..., these should be the correct
>> values with excluded days:
>> Without first day: 24890
>> Without first and second day: 22989
>>
>> Thanks,
>> Marko
>>
>>
>> On Fri, 14 Aug 2020 at 17:08, leerho wrote:
>>
>>&g
Hi Marko,
I notice that the first two sketches are the result of union operations,
while the remaining sketches are pure streaming sketches.
Could you perform Jon's request again except excluding the first two
sketches?
Just to cover the bases, could you explain the types of the data items that
Marko,
We are working to understand this problem. Thank you for sending us the
actual sketches, That helps us a great deal!
Cheers,
Lee.
On Thu, Aug 13, 2020 at 3:24 PM Jon Malkin wrote:
> Hi Marko,
>
> Could you please let us know two more things:
> 1) Which is the one particular sketch
Csaba,
These are some very thoughtful suggestions and I can see that some
recommendations in this area would be useful.
Our focus in our DataSketches team is really on the sketching algorithms
and designing the core sketches to be very high performing, robust,
accurate, and easy to integrate
v <
> sayda...@verizonmedia.com> wrote:
>
>> Adding the original poster just in case he is not subscribed to the list
>>
>> On Mon, Jun 22, 2020 at 7:18 PM leerho wrote:
>>
>>> I see a typo: What I called the Omega relation is actually Omicron (big
>>&
read this over at some point and double-check both of
> our work :-)
>
> On Mon, Jun 22, 2020 at 9:14 PM leerho wrote:
>
>> Hello Gourav, welcome to this forum!
>>
>> I want to make sure you have access to and have read the code
>> documentation for the K
Hello Gourav, welcome to this forum!
I want to make sure you have access to and have read the code documentation
for the KLL sketch in addition to the papers. Although the code
documentation exists for both Java and C++, it is a little easier to access
the Javadocs as they are accessible from
Hi David,
Thank you for reaching out to us. We are always interested in learning
about new users and new uses of the library, especially with Tuple
sketches, which we do not hear much feedback about. Let me try to address
some of your questions:
The Tuple Sketch is an "extension" of the Theta
There is something wrong with that link. Meanwhile I have added your email
& name on your behalf for the #datasketches channel on the-asf.slack.com
workspace.
Lee.
On Wed, May 20, 2020 at 2:50 AM David Cromberge <
david.crombe...@permutive.com> wrote:
> Hello,
>
> I would like to join the
Hi Gabor,
My quick question would be that taking into account that the order of the
> items provided to datasketches:hll_sketch is not deterministic is it normal
> behaviour that for the same dataset I get a different estimate each time I
> run my query?
> I'm trying to figure out if this is due
nt.
>
> Ron
>
> On Apr 24, 2020, at 3:12 PM, leerho wrote:
>
> Hi Ron,
>
> Our mission is to develop a robust sketch library *product* that can be
> used in production systems in many different environments and be high
> performing and binary compatible across langu
25 matches
Mail list logo