Re: ItemsSketch Aggregator in druid-datasketches extension

2021-09-02 Thread leerho
We are busy at the moment trying to get new releases of datasketches-memory followed by a new release of datasketches-java out the door. This pair of releases will give us compatibility with JDK 8 through JDK 13. After that, we will work with the Druid team to update the datasketches adaptor

Re: ItemsSketch Aggregator in druid-datasketches extension

2021-09-02 Thread David Glasser
On Thu, Aug 12, 2021 at 8:46 PM leerho wrote: > The non-generic container sketches include the Theta sketch, the double > valued Tuple sketch, the double valued Quantiles sketch, the float valued > KLL and REQ sketches and the long valued FrequentItemsSketch. Of these, > the Theta family and our

Re: ItemsSketch Aggregator in druid-datasketches extension

2021-08-12 Thread leerho
a VectorAggregator implementation, then the whole > > query will run non-vectorized, which will slow it down. > > > > The main reason that BufferAggregators exist is to support off-heap > > aggregation, which minimizes GC pressure and prevents out-of-memory > errors >

Re: ItemsSketch Aggregator in druid-datasketches extension

2021-07-23 Thread Michael Schiff
eason that BufferAggregators exist is to support off-heap > aggregation, which minimizes GC pressure and prevents out-of-memory errors > during the aggregation process. > > > Assuming it is, we can begin talking with the datasketches team about the > possibility of a Direct implem

Re: ItemsSketch Aggregator in druid-datasketches extension

2021-07-23 Thread Gian Merlino
aggregation process. > > > Assuming it is, we can begin talking with the datasketches team about > the possibility of a Direct implementation. > > With ItemsSketch, the biggest roadblock you're likely to run into is the > fact that the items may be of variable size. Currently in Druid eac

Re: ItemsSketch Aggregator in druid-datasketches extension

2021-07-23 Thread Gian Merlino
during the aggregation process. > Assuming it is, we can begin talking with the datasketches team about the possibility of a Direct implementation. With ItemsSketch, the biggest roadblock you're likely to run into is the fact that the items may be of variable size. Currently in Druid e

ItemsSketch Aggregator in druid-datasketches extension

2021-07-23 Thread Michael Schiff
I am looking into implementing a new Aggregator in the datasketches extension using the ItemSketch in the frequencies package: https://datasketches.apache.org/docs/Frequency/FrequentItemsOverview.html https://github.com/apache/datasketches-java/tree/master/src/main/java/org/apache/datasketches

[ANNOUNCE] Apache DataSketches Java 1.3.0-incubating released!

2020-05-07 Thread leerho
### Bug Fix for Druid Issue #9736 ### Hello All, 1. The Apache DataSketches Java 1.3.0-incubating has been released! NOTE 1: This is the core Java component of the DataSketches library that includes all the sketch algorithms in production-ready packages. These sketches can be called

Re: Discussion: Moving DataSketches to core

2019-11-14 Thread Chi Cao Minh
I tested moving datasketches to core and it doesn’t look like it brings additional dependencies: > [INFO] < org.apache.druid:druid-core > >- > [INFO] Building druid-core 0.17.0-incubating-SNAPSHOT > [INFO] ---

Re: Discussion: Moving DataSketches to core

2019-10-31 Thread Charles Allen
Any time we discuss moving things into core Druid I would love to see a list of dependencies that comes with it. On Wed, Oct 30, 2019, 6:08 PM Jihoon Son wrote: > +1 on moving too. > > On Mon, Oct 28, 2019 at 12:46 PM Fangjin Yang wrote: > > > +1 on moving datasketches to c

Re: Discussion: Moving DataSketches to core

2019-10-28 Thread Fangjin Yang
+1 on moving datasketches to core On Mon, Oct 28, 2019 at 12:36 PM Chi Cao Minh wrote: > To support range partitioning for native parallel batch indexing, I’m > considering moving DataSketches from extensions to core (see > https://github.com/apache/incubator-druid/issues/8769

Discussion: Moving DataSketches to core

2019-10-28 Thread Chi Cao Minh
To support range partitioning for native parallel batch indexing, I’m considering moving DataSketches from extensions to core (see https://github.com/apache/incubator-druid/issues/8769 <https://github.com/apache/incubator-druid/issues/8769> for details). Having DataSketches in core woul

Re: Datasketches

2019-02-25 Thread Roman Leventov
There is also an important sub-project in DataSketches - Memory (currently https://github.com/DataSketches/memory) that originated from this issue: https://github.com/apache/incubator-druid/issues/3892 and there is a plan to eventually move Druid from ByteBuffer to Memory, at least in some parts

Re: Datasketches

2019-02-25 Thread Charles Allen
Basically there are a LOT of issues and PRs that show up when searching for datasketches in the druid PR list: https://github.com/apache/incubator-druid/pulls?utf8=%E2%9C%93=datasketches Maybe just have a label called Area - Sketches ? On Mon, Feb 25, 2019 at 11:01 AM Gian Merlino wrote

Re: Datasketches

2019-02-25 Thread Gian Merlino
What scope would you suggest for the label or github project? There seem to be discussions going on around making DataSketches HLL and/or Quantiles more 'default' options for their respective areas -- are you thinking that kind of thing? On Mon, Feb 25, 2019 at 9:57 AM Charles Allen wrote

Re: Datasketches

2019-02-25 Thread Julian Hyde
I don’t know how a project can formally track another project, but individuals certainly can. If any Druid committers are ASF members then they could volunteer to help as mentors of the Data Sketches podling. If any Druid committers are past or current contributors to the DataSketches

Datasketches

2019-02-25 Thread Charles Allen
There are a lot of here and there discussions on how to handle sketching / hll / histograms / other-stats, and it is getting kind of hard to keep track of them all. In addition, looks like Datasketches is in an incubating proposal stage for Apache http://mail-archives.apache.org/mod_mbox

Re: synchronization question about datasketches aggregator

2018-07-23 Thread Anastasia Braginsky
Will Lauer wrote: > A colleague recently pointed out to me that all the sketch operations that > take place in SketchAggregator (in the datasketches module) use a > SychronizedUnion class that basically wraps a normal sketch Union and > synchronizes all operations. From what I can tel

Re: synchronization question about datasketches aggregator

2018-07-19 Thread Gian Merlino
tch operations that > take place in SketchAggregator (in the datasketches module) use a > SychronizedUnion class that basically wraps a normal sketch Union and > synchronizes all operations. From what I can tell with other aggregators in > the Druid code base, there doesn't appe

Re: synchronization question about datasketches aggregator

2018-07-19 Thread Roman Leventov
tch operations that > take place in SketchAggregator (in the datasketches module) use a > SychronizedUnion class that basically wraps a normal sketch Union and > synchronizes all operations. From what I can tell with other aggregators in > the Druid code base, there doesn't appe