We are busy at the moment trying to get new releases of datasketches-memory
followed by a new release of datasketches-java out the door. This pair of
releases will give us compatibility with JDK 8 through JDK 13. After that,
we will work with the Druid team to update the datasketches adaptor
On Thu, Aug 12, 2021 at 8:46 PM leerho wrote:
> The non-generic container sketches include the Theta sketch, the double
> valued Tuple sketch, the double valued Quantiles sketch, the float valued
> KLL and REQ sketches and the long valued FrequentItemsSketch. Of these,
> the Theta family and our
a VectorAggregator implementation, then the whole
> > query will run non-vectorized, which will slow it down.
> >
> > The main reason that BufferAggregators exist is to support off-heap
> > aggregation, which minimizes GC pressure and prevents out-of-memory
> errors
>
eason that BufferAggregators exist is to support off-heap
> aggregation, which minimizes GC pressure and prevents out-of-memory errors
> during the aggregation process.
>
> > Assuming it is, we can begin talking with the datasketches team about the
> possibility of a Direct implem
aggregation process.
>
> > Assuming it is, we can begin talking with the datasketches team about
> the possibility of a Direct implementation.
>
> With ItemsSketch, the biggest roadblock you're likely to run into is the
> fact that the items may be of variable size. Currently in Druid eac
during the aggregation process.
> Assuming it is, we can begin talking with the datasketches team about the
possibility of a Direct implementation.
With ItemsSketch, the biggest roadblock you're likely to run into is the
fact that the items may be of variable size. Currently in Druid e
I am looking into implementing a new Aggregator in the datasketches extension
using the ItemSketch in the frequencies package:
https://datasketches.apache.org/docs/Frequency/FrequentItemsOverview.html
https://github.com/apache/datasketches-java/tree/master/src/main/java/org/apache/datasketches
### Bug Fix for Druid Issue #9736 ###
Hello All,
1. The Apache DataSketches Java 1.3.0-incubating has been released!
NOTE 1: This is the core Java component of the DataSketches library
that includes all the sketch algorithms in production-ready packages. These
sketches can be called
I tested moving datasketches to core and it doesn’t look like it brings
additional dependencies:
> [INFO] < org.apache.druid:druid-core
> >-
> [INFO] Building druid-core 0.17.0-incubating-SNAPSHOT
> [INFO] ---
Any time we discuss moving things into core Druid I would love to see a
list of dependencies that comes with it.
On Wed, Oct 30, 2019, 6:08 PM Jihoon Son wrote:
> +1 on moving too.
>
> On Mon, Oct 28, 2019 at 12:46 PM Fangjin Yang wrote:
>
> > +1 on moving datasketches to c
+1 on moving datasketches to core
On Mon, Oct 28, 2019 at 12:36 PM Chi Cao Minh wrote:
> To support range partitioning for native parallel batch indexing, I’m
> considering moving DataSketches from extensions to core (see
> https://github.com/apache/incubator-druid/issues/8769
To support range partitioning for native parallel batch indexing, I’m
considering moving DataSketches from extensions to core (see
https://github.com/apache/incubator-druid/issues/8769
<https://github.com/apache/incubator-druid/issues/8769> for details). Having
DataSketches in core woul
There is also an important sub-project in DataSketches - Memory (currently
https://github.com/DataSketches/memory) that originated from this issue:
https://github.com/apache/incubator-druid/issues/3892 and there is a plan
to eventually move Druid from ByteBuffer to Memory, at least in some parts
Basically there are a LOT of issues and PRs that show up when searching for
datasketches in the druid PR list:
https://github.com/apache/incubator-druid/pulls?utf8=%E2%9C%93=datasketches
Maybe just have a label called
Area - Sketches
?
On Mon, Feb 25, 2019 at 11:01 AM Gian Merlino wrote
What scope would you suggest for the label or github project?
There seem to be discussions going on around making DataSketches HLL and/or
Quantiles more 'default' options for their respective areas -- are you
thinking that kind of thing?
On Mon, Feb 25, 2019 at 9:57 AM Charles Allen
wrote
I don’t know how a project can formally track another project, but individuals
certainly can.
If any Druid committers are ASF members then they could volunteer to help as
mentors of the Data Sketches podling.
If any Druid committers are past or current contributors to the DataSketches
There are a lot of here and there discussions on how to handle sketching /
hll / histograms / other-stats, and it is getting kind of hard to keep
track of them all.
In addition, looks like Datasketches is in an incubating proposal stage for
Apache
http://mail-archives.apache.org/mod_mbox
Will Lauer wrote:
> A colleague recently pointed out to me that all the sketch operations that
> take place in SketchAggregator (in the datasketches module) use a
> SychronizedUnion class that basically wraps a normal sketch Union and
> synchronizes all operations. From what I can tel
tch operations that
> take place in SketchAggregator (in the datasketches module) use a
> SychronizedUnion class that basically wraps a normal sketch Union and
> synchronizes all operations. From what I can tell with other aggregators in
> the Druid code base, there doesn't appe
tch operations that
> take place in SketchAggregator (in the datasketches module) use a
> SychronizedUnion class that basically wraps a normal sketch Union and
> synchronizes all operations. From what I can tell with other aggregators in
> the Druid code base, there doesn't appe
20 matches
Mail list logo