Re: Druid + Theta Sketches performance

2018-10-22 Thread Charles Allen
Honestly I do not remember how the dimension exclusion vs dimension inclusion stuff works. I have to look it up every time. If you look at any segment for that datasource in the Coordinator Console, it should give you a list of dimensions and metrics. Do they match what you expect? On Sun, Oct

Re: Druid + Theta Sketches performance

2018-10-21 Thread alex . rnv . ru
On 2018/10/19 14:42:18, Charles Allen wrote: > This is a good callout. Those numbers still seem very slow. One item I'm > curious of is if you are dropping the id when you index, or if the id is > also being indexed into the druid segments. > > With how druid does indexing, it dictionary

Re: Druid + Theta Sketches performance

2018-10-19 Thread Charles Allen
This is a good callout. Those numbers still seem very slow. One item I'm curious of is if you are dropping the id when you index, or if the id is also being indexed into the druid segments. With how druid does indexing, it dictionary encodes all the dimension values. So the cardinality of rows is

Druid + Theta Sketches performance

2018-10-19 Thread alex . rnv . ru
Hi Druid devs, I am testing Druid for our specific count distinct estimation case. Data was ingested via Hadoop indexer. When simplified, it has following schema: timestampkeycountrytheta-sketchevent-counter So, there are 2 dimensions, one counter metric, one theta sketch metric.