I was thinking about model training at Druid indexing side and evaluation
at Druid querying side.
The advantage Druid has over Spark at querying is faster row filtering
thanks to bitset indexes. But since model evaluation is a pretty heavy
operation (I suppose; does anyone has ballpark time
> it makes more sense to have tooling around Druid, to do slice and dice
the data that you need, and do the ml stuff in sklearn, or even in spark
I agree with this sentiment. Druid as an execution engine is very good at
doing distributed aggregation (distributed reduce). What advantage does
> Vertica has it. Good idea to introduce it in Druid.
I'm not sure if this is a valid argument. With this argument, you can
introduce anything into Druid. I think it is good to be opinionated, and as
a community why we do or don't introduce ML possibilities into the software.
For example,