Yes, really interesting discussion. It would be really interesting to compare the performance of alternative architectures. Specifically, I've found that Elasticsearch is a great option for analytic workloads - it doesn't support SQL (joins in particular), but its aggregation and arbitrary filtering capabilities make it very powerful, plus it does play really nicely with Spark, so Spark SQL could be layered on top (though I've only done this for offline batch jobs, not real-time user facing queries).
It also lends itself potentially very nicely to a "lambda-style" architecture, i.e. querying across historical aggregated data and the "real-time" component (current day, or hour, or whatever) at the same time, with careful data modelling. On Fri, 11 Mar 2016 at 06:25 Tristan Nixon <[email protected]> wrote: > Hear, hear. That’s why I’m here :) > > On Mar 10, 2016, at 7:32 PM, Chris Fregly <[email protected]> wrote: > > Anyway, thanks for the good discussion, everyone! This is why we have > these lists, right! :) > > >
