Great work. On Fri, Sep 18, 2015 at 6:51 PM, Harish Butani <rhbutani.sp...@gmail.com> wrote:
> Hi, > > I have just posted a Blog on this: > https://www.linkedin.com/pulse/combining-druid-spark-interactive-flexible-analytics-scale-butani > > regards, > Harish Butani. > > On Tue, Sep 1, 2015 at 11:46 PM, Paolo Platter <paolo.plat...@agilelab.it> > wrote: > >> Fantastic!!! I will look into that and I hope to contribute >> >> Paolo >> >> Inviata dal mio Windows Phone >> ------------------------------ >> Da: Harish Butani <rhbutani.sp...@gmail.com> >> Inviato: 02/09/2015 06:04 >> A: user <user@spark.apache.org> >> Oggetto: Spark + Druid >> >> Hi, >> >> I am working on the Spark Druid Package: >> https://github.com/SparklineData/spark-druid-olap. >> For scenarios where a 'raw event' dataset is being indexed in Druid it >> enables you to write your Logical Plans(queries/dataflows) against the 'raw >> event' dataset and it rewrites parts of the plan to execute as a Druid >> Query. In Spark the configuration of a Druid DataSource is somewhat like >> configuring an OLAP index in a traditional DB. Early results show >> significant speedup of pushing slice and dice queries to Druid. >> >> It comprises of a Druid DataSource that wraps the 'raw event' dataset and >> has knowledge of the Druid Index; and a DruidPlanner which is a set of plan >> rewrite strategies to convert Aggregation queries into a Plan having a >> DruidRDD. >> >> Here >> <https://github.com/SparklineData/spark-druid-olap/blob/master/docs/SparkDruid.pdf> >> is >> a detailed design document, which also describes a benchmark of >> representative queries on the TPCH dataset. >> >> Looking for folks who would be willing to try this out and/or contribute. >> >> regards, >> Harish Butani. >> > >