Re: Spark + Druid

Petr Novak Mon, 21 Sep 2015 03:03:11 -0700

Great work.

On Fri, Sep 18, 2015 at 6:51 PM, Harish Butani <rhbutani.sp...@gmail.com>
wrote:


> Hi,
>
> I have just posted a Blog on this:
> https://www.linkedin.com/pulse/combining-druid-spark-interactive-flexible-analytics-scale-butani
>
> regards,
> Harish Butani.
>
> On Tue, Sep 1, 2015 at 11:46 PM, Paolo Platter <paolo.plat...@agilelab.it>
> wrote:
>
>> Fantastic!!! I will look into that and I hope to contribute
>>
>> Paolo
>>
>> Inviata dal mio Windows Phone
>> ------------------------------
>> Da: Harish Butani <rhbutani.sp...@gmail.com>
>> Inviato: ‎02/‎09/‎2015 06:04
>> A: user <user@spark.apache.org>
>> Oggetto: Spark + Druid
>>
>> Hi,
>>
>> I am working on the Spark Druid Package:
>> https://github.com/SparklineData/spark-druid-olap.
>> For scenarios where a 'raw event' dataset is being indexed in Druid it
>> enables you to write your Logical Plans(queries/dataflows) against the 'raw
>> event' dataset and it rewrites parts of the plan to execute as a Druid
>> Query. In Spark the configuration of a Druid DataSource is somewhat like
>> configuring an OLAP index in a traditional DB. Early results show
>> significant speedup of pushing slice and dice queries to Druid.
>>
>> It comprises of a Druid DataSource that wraps the 'raw event' dataset and
>> has knowledge of the Druid Index; and a DruidPlanner which is a set of plan
>> rewrite strategies to convert Aggregation queries into a Plan having a
>> DruidRDD.
>>
>> Here
>> <https://github.com/SparklineData/spark-druid-olap/blob/master/docs/SparkDruid.pdf>
>>  is
>> a detailed design document, which also describes a benchmark of
>> representative queries on the TPCH dataset.
>>
>> Looking for folks who would be willing to try this out and/or contribute.
>>
>> regards,
>> Harish Butani.
>>
>
>

Re: Spark + Druid

Reply via email to