Re: IndexR, a new storage plugin for Drill

WeiWan Wed, 04 Jan 2017 03:21:07 -0800

Hi Nicolas,

> 1)Does both drill and hive support predicat pushdown with indexR ? I mean
> using the indexes and not scanning table.


Of course we supports predicates pushdown.
IndexR implements a special index so called Rough Set Index, which is very 
suitable for statistic queries. It can effectively filter out those irrelevant 
data chunks and cost very little comparing to other index form. The idea is 
original comes from Infobright (ICE). I'm sure you can find many useful links 
by google with “infobright rough set”. In some aspects you can think IndexR as 
another Infobright which is open source, distributed, on Hadoop and realtime 
ingest supported.


> 2)Does it support join push down, sort etc ?

It does not. Those job should be done by query layer, i.e. Drill. 
But we did hope Drill can support aggregation push down, which can really speed 
up queries in the cases like “select count(*), sum(a), max(b) form table"

> 3)Can you elaborate why your team choose Drill versus equivalent (impala,
> presto…)

We are not very familiar with Impala, Presto. But we did tried Spark. We didn’t 
choose Spark because at that time, early 2016, Spark’s API for scanner is not 
stable enough, and we need the processes running on local machines, instead of 
running on Yarn. And most of all, we love Drill for its stability, efficiency, 
simplicity, and the nice interface for storage plugin.

Regards
Flow Wei



> On Jan 4, 2017, at 16:32, Nicolas Paris <[email protected]> wrote:
> 
> Hi Weiwan,
> 
> 1)Does both drill and hive support predicat pushdown with indexR ? I mean
> using the indexes and not scanning table.
> 2)Does it support join push down, sort etc ?
> 3)Can you elaborate why your team choose Drill versus equivalent (impala,
> presto...)
> 
> Thanks !
> 
> 
> 
> 2017-01-04 2:59 GMT+01:00 WeiWan <[email protected]>:
> 
>> Hi,
>> 
>> It will take some time for IndexR plugin to merge into Drill. But you can
>> try it out already by following those documents.
>> 
>> Compilation:  https://github.com/shunfei/indexr/wiki/Compilation <
>> https://github.com/shunfei/indexr/wiki/Compilation>
>> Deployment:  https://github.com/shunfei/indexr/wiki/Deployment <
>> https://github.com/shunfei/indexr/wiki/Deployment>
>> User Guide:  https://github.com/shunfei/indexr/wiki/User-Guide <
>> https://github.com/shunfei/indexr/wiki/User-Guide>
>> Regards
>> Flow Wei
>> 
>> 
>> 
>>> On Jan 4, 2017, at 00:22, Jinfeng Ni <[email protected]> wrote:
>>> 
>>> Looks like IndexR is very interesting storage plugin. Although I have
>>> not looked into the detail, I'm looking forward to seeing the PR and
>>> hopefully getting this into Drill!
>>> 
>>> Thanks,
>>> 
>>> Jinfeng
>>> 
>>> 
>>> On Tue, Jan 3, 2017 at 7:30 AM, WeiWan <[email protected]> wrote:
>>>> Hi Charles,
>>>> 
>>>> It would be great if IndexR plugin can be merged into official Drill
>> project. I will do some more tests based on latest Drill version and submit
>> a PR.
>>>> 
>>>> Regards
>>>> Flow Wei
>>>> 
>>>> 
>>>> 
>>>>> On Jan 3, 2017, at 23:18, Charles Givre <[email protected]> wrote:
>>>>> 
>>>>> This sounds really interesting.  Will you be submitting a PR to
>> integrate this into the main Drill codebase?
>>>>> — C
>>>>> 
>>>>>> On Jan 3, 2017, at 03:35, WeiWan <[email protected]> wrote:
>>>>>> 
>>>>>> IndexR is a distributed, columnar storage system based on HDFS, which
>> focus on fast analyse, both for massive static(historical) data and rapidly
>> ingesting realtime data. IndexR is designed for OLAP.
>>>>>> 
>>>>>> Fast analyze on large dataset
>>>>>> Realtime ingestion with zero delay for query
>>>>>> Deep integration with Hadoop ecosystem
>>>>>> Hardware efficiency
>>>>>> Highly avaliable, scalable, manageable and simple
>>>>>> Adapted with popular query engines like Apache Drill, Apache Hive,
>> etc.
>>>>>> 
>>>>>> And now it is open source.
>>>>>> 
>>>>>> Project: https://github.com/shunfei/indexr <
>> https://github.com/shunfei/indexr>
>>>>>> Wiki: https://github.com/shunfei/indexr/wiki <
>> https://github.com/shunfei/indexr/wiki>
>>>>>> 
>>>>>> IndexR is original developed by Sunteng Tech. This project started a
>> year ago and now has been deployed to several productions in our company.
>> The whole cluster consumes over 30 billions events each day in realtime
>> from Kafka. The largest table contains over 10 billions rows (after rollup)
>> and rapidly increasing. Most of the statistic/analyze queries’ latency is
>> less than 3 seconds in real world production environment.
>>>>>> 
>>>>>> Currently it is mainly used as Drill and Hive storage plugin. It
>> should be quite easy to master.
>>>>>> 
>>>>>> We hope IndexR be a favor to you and make it better.
>>>>>> 
>>>>>> Regards
>>>>>> Flow Wei
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
>>

Re: IndexR, a new storage plugin for Drill

Reply via email to