One of the Drill's goal is to allow a direct access, i.e. without the need
for transformation,
from any data source in any format and that's why you see so many choices
with data sources
and data format.

Having said that, it's also Drill's primary goal is to provide low latency
access to queries by
taking advantage of a storage format which provides optimization like the
one you mentioned.

Please take a look at Parquet <http://parquet.incubator.apache.org/>, which
is a columnar storage format and is used by Drill as the
de facto format for optimized access to the data.

On Sat, Feb 14, 2015 at 5:56 AM, Tamil selvan R.S <[email protected]>
wrote:

> Hi,
> As the project description says, I understand drill as a open source
> implementation of Dremel. Basically, Dremel optimizes adhoc queries on
> unstructured data by storing it columnar way instead of record wise. I
> assume drill doing the same. I saw drill supporting a wide variety of
> datasources like json, mongo, etc., How does drill achieve the
> transformation of source data into a columnar representation so that it can
> optimize the queries?
>
> For Example:
> Data [Assume it to be in mongo]:
>
> {"idtype":"ca","id":3,"metric":"purchases","time":"Y14/M0/D0","device":"nexus","devicegrp":"tablet","source":"minewhat","sourcegrp":"email","dofw":"weekend","tofd":"morning","browser":"chrome","engage":"return","location":"mumbai","locationgrp":"maharashtra","usertag":"frequent","search":"sony
> tab","total":56263}
>
> And for a query like below:
> select test.device, count(*) from mongo.mydata test where test.idtype='b'
> and test.id=10 group by test.device, test.idtype, test.id;
>
> Will drill load *all documents* from mydata collection every time this
> query is fired and later map the data to columnar style? I'm 100% sure this
> won't be the implementation as it look to worsen the situation more
> [loading data, transform [should go row by row] and then query the
> transformed data].
>
> It would be really helpful if someone can shed some light on this area, as
> there is no material found in the documentation.
>
> Regards,
> Tamil.s
>

Reply via email to