Drill allows for columnar *execution *as well as storage.  Other tools will
flatten the file rows before executing. For non columnar data storage,
drill optimizes with columnar execution.  This makes for much more
efficient processing.


Drill optimizes for both columnar storage and execution by using an
in-memory data model that is hierarchical and columnar. When working with
data stored in columnar formats such as Parquet, Drill avoids disk access
for columns that are not involved in an analytic query. Drill also provides
an execution layer that performs SQL processing directly on columnar data
without row materialization. The combination of optimizations for columnar
storage and direct columnar execution significantly lowers memory
footprints and provides faster execution of BI/Analytic type of workloads.

Drill tackles rapidly evolving application driven schemas and nested data
structures with a unique hierarchical columnar representation of data
allowing for high performance queries on such evolving data structures.




On Sat, Feb 14, 2015 at 2:52 PM, Aditya <[email protected]> wrote:

> One of the Drill's goal is to allow a direct access, i.e. without the need
> for transformation,
> from any data source in any format and that's why you see so many choices
> with data sources
> and data format.
>
> Having said that, it's also Drill's primary goal is to provide low latency
> access to queries by
> taking advantage of a storage format which provides optimization like the
> one you mentioned.
>
> Please take a look at Parquet <http://parquet.incubator.apache.org/>,
> which
> is a columnar storage format and is used by Drill as the
> de facto format for optimized access to the data.
>
> On Sat, Feb 14, 2015 at 5:56 AM, Tamil selvan R.S <[email protected]>
> wrote:
>
> > Hi,
> > As the project description says, I understand drill as a open source
> > implementation of Dremel. Basically, Dremel optimizes adhoc queries on
> > unstructured data by storing it columnar way instead of record wise. I
> > assume drill doing the same. I saw drill supporting a wide variety of
> > datasources like json, mongo, etc., How does drill achieve the
> > transformation of source data into a columnar representation so that it
> can
> > optimize the queries?
> >
> > For Example:
> > Data [Assume it to be in mongo]:
> >
> >
> {"idtype":"ca","id":3,"metric":"purchases","time":"Y14/M0/D0","device":"nexus","devicegrp":"tablet","source":"minewhat","sourcegrp":"email","dofw":"weekend","tofd":"morning","browser":"chrome","engage":"return","location":"mumbai","locationgrp":"maharashtra","usertag":"frequent","search":"sony
> > tab","total":56263}
> >
> > And for a query like below:
> > select test.device, count(*) from mongo.mydata test where test.idtype='b'
> > and test.id=10 group by test.device, test.idtype, test.id;
> >
> > Will drill load *all documents* from mydata collection every time this
> > query is fired and later map the data to columnar style? I'm 100% sure
> this
> > won't be the implementation as it look to worsen the situation more
> > [loading data, transform [should go row by row] and then query the
> > transformed data].
> >
> > It would be really helpful if someone can shed some light on this area,
> as
> > there is no material found in the documentation.
> >
> > Regards,
> > Tamil.s
> >
>

Reply via email to