Even for csv or json format, directory-based Partition pruning [1] could be
leveraged to prune data. You have to use the special dir* field in your
query to filter out un-wanted data, or define a view which uses dir* field
and then query against the view.

1. https://drill.apache.org/docs/partition-pruning/



On Thu, Jul 23, 2015 at 8:09 AM, Abdel Hakim Deneche <[email protected]>
wrote:

> Hi Hafiz,
>
> I guess it depends on the query. Generally Drill will try to push any
> filter you have in your query to the leaf nodes so they won't send any row
> that doesn't pass the filter. Also only the columns that appear in the
> query will be loaded from the file.
>
> The file format you are querying also impacts how much data is read from
> disk: in parquet Drill can avoid reading unnecessary columns, but for other
> formats (csv or json) Drill will still need to read everything from disk
> then discard unneeded columns before sending the remaining data for further
> processing.
>
> Adding a limit to the query can also help, and Drill will stop reading the
> data as soon as possible once enough records have been collected.
>
> On Thu, Jul 23, 2015 at 8:01 AM, Hafiz Mujadid <[email protected]>
> wrote:
>
> > Hi all!
> >
> > I want to know about drill working. Suppose i query to data on S3. the
> > volume of data is huge in GB's. So when I query to that data what
> happens?
> > whether drill load whole data on drill nodes? or just query data without
> > loading whole data?
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>

Reply via email to