Re: question about drill

Jason Altekruse Wed, 06 Jul 2016 10:48:49 -0700

If you want to combine files from different directories where there are no
patterns in the file or directory names, you could use a UNION ALL to
combine datasets [1].


[1] - https://drill.apache.org/docs/select-union/

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Wed, Jul 6, 2016 at 7:57 AM, Andries Engelbrecht <
[email protected]> wrote:

> You can use wildcards to query dir and sub dir
>
> Simple example with a number of csv files in a directory structure
> The dfs.orders workspace / has 10 files
> /dir1 has another csv files
> /dir2 has a csv file
> /subdir/dir3 has another csv file
>
> Below are a couple of different examples
>
> 0: jdbc:drill:> select count(*) from dfs.orders.`./`;
> +---------+
> | EXPR$0  |
> +---------+
> | 159000  |
> +---------+
> 1 row selected (0.268 seconds)
> 0: jdbc:drill:> select count(*) from dfs.orders.`./*.csv`;
> +---------+
> | EXPR$0  |
> +---------+
> | 122000  |
> +---------+
> 1 row selected (0.137 seconds)
> 0: jdbc:drill:> select count(*) from dfs.orders.`./dir1/*.csv`;
> +---------+
> | EXPR$0  |
> +---------+
> | 9000    |
> +---------+
> 1 row selected (0.099 seconds)
> 0: jdbc:drill:> select count(*) from dfs.orders.`./dir2/*.csv`;
> +---------+
> | EXPR$0  |
> +---------+
> | 12000   |
> +---------+
> 1 row selected (0.092 seconds)
> 0: jdbc:drill:> select count(*) from dfs.orders.`./subdir/dir3/*.csv`;
> +---------+
> | EXPR$0  |
> +---------+
> | 16000   |
> +---------+
> 1 row selected (0.1 seconds)
> 0: jdbc:drill:> select count(*) from dfs.orders.`./*/*.csv`;
> +---------+
> | EXPR$0  |
> +---------+
> | 21000   |
> +---------+
> 1 row selected (0.12 seconds)
> 0: jdbc:drill:> select count(*) from dfs.orders.`./*/*/*.csv`;
> +---------+
> | EXPR$0  |
> +---------+
> | 16000   |
> +---------+
> 1 row selected (0.106 seconds)
> 0: jdbc:drill:> select count(*) from dfs.orders.`./*`;
> +---------+
> | EXPR$0  |
> +---------+
> | 159000  |
> +---------+
> 1 row selected (0.173 seconds)
> 0: jdbc:drill:> select count(*) from dfs.orders.`./*/*`;
> +---------+
> | EXPR$0  |
> +---------+
> | 37000   |
> +---------+
> 1 row selected (0.123 seconds)
> 0: jdbc:drill:> select count(*) from dfs.orders.`./*/.`;
> +---------+
> | EXPR$0  |
> +---------+
> | 159000  |
> +---------+
> 1 row selected (0.182 seconds)
> 0: jdbc:drill:> select count(*) from dfs.orders.`./*/*/.`;
> +---------+
> | EXPR$0  |
> +---------+
> | 37000   |
> +---------+
> 1 row selected (0.123 seconds)
>
>
> Also see
> https://drill.apache.org/docs/querying-directories/ <
> https://drill.apache.org/docs/querying-directories/>
>
> https://drill.apache.org/docs/query-directory-functions/ <
> https://drill.apache.org/docs/query-directory-functions/>
>
>
> --Andries
>
>
>
> > On Jul 5, 2016, at 11:20 PM, 伍晋博 <[email protected]> wrote:
> >
> > Hi, I am a master in china and have learned drill for a long time. Now
> drill has provided functions to query file in a same directory. Drill scans
> all files in a same directory firstly, then executes other operations for a
> query. But I have a requirement that require to query multiple files in
> different directories once. I don't want to move all files in a same
> directory that will lead some I/O cost. Now I have a idea is to add
> function in source data to support this function, but I don't have enough
> ability to understand source data. Can you give some advice on this
> problem, thank you !
> > Jinbo Wu
>
>

Re: question about drill

Reply via email to