Re: question about drill

Andries Engelbrecht Wed, 06 Jul 2016 07:58:15 -0700

You can use wildcards to query dir and sub dir

Simple example with a number of csv files in a directory structure
The dfs.orders workspace / has 10 files
/dir1 has another csv files
/dir2 has a csv file
/subdir/dir3 has another csv file


Below are a couple of different examples

0: jdbc:drill:> select count(*) from dfs.orders.`./`;
+---------+
| EXPR$0  |
+---------+
| 159000  |
+---------+
1 row selected (0.268 seconds)
0: jdbc:drill:> select count(*) from dfs.orders.`./*.csv`;
+---------+
| EXPR$0  |
+---------+
| 122000  |
+---------+
1 row selected (0.137 seconds)
0: jdbc:drill:> select count(*) from dfs.orders.`./dir1/*.csv`;
+---------+
| EXPR$0  |
+---------+
| 9000    |
+---------+
1 row selected (0.099 seconds)
0: jdbc:drill:> select count(*) from dfs.orders.`./dir2/*.csv`;
+---------+
| EXPR$0  |
+---------+
| 12000   |
+---------+
1 row selected (0.092 seconds)
0: jdbc:drill:> select count(*) from dfs.orders.`./subdir/dir3/*.csv`;
+---------+
| EXPR$0  |
+---------+
| 16000   |
+---------+
1 row selected (0.1 seconds)
0: jdbc:drill:> select count(*) from dfs.orders.`./*/*.csv`;
+---------+
| EXPR$0  |
+---------+
| 21000   |
+---------+
1 row selected (0.12 seconds)
0: jdbc:drill:> select count(*) from dfs.orders.`./*/*/*.csv`;
+---------+
| EXPR$0  |
+---------+
| 16000   |
+---------+
1 row selected (0.106 seconds)
0: jdbc:drill:> select count(*) from dfs.orders.`./*`;
+---------+
| EXPR$0  |
+---------+
| 159000  |
+---------+
1 row selected (0.173 seconds)
0: jdbc:drill:> select count(*) from dfs.orders.`./*/*`;
+---------+
| EXPR$0  |
+---------+
| 37000   |
+---------+
1 row selected (0.123 seconds)
0: jdbc:drill:> select count(*) from dfs.orders.`./*/.`;
+---------+
| EXPR$0  |
+---------+
| 159000  |
+---------+
1 row selected (0.182 seconds)
0: jdbc:drill:> select count(*) from dfs.orders.`./*/*/.`;
+---------+
| EXPR$0  |
+---------+
| 37000   |
+---------+
1 row selected (0.123 seconds)


Also see
https://drill.apache.org/docs/querying-directories/ 
<https://drill.apache.org/docs/querying-directories/>

https://drill.apache.org/docs/query-directory-functions/ 
<https://drill.apache.org/docs/query-directory-functions/>


--Andries



> On Jul 5, 2016, at 11:20 PM, 伍晋博 <[email protected]> wrote:
> 
> Hi, I am a master in china and have learned drill for a long time. Now drill 
> has provided functions to query file in a same directory. Drill scans all 
> files in a same directory firstly, then executes other operations for a 
> query. But I have a requirement that require to query multiple files in 
> different directories once. I don't want to move all files in a same 
> directory that will lead some I/O cost. Now I have a idea is to add function 
> in source data to support this function, but I don't have enough ability to 
> understand source data. Can you give some advice on this problem, thank you !
> Jinbo Wu

Re: question about drill

Reply via email to