You can use wildcards to query dir and sub dir Simple example with a number of csv files in a directory structure The dfs.orders workspace / has 10 files /dir1 has another csv files /dir2 has a csv file /subdir/dir3 has another csv file
Below are a couple of different examples 0: jdbc:drill:> select count(*) from dfs.orders.`./`; +---------+ | EXPR$0 | +---------+ | 159000 | +---------+ 1 row selected (0.268 seconds) 0: jdbc:drill:> select count(*) from dfs.orders.`./*.csv`; +---------+ | EXPR$0 | +---------+ | 122000 | +---------+ 1 row selected (0.137 seconds) 0: jdbc:drill:> select count(*) from dfs.orders.`./dir1/*.csv`; +---------+ | EXPR$0 | +---------+ | 9000 | +---------+ 1 row selected (0.099 seconds) 0: jdbc:drill:> select count(*) from dfs.orders.`./dir2/*.csv`; +---------+ | EXPR$0 | +---------+ | 12000 | +---------+ 1 row selected (0.092 seconds) 0: jdbc:drill:> select count(*) from dfs.orders.`./subdir/dir3/*.csv`; +---------+ | EXPR$0 | +---------+ | 16000 | +---------+ 1 row selected (0.1 seconds) 0: jdbc:drill:> select count(*) from dfs.orders.`./*/*.csv`; +---------+ | EXPR$0 | +---------+ | 21000 | +---------+ 1 row selected (0.12 seconds) 0: jdbc:drill:> select count(*) from dfs.orders.`./*/*/*.csv`; +---------+ | EXPR$0 | +---------+ | 16000 | +---------+ 1 row selected (0.106 seconds) 0: jdbc:drill:> select count(*) from dfs.orders.`./*`; +---------+ | EXPR$0 | +---------+ | 159000 | +---------+ 1 row selected (0.173 seconds) 0: jdbc:drill:> select count(*) from dfs.orders.`./*/*`; +---------+ | EXPR$0 | +---------+ | 37000 | +---------+ 1 row selected (0.123 seconds) 0: jdbc:drill:> select count(*) from dfs.orders.`./*/.`; +---------+ | EXPR$0 | +---------+ | 159000 | +---------+ 1 row selected (0.182 seconds) 0: jdbc:drill:> select count(*) from dfs.orders.`./*/*/.`; +---------+ | EXPR$0 | +---------+ | 37000 | +---------+ 1 row selected (0.123 seconds) Also see https://drill.apache.org/docs/querying-directories/ <https://drill.apache.org/docs/querying-directories/> https://drill.apache.org/docs/query-directory-functions/ <https://drill.apache.org/docs/query-directory-functions/> --Andries > On Jul 5, 2016, at 11:20 PM, δΌζε <[email protected]> wrote: > > Hi, I am a master in china and have learned drill for a long time. Now drill > has provided functions to query file in a same directory. Drill scans all > files in a same directory firstly, then executes other operations for a > query. But I have a requirement that require to query multiple files in > different directories once. I don't want to move all files in a same > directory that will lead some I/O cost. Now I have a idea is to add function > in source data to support this function, but I don't have enough ability to > understand source data. Can you give some advice on this problem, thank you ! > Jinbo Wu
