If you want to combine files from different directories where there are no patterns in the file or directory names, you could use a UNION ALL to combine datasets [1].
[1] - https://drill.apache.org/docs/select-union/ Jason Altekruse Software Engineer at Dremio Apache Drill Committer On Wed, Jul 6, 2016 at 7:57 AM, Andries Engelbrecht < [email protected]> wrote: > You can use wildcards to query dir and sub dir > > Simple example with a number of csv files in a directory structure > The dfs.orders workspace / has 10 files > /dir1 has another csv files > /dir2 has a csv file > /subdir/dir3 has another csv file > > Below are a couple of different examples > > 0: jdbc:drill:> select count(*) from dfs.orders.`./`; > +---------+ > | EXPR$0 | > +---------+ > | 159000 | > +---------+ > 1 row selected (0.268 seconds) > 0: jdbc:drill:> select count(*) from dfs.orders.`./*.csv`; > +---------+ > | EXPR$0 | > +---------+ > | 122000 | > +---------+ > 1 row selected (0.137 seconds) > 0: jdbc:drill:> select count(*) from dfs.orders.`./dir1/*.csv`; > +---------+ > | EXPR$0 | > +---------+ > | 9000 | > +---------+ > 1 row selected (0.099 seconds) > 0: jdbc:drill:> select count(*) from dfs.orders.`./dir2/*.csv`; > +---------+ > | EXPR$0 | > +---------+ > | 12000 | > +---------+ > 1 row selected (0.092 seconds) > 0: jdbc:drill:> select count(*) from dfs.orders.`./subdir/dir3/*.csv`; > +---------+ > | EXPR$0 | > +---------+ > | 16000 | > +---------+ > 1 row selected (0.1 seconds) > 0: jdbc:drill:> select count(*) from dfs.orders.`./*/*.csv`; > +---------+ > | EXPR$0 | > +---------+ > | 21000 | > +---------+ > 1 row selected (0.12 seconds) > 0: jdbc:drill:> select count(*) from dfs.orders.`./*/*/*.csv`; > +---------+ > | EXPR$0 | > +---------+ > | 16000 | > +---------+ > 1 row selected (0.106 seconds) > 0: jdbc:drill:> select count(*) from dfs.orders.`./*`; > +---------+ > | EXPR$0 | > +---------+ > | 159000 | > +---------+ > 1 row selected (0.173 seconds) > 0: jdbc:drill:> select count(*) from dfs.orders.`./*/*`; > +---------+ > | EXPR$0 | > +---------+ > | 37000 | > +---------+ > 1 row selected (0.123 seconds) > 0: jdbc:drill:> select count(*) from dfs.orders.`./*/.`; > +---------+ > | EXPR$0 | > +---------+ > | 159000 | > +---------+ > 1 row selected (0.182 seconds) > 0: jdbc:drill:> select count(*) from dfs.orders.`./*/*/.`; > +---------+ > | EXPR$0 | > +---------+ > | 37000 | > +---------+ > 1 row selected (0.123 seconds) > > > Also see > https://drill.apache.org/docs/querying-directories/ < > https://drill.apache.org/docs/querying-directories/> > > https://drill.apache.org/docs/query-directory-functions/ < > https://drill.apache.org/docs/query-directory-functions/> > > > --Andries > > > > > On Jul 5, 2016, at 11:20 PM, δΌζε <[email protected]> wrote: > > > > Hi, I am a master in china and have learned drill for a long time. Now > drill has provided functions to query file in a same directory. Drill scans > all files in a same directory firstly, then executes other operations for a > query. But I have a requirement that require to query multiple files in > different directories once. I don't want to move all files in a same > directory that will lead some I/O cost. Now I have a idea is to add > function in source data to support this function, but I don't have enough > ability to understand source data. Can you give some advice on this > problem, thank you ! > > Jinbo Wu > >
