Yes, that's the expected behavior for now.  Directory pruning where only
subdirectory is specified is logically equivalent to wildcard matching -
'*/*/10'  which is not supported yet.  You could open an enhancement
request.

On Tue, Feb 3, 2015 at 5:27 PM, Andries Engelbrecht <
[email protected]> wrote:

> Is it required for the directory pruning to work that a top down filter of
> directories be applied?
>
> My current observation is that for a directory structure as listed below,
> the pruning only works if the full tree is provided. If only a lower level
> directory is supplied in the filter condition Drill only uses it as a
> filter.
>
> /2015
>          /01
>                 /10
>                 /11
>                 /12
>                 /13
>                 /14
>
> select count(id) from `/foo` t where dir0='2015' and dir1='01' and
> dir2='10'
> Produces the correct pruning and query plan
> 01-02            Project(id=[$3]): rowcount = 3670316.0, cumulative cost =
> {1.1010948E7 rows, 1.4681284E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id =
> 28434
> 01-03              Project(dir0=[$0], dir1=[$3], dir2=[$2], id=[$1]):
> rowcount = 3670316.0, cumulative cost = {7340632.0 rows, 1.468128E7 cpu,
> 0.0 io, 0.0 network, 0.0 memory}, id = 28433
> 01-04                Scan(groupscan=[EasyGroupScan [selectionRoot=/foo,
> numFiles=24, columns=[`dir0`, `dir1`, `dir2`, `id`]
>
>
> However
> select count(id) from `/foo` t where dir2='10'
> Produces full scan of all sub directories and only applies a filter
> condition after the fact. Notice the numFiles between the 2, even though it
> lists columns in the base scan
> 01-04                Filter(condition=[=($0, '10')]): rowcount =
> 9423761.7, cumulative cost = {1.88475234E8 rows, 3.76950476E8 cpu, 0.0 io,
> 0.0 network, 0.0 memory}, id = 27470
> 01-05                  Project(dir2=[$1], id=[$0]): rowcount =
> 6.2825078E7, cumulative cost = {1.25650156E8 rows, 1.25650164E8 cpu, 0.0
> io, 0.0 network, 0.0 memory}, id = 27469
> 01-06                    Scan(groupscan=[EasyGroupScan
> [selectionRoot=/foo, numFiles=405, columns=[`dir2`, `id`]
>
> Any thoughts?
>
> Thanks
>
> —Andries
>
>
>
>
>
>
>
>
>
>

Reply via email to