Is it required for the directory pruning to work that a top down filter of 
directories be applied?

My current observation is that for a directory structure as listed below, the 
pruning only works if the full tree is provided. If only a lower level 
directory is supplied in the filter condition Drill only uses it as a filter.

/2015
         /01
                /10
                /11
                /12
                /13
                /14

select count(id) from `/foo` t where dir0='2015' and dir1='01' and dir2='10'
Produces the correct pruning and query plan
01-02            Project(id=[$3]): rowcount = 3670316.0, cumulative cost = 
{1.1010948E7 rows, 1.4681284E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 28434
01-03              Project(dir0=[$0], dir1=[$3], dir2=[$2], id=[$1]): rowcount 
= 3670316.0, cumulative cost = {7340632.0 rows, 1.468128E7 cpu, 0.0 io, 0.0 
network, 0.0 memory}, id = 28433
01-04                Scan(groupscan=[EasyGroupScan [selectionRoot=/foo, 
numFiles=24, columns=[`dir0`, `dir1`, `dir2`, `id`]


However
select count(id) from `/foo` t where dir2='10'
Produces full scan of all sub directories and only applies a filter condition 
after the fact. Notice the numFiles between the 2, even though it lists columns 
in the base scan
01-04                Filter(condition=[=($0, '10')]): rowcount = 9423761.7, 
cumulative cost = {1.88475234E8 rows, 3.76950476E8 cpu, 0.0 io, 0.0 network, 
0.0 memory}, id = 27470
01-05                  Project(dir2=[$1], id=[$0]): rowcount = 6.2825078E7, 
cumulative cost = {1.25650156E8 rows, 1.25650164E8 cpu, 0.0 io, 0.0 network, 
0.0 memory}, id = 27469
01-06                    Scan(groupscan=[EasyGroupScan [selectionRoot=/foo, 
numFiles=405, columns=[`dir2`, `id`]

Any thoughts?

Thanks

—Andries









Reply via email to