Drill version: 1.5.0.
I found that Drill will optimize an IN predicate with many values(over 20) into
HashJoin, like that:
0: jdbc:drill:drillbit=localhost> explain plan for select campaign_id from
campaign where campaign_id
in(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20) limit 10;
+------+------+
| text | json |
+------+------+
| 00-00 Screen
00-01 Project(campaign_id=[$0])
00-02 SelectionVectorRemover
00-03 Limit(fetch=[10])
00-04 UnionExchange
01-01 SelectionVectorRemover
01-02 Limit(fetch=[10])
01-03 Project(campaign_id=[$0])
01-04 HashJoin(condition=[=($1, $2)], joinType=[inner])
01-06 Project($f0=[$0], $f66=[$0])
01-07 Scan(groupscan=[IndexRGroupScan
[IndexRScanSpec={tableName='campaign', rsFilter=null},
columns=[`campaign_id`]]])
01-05 BroadcastExchange
02-01 HashAgg(group=[{0}])
02-02 Values
And I’m implementing a new storage plugin which is especially designed to
efficiently handled the long IN condition. So I really need to get the IN
condition. Just like when the IN values count less than 20:
0: jdbc:drill:drillbit=localhost> explain plan for select campaign_id from
campaign where campaign_id in(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19)
limit 10;
+------+------+
| text | json |
+------+------+
| 00-00 Screen
00-01 Project(campaign_id=[$0])
00-02 SelectionVectorRemover
00-03 Limit(fetch=[10])
00-04 UnionExchange
01-01 SelectionVectorRemover
01-02 Limit(fetch=[10])
01-03 Filter(condition=[OR(=($0, 1), =($0, 2), =($0, 3), =($0,
4), =($0, 5), =($0, 6), =($0, 7), =($0, 8), =($0, 9), =($0, 10), =($0, 11),
=($0, 12), =($0, 13), =($0, 14), =($0, 15), =($0, 16), =($0, 17), =($0, 18),
=($0, 19))])
01-04 Scan(groupscan=[IndexRGroupScan
[IndexRScanSpec={tableName='campaign', rsFilter=In($0: (19)[1,2,3,4,5 ...])},
columns=[`campaign_id`]]])
Any suggest that I can fetch the whole filter condition, especially the IN
condition?