Hi Muhammad, > I have a couple of questions: > > 1. If I have multiple *SubScan*s to be executed, will each *SubScan* be > handled by a single *Scan* operator ? So whenever I have *n* *SubScan*s, > I'll have *n* Scan operators distributed among Drill's cluster ?
As Rahul explained, subscans are assigned to fragments. Let’s say that three were assigned to the same fragment. In this case, a single scan operator handles all three. Your “Scan Batch Creator” will create a separate “Record Reader” for each subscan and hand them to the scan operator. The scan operator then opens, reads, an closes each in turn. > 2. How can I control the amount of any type of physical operators per > Drill cluster or node ? For instance, what if I want to have less > *Filter* operators or more *Scan* operators, how can I do that ? > I’ve not seen anything that suggests that this is possible. Drill groups operators into fragments, then parallelizes the fragments. To accomplish what you want, you’d need to figure out how Drill slices the DAG into fragments and adjust the slicing to isolate the operators as you desire. Network exchanges join your custom fragments. Parallelization is generic for all fragments as Rahul explained; I’ve seen nothing that suggests we have a way to identify different categories of fragments and apply different parallelization rules to each. Maybe there is some Calcite magic available? - Paul
