Re: Pushing down Joins, Aggregates and filters, and data distribution questions

Paul Rogers Thu, 01 Jun 2017 22:11:12 -0700

Hi Muhammad,

> I have a couple of questions:
> 
>   1. If I have multiple *SubScan*s to be executed, will each *SubScan* be
>   handled by a single *Scan* operator ? So whenever I have *n* *SubScan*s,
>   I'll have *n* Scan operators distributed among Drill's cluster ?


As Rahul explained, subscans are assigned to fragments. Let’s say that three 
were assigned to the same fragment. In this case, a single scan operator 
handles all three. Your “Scan Batch Creator” will create a separate “Record 
Reader” for each subscan and hand them to the scan operator. The scan operator 
then opens, reads, an closes each in turn.

>   2. How can I control the amount of any type of physical operators per
>   Drill cluster or node ? For instance, what if I want to have less
>   *Filter* operators or more *Scan* operators, how can I do that ?
> 
I’ve not seen anything that suggests that this is possible. Drill groups 
operators into fragments, then parallelizes the fragments. To accomplish what 
you want, you’d need to figure out how Drill slices the DAG into fragments and 
adjust the slicing to isolate the operators as you desire. Network exchanges 
join your custom fragments.

Parallelization is generic for all fragments as Rahul explained; I’ve seen 
nothing that suggests we have a way to identify different categories of 
fragments and apply different parallelization rules to each.

Maybe there is some Calcite magic available?

- Paul

Re: Pushing down Joins, Aggregates and filters, and data distribution questions

Reply via email to