On Mon, Oct 19, 2020 at 8:19 PM bu...@sohu.com <bu...@sohu.com> wrote:
>
> Hi hackers,
> I write a path for soupport parallel distinct, union and aggregate using 
> batch sort.
> steps:
>  1. generate hash value for group clauses values, and using mod hash value 
> save to batch
>  2. end of outer plan, wait all other workers finish write to batch
>  3. echo worker get a unique batch number, call tuplesort_performsort() 
> function finish this batch sort
>  4. return row for this batch
>  5. if not end of all batchs, got step 3
>
> BatchSort paln make sure same tuple(group clause) return in same range, so 
> Unique(or GroupAggregate) plan can work.

Interesting idea.  So IIUC, whenever a worker is scanning the tuple it
will directly put it into the respective batch(shared tuple store),
based on the hash on grouping column and once all the workers are
doing preparing the batch then each worker will pick those baches one
by one, perform sort and finish the aggregation.  I think there is a
scope of improvement that instead of directly putting the tuple to the
batch what if the worker does the partial aggregations and then it
places the partially aggregated rows in the shared tuple store based
on the hash value and then the worker can pick the batch by batch.  By
doing this way, we can avoid doing large sorts.  And then this
approach can also be used with the hash aggregate, I mean the
partially aggregated data by the hash aggregate can be put into the
respective batch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Reply via email to