Re: [HACKERS] Parallel Append implementation

```>> 2. Next, estimate the cost of the non-partial paths.  To do this, make
>> an array of Cost of that length and initialize all the elements to
>> zero, then add the total cost of each non-partial plan in turn to the
>> element of the array with the smallest cost, and then take the maximum
>> of the array elements as the total cost of the non-partial plans.  Add
>> this to the result from step 1 to get the total cost.
>
> So with costs (8, 5, 2), add 8 and 5 to 2 so that it becomes (8, 5,
> 15) , and so the max is 15 ? I surely am misinterpreting this.
>
> Actually, I couldn't come up with a general formula to find the
> non-partial paths total cost, given the per-subplan cost and number of
> workers. I mean, we can manually find out the total cost, but turning
> it into a formula seems quite involved. We can even do a dry-run of
> workers consuming each of the subplan slots and find the total time
> time units taken, but finding some approximation seemed ok.
>
> For e.g. we can manually find total time units taken for following :
> costs (8, 2, 2, 2) with 2 workers : 8
> costs (6, 6, 4, 1) with 2 workers : 10.
> costs (6, 6, 4, 1) with 3 workers : 6.
>
> But coming up with an alogrithm or a formula didn't look worth. So I
> just did the total cost and divided it by workers. And besides that,
> took the maximum of the 1st plan cost (since it is the highest) and
> the average of total. I understand it would be too much approximation
> for some cases, but another thing is, we don't know how to take into
> account some of the workers shifting to partial workers. So the shift
> may be quite fuzzy since all workers may not shift to partial plans
> together.```
```

For non-partial paths, I did some comparison between the actual cost
and the cost taken by adding the per-subpath figures and dividing by
number of workers. And in the below cases, they do not differ
significantly. Here are the figures :

Case 1 :
Cost units of subpaths : 20 16 10 8 3 1.
Workers : 3
Actual total time to finish all workers : 20.
total/workers: 16.

Case 2 :
Cost units of subpaths : 20 16 10 8 3 1.
Workers : 2
Actual total time to finish all workers : 34.
total/workers: 32.

Case 3 :
Cost units of subpaths : 5 3 3 3 3
Workers : 3
Actual total time to finish all workers : 6
total/workers: 5.6

One more thing observed, is , in all of the above cases, all the
workers more or less finish at about the same time.

So this method seem to compare good which actual cost. The average
comes out a little less than the actual. But I think in the patch,
what I need to correct is, calculate separate per-worker costs of
non-partial and partial costs, and add them. This will give us
per-worker total cost, which is what a partial Append cost will be. I

There can be some extreme cases such as (5, 1, 1, 1, 1, 1) with 6
workers, where it will take at least 5 units, but average is 2. For
that we can clamp up the cost to the first path cost, so that for e.g.
it does not go lesser than 5 in this case.

Actually I have deviced one algorithm to calculate the exact time when
all workers finish non-partial costs. But I think it does not make
sense to apply it because it may be too much of calculation cost for
hundreds of paths.

But anyways, for archival purpose, here is the algorithm :

Per-subpath cost : 20 16 10 8 3 1, with 3 workers.
After 10 units (this is minimum of 20, 16, 10), the times remaining are :
10  6  0 8 3 1
After 6 units (minimum of 10, 06, 08), the times remaining are :
4  0  0 2 3 1
After 2 units (minimum of 4, 2, 3), the times remaining are :
2  0  0 0 1 1
After 1 units (minimum of 2, 1, 1), the times remaining are :
1  0  0 0 0 0
After 1 units (minimum of 1, 0 , 0), the times remaining are :
0  0  0 0 0 0
Now add up above time chunks : 10 + 6 + 2 + 1 + 1 = 20

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
```