On 23 March 2017 at 16:26, Amit Khandekar <amitdkhan...@gmail.com> wrote: > On 23 March 2017 at 05:55, Robert Haas <robertmh...@gmail.com> wrote: >> On Wed, Mar 22, 2017 at 4:49 AM, Amit Khandekar <amitdkhan...@gmail.com> wrote: >>> Attached is the updated patch that handles the changes for all the >>> comments except the cost changes part. Details about the specific >>> changes are after the cost-related points discussed below. >>> >>> For non-partial paths, I was checking following 3 options : >>> >>> Option 1. Just take the sum of total non-partial child costs and >>> divide it by number of workers. It seems to be getting close to the >>> actual cost. >> >> If the costs for all children are about equal, then that works fine. >> But when they are very unequal, then it's highly misleading. >> >>> Option 2. Calculate exact cost by an algorithm which I mentioned >>> before, which is pasted below for reference : >>> PerĀsubpath cost : 20 16 10 8 3 1, with 3 workers. >>> After 10 time units (this is minimum of first 3 i.e. 20, 16, 10), the >>> times remaining are : >>> 10 6 0 8 3 1 >>> After 6 units (minimum of 10, 06, 08), the times remaining are : >>> 4 0 0 2 3 1 >>> After 2 units (minimum of 4, 2, 3), the times remaining are : >>> 2 0 0 0 1 1 >>> After 1 units (minimum of 2, 1, 1), the times remaining are : >>> 1 0 0 0 0 0 >>> After 1 units (minimum of 1, 0 , 0), the times remaining are : >>> 0 0 0 0 0 0 >>> Now add up above time chunks : 10 + 6 + 2 + 1 + 1 = 20 >> > >> This gives the same answer as what I was proposing > > Ah I see. > >> but I believe it's more complicated to compute. > Yes a bit, particularly because in my algorithm, I would have to do > 'n' subtractions each time, in case of 'n' workers. But it looked more > natural because it follows exactly the way we manually calculate. > >> The way my proposal would work in this >> case is that we would start with an array C[3] (since there are three >> workers], with all entries 0. Logically C[i] represents the amount of >> work to be performed by worker i. We add each path in turn to the >> worker whose array entry is currently smallest; in the case of a tie, >> just pick the first such entry. >> >> So in your example we do this: >> >> C[0] += 20; >> C[1] += 16; >> C[2] += 10; >> /* C[2] is smaller than C[0] or C[1] at this point, so we add the next >> path to C[2] */ >> C[2] += 8; >> /* after the previous line, C[1] is now the smallest, so add to that >> entry next */ >> C[1] += 3; >> /* now we've got C[0] = 20, C[1] = 19, C[2] = 18, so add to C[2] */ >> C[2] += 1; >> /* final result: C[0] = 20, C[1] = 19, C[2] = 19 */ >> >> Now we just take the highest entry that appears in any array, which in >> this case is C[0], as the total cost. > > Wow. The way your final result exactly tallies with my algorithm > result is very interesting. This looks like some maths or computer > science theory that I am not aware. > > I am currently coding the algorithm using your method.

## Advertising

While I was coding this, I was considering if Path->rows also should be calculated similar to total cost for non-partial subpath and total cost for partial subpaths. I think for rows, we can just take total_rows divided by workers for non-partial paths, and this approximation should suffice. It looks odd that it be treated with the same algorithm we chose for total cost for non-partial paths. Meanwhile, attached is a WIP patch v10. The only change in this patch w.r.t. the last patch (v9) is that this one has a new function defined append_nonpartial_cost(). Just sending this to show how the algorithm looks like; haven't yet called it.

**
ParallelAppend_v10_WIP.patch**

*Description:* Binary data

-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers