Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-07 Thread Kevin Grittner
Sam Mason s...@samason.me.uk wrote: What do people do when testing this? I think I'd look to something like Student's t-test to check for statistical significance. My working would go something like: I assume the variance is the same because it's being tested on the same machine.

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-07 Thread Sam Mason
On Fri, Aug 07, 2009 at 10:19:20AM -0500, Kevin Grittner wrote: Sam Mason s...@samason.me.uk wrote: What do people do when testing this? I think I'd look to something like Student's t-test to check for statistical significance. My working would go something like: I assume the

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-07 Thread Kevin Grittner
Sam Mason s...@samason.me.uk wrote: Yes, all that sounds as though you've got it. Thanks. I read through it carefully a few times, but I was still only 80% confident that I had it more-or-less right. ;-) That does seem like a good test, with the advantage of being relatively easy to

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-07 Thread Sam Mason
On Fri, Aug 07, 2009 at 10:39:19AM -0500, Kevin Grittner wrote: Sam Mason s...@samason.me.uk wrote: Yes, all that sounds as though you've got it. Thanks. I read through it carefully a few times, but I was still only 80% confident that I had it more-or-less right. ;-) And which method

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-07 Thread Kevin Grittner
Sam Mason s...@samason.me.uk wrote: All we're saying is that we're less than 90% confident that there's something significant going on. All the fiddling with standard deviations and sample sizes is just easiest way (that I know of) that statistics currently gives us of determining this

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-07 Thread Robert Haas
On Fri, Aug 7, 2009 at 3:08 PM, Kevin Grittnerkevin.gritt...@wicourts.gov wrote: Sam Mason s...@samason.me.uk wrote: All we're saying is that we're less than 90% confident that there's something significant going on.  All the fiddling with standard deviations and sample sizes is just easiest

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-07 Thread Sam Mason
On Fri, Aug 07, 2009 at 02:08:21PM -0500, Kevin Grittner wrote: With the 20 samples from that last round of tests, the answer (rounded to the nearest percent) is 60%, so probably noise is a good summary. Combined with the 12 samples from earlier comparable runs with the prior version of the

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-07 Thread Sam Mason
On Fri, Aug 07, 2009 at 03:18:54PM -0400, Robert Haas wrote: On Fri, Aug 7, 2009 at 3:08 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: With the 20 samples from that last round of tests, the answer (rounded to the nearest percent) is 60%, so probably noise is a good summary. So

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-07 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote: So should we give up on this patch? No, this is not news; just confirmation of the earlier gut feelings and less convincing statistics that there is no problem. Tom's argument that if there's no slowdown for common cases, preventing O(N^2) behavior

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-07 Thread Robert Haas
On Fri, Aug 7, 2009 at 3:36 PM, Sam Masons...@samason.me.uk wrote: On Fri, Aug 07, 2009 at 03:18:54PM -0400, Robert Haas wrote: On Fri, Aug 7, 2009 at 3:08 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: With the 20 samples from that last round of tests, the answer (rounded to the

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-07 Thread Kevin Grittner
I wrote: I remember someone else on the thread saying [...] it provided better structure for future enhancements. Found the reference: http://archives.postgresql.org/pgsql-hackers/2009-08/msg00078.php This was the email which I thought confirmed that the changes were worth it, even in

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-07 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes: Robert Haas robertmh...@gmail.com wrote: So should we give up on this patch? No, this is not news; just confirmation of the earlier gut feelings and less convincing statistics that there is no problem. Tom's argument that if there's no

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-04 Thread Sam Mason
On Mon, Aug 03, 2009 at 10:03:47AM -0500, Kevin Grittner wrote: That's about 0.52% slower with the patch. Because there was over 10% variation in the numbers with the patch, I tried leaving out the four highest outliers on both, in case it was the result of some other activity on the system

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-04 Thread Tom Lane
Sam Mason s...@samason.me.uk writes: t = 0.54 ((avg1 - avg2) / (stddev * sqrt(2/samples))) We then have to choose how certain we want to be that they're actually different, 90% is a reasonably easy level to hit (i.e. one part in ten, with 95% being more commonly quoted). For 20

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-04 Thread Sam Mason
On Tue, Aug 04, 2009 at 10:45:52AM -0400, Tom Lane wrote: Sam Mason s...@samason.me.uk writes: t = 0.54 ((avg1 - avg2) / (stddev * sqrt(2/samples))) We then have to choose how certain we want to be that they're actually different, 90% is a reasonably easy level to hit (i.e. one

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-03 Thread daveg
On Mon, Aug 03, 2009 at 11:21:43AM -0400, Tom Lane wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: Over the weekend I ran 40 restores of Milwaukee County's production data using Friday's snapshot with and without the patch. I alternated between patched and unpatched. It appears

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-03 Thread Kevin Grittner
I wrote: Tom Lane t...@sss.pgh.pa.us wrote: Attached is a further small improvement that gets rid of the find_ready_items() scans. After re-reading the patch I realized that it wasn't *really* avoiding O(N^2) behavior ... but this version does. I'll run a fresh set of benchmarks.

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-03 Thread Andrew Dunstan
That's about 0.52% slower with the patch. Because there was over 10% variation in the numbers with the patch, I tried leaving out the four highest outliers on both, in case it was the result of some other activity on the system (even though this machine should have been pretty quiet over the

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-03 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes: Over the weekend I ran 40 restores of Milwaukee County's production data using Friday's snapshot with and without the patch. I alternated between patched and unpatched. It appears that this latest version is slightly slower for our

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-08-03 Thread Josh Berkus
IIRC daveg was volunteering to do some tests with his own data; maybe we should wait for those results. Unfortunately, I've lost access to the client's data which was showing bad behaviour under the first heuristic. -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com -- Sent via

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-31 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes: Rebased to correct for pg_indent changes. Thanks for doing that. Attached is a further small improvement that gets rid of the find_ready_items() scans. After re-reading the patch I realized that it wasn't *really* avoiding O(N^2) behavior ...

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-31 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: Rebased to correct for pg_indent changes. Thanks for doing that. No problem. I think I still owe you a few. :-) Attached is a further small improvement that gets rid of the find_ready_items() scans.

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-31 Thread daveg
On Thu, Jul 30, 2009 at 12:29:34PM -0500, Kevin Grittner wrote: Tom Lane t...@sss.pgh.pa.us wrote: I think we've pretty much established that it doesn't make things *worse*, so I'm sort of inclined to go ahead and apply it. The theoretical advantage of eliminating O(N^2) search

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-31 Thread Tom Lane
daveg da...@sonic.net writes: Will the patch apply to a vanilla 8.4.0? Yeah, it should. The line numbers in the version I just posted might be off a little bit for 8.4.0, but patch should cope. Be sure to make clean and recompile all of src/bin/pg_dump, else you might have some issues.

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-30 Thread Kevin Grittner
Kevin Grittner kevin.gritt...@wicourts.gov wrote: with the default settings, the patched version ran an additional 1% faster than the unpatched; although I don't have enough samples to have a high degree of confidence it wasn't noise. I'll run another slew of tests tonight with the

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-30 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes: The timings vary by up to 2.5% between runs, so that's the noise level. Five runs of each (alternating between the two) last night give an average performance of 1.89% faster for the patched version. Combining that with yesterday's results

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-30 Thread Robert Haas
On Thu, Jul 30, 2009 at 1:24 PM, Tom Lanet...@sss.pgh.pa.us wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: The timings vary by up to 2.5% between runs, so that's the noise level.  Five runs of each (alternating between the two) last night give an average performance of 1.89% faster

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-30 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote: I think we've pretty much established that it doesn't make things *worse*, so I'm sort of inclined to go ahead and apply it. The theoretical advantage of eliminating O(N^2) search behavior seems like reason enough, even if it takes a ridiculous number of

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-29 Thread Robert Haas
On Tue, Jul 28, 2009 at 9:52 PM, Tom Lanet...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: The other possibility here is that this just doesn't work.  :-) That's why we wanted to test it ;-). I don't have time to look right now, but ISTM the original discussion that led to

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-29 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Tue, Jul 28, 2009 at 9:52 PM, Tom Lanet...@sss.pgh.pa.us wrote: I don't have time to look right now, but ISTM the original discussion that led to making that patch had ideas about scenarios where it would be faster. This is what I've been able to

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-29 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote: This is what I've been able to find on a quick look: http://archives.postgresql.org/pgsql-hackers/2009-05/msg00678.php Sounds like Kevin may want to try renaming some of his indices to produce intermingling... Thanks, I'll give that a try.

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-29 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote: Also, the followup to that message points out that the 8.4.0 code has a potential O(N^2) dependency on the total number of TOC items in the dump. So it might be interesting to check the behavior with very large numbers of tables/indexes. I've got 431

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-29 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes: Tom Lane t...@sss.pgh.pa.us wrote: Also, the followup to that message points out that the 8.4.0 code has a potential O(N^2) dependency on the total number of TOC items in the dump. So it might be interesting to check the behavior with very

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-28 Thread Kevin Grittner
I wrote: So far, all tests have shown no difference in performance based on the patch; My testing to that point had been on a big machine with 16 CPUs and 128 GB RAM and dozens of spindles. Last night I tried with a dual core machine with 4 GB RAM and 5 spindles in RAID 5. Still no

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-28 Thread Robert Haas
On Tue, Jul 28, 2009 at 10:28 AM, Kevin Grittnerkevin.gritt...@wicourts.gov wrote: I wrote: So far, all tests have shown no difference in performance based on the patch; My testing to that point had been on a big machine with 16 CPUs and 128 GB RAM and dozens of spindles.  Last night I

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-28 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: The other possibility here is that this just doesn't work. :-) That's why we wanted to test it ;-). I don't have time to look right now, but ISTM the original discussion that led to making that patch had ideas about scenarios where it would be faster.

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-27 Thread Kevin Grittner
Andrew Dunstan and...@dunslane.net wrote: To performance test this properly you might need to devise a test that will use a sufficiently different order of queueing items to show the difference. It would appear that I need help with devising a proper test. So far, all tests have shown no

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-27 Thread Andrew Dunstan
Kevin Grittner wrote: Andrew Dunstan and...@dunslane.net wrote: To performance test this properly you might need to devise a test that will use a sufficiently different order of queueing items to show the difference. It would appear that I need help with devising a proper test.

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-27 Thread Kevin Grittner
Andrew Dunstan and...@dunslane.net wrote: Does your test case have lots of foreign keys? 488 of them. There is some variation on individual tests, but the results look to be in the noise. When I add them all up, the patch comes out 0.0036% slower -- but that is so far into the noise as to

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-20 Thread Josh Berkus
Kevin, It would be hard to schedule the requisite time on our biggest web machines, but I assume an 8 core 64GB machine would give meaningful results. Any sense what numbers of parallel jobs I should use for tests? I would be tempted to try 1 (with the -1 switch), 8, 12, and 16 -- maybe keep

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-20 Thread Kevin Grittner
Andrew Dunstan and...@dunslane.net wrote: To performance test this properly you might need to devise a test that will use a sufficiently different order of queueing items to show the difference. One thing I am particularly interested in is to see if queuing FK items for a table as soon as

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-20 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote: it might be worth testing with default settings too. OK. I'll do that too, if time allows. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-20 Thread Kevin Grittner
Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote: My plan here would be to have the dump on one machine, and run pg_restore there, and push it to a database on another machine through the LAN on a 1Gb connection. (This seems most likely to be what we'd be doing in real life.) you need

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-19 Thread Robert Haas
On Sat, Jul 18, 2009 at 4:41 PM, Kevin Grittnerkevin.gritt...@wicourts.gov wrote: Kevin Grittner kevin.gritt...@wicourts.gov wrote: Performance tests to follow in a day or two. I'm looking to beg another week or so on this to run more tests.  What I can have by the end of today is pretty

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-19 Thread Stefan Kaltenbrunner
Kevin Grittner wrote: Kevin Grittner kevin.gritt...@wicourts.gov wrote: Performance tests to follow in a day or two. I'm looking to beg another week or so on this to run more tests. What I can have by the end of today is pretty limited, mostly because I decided it made the most sense to

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-19 Thread Andrew Dunstan
Robert Haas wrote: On Sat, Jul 18, 2009 at 4:41 PM, Kevin Grittnerkevin.gritt...@wicourts.gov wrote: Kevin Grittner kevin.gritt...@wicourts.gov wrote: Performance tests to follow in a day or two. I'm looking to beg another week or so on this to run more tests. What I can

Re: [HACKERS] Review: Revise parallel pg_restore's scheduling heuristic

2009-07-18 Thread Kevin Grittner
Kevin Grittner kevin.gritt...@wicourts.gov wrote: Performance tests to follow in a day or two. I'm looking to beg another week or so on this to run more tests. What I can have by the end of today is pretty limited, mostly because I decided it made the most sense to test this with big