Re: [PERFORM] 8.x index insert performance

2005-11-11 Thread Tom Lane
Kelly Burkhart <[EMAIL PROTECTED]> writes: > On Thu, 2005-11-10 at 19:13 -0500, Tom Lane wrote: >> Kelly, could there be any patterns in the data that might be >> related? > I modified my original program to insert generated, sequential data. > The following graph shows the results to be flat: > <

Re: [PERFORM] 8.x index insert performance

2005-11-11 Thread Kevin Grittner
That sure seems to bolster the theory that performance is degrading because you exhaust the cache space and need to start reading index pages. When inserting sequential data, you don't need to randomly access pages all over the index tree. -Kevin >>> Kelly Burkhart <[EMAIL PROTECTED]> >>> I m

Re: [PERFORM] 8.x index insert performance

2005-11-11 Thread Kelly Burkhart
On Thu, 2005-11-10 at 19:13 -0500, Tom Lane wrote: > Kelly Burkhart <[EMAIL PROTECTED]> writes: > > ... A graph showing the performance > > characteristics is here: > > > > > I hadn't looked at this chart till just now, but it sure seems to put a > crimp in my theory

Re: [PERFORM] slow queries after ANALYZE

2005-11-11 Thread Tom Lane
DW <[EMAIL PROTECTED]> writes: > In the meantime, again I'm new to this -- I got pg_stats; which rows are > the relevent ones? The ones for columns that are mentioned in the problem query. I don't think you need to worry about columns used only in the SELECT output list, but anything used in WH

Re: [PERFORM] slow queries after ANALYZE

2005-11-11 Thread DW
Tom Lane wrote: It would be interesting to see EXPLAIN ANALYZE results in both cases, plus the contents of the relevant pg_stats rows. (BTW, you need not dump and reload to get back to the virgin state --- just delete the relevant rows from pg_statistic.) Also we'd want to know exactly what PG

Re: [PERFORM] slow queries after ANALYZE

2005-11-11 Thread Tom Lane
DW <[EMAIL PROTECTED]> writes: > I'm perplexed. I'm trying to find out why some queries are taking a long > time, and have found that after running analyze, one particular query > becomes slow. This implies that the planner's default choice of plan (without any statistics) is better than its ch

[PERFORM] slow queries after ANALYZE

2005-11-11 Thread DW
Hello, I'm perplexed. I'm trying to find out why some queries are taking a long time, and have found that after running analyze, one particular query becomes slow. This query is based on a view that is based on multiple left outer joins to merge data from lots of tables. If I drop the dat

Re: [PERFORM] same plan, add 1 condition, 1900x slower

2005-11-11 Thread Tom Lane
Mitchell Skinner <[EMAIL PROTECTED]> writes: > On Fri, 2005-11-11 at 10:53 -0500, Tom Lane wrote: >> what you need to do is incorporate the "source" value into the >> external_id_map index key somehow. Then the index scan would be able to >> realize that there is no possibility of finding another

Re: [PERFORM] same plan, add 1 condition, 1900x slower

2005-11-11 Thread Mitchell Skinner
On Fri, 2005-11-11 at 10:53 -0500, Tom Lane wrote: > After re-reading your explanation of what you're doing with the data, > I thought of a possible explanation. Is the "source" value exactly > correlated with the external_id_map primary key? Sort of. In this case, at the beginning of external_i

Re: [PERFORM] same plan, add 1 condition, 1900x slower

2005-11-11 Thread Tom Lane
I wrote: > No, the visibility check happens first. The timing does seem consistent > with the idea that the comparison is being done at all 15M rows, but > your other EXPLAIN shows that only 2K rows are actually retrieved, which > presumably is because the merge doesn't need the rest. (Merge will

Re: [PERFORM] same plan, add 1 condition, 1900x slower

2005-11-11 Thread Tom Lane
Mitch Skinner <[EMAIL PROTECTED]> writes: > On Fri, 2005-11-11 at 09:17 -0500, Tom Lane wrote: >> Please don't, actually, until we understand what's going on. > Ack, I was the middle of the vacuum full already when I got this. Given what you said about no deletions or updates, the vacuum should h

Re: [PERFORM] same plan, add 1 condition, 1900x slower

2005-11-11 Thread Mitch Skinner
On Fri, 2005-11-11 at 09:17 -0500, Tom Lane wrote: > Richard Huxton writes: > > Does external_id_map really have 15 million rows? If not, try a VACUUM > > FULL on it. Be prepared to give it some time to complete. > > Please don't, actually, until we understand what's going on. Ack, I was the mi

Re: [PERFORM] same plan, add 1 condition, 1900x slower

2005-11-11 Thread Mitch Skinner
On Fri, 2005-11-11 at 11:51 +, Richard Huxton wrote: > Does external_id_map really have 15 million rows? If not, try a VACUUM > FULL on it. Be prepared to give it some time to complete. Thanks for the reply. It does indeed have that many rows: statgen=> select count(*) from util.external_id_

Re: [PERFORM] same plan, add 1 condition, 1900x slower

2005-11-11 Thread Tom Lane
Richard Huxton writes: > Mitch Skinner wrote: >> The version with the condition is definitely doing more I/O. The >> version without the condition doesn't read at all. > Does external_id_map really have 15 million rows? If not, try a VACUUM > FULL on it. Be prepared to give it some time to co

Re: [PERFORM] same plan, add 1 condition, 1900x slower

2005-11-11 Thread Tom Lane
Mitch Skinner <[EMAIL PROTECTED]> writes: > On Thu, 2005-11-10 at 12:23 -0500, Tom Lane wrote: >> Apparently, you are using a platform and/or locale in which strcoll() is >> spectacularly, god-awfully slow --- on the order of 10 msec per comparison. > The version with the condition is definitely d

Re: [PERFORM] same plan, add 1 condition, 1900x slower

2005-11-11 Thread Richard Huxton
Mitch Skinner wrote: The version with the condition is definitely doing more I/O. The version without the condition doesn't read at all. [snip] relname | relpages | reltuples -+--+- external_id_map | 126883

Re: [PERFORM] same plan, add 1 condition, 1900x slower

2005-11-11 Thread Mitch Skinner
On Thu, 2005-11-10 at 12:23 -0500, Tom Lane wrote: > Apparently, you are using a platform and/or locale in which strcoll() is > spectacularly, god-awfully slow --- on the order of 10 msec per comparison. The version with the condition is definitely doing more I/O. The version without the conditio