Re: [PERFORM] same plan, add 1 condition, 1900x slower
On Thu, 2005-11-10 at 12:23 -0500, Tom Lane wrote: Apparently, you are using a platform and/or locale in which strcoll() is spectacularly, god-awfully slow --- on the order of 10 msec per comparison. The version with the condition is definitely doing more I/O. The version without the condition doesn't read at all. I strace'd an explain analyze for each separately, and this is what I ended up with (the first is with the condition, the second is without): bash-2.05b$ cut '-d(' -f1 subsourcestrace | sort | uniq -c 7127 gettimeofday 75213 _llseek 1 Process 30227 attached - interrupt to quit 1 Process 30227 detached 148671 read 2 recv 4 semop 4 send bash-2.05b$ cut '-d(' -f1 subsourcestrace-nocond | sort | uniq -c 9103 gettimeofday 7 _llseek 1 Process 30227 attached - interrupt to quit 1 Process 30227 detached 2 recv 4 send For the moment, all of the rows in the view I'm selecting from satisfy the condition, so the output of both queries is the same. The relevant rows of the underlying tables are probably pretty contiguous (all of the rows satisfying the condition and the join were inserted at the same time). Could it just be the result of a weird physical distribution of data in the table/index files? For the fast query, the actual number of rows is a lot less than the planner expects. This is a bit hard to believe but I can't make sense of those numbers any other way. What is the platform exactly, and what database locale and encoding are you using? It's RHEL 3 on x86: [EMAIL PROTECTED] root]# uname -a Linux rehoboam 2.4.21-32.0.1.ELsmp #1 SMP Tue May 17 17:52:23 EDT 2005 i686 i686 i386 GNU/Linux The glibc version is 2.3.2. statgen=# select current_setting('lc_collate'); current_setting - en_US.UTF-8 Not sure what's relevant, but here's some more info: The machine has 4.5GiB of RAM and a 5-disk Raid 5. It's a dual xeon 3.2ghz. relname | relpages | reltuples -+--+- external_id_map | 126883 | 1.55625e+07 external_id_map_primary_key |64607 | 1.55625e+07 subject | 31 |1186 subject_pkey| 19 |1186 I've attached the output of select name, setting from pg_settings. And, in case my original message isn't handy, the explain analyze output and table/view info is below. Thanks for taking a look, Mitch statgen= explain analyze select * from subject_source; QUERY PLAN Merge Join (cost=0.00..330.72 rows=1186 width=46) (actual time=0.051..8.890 rows=1186 loops=1) Merge Cond: (outer.id = inner.target_id) - Index Scan using subject_pkey on subject norm (cost=0.00..63.36 rows=1186 width=28) (actual time=0.022..1.441 rows=1186 loops=1) - Index Scan using external_id_map_primary_key on external_id_map eim (cost=0.00..2485226.70 rows=15562513 width=26) (actual time=0.016..2.532 rows=2175 loops=1) Total runtime: 9.592 ms (5 rows) statgen= explain analyze select * from subject_source where source='SCH'; QUERY PLAN Merge Join (cost=0.00..1147.33 rows=1 width=46) (actual time=0.054..20258.161 rows=1186 loops=1) Merge Cond: (outer.id = inner.target_id) - Index Scan using subject_pkey on subject norm (cost=0.00..63.36 rows=1186 width=28) (actual time=0.022..1.478 rows=1186 loops=1) - Index Scan using external_id_map_primary_key on external_id_map eim (cost=0.00..2524132.99 rows=2335 width=26) (actual time=0.022..20252.326 rows=1186 loops=1) Filter: (source = 'SCH'::bpchar) Total runtime: 20258.922 ms (6 rows) statgen= \d subject_source View public.subject_source Column | Type | Modifiers ---+---+--- id| bigint| sex | integer | parent1 | bigint| parent2 | bigint| source| character(3) | source_id | character varying(32) | View definition: SELECT norm.id, norm.sex, norm.parent1, norm.parent2, eim.source, eim.source_id FROM subject norm JOIN util.external_id_map eim ON norm.id = eim.target_id; statgen= \d subject Table public.subject Column | Type | Modifiers -+-+--- id | bigint | not null sex | integer | parent1 | bigint | parent2 | bigint | Indexes: subject_pkey PRIMARY KEY, btree (id) Foreign-key constraints: subject_parent1 FOREIGN KEY (parent1) REFERENCES subject(id) DEFERRABLE INITIALLY DEFERRED subject_parent2 FOREIGN KEY (parent2) REFERENCES subject(id)
Re: [PERFORM] same plan, add 1 condition, 1900x slower
Mitch Skinner wrote: The version with the condition is definitely doing more I/O. The version without the condition doesn't read at all. [snip] relname | relpages | reltuples -+--+- external_id_map | 126883 | 1.55625e+07 external_id_map_primary_key |64607 | 1.55625e+07 subject | 31 |1186 subject_pkey| 19 |1186 Does external_id_map really have 15 million rows? If not, try a VACUUM FULL on it. Be prepared to give it some time to complete. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] same plan, add 1 condition, 1900x slower
Mitch Skinner [EMAIL PROTECTED] writes: On Thu, 2005-11-10 at 12:23 -0500, Tom Lane wrote: Apparently, you are using a platform and/or locale in which strcoll() is spectacularly, god-awfully slow --- on the order of 10 msec per comparison. The version with the condition is definitely doing more I/O. The version without the condition doesn't read at all. That's pretty interesting, but what file(s) is it reading exactly? It could still be strcoll's fault. The only plausible explanation I can think of for strcoll being so slow is if for some reason it were re-reading the locale definition file every time, instead of setting up just once. If it is hitting Postgres files, it'd be interesting to look at exactly which files and what the distribution of seek offsets is. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] same plan, add 1 condition, 1900x slower
Mitch Skinner [EMAIL PROTECTED] writes: On Fri, 2005-11-11 at 09:17 -0500, Tom Lane wrote: Please don't, actually, until we understand what's going on. Ack, I was the middle of the vacuum full already when I got this. Given what you said about no deletions or updates, the vacuum should have no effect anyway, so don't panic. I put up some files at: http://arctur.us/pgsql/ Great, I'll take a look ... Pardon my ignorance, but do the visibility check and the check of the condition happen at different stages of execution? Would it end up checking the condition for all 15M rows, but only checking visibility for the 1200 rows that come back from the join? No, the visibility check happens first. The timing does seem consistent with the idea that the comparison is being done at all 15M rows, but your other EXPLAIN shows that only 2K rows are actually retrieved, which presumably is because the merge doesn't need the rest. (Merge will stop scanning either input when it runs out of rows on the other side; so this sort of plan is very fast if the range of keys on one side is smaller than the range on the other. The numbers from the no-comparison EXPLAIN ANALYZE indicate that that is happening for your case.) So the comparison should happen for at most 2K rows too. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] same plan, add 1 condition, 1900x slower
I wrote: No, the visibility check happens first. The timing does seem consistent with the idea that the comparison is being done at all 15M rows, but your other EXPLAIN shows that only 2K rows are actually retrieved, which presumably is because the merge doesn't need the rest. (Merge will stop scanning either input when it runs out of rows on the other side; so this sort of plan is very fast if the range of keys on one side is smaller than the range on the other. The numbers from the no-comparison EXPLAIN ANALYZE indicate that that is happening for your case.) So the comparison should happen for at most 2K rows too. After re-reading your explanation of what you're doing with the data, I thought of a possible explanation. Is the source value exactly correlated with the external_id_map primary key? What could be happening is this: 1. We can see from the EXPLAIN ANALYZE for the no-comparison case that the merge join stops after fetching only 2175 rows from external_id_map. This implies that the subject table joins to the first couple thousand entries in external_id_map and nothing beyond that. In particular, the merge join must have observed that the join key in the 2175'th row (in index order) of external_id_map was larger than the last (largest) join key in subject. 2. Let's suppose that source = 'SCH' is false for the 2175'th row of external_id_map and every one after that. Then what will happen is that the index scan will vainly seek through the entire external_id_map, looking for a row that its filter allows it to return, not knowing that the merge join has no use for any of those rows. If this is the story, and you need to make this sort of query fast, then what you need to do is incorporate the source value into the external_id_map index key somehow. Then the index scan would be able to realize that there is no possibility of finding another row with source = 'SCH'. The simplest way is just to make a 2-column index, but I wonder whether the source isn't actually redundant with the external_id_map primary key already ... regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] same plan, add 1 condition, 1900x slower
On Fri, 2005-11-11 at 10:53 -0500, Tom Lane wrote: After re-reading your explanation of what you're doing with the data, I thought of a possible explanation. Is the source value exactly correlated with the external_id_map primary key? Sort of. In this case, at the beginning of external_id_map, yes, though further down the table they're not. For example, if we got new subjects from 'SCH' at this point, they'd get assigned external_id_map.target_id (the primary key) values that are totally unrelated to what the current set are (the values in the external_id_map primary key just come off of a sequence that we use for everything). Right now though, since the 'SCH' data came in a contiguous chunk right at the beginning and hasn't changed or grown since then, the correlation is pretty exact, I think. It's true that there are no 'SCH' rows in the table after the first contiguous set (when I get back to work I'll check exactly what row that is). It's interesting that there are these correlations in the the data that didn't exist at all in my mental model. what you need to do is incorporate the source value into the external_id_map index key somehow. Then the index scan would be able to realize that there is no possibility of finding another row with source = 'SCH'. The simplest way is just to make a 2-column index I thought that's what I had done with the external_id_map_source_target_id index: statgen= \d util.external_id_map Table util.external_id_map Column | Type | Modifiers ---+---+--- source_id | character varying(32) | not null source| character(3) | not null target_id | bigint| not null Indexes: external_id_map_primary_key PRIMARY KEY, btree (target_id) external_id_map_source_source_id_unique UNIQUE, btree (source, source_id) external_id_map_source btree (source) external_id_map_source_target_id btree (source, target_id) Foreign-key constraints: external_id_map_source FOREIGN KEY (source) REFERENCES util.source(id) So if I understand your suggestion correctly, we're back to the why isn't this query using index foo FAQ. For the external_id_map table, the statistics target for source is 200; the other two columns are at the default level because I didn't think of them as being very interesting statistics-wise. I suppose I should probably go ahead and raise the targets for every column of that table; I expect the planning time is negligible, and our queries tend to be large data-wise. Beyond that, I'm not sure how else to encourage the use of that index. If I changed that index to be (target_id, source) would it make a difference? Thanks for your help, Mitch ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] same plan, add 1 condition, 1900x slower
Mitchell Skinner [EMAIL PROTECTED] writes: On Fri, 2005-11-11 at 10:53 -0500, Tom Lane wrote: what you need to do is incorporate the source value into the external_id_map index key somehow. Then the index scan would be able to realize that there is no possibility of finding another row with source = 'SCH'. The simplest way is just to make a 2-column index I thought that's what I had done with the external_id_map_source_target_id index: external_id_map_source_target_id btree (source, target_id) If I changed that index to be (target_id, source) would it make a difference? [ fools around with a test case ... ] Seems like not :-(. PG is not bright enough to realize that an index on (source, target_id) can be used with a mergejoin on target_id, because the index sort order isn't compatible. (Given the equality constraint on source, there is an effective compatibility. I had thought that 8.1 might be able to detect this, but it seems not to in a simple test case --- there may be a bug involved there. In any case 8.0 definitely won't see it.) An index on (target_id, source) would be recognized as mergejoinable, but that doesn't solve the problem because an index condition on the second column doesn't provide enough information to know that the scan can stop early. Given your comment that the correlation is accidental, it may be that there's not too much point in worrying. The planner is picking this plan only because it notices the asymmetry in key ranges, and as soon as some more rows get added with higher-numbered target_ids it will shift to something else (probably a hash join). regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] same plan, add 1 condition, 1900x slower
Mitch Skinner [EMAIL PROTECTED] writes: This is with Postgres 8.0.3. Any advice is appreciated. These are exactly the same plan, except for the addition of the extra filter condition ... - Index Scan using external_id_map_primary_key on external_id_map eim (cost=0.00..2345747.01 rows=15560708 width=26) (actual time=0.061..2.944 rows=2175 loops=1) - Index Scan using external_id_map_primary_key on external_id_map eim (cost=0.00..2384648.78 rows=4150 width=26) (actual time=0.020..21068.508 rows=1186 loops=1) Filter: (source = 'SCH'::bpchar) Apparently, you are using a platform and/or locale in which strcoll() is spectacularly, god-awfully slow --- on the order of 10 msec per comparison. This is a bit hard to believe but I can't make sense of those numbers any other way. What is the platform exactly, and what database locale and encoding are you using? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match