Re: [PERFORM] [PERFORMANCE] expanding to SAN: which portion best to move
On Fri, May 13, 2011 at 9:04 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, May 4, 2011 at 6:31 AM, Willy-Bas Loos willy...@gmail.com wrote: I'm asking them for (real) benchmarks, thanks for the advice. (fio is not available for us now to do it myself, grmbl) It just occurred to me that it is not necessarily the case that reading the indexes causes a lot of random I/O (on the index itself). I mean, maybe the index is generally read sequentially and then, when retrieving the data, there is a lot of random I/O. if it's a long story, any tips for info about this (book or web site)? If you don't do anything special, and if the query plan says Index Scan rather than Bitmap Index Scan, then both the index I/O and the table I/O are likely to be fairly random. However there are a number of cases in which you can expect the table I/O to be sequential: - In some cases, you may happen to insert rows with an ordering that matches the index. For example, if you have a table with not too many updates and deletes, and an index on a serial column, then new rows will have a higher value in that column than old rows, and will also typically be physically after older rows in the file. Or you might be inserting timestamped data from oldest to newest. - If the planner chooses a Bitmap Index Scan, it effectively scans the index to figure out which table blocks to read, and then reads those table blocks in block number order, so that the I/O is sequential, with skips. Are these two separate phases (i.e. first scan index completely, then access table)? - If you CLUSTER the table on a particular index, it will be physically ordered to match the index's key ordering. As the table is further modified the degree of clustering will gradually decline; eventually you may wish to re-CLUSTER. It's also worth keeping in mind that the index itself won't necessarily be accessed in physically sequential order. The point of the index is to emit the rows in key order, but if the table is heavily updated, it won't necessarily be the case that a page containing lower-valued keys physically precedes a page containing higher-valued keys. I'm actually somewhat fuzzy on how this works, and to what extent it's a problem in practice, but I am fairly sure it can happen. Separating index and tables might not be a totally good idea generally. Richard Foote has an excellent article about Oracle but I assume at least a few things do apply to PostgreSQL as well - it's at least worth as something to check PostgreSQL's access patterns against: http://richardfoote.wordpress.com/2008/04/16/separate-indexes-from-tables-some-thoughts-part-i-everything-in-its-right-place/ I would probably rather try to separate data by the nature and frequency of accesses. One reasonable separation would be to leave all frequently accessed tables *and* their indexes on local RAID and moving less frequently accessed data to the SAN. This separation could be easily identified if you have separate tables for current and historic data. Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why query takes soo much time
On 05/16/2011 01:39 PM, Adarsh Sharma wrote: Dear all, I have a query on 3 tables in a database as :- _*Explain Analyze Output :-*_ explain anayze select c.clause, s.subject ,s.object , s.verb, s.subject_type , s.object_type ,s.doc_id ,s.svo_id from clause2 c, svo2 s ,page_content p where c.clause_id=s.clause_id and s.doc_id=c.source_id and c.sentence_id=s.sentence_id and s.doc_id=p.crawled_page_id order by s.svo_id limit 1000 offset 17929000 Using limit and offset can be horrifyingly slow for non-trivial queries. Are you trying to paginate results? If not, what are you trying to achieve? In most (all?) cases, Pg will have to execute the query up to the point where it's found limit+offset rows, producing and discarding offset rows as it goes. Needless to say, that's horrifyingly inefficient. Reformatting your query for readability (to me) as: EXPLAIN ANALYZE SELECT c.clause, s.subject ,s.object , s.verb, s.subject_type, s.object_type ,s.doc_id ,s.svo_id FROM clause2 c INNER JOIN svo2 s ON (c.clause_id=s.clause_id AND c.source_id=s.doc_id AND c.sentence_id=s.sentence_id) INNER JOIN page_content p ON (s.doc_id=p.crawled_page_id) ORDER BY s.svo_id limit 1000 offset 17929000 ... I can see that you're joining on (c.clause_id,c.source_id,c.sentence_id)=(s.clause_id,s.doc_id,s.sentence_id). You have matching indexes idx_clause2_id and idx_svo2_id_dummy with matching column ordering. Pg is using idx_clause2_id in the join of svo2 and clause2, but instead of doing a bitmap index scan using it and idx_svo2_id_dummy it's doing a nested loop using idx_clause2_id and pk_svo_id. First: make sure your stats are up to date by ANALYZE-ing your tables and probably increasing the stats collected on the join columns and/or increasing default_statistics_target. If that doesn't help, personally I'd play with the random_page_cost and seq_page_cost to see if they reflect your machine's actual performance, and to see if you get a more favourable plan. If I were experimenting with this I'd also see if giving the query lots of work_mem allowed it to try a different approach to the join. Limit (cost=21685592.91..21686802.44 rows=1000 width=2624) (actual time=414601.802..414622.920 rows=1000 loops=1) - Nested Loop (cost=59.77..320659013645.28 rows=265112018116 width=2624) (actual time=0.422..404902.314 rows=1793 loops=1) - Nested Loop (cost=0.00..313889654.42 rows=109882338 width=2628) (actual time=0.242..174223.789 rows=17736897 loops=1) - Index Scan using pk_svo_id on svo2 s (cost=0.00..33914955.13 rows=26840752 width=2600) (actual time=0.157..14691.039 rows=14238271 loops=1) - Index Scan using idx_clause2_id on clause2 c (cost=0.00..10.36 rows=4 width=44) (actual time=0.007..0.008 rows=1 loops=14238271) Index Cond: ((c.source_id = s.doc_id) AND (c.clause_id = s.clause_id) AND (c.sentence_id = s.sentence_id)) - Bitmap Heap Scan on page_content p (cost=59.77..2885.18 rows=2413 width=8) (actual time=0.007..0.008 rows=1 loops=17736897) Recheck Cond: (p.crawled_page_id = s.doc_id) - Bitmap Index Scan on idx_crawled_id (cost=0.00..59.17 rows=2413 width=0) (actual time=0.005..0.005 rows=1 loops=17736897) Index Cond: (p.crawled_page_id = s.doc_id) Total runtime: 414623.634 ms _*My Table index definitions are as under :- *_Estimated rows in 3 tables are :- clause2 10341700 svo2 26008000 page_content 479785 CREATE TABLE clause2 ( id bigint NOT NULL DEFAULT nextval('clause_id_seq'::regclass), source_id integer, sentence_id integer, clause_id integer, tense character varying(30), clause text, CONSTRAINT pk_clause_id PRIMARY KEY (id) )WITH ( OIDS=FALSE); CREATE INDEX idx_clause2_id ON clause2 USING btree (source_id, clause_id, sentence_id); CREATE TABLE svo2 ( svo_id bigint NOT NULL DEFAULT nextval('svo_svo_id_seq'::regclass), doc_id integer, sentence_id integer, clause_id integer, negation integer, subject character varying(3000), verb character varying(3000), object character varying(3000), preposition character varying(3000), subject_type character varying(3000), object_type character varying(3000), subject_attribute character varying(3000), object_attribute character varying(3000), verb_attribute character varying(3000), subject_concept character varying(100), object_concept character varying(100), subject_sense character varying(100), object_sense character varying(100), subject_chain character varying(5000), object_chain character varying(5000), sub_type_id integer, obj_type_id integer, CONSTRAINT pk_svo_id PRIMARY KEY (svo_id) )WITH ( OIDS=FALSE); CREATE INDEX idx_svo2_id_dummy ON svo2 USING btree (doc_id, clause_id, sentence_id); CREATE TABLE page_content ( content_id integer NOT NULL DEFAULT nextval('page_content_ogc_fid_seq'::regclass), wkb_geometry
Re: [PERFORM] Why query takes soo much time
[big nestloop with a huge number of rows] You're in an edge case, and I doubt you'll get things to run much faster: you want the last 1k rows out of an 18M row result set... It will be slow no matter what you do. What the plan is currently doing, is it's going through these 18M rows using a for each loop, until it returns the 1k requested rows. Without the offset, the plan is absolutely correct (and quite fast, I take it). With the enormous offset, it's a different story as you've noted. An alternative plan could have been to hash join the tables together, to sort the result set, and to apply the limit/offset on the resulting set. You can probably force the planner to do so by rewriting your statement using a with statement, too: EXPLAIN ANALYZE WITH rows AS ( SELECT c.clause, s.subject ,s.object , s.verb, s.subject_type, s.object_type ,s.doc_id ,s.svo_id FROM clause2 c INNER JOIN svo2 s ON (c.clause_id=s.clause_id AND c.source_id=s.doc_id AND c.sentence_id=s.sentence_id) INNER JOIN page_content p ON (s.doc_id=p.crawled_page_id) ) SELECT * FROM rows ORDER BY svo_id limit 1000 offset 17929000 I've my doubts that it'll make much of a different, though: you'll still be extracting the last 1k rows out of 18M. D
Re: [PERFORM] Using pgiosim realistically
On Sat, May 14, 2011 at 12:07:02PM -0500, k...@rice.edu wrote: On Fri, May 13, 2011 at 09:09:41PM +, John Rouillard wrote: I am adding pgiosim to our testing for new database hardware and I am seeing something I don't quite get and I think it's because I am using pgiosim incorrectly. Specs: OS: centos 5.5 kernel: 2.6.18-194.32.1.el5 memory: 96GB cpu: 2x Intel(R) Xeon(R) X5690 @ 3.47GHz (6 core, ht enabled) disks: WD2003FYYS RE4 raid: lsi - 9260-4i with 8 disks in raid 10 configuration 1MB stripe size raid cache enabled w/ bbu disk caches disabled filesystem: ext3 created with -E stride=256 I am seeing really poor (70) iops with pgiosim. According to: http://www.tomshardware.com/reviews/2tb-hdd-7200,2430-8.html in the database benchmark they are seeing ~170 iops on a single disk for these drives. I would expect an 8 disk raid 10 should get better then 3x the single disk rate (assuming the data is randomly distributed). Those drives are 7200 rpm drives which would give you a maximum write rate of 120/sec at best with the cache disabled. I actually think your 70/sec is closer to reality and what you should anticipate in real use. I do not see how they could make 170/sec. Did they strap a jet engine to the drive. :) Hmm, I stated the disk cache was disabled. I should have said the disk write cache, but it's possible the readhead cache is disabled as well (not quite sure how to tell on the lsi cards). Also there isn't a lot of detail in what the database test mix is and I haven't tried researching the site to see if the spec the exact test. If it included a lot of writes and they were being handled by a cache then that could explain it. However, in my case I have an 8 disk raid 10 with a read only load (in this testing configuration). Shouldn't I expect more iops than a single disk can provide? Maybe pgiosim is hitting some other boundary than just i/o? Also it turns out that pgiosim can only handle 64 files. I haven't checked to see if this is a compile time changable item or not. -- -- rouilj John Rouillard System Administrator Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] KVP table vs. hstore - hstore performance (Was: Postgres NoSQL emulation)
On Sat, May 14, 2011 at 5:10 AM, Stefan Keller sfkel...@gmail.com wrote: Hi, I am conducting a benchmark to compare KVP table vs. hstore and got bad hstore performance results when the no. of records is greater than about 500'000. CREATE TABLE kvp ( id SERIAL PRIMARY KEY, key text NOT NULL, value text ); -- with index on key CREATE TABLE myhstore ( id SERIAL PRIMARY KEY, obj hstore NOT NULL ); -- with GIST index on obj Does anyone have experience with that? hstore is not really designed for large-ish sets like that. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why query takes soo much time
Denis de Bernardy ddeberna...@yahoo.com writes: An alternative plan could have been to hash join the tables together, to sort the result set, and to apply the limit/offset on the resulting set. Indeed. I rather wonder why the planner didn't do that to start with. This plan looks to me like it might be suffering from insufficient work_mem to allow use of a hash join. Or possibly the OP changed some of the cost_xxx or enable_xxx settings in a misguided attempt to force it to use indexes instead. As a rule of thumb, whole-table joins probably ought not be using nestloop plans, and that frequently means that indexes are worthless for them. But in any case, as Craig noted, the real elephant in the room is the huge OFFSET value. It seems likely that this query is not standing alone but is meant as one of a series that's supposed to provide paginated output, and if so the total cost of the series is just going to be impossible no matter what. The OP needs to think about using a cursor or some such to avoid repeating most of the work each time. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [PERFORMANCE] expanding to SAN: which portion best to move
On Mon, May 16, 2011 at 4:19 AM, Robert Klemme shortcut...@googlemail.com wrote: - If the planner chooses a Bitmap Index Scan, it effectively scans the index to figure out which table blocks to read, and then reads those table blocks in block number order, so that the I/O is sequential, with skips. Are these two separate phases (i.e. first scan index completely, then access table)? Yes. Separating index and tables might not be a totally good idea generally. Richard Foote has an excellent article about Oracle but I assume at least a few things do apply to PostgreSQL as well - it's at least worth as something to check PostgreSQL's access patterns against: http://richardfoote.wordpress.com/2008/04/16/separate-indexes-from-tables-some-thoughts-part-i-everything-in-its-right-place/ I would probably rather try to separate data by the nature and frequency of accesses. One reasonable separation would be to leave all frequently accessed tables *and* their indexes on local RAID and moving less frequently accessed data to the SAN. This separation could be easily identified if you have separate tables for current and historic data. Yeah, I think the idea of putting tables and indexes in separate tablespaces is mostly to bring more I/O bandwidth to bear on the same data. But there are other reasonable things you might do also - e.g. put the indexes on an SSD, and the tables on a spinning disk, figuring that the SSD is less reliable but you can always rebuild the index if you need to... Also, a lot of people have reported big speedups from putting pg_xlog on a dedicated RAID 1 pair, or moving the PostgreSQL logs off the data partition. So those sorts of divisions should be considered also. Your idea of dividing things by access frequency is another good thought. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
On Mon, May 16, 2011 at 12:49 AM, Jesper Krogh jes...@krogh.cc wrote: To me it seems like a robust and fairly trivial way to to get better numbers. The fear is that the OS-cache is too much in flux to get any stable numbers out of it. Ok, it may not work as well with index'es, since having 1% in cache may very well mean that 90% of all requested blocks are there.. for tables in should be more trivial. Tables can have hot spots, too. Consider a table that holds calendar reservations. Reservations can be inserted, updated, deleted. But typically, the most recent data will be what is most actively modified, and the older data will be relatively more (though not completely) static, and less frequently accessed. Such examples are common in many real-world applications. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
Robert Haas robertmh...@gmail.com writes: On Mon, May 16, 2011 at 12:49 AM, Jesper Krogh jes...@krogh.cc wrote: Ok, it may not work as well with index'es, since having 1% in cache may very well mean that 90% of all requested blocks are there.. for tables in should be more trivial. Tables can have hot spots, too. Consider a table that holds calendar reservations. Reservations can be inserted, updated, deleted. But typically, the most recent data will be what is most actively modified, and the older data will be relatively more (though not completely) static, and less frequently accessed. Such examples are common in many real-world applications. Yes. I'm not convinced that measuring the fraction of a table or index that's in cache is really going to help us much. Historical cache hit rates might be useful, but only to the extent that the incoming query has a similar access pattern to those in the (recent?) past. It's not an easy problem. I almost wonder if we should not try to measure this at all, but instead let the DBA set a per-table or per-index number to use, analogous to the override we added recently for column n-distinct statistics ... regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
On Sun, May 15, 2011 at 9:49 PM, Jesper Krogh jes...@krogh.cc wrote: Ok, it may not work as well with index'es, since having 1% in cache may very well mean that 90% of all requested blocks are there.. for tables in should be more trivial. Why would the index have a meaningful hot-spot unless the underlying table had one as well? (Of course the root block will be a hot-spot, but certainly not 90% of all requests) Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Using pgiosim realistically
On Mon, May 16, 2011 at 12:23:13PM -0400, Jeff wrote: On May 16, 2011, at 9:17 AM, John Rouillard wrote: However, in my case I have an 8 disk raid 10 with a read only load (in this testing configuration). Shouldn't I expect more iops than a single disk can provide? Maybe pgiosim is hitting some other boundary than just i/o? given your command line you are only running a single thread - use the -t argument to add more threads and that'll increase concurrency. a single process can only process so much at once and with multiple threads requesting different things the drive will actually be able to respond faster since it will have more work to do. I tend to test various levels - usually a single (-t 1 - the default) to get a base line, then -t (drives / 2), -t (#drives) up to probably 4x drives (you'll see iops level off). Ok cool. I'll try that. Also it turns out that pgiosim can only handle 64 files. I haven't checked to see if this is a compile time changable item or not. that is a #define in pgiosim.c So which is a better test, modifying the #define to allow specifying 200-300 1GB files, or using 64 files but increasing the size of my files to 2-3GB for a total bytes in the file two or three times the memory in my server (96GB)? also, are you running the latest pgiosim from pgfoundry? yup version 0.5 from the foundry. the -w param to pgiosim has it rewrite blocks out as it runs. (it is a percentage). Yup, I was running with that and getting low enough numbers, that I switched to pure read tests. It looks like I just need multiple threads so I can have multiple reads/writes in flight at the same time. -- -- rouilj John Rouillard System Administrator Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
Jeff Janes jeff.ja...@gmail.com writes: On Sun, May 15, 2011 at 9:49 PM, Jesper Krogh jes...@krogh.cc wrote: Ok, it may not work as well with index'es, since having 1% in cache may very well mean that 90% of all requested blocks are there.. for tables in should be more trivial. Why would the index have a meaningful hot-spot unless the underlying table had one as well? (Of course the root block will be a hot-spot, but certainly not 90% of all requests) The accesses to an index are far more likely to be clustered than the accesses to the underlying table, because the index is organized in a way that's application-meaningful and the table not so much. Continuing the earlier example of a timestamp column, accesses might preferentially hit near the right end of the index while the underlying rows are all over the table. IOW, hot spots measured at the row level and hot spots measured at the page level could very easily be different between table and index. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Using pgiosim realistically
On May 16, 2011, at 1:06 PM, John Rouillard wrote: that is a #define in pgiosim.c So which is a better test, modifying the #define to allow specifying 200-300 1GB files, or using 64 files but increasing the size of my files to 2-3GB for a total bytes in the file two or three times the memory in my server (96GB)? I tend to make 10G chunks with dd and run pgiosim over that. dd if=/dev/zero of=bigfile bs=1M count=10240 the -w param to pgiosim has it rewrite blocks out as it runs. (it is a percentage). Yup, I was running with that and getting low enough numbers, that I switched to pure read tests. It looks like I just need multiple threads so I can have multiple reads/writes in flight at the same time. Yep - you need multiple threads to get max throughput of your io. -- Jeff Trout j...@jefftrout.com http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
The accesses to an index are far more likely to be clustered than the accesses to the underlying table, because the index is organized in a way that's application-meaningful and the table not so much. So, to clarify, are you saying that if query were actually requesting rows uniformly random, then there would be no reason to suspect that index accesses would have hotspots? It seems like the index structure ( ie, the top node in b-trees ) could also get in the way. Best, Nathan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] hash semi join caused by IN (select ...)
Hi, I have a quite complex, performance sensitive query in a system with a few (7) joins: select from t1 left join t2 WHERE id IN (select ) For this query the planner evaluates the IN with a hash semi join last, and all the joining is done by hash joins for all rows contained in t1. However when I specify the ids manually (IN (1, 2, 3, 4, 5) the planner first does an index lookup on the primary key column id, and subsequently does nested loop joins using an index on t2 - which gives way better results. Is there any way to guide the planner to evaluate the IN condition first, instead of last? Why is the planner behaving this way? (postgresql 8.4.??) Thank you in advance, Clemens Query plan with IN(select): Sort (cost=165.77..165.77 rows=2 width=16974) (actual time=13.459..13.460 rows=2 loops=1) Sort Key: this_.id Sort Method: quicksort Memory: 26kB - Hash Semi Join (cost=123.09..165.76 rows=2 width=16974) (actual time=12.741..13.432 rows=2 loops=1) Hash Cond: (this_.id = kladdenent0_.id) - Hash Left Join (cost=119.17..160.90 rows=348 width=16974) (actual time=8.765..13.104 rows=342 loops=1) Hash Cond: (flugzeug2_.flugzeugtyp_id = flugzeugty3_.id) - Hash Left Join (cost=118.10..155.08 rows=348 width=16454) (actual time=8.724..12.412 rows=342 loops=1) Hash Cond: (flugzeug2_.zaehlertyp_id = bmintype4_.id) - Hash Left Join (cost=117.06..152.71 rows=348 width=15934) (actual time=8.660..11.786 rows=342 loops=1) Hash Cond: (this_.lehrerid = pilot5_.id) - Hash Left Join (cost=96.66..130.46 rows=348 width=8912) (actual time=6.395..8.899 rows=342 loops=1) Hash Cond: (this_.nachid = flugplatz6_.id) - Hash Left Join (cost=93.89..122.90 rows=348 width=8370) (actual time=6.354..8.429 rows=342 loops=1) Hash Cond: (this_.flugzeugid = flugzeug2_.id) - Hash Left Join (cost=23.17..47.04 rows=348 width=7681) (actual time=1.992..3.374 rows=342 loops=1) Hash Cond: (this_.pilotid = pilot7_.id) - Hash Left Join (cost=2.78..22.04 rows=348 width=659) (actual time=0.044..0.548 rows=342 loops=1) Hash Cond: (this_.vonid = flugplatz8_.id) - Seq Scan on startkladde this_ (cost=0.00..14.48 rows=348 width=117) (actual time=0.004..0.074 rows=342 loops=1) - Hash (cost=1.79..1.79 rows=79 width=542) (actual time=0.032..0.032 rows=79 loops=1) - Seq Scan on flugplatz flugplatz8_ (cost=0.00..1.79 rows=79 width=542) (actual time=0.003..0.010 rows=79 loops=1) - Hash (cost=15.73..15.73 rows=373 width=7022) (actual time=1.938..1.938 rows=375 loops=1) - Seq Scan on pilot pilot7_ (cost=0.00..15.73 rows=373 width=7022) (actual time=0.006..0.769 rows=375 loops=1) - Hash (cost=51.43..51.43 rows=1543 width=689) (actual time=4.351..4.351 rows=1543 loops=1) - Seq Scan on flugzeug flugzeug2_ (cost=0.00..51.43 rows=1543 width=689) (actual time=0.006..1.615 rows=1543 loops=1) - Hash (cost=1.79..1.79 rows=79 width=542) (actual time=0.031..0.031 rows=79 loops=1) - Seq Scan on flugplatz flugplatz6_ (cost=0.00..1.79 rows=79 width=542) (actual time=0.003..0.011 rows=79 loops=1) - Hash (cost=15.73..15.73 rows=373 width=7022) (actual time=2.236..2.236 rows=375 loops=1) - Seq Scan on pilot pilot5_ (cost=0.00..15.73 rows=373 width=7022) (actual time=0.005..0.781 rows=375 loops=1) - Hash (cost=1.02..1.02 rows=2 width=520) (actual time=0.005..0.005 rows=2 loops=1) - Seq Scan on bmintype bmintype4_ (cost=0.00..1.02 rows=2 width=520) (actual time=0.003..0.004 rows=2 loops=1) - Hash (cost=1.03..1.03 rows=3 width=520) (actual time=0.004..0.004 rows=3 loops=1) - Seq Scan on flugzeugtype flugzeugty3_ (cost=0.00..1.03 rows=3 width=520) (actual time=0.002..0.002 rows=3 loops=1) - Hash (cost=3.90..3.90 rows=2 width=4) (actual time=0.239..0.239 rows=2 loops=1) - Limit (cost=0.00..3.88 rows=2 width=4) (actual time=0.202..0.236 rows=2 loops=1) - Index Scan using startkladde_pkey on startkladde kladdenent0_ (cost=0.00..56.24 rows=29 width=4) (actual time=0.200..0.233 rows=2 loops=1)
Re: [PERFORM] hash semi join caused by IN (select ...)
Clemens Eisserer linuxhi...@gmail.com writes: I have a quite complex, performance sensitive query in a system with a few (7) joins: select from t1 left join t2 WHERE id IN (select ) Does it work as expected with one less join? If so, try increasing join_collapse_limit ... regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] hash semi join caused by IN (select ...)
On Mon, May 16, 2011 at 3:30 PM, Clemens Eisserer linuxhi...@gmail.comwrote: Hi, I have a quite complex, performance sensitive query in a system with a few (7) joins: select from t1 left join t2 WHERE id IN (select ) For this query the planner evaluates the IN with a hash semi join last, and all the joining is done by hash joins for all rows contained in t1. However when I specify the ids manually (IN (1, 2, 3, 4, 5) the planner first does an index lookup on the primary key column id, and subsequently does nested loop joins using an index on t2 - which gives way better results. Is there any way to guide the planner to evaluate the IN condition first, instead of last? Why is the planner behaving this way? (postgresql 8.4.??) Thank you in advance, Clemens Query plan with IN(select): Sort (cost=165.77..165.77 rows=2 width=16974) (actual time=13.459..13.460 rows=2 loops=1) Sort Key: this_.id Sort Method: quicksort Memory: 26kB - Hash Semi Join (cost=123.09..165.76 rows=2 width=16974) (actual time=12.741..13.432 rows=2 loops=1) Hash Cond: (this_.id = kladdenent0_.id) - Hash Left Join (cost=119.17..160.90 rows=348 width=16974) (actual time=8.765..13.104 rows=342 loops=1) Hash Cond: (flugzeug2_.flugzeugtyp_id = flugzeugty3_.id) - Hash Left Join (cost=118.10..155.08 rows=348 width=16454) (actual time=8.724..12.412 rows=342 loops=1) Hash Cond: (flugzeug2_.zaehlertyp_id = bmintype4_.id) - Hash Left Join (cost=117.06..152.71 rows=348 width=15934) (actual time=8.660..11.786 rows=342 loops=1) Hash Cond: (this_.lehrerid = pilot5_.id) - Hash Left Join (cost=96.66..130.46 rows=348 width=8912) (actual time=6.395..8.899 rows=342 loops=1) Hash Cond: (this_.nachid = flugplatz6_.id) - Hash Left Join (cost=93.89..122.90 rows=348 width=8370) (actual time=6.354..8.429 rows=342 loops=1) Hash Cond: (this_.flugzeugid = flugzeug2_.id) - Hash Left Join (cost=23.17..47.04 rows=348 width=7681) (actual time=1.992..3.374 rows=342 loops=1) Hash Cond: (this_.pilotid = pilot7_.id) - Hash Left Join (cost=2.78..22.04 rows=348 width=659) (actual time=0.044..0.548 rows=342 loops=1) Hash Cond: (this_.vonid = flugplatz8_.id) - Seq Scan on startkladde this_ (cost=0.00..14.48 rows=348 width=117) (actual time=0.004..0.074 rows=342 loops=1) - Hash (cost=1.79..1.79 rows=79 width=542) (actual time=0.032..0.032 rows=79 loops=1) - Seq Scan on flugplatz flugplatz8_ (cost=0.00..1.79 rows=79 width=542) (actual time=0.003..0.010 rows=79 loops=1) - Hash (cost=15.73..15.73 rows=373 width=7022) (actual time=1.938..1.938 rows=375 loops=1) - Seq Scan on pilot pilot7_ (cost=0.00..15.73 rows=373 width=7022) (actual time=0.006..0.769 rows=375 loops=1) - Hash (cost=51.43..51.43 rows=1543 width=689) (actual time=4.351..4.351 rows=1543 loops=1) - Seq Scan on flugzeug flugzeug2_ (cost=0.00..51.43 rows=1543 width=689) (actual time=0.006..1.615 rows=1543 loops=1) - Hash (cost=1.79..1.79 rows=79 width=542) (actual time=0.031..0.031 rows=79 loops=1) - Seq Scan on flugplatz flugplatz6_ (cost=0.00..1.79 rows=79 width=542) (actual time=0.003..0.011 rows=79 loops=1) - Hash (cost=15.73..15.73 rows=373 width=7022) (actual time=2.236..2.236 rows=375 loops=1) - Seq Scan on pilot pilot5_ (cost=0.00..15.73 rows=373 width=7022) (actual time=0.005..0.781 rows=375 loops=1) - Hash (cost=1.02..1.02 rows=2 width=520) (actual time=0.005..0.005 rows=2 loops=1) - Seq Scan on bmintype bmintype4_ (cost=0.00..1.02 rows=2 width=520) (actual time=0.003..0.004 rows=2 loops=1) - Hash (cost=1.03..1.03 rows=3 width=520) (actual time=0.004..0.004 rows=3 loops=1) - Seq Scan on flugzeugtype flugzeugty3_ (cost=0.00..1.03 rows=3 width=520) (actual time=0.002..0.002 rows=3 loops=1) - Hash (cost=3.90..3.90 rows=2 width=4) (actual time=0.239..0.239 rows=2 loops=1) - Limit (cost=0.00..3.88 rows=2 width=4) (actual time=0.202..0.236 rows=2 loops=1) - Index Scan using