Re: [PERFORM] [PERFORMANCE] expanding to SAN: which portion best to move

2011-05-16 Thread Robert Klemme
On Fri, May 13, 2011 at 9:04 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, May 4, 2011 at 6:31 AM, Willy-Bas Loos willy...@gmail.com wrote:
 I'm asking them for (real) benchmarks, thanks for the advice. (fio is not
 available for us now to do it myself, grmbl)
 It just occurred to me that it is not necessarily the case that reading the
 indexes causes a lot of random I/O (on the index itself).
 I mean, maybe the index is generally read sequentially and then, when
 retrieving the data, there is a lot of random I/O.
 if it's a long story, any tips for info about this (book or web site)?

 If you don't do anything special, and if the query plan says Index
 Scan rather than Bitmap Index Scan, then both the index I/O and the
 table I/O are likely to be fairly random.  However there are a number
 of cases in which you can expect the table I/O to be sequential:

 - In some cases, you may happen to insert rows with an ordering that
 matches the index.  For example, if you have a table with not too many
 updates and deletes, and an index on a serial column, then new rows
 will have a higher value in that column than old rows, and will also
 typically be physically after older rows in the file.  Or you might be
 inserting timestamped data from oldest to newest.
 - If the planner chooses a Bitmap Index Scan, it effectively scans the
 index to figure out which table blocks to read, and then reads those
 table blocks in block number order, so that the I/O is sequential,
 with skips.

Are these two separate phases (i.e. first scan index completely, then
access table)?

 - If you CLUSTER the table on a particular index, it will be
 physically ordered to match the index's key ordering.  As the table is
 further modified the degree of clustering will gradually decline;
 eventually you may wish to re-CLUSTER.

 It's also worth keeping in mind that the index itself won't
 necessarily be accessed in physically sequential order.  The point of
 the index is to emit the rows in key order, but if the table is
 heavily updated, it won't necessarily be the case that a page
 containing lower-valued keys physically precedes a page containing
 higher-valued keys.  I'm actually somewhat fuzzy on how this works,
 and to what extent it's a problem in practice, but I am fairly sure it
 can happen.

Separating index and tables might not be a totally good idea
generally.  Richard Foote has an excellent article about Oracle but I
assume at least a few things do apply to PostgreSQL as well - it's at
least worth as something to check PostgreSQL's access patterns
against:

http://richardfoote.wordpress.com/2008/04/16/separate-indexes-from-tables-some-thoughts-part-i-everything-in-its-right-place/

I would probably rather try to separate data by the nature and
frequency of accesses.  One reasonable separation would be to leave
all frequently accessed tables *and* their indexes on local RAID and
moving less frequently accessed data to the SAN.  This separation
could be easily identified if you have separate tables for current and
historic data.

Kind regards

robert


-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why query takes soo much time

2011-05-16 Thread Craig Ringer

On 05/16/2011 01:39 PM, Adarsh Sharma wrote:

Dear all,
I have a query on 3 tables in a database as :-

_*Explain Analyze Output :-*_

explain anayze select c.clause, s.subject ,s.object , s.verb, 
s.subject_type , s.object_type ,s.doc_id ,s.svo_id from clause2 c, 
svo2 s ,page_content p where c.clause_id=s.clause_id and 
s.doc_id=c.source_id and c.sentence_id=s.sentence_id and 
s.doc_id=p.crawled_page_id order by s.svo_id limit 1000 offset 17929000




Using limit and offset can be horrifyingly slow for non-trivial queries. 
Are you trying to paginate results? If not, what are you trying to achieve?


In most (all?) cases, Pg will have to execute the query up to the point 
where it's found limit+offset rows, producing and discarding offset rows 
as it goes. Needless to say, that's horrifyingly inefficient.


Reformatting your query for readability (to me) as:

EXPLAIN ANALYZE
SELECT c.clause, s.subject ,s.object , s.verb, s.subject_type, 
s.object_type ,s.doc_id ,s.svo_id
FROM clause2 c INNER JOIN svo2 s ON (c.clause_id=s.clause_id AND 
c.source_id=s.doc_id AND c.sentence_id=s.sentence_id)

   INNER JOIN page_content p ON (s.doc_id=p.crawled_page_id)
ORDER BY s.svo_id limit 1000 offset 17929000

... I can see that you're joining on 
(c.clause_id,c.source_id,c.sentence_id)=(s.clause_id,s.doc_id,s.sentence_id). 
You have matching indexes idx_clause2_id and idx_svo2_id_dummy with 
matching column ordering. Pg is using idx_clause2_id in the join of svo2 
and clause2, but instead of doing a bitmap index scan using it and 
idx_svo2_id_dummy it's doing a nested loop using idx_clause2_id and 
pk_svo_id.


First: make sure your stats are up to date by ANALYZE-ing your tables 
and probably increasing the stats collected on the join columns and/or 
increasing default_statistics_target. If that doesn't help, personally 
I'd play with the random_page_cost and seq_page_cost to see if they 
reflect your machine's actual performance, and to see if you get a more 
favourable plan. If I were experimenting with this I'd also see if 
giving the query lots of work_mem allowed it to try a different approach 
to the join.



Limit  (cost=21685592.91..21686802.44 rows=1000 width=2624) (actual 
time=414601.802..414622.920 rows=1000 loops=1)
  -  Nested Loop  (cost=59.77..320659013645.28 rows=265112018116 
width=2624) (actual time=0.422..404902.314 rows=1793 loops=1)
-  Nested Loop  (cost=0.00..313889654.42 rows=109882338 
width=2628) (actual time=0.242..174223.789 rows=17736897 loops=1)
  -  Index Scan using pk_svo_id on svo2 s  
(cost=0.00..33914955.13 rows=26840752 width=2600) (actual 
time=0.157..14691.039 rows=14238271 loops=1)
  -  Index Scan using idx_clause2_id on clause2 c  
(cost=0.00..10.36 rows=4 width=44) (actual time=0.007..0.008 rows=1 
loops=14238271)
Index Cond: ((c.source_id = s.doc_id) AND 
(c.clause_id = s.clause_id) AND (c.sentence_id = s.sentence_id))
-  Bitmap Heap Scan on page_content p  (cost=59.77..2885.18 
rows=2413 width=8) (actual time=0.007..0.008 rows=1 loops=17736897)

  Recheck Cond: (p.crawled_page_id = s.doc_id)
  -  Bitmap Index Scan on idx_crawled_id  
(cost=0.00..59.17 rows=2413 width=0) (actual time=0.005..0.005 rows=1 
loops=17736897)

Index Cond: (p.crawled_page_id = s.doc_id)
Total runtime: 414623.634 ms

_*My Table  index definitions are as under :-

*_Estimated rows in 3 tables are :-

clause2 10341700
svo2 26008000
page_content 479785

CREATE TABLE clause2
(
  id bigint NOT NULL DEFAULT nextval('clause_id_seq'::regclass),
  source_id integer,
  sentence_id integer,
  clause_id integer,
  tense character varying(30),
  clause text,
  CONSTRAINT pk_clause_id PRIMARY KEY (id)
)WITH ( OIDS=FALSE);
CREATE INDEX idx_clause2_id  ON clause2  USING btree (source_id, 
clause_id, sentence_id);


CREATE TABLE svo2
(
  svo_id bigint NOT NULL DEFAULT nextval('svo_svo_id_seq'::regclass),
  doc_id integer,
  sentence_id integer,
  clause_id integer,
  negation integer,
  subject character varying(3000),
  verb character varying(3000),
  object character varying(3000),
  preposition character varying(3000),
  subject_type character varying(3000),
  object_type character varying(3000),
  subject_attribute character varying(3000),
  object_attribute character varying(3000),
  verb_attribute character varying(3000),
  subject_concept character varying(100),
  object_concept character varying(100),
  subject_sense character varying(100),
  object_sense character varying(100),
  subject_chain character varying(5000),
  object_chain character varying(5000),
  sub_type_id integer,
  obj_type_id integer,
  CONSTRAINT pk_svo_id PRIMARY KEY (svo_id)
)WITH (  OIDS=FALSE);
CREATE INDEX idx_svo2_id_dummy  ON svo2  USING btree  (doc_id, 
clause_id, sentence_id);


CREATE TABLE page_content
(
  content_id integer NOT NULL DEFAULT 
nextval('page_content_ogc_fid_seq'::regclass),

  wkb_geometry 

Re: [PERFORM] Why query takes soo much time

2011-05-16 Thread Denis de Bernardy
[big nestloop with a huge number of rows]

You're in an edge case, and I doubt you'll get things to run much faster: you 
want the last 1k rows out of an 18M row result set... It will be slow no matter 
what you do.

What the plan is currently doing, is it's going through these 18M rows using a 
for each loop, until it returns the 1k requested rows. Without the offset, the 
plan is absolutely correct (and quite fast, I take it). With the enormous 
offset, it's a different story as you've noted.

An alternative plan could have been to hash join the tables together, to sort 
the result set, and to apply the limit/offset on the resulting set. You can 
probably force the planner to do so by rewriting your statement using a with 
statement, too:

EXPLAIN ANALYZE
WITH rows AS (
SELECT c.clause, s.subject ,s.object , s.verb, s.subject_type, s.object_type 
,s.doc_id ,s.svo_id 
FROM clause2 c INNER JOIN svo2 s ON (c.clause_id=s.clause_id AND 
c.source_id=s.doc_id AND c.sentence_id=s.sentence_id)
   INNER JOIN page_content p ON (s.doc_id=p.crawled_page_id)
)
SELECT *
FROM rows
ORDER BY svo_id limit 1000 offset 17929000


I've my doubts that it'll make much of a different, though: you'll still be 
extracting the last 1k rows out of 18M.

D


Re: [PERFORM] Using pgiosim realistically

2011-05-16 Thread John Rouillard
On Sat, May 14, 2011 at 12:07:02PM -0500, k...@rice.edu wrote:
 On Fri, May 13, 2011 at 09:09:41PM +, John Rouillard wrote:
  I am adding pgiosim to our testing for new database hardware and I am
  seeing something I don't quite get and I think it's because I am using
  pgiosim incorrectly.
  
  Specs:
  
OS: centos 5.5 kernel: 2.6.18-194.32.1.el5
memory: 96GB
cpu: 2x Intel(R) Xeon(R) X5690  @ 3.47GHz (6 core, ht enabled)
disks: WD2003FYYS RE4
raid: lsi - 9260-4i with 8 disks in raid 10 configuration
1MB stripe size
raid cache enabled w/ bbu
disk caches disabled
filesystem: ext3 created with -E stride=256
  
  I am seeing really poor (70) iops with pgiosim.  According to:
  http://www.tomshardware.com/reviews/2tb-hdd-7200,2430-8.html in the
  database benchmark they are seeing ~170 iops on a single disk for
  these drives. I would expect an 8 disk raid 10 should get better then
  3x the single disk rate (assuming the data is randomly distributed).
 Those drives are 7200 rpm drives which would give you a maximum write
 rate of 120/sec at best with the cache disabled. I actually think your
 70/sec is closer to reality and what you should anticipate in real use.
 I do not see how they could make 170/sec. Did they strap a jet engine to
 the drive. :)

Hmm, I stated the disk cache was disabled. I should have said the disk
write cache, but it's possible the readhead cache is disabled as well
(not quite sure how to tell on the lsi cards). Also there isn't a lot
of detail in what the database test mix is and I haven't tried
researching the site to see if the spec the exact test. If it included
a lot of writes and they were being handled by a cache then that could
explain it.

However, in my case I have an 8 disk raid 10 with a read only load (in
this testing configuration). Shouldn't I expect more iops than a
single disk can provide? Maybe pgiosim is hitting some other boundary
than just i/o?

Also it turns out that pgiosim can only handle 64 files. I haven't
checked to see if this is a compile time changable item or not.

-- 
-- rouilj

John Rouillard   System Administrator
Renesys Corporation  603-244-9084 (cell)  603-643-9300 x 111

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] KVP table vs. hstore - hstore performance (Was: Postgres NoSQL emulation)

2011-05-16 Thread Merlin Moncure
On Sat, May 14, 2011 at 5:10 AM, Stefan Keller sfkel...@gmail.com wrote:
 Hi,

 I am conducting a benchmark to compare KVP table vs. hstore and got
 bad hstore performance results when the no. of records is greater than
 about 500'000.

 CREATE TABLE kvp ( id SERIAL PRIMARY KEY, key text NOT NULL, value text );
 -- with index on key
 CREATE TABLE myhstore ( id SERIAL PRIMARY KEY, obj hstore NOT NULL );
 -- with GIST index on obj

 Does anyone have experience with that?

hstore is not really designed for large-ish sets like that.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why query takes soo much time

2011-05-16 Thread Tom Lane
Denis de Bernardy ddeberna...@yahoo.com writes:
 An alternative plan could have been to hash join the tables together,
 to sort the result set, and to apply the limit/offset on the resulting
 set.

Indeed.  I rather wonder why the planner didn't do that to start with.
This plan looks to me like it might be suffering from insufficient
work_mem to allow use of a hash join.  Or possibly the OP changed some
of the cost_xxx or enable_xxx settings in a misguided attempt to force
it to use indexes instead.  As a rule of thumb, whole-table joins
probably ought not be using nestloop plans, and that frequently means
that indexes are worthless for them.

But in any case, as Craig noted, the real elephant in the room is the
huge OFFSET value.  It seems likely that this query is not standing
alone but is meant as one of a series that's supposed to provide
paginated output, and if so the total cost of the series is just going
to be impossible no matter what.  The OP needs to think about using a
cursor or some such to avoid repeating most of the work each time.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [PERFORMANCE] expanding to SAN: which portion best to move

2011-05-16 Thread Robert Haas
On Mon, May 16, 2011 at 4:19 AM, Robert Klemme
shortcut...@googlemail.com wrote:
 - If the planner chooses a Bitmap Index Scan, it effectively scans the
 index to figure out which table blocks to read, and then reads those
 table blocks in block number order, so that the I/O is sequential,
 with skips.

 Are these two separate phases (i.e. first scan index completely, then
 access table)?

Yes.

 Separating index and tables might not be a totally good idea
 generally.  Richard Foote has an excellent article about Oracle but I
 assume at least a few things do apply to PostgreSQL as well - it's at
 least worth as something to check PostgreSQL's access patterns
 against:

 http://richardfoote.wordpress.com/2008/04/16/separate-indexes-from-tables-some-thoughts-part-i-everything-in-its-right-place/

 I would probably rather try to separate data by the nature and
 frequency of accesses.  One reasonable separation would be to leave
 all frequently accessed tables *and* their indexes on local RAID and
 moving less frequently accessed data to the SAN.  This separation
 could be easily identified if you have separate tables for current and
 historic data.

Yeah, I think the idea of putting tables and indexes in separate
tablespaces is mostly to bring more I/O bandwidth to bear on the same
data.  But there are other reasonable things you might do also - e.g.
put the indexes on an SSD, and the tables on a spinning disk, figuring
that the SSD is less reliable but you can always rebuild the index if
you need to...

Also, a lot of people have reported big speedups from putting pg_xlog
on a dedicated RAID 1 pair, or moving the PostgreSQL logs off the data
partition.  So those sorts of divisions should be considered also.
Your idea of dividing things by access frequency is another good
thought.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-16 Thread Robert Haas
On Mon, May 16, 2011 at 12:49 AM, Jesper Krogh jes...@krogh.cc wrote:
 To me it seems like a robust and fairly trivial way to to get better
 numbers. The
 fear is that the OS-cache is too much in flux to get any stable numbers
 out
 of it.

 Ok, it may not work as well with index'es, since having 1% in cache may very
 well mean that 90% of all requested blocks are there.. for tables in should
 be more trivial.

Tables can have hot spots, too.  Consider a table that holds calendar
reservations.  Reservations can be inserted, updated, deleted.  But
typically, the most recent data will be what is most actively
modified, and the older data will be relatively more (though not
completely) static, and less frequently accessed.  Such examples are
common in many real-world applications.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-16 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, May 16, 2011 at 12:49 AM, Jesper Krogh jes...@krogh.cc wrote:
 Ok, it may not work as well with index'es, since having 1% in cache may very
 well mean that 90% of all requested blocks are there.. for tables in should
 be more trivial.

 Tables can have hot spots, too.  Consider a table that holds calendar
 reservations.  Reservations can be inserted, updated, deleted.  But
 typically, the most recent data will be what is most actively
 modified, and the older data will be relatively more (though not
 completely) static, and less frequently accessed.  Such examples are
 common in many real-world applications.

Yes.  I'm not convinced that measuring the fraction of a table or index
that's in cache is really going to help us much.  Historical cache hit
rates might be useful, but only to the extent that the incoming query
has a similar access pattern to those in the (recent?) past.  It's not
an easy problem.

I almost wonder if we should not try to measure this at all, but instead
let the DBA set a per-table or per-index number to use, analogous to the
override we added recently for column n-distinct statistics ...

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-16 Thread Jeff Janes
On Sun, May 15, 2011 at 9:49 PM, Jesper Krogh jes...@krogh.cc wrote:

 Ok, it may not work as well with index'es, since having 1% in cache may very
 well mean that 90% of all requested blocks are there.. for tables in should
 be more trivial.

Why would the index have a meaningful hot-spot unless the underlying
table had one as well?  (Of course the root block will be a hot-spot,
but certainly not 90% of all requests)

Cheers,

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Using pgiosim realistically

2011-05-16 Thread John Rouillard
On Mon, May 16, 2011 at 12:23:13PM -0400, Jeff wrote:
 On May 16, 2011, at 9:17 AM, John Rouillard wrote:
 However, in my case I have an 8 disk raid 10 with a read only load (in
 this testing configuration). Shouldn't I expect more iops than a
 single disk can provide? Maybe pgiosim is hitting some other boundary
 than just i/o?
 
 
 given your command line you are only running a single thread - use
 the -t argument to add more threads and that'll increase
 concurrency.  a single process can only process so much at once and
 with multiple threads requesting different things the drive will
 actually be able to respond faster since it will have more work to
 do.
 I tend to test various levels - usually a single (-t 1 - the
 default) to get a base line, then -t (drives / 2), -t (#drives) up
 to probably 4x drives (you'll see iops level off).

Ok cool. I'll try that.
 
 Also it turns out that pgiosim can only handle 64 files. I haven't
 checked to see if this is a compile time changable item or not.
 
 that is a #define in pgiosim.c

So which is a better test, modifying the #define to allow specifying
200-300 1GB files, or using 64 files but increasing the size of my
files to 2-3GB for a total bytes in the file two or three times the
memory in my server (96GB)?

 also, are you running the latest pgiosim from pgfoundry?

yup version 0.5 from the foundry.

 the -w param to pgiosim has it rewrite blocks out as it runs. (it is
 a percentage).

Yup, I was running with that and getting low enough numbers, that I
switched to pure read tests. It looks like I just need multiple
threads so I can have multiple reads/writes in flight at the same
time.

-- 
-- rouilj

John Rouillard   System Administrator
Renesys Corporation  603-244-9084 (cell)  603-643-9300 x 111

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-16 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes:
 On Sun, May 15, 2011 at 9:49 PM, Jesper Krogh jes...@krogh.cc wrote:
 Ok, it may not work as well with index'es, since having 1% in cache may very
 well mean that 90% of all requested blocks are there.. for tables in should
 be more trivial.

 Why would the index have a meaningful hot-spot unless the underlying
 table had one as well?  (Of course the root block will be a hot-spot,
 but certainly not 90% of all requests)

The accesses to an index are far more likely to be clustered than the
accesses to the underlying table, because the index is organized in a
way that's application-meaningful and the table not so much.  Continuing
the earlier example of a timestamp column, accesses might preferentially
hit near the right end of the index while the underlying rows are all
over the table.

IOW, hot spots measured at the row level and hot spots measured at the
page level could very easily be different between table and index.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Using pgiosim realistically

2011-05-16 Thread Jeff


On May 16, 2011, at 1:06 PM, John Rouillard wrote:


that is a #define in pgiosim.c


So which is a better test, modifying the #define to allow specifying
200-300 1GB files, or using 64 files but increasing the size of my
files to 2-3GB for a total bytes in the file two or three times the
memory in my server (96GB)?



I tend to make 10G chunks with dd and run pgiosim over that.
dd if=/dev/zero of=bigfile bs=1M count=10240


the -w param to pgiosim has it rewrite blocks out as it runs. (it is
a percentage).


Yup, I was running with that and getting low enough numbers, that I
switched to pure read tests. It looks like I just need multiple
threads so I can have multiple reads/writes in flight at the same
time.



Yep - you need multiple threads to get max throughput of your io.

--
Jeff Trout j...@jefftrout.com
http://www.stuarthamm.net/
http://www.dellsmartexitin.com/




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-16 Thread Nathan Boley
 The accesses to an index are far more likely to be clustered than the
 accesses to the underlying table, because the index is organized in a
 way that's application-meaningful and the table not so much.

So, to clarify, are you saying that if query were actually requesting
rows uniformly random, then there would be no reason to suspect that
index accesses would have hotspots? It seems like the index structure
( ie, the top node in b-trees ) could also get in the way.

Best,
Nathan

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] hash semi join caused by IN (select ...)

2011-05-16 Thread Clemens Eisserer
Hi,

I have a quite complex, performance sensitive query in a system with a
few (7) joins:
select  from t1 left join t2  WHERE id IN (select )

For this query the planner evaluates the IN with a hash semi join last,
and all the joining is done by hash joins for all rows contained in t1.

However when I specify the ids manually (IN (1, 2, 3, 4, 5) the
planner first does an index lookup on the primary key column id,
and subsequently does nested loop joins using an index on t2 - which
gives way better results.

Is there any way to guide the planner to evaluate the IN condition
first, instead of last?
Why is the planner behaving this way? (postgresql 8.4.??)

Thank you in advance, Clemens


Query plan with IN(select):

Sort  (cost=165.77..165.77 rows=2 width=16974) (actual
time=13.459..13.460 rows=2 loops=1)
   Sort Key: this_.id
   Sort Method:  quicksort  Memory: 26kB
   -  Hash Semi Join  (cost=123.09..165.76 rows=2 width=16974)
(actual time=12.741..13.432 rows=2 loops=1)
 Hash Cond: (this_.id = kladdenent0_.id)
 -  Hash Left Join  (cost=119.17..160.90 rows=348
width=16974) (actual time=8.765..13.104 rows=342 loops=1)
   Hash Cond: (flugzeug2_.flugzeugtyp_id = flugzeugty3_.id)
   -  Hash Left Join  (cost=118.10..155.08 rows=348
width=16454) (actual time=8.724..12.412 rows=342 loops=1)
 Hash Cond: (flugzeug2_.zaehlertyp_id = bmintype4_.id)
 -  Hash Left Join  (cost=117.06..152.71 rows=348
width=15934) (actual time=8.660..11.786 rows=342 loops=1)
   Hash Cond: (this_.lehrerid = pilot5_.id)
   -  Hash Left Join  (cost=96.66..130.46
rows=348 width=8912) (actual time=6.395..8.899 rows=342 loops=1)
 Hash Cond: (this_.nachid = flugplatz6_.id)
 -  Hash Left Join
(cost=93.89..122.90 rows=348 width=8370) (actual time=6.354..8.429
rows=342 loops=1)
   Hash Cond: (this_.flugzeugid =
flugzeug2_.id)
   -  Hash Left Join
(cost=23.17..47.04 rows=348 width=7681) (actual time=1.992..3.374
rows=342 loops=1)
 Hash Cond: (this_.pilotid
= pilot7_.id)
 -  Hash Left Join
(cost=2.78..22.04 rows=348 width=659) (actual time=0.044..0.548
rows=342 loops=1)
   Hash Cond:
(this_.vonid = flugplatz8_.id)
   -  Seq Scan on
startkladde this_  (cost=0.00..14.48 rows=348 width=117) (actual
time=0.004..0.074 rows=342 loops=1)
   -  Hash
(cost=1.79..1.79 rows=79 width=542) (actual time=0.032..0.032 rows=79
loops=1)
 -  Seq Scan
on flugplatz flugplatz8_  (cost=0.00..1.79 rows=79 width=542) (actual
time=0.003..0.010 rows=79 loops=1)
 -  Hash
(cost=15.73..15.73 rows=373 width=7022) (actual time=1.938..1.938
rows=375 loops=1)
   -  Seq Scan on
pilot pilot7_  (cost=0.00..15.73 rows=373 width=7022) (actual
time=0.006..0.769 rows=375 loops=1)
   -  Hash  (cost=51.43..51.43
rows=1543 width=689) (actual time=4.351..4.351 rows=1543 loops=1)
 -  Seq Scan on flugzeug
flugzeug2_  (cost=0.00..51.43 rows=1543 width=689) (actual
time=0.006..1.615 rows=1543 loops=1)
 -  Hash  (cost=1.79..1.79 rows=79
width=542) (actual time=0.031..0.031 rows=79 loops=1)
   -  Seq Scan on flugplatz
flugplatz6_  (cost=0.00..1.79 rows=79 width=542) (actual
time=0.003..0.011 rows=79 loops=1)
   -  Hash  (cost=15.73..15.73 rows=373
width=7022) (actual time=2.236..2.236 rows=375 loops=1)
 -  Seq Scan on pilot pilot5_
(cost=0.00..15.73 rows=373 width=7022) (actual time=0.005..0.781
rows=375 loops=1)
 -  Hash  (cost=1.02..1.02 rows=2 width=520)
(actual time=0.005..0.005 rows=2 loops=1)
   -  Seq Scan on bmintype bmintype4_
(cost=0.00..1.02 rows=2 width=520) (actual time=0.003..0.004 rows=2
loops=1)
   -  Hash  (cost=1.03..1.03 rows=3 width=520) (actual
time=0.004..0.004 rows=3 loops=1)
 -  Seq Scan on flugzeugtype flugzeugty3_
(cost=0.00..1.03 rows=3 width=520) (actual time=0.002..0.002 rows=3
loops=1)
 -  Hash  (cost=3.90..3.90 rows=2 width=4) (actual
time=0.239..0.239 rows=2 loops=1)
   -  Limit  (cost=0.00..3.88 rows=2 width=4) (actual
time=0.202..0.236 rows=2 loops=1)
 -  Index Scan using startkladde_pkey on
startkladde kladdenent0_  (cost=0.00..56.24 rows=29 width=4) (actual
time=0.200..0.233 rows=2 loops=1)

Re: [PERFORM] hash semi join caused by IN (select ...)

2011-05-16 Thread Tom Lane
Clemens Eisserer linuxhi...@gmail.com writes:
 I have a quite complex, performance sensitive query in a system with a
 few (7) joins:
 select  from t1 left join t2  WHERE id IN (select )

Does it work as expected with one less join?  If so, try increasing
join_collapse_limit ...

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] hash semi join caused by IN (select ...)

2011-05-16 Thread Dave Johansen
On Mon, May 16, 2011 at 3:30 PM, Clemens Eisserer linuxhi...@gmail.comwrote:

 Hi,

 I have a quite complex, performance sensitive query in a system with a
 few (7) joins:
 select  from t1 left join t2  WHERE id IN (select )

 For this query the planner evaluates the IN with a hash semi join last,
 and all the joining is done by hash joins for all rows contained in t1.

 However when I specify the ids manually (IN (1, 2, 3, 4, 5) the
 planner first does an index lookup on the primary key column id,
 and subsequently does nested loop joins using an index on t2 - which
 gives way better results.

 Is there any way to guide the planner to evaluate the IN condition
 first, instead of last?
 Why is the planner behaving this way? (postgresql 8.4.??)

 Thank you in advance, Clemens


 Query plan with IN(select):

 Sort  (cost=165.77..165.77 rows=2 width=16974) (actual
 time=13.459..13.460 rows=2 loops=1)
   Sort Key: this_.id
   Sort Method:  quicksort  Memory: 26kB
   -  Hash Semi Join  (cost=123.09..165.76 rows=2 width=16974)
 (actual time=12.741..13.432 rows=2 loops=1)
 Hash Cond: (this_.id = kladdenent0_.id)
 -  Hash Left Join  (cost=119.17..160.90 rows=348
 width=16974) (actual time=8.765..13.104 rows=342 loops=1)
   Hash Cond: (flugzeug2_.flugzeugtyp_id = flugzeugty3_.id)
   -  Hash Left Join  (cost=118.10..155.08 rows=348
 width=16454) (actual time=8.724..12.412 rows=342 loops=1)
 Hash Cond: (flugzeug2_.zaehlertyp_id = bmintype4_.id)
 -  Hash Left Join  (cost=117.06..152.71 rows=348
 width=15934) (actual time=8.660..11.786 rows=342 loops=1)
   Hash Cond: (this_.lehrerid = pilot5_.id)
   -  Hash Left Join  (cost=96.66..130.46
 rows=348 width=8912) (actual time=6.395..8.899 rows=342 loops=1)
 Hash Cond: (this_.nachid = flugplatz6_.id)
 -  Hash Left Join
 (cost=93.89..122.90 rows=348 width=8370) (actual time=6.354..8.429
 rows=342 loops=1)
   Hash Cond: (this_.flugzeugid =
 flugzeug2_.id)
   -  Hash Left Join
 (cost=23.17..47.04 rows=348 width=7681) (actual time=1.992..3.374
 rows=342 loops=1)
 Hash Cond: (this_.pilotid
 = pilot7_.id)
 -  Hash Left Join
 (cost=2.78..22.04 rows=348 width=659) (actual time=0.044..0.548
 rows=342 loops=1)
   Hash Cond:
 (this_.vonid = flugplatz8_.id)
   -  Seq Scan on
 startkladde this_  (cost=0.00..14.48 rows=348 width=117) (actual
 time=0.004..0.074 rows=342 loops=1)
   -  Hash
 (cost=1.79..1.79 rows=79 width=542) (actual time=0.032..0.032 rows=79
 loops=1)
 -  Seq Scan
 on flugplatz flugplatz8_  (cost=0.00..1.79 rows=79 width=542) (actual
 time=0.003..0.010 rows=79 loops=1)
 -  Hash
 (cost=15.73..15.73 rows=373 width=7022) (actual time=1.938..1.938
 rows=375 loops=1)
   -  Seq Scan on
 pilot pilot7_  (cost=0.00..15.73 rows=373 width=7022) (actual
 time=0.006..0.769 rows=375 loops=1)
   -  Hash  (cost=51.43..51.43
 rows=1543 width=689) (actual time=4.351..4.351 rows=1543 loops=1)
 -  Seq Scan on flugzeug
 flugzeug2_  (cost=0.00..51.43 rows=1543 width=689) (actual
 time=0.006..1.615 rows=1543 loops=1)
 -  Hash  (cost=1.79..1.79 rows=79
 width=542) (actual time=0.031..0.031 rows=79 loops=1)
   -  Seq Scan on flugplatz
 flugplatz6_  (cost=0.00..1.79 rows=79 width=542) (actual
 time=0.003..0.011 rows=79 loops=1)
   -  Hash  (cost=15.73..15.73 rows=373
 width=7022) (actual time=2.236..2.236 rows=375 loops=1)
 -  Seq Scan on pilot pilot5_
 (cost=0.00..15.73 rows=373 width=7022) (actual time=0.005..0.781
 rows=375 loops=1)
 -  Hash  (cost=1.02..1.02 rows=2 width=520)
 (actual time=0.005..0.005 rows=2 loops=1)
   -  Seq Scan on bmintype bmintype4_
 (cost=0.00..1.02 rows=2 width=520) (actual time=0.003..0.004 rows=2
 loops=1)
   -  Hash  (cost=1.03..1.03 rows=3 width=520) (actual
 time=0.004..0.004 rows=3 loops=1)
 -  Seq Scan on flugzeugtype flugzeugty3_
 (cost=0.00..1.03 rows=3 width=520) (actual time=0.002..0.002 rows=3
 loops=1)
 -  Hash  (cost=3.90..3.90 rows=2 width=4) (actual
 time=0.239..0.239 rows=2 loops=1)
   -  Limit  (cost=0.00..3.88 rows=2 width=4) (actual
 time=0.202..0.236 rows=2 loops=1)
 -  Index Scan using