how about select count(distinct s.specimen_id) from specimens sp INNER JOIN sequences s on s.specimen_id = sp.id;
>>> Julien <[EMAIL PROTECTED]> 2008-03-13 17:27 >>> mmh no because it's a one to many relation (a specimen can have more than one sequence) : muridae=> select count(sp.id) from specimens sp INNER JOIN sequences s on s.specimen_id = sp.id; count ------- 1536 (1 row) Time: 81.242 ms muridae=> select count(sp.id) from specimens sp where sp.id in (select specimen_id from sequences group by specimen_id); count ------- 1431 (1 row) Time: 81.736 ms muridae=> (of course this is a bad example, because I could just do: select count(specimen_id) from sequences group by specimen_id;, but in my application I have more fields coming from specimens of course) Julien On Thu, 2008-03-13 at 15:12 +0100, Bart Degryse wrote: > I think that just > select count(sp.id) from specimens sp INNER JOIN sequences s on > s.specimen_id = sp.id; > should be enough > > >>> Julien <[EMAIL PROTECTED]> 2008-03-13 17:10 >>> > If I understood well the query plan, the planner optimize the > IN(SELECT ...) version with a JOIN (line 19-20 of the first paste) : > > -> Hash IN Join (cost=240.95..4011.20 rows=1436 width=4) (actual > time=93.971..201.908 rows=1431 loops=1) > Hash Cond: ("outer".id = "inner".specimen_id) > > so I guess that : > > select count(sp.id) from specimens sp where sp.id in (select > specimen_id > from sequences); > > is almost the same as : > > select count(sp.id) from specimens sp INNER JOIN (select specimen_id > from sequences GROUP BY specimen_id) as foo on foo.specimen_id = > sp.id; > > ? > > Thanks, > Julien > > On Thu, 2008-03-13 at 14:46 +0100, Bart Degryse wrote: > > The chapter on indexes in the manual should give you a pretty good > > idea on the why. > > IN and EXISTS are not the only possibilities, you can also use inner > > or outer joins. > > Which solution performs best depends on the data, the database > > version, the available indexes, ... > > > > >>> Julien <[EMAIL PROTECTED]> 2008-03-13 15:47 >>> > > Hello, > > > > Does anyone has an idea why sometimes: > > - select ... where ... in (select ...) > > is faster than : > > - select ... where ... exists(select ...) > > and sometimes it's the opposite ? > > > > I had such a situation, I've pasted the queries on: > > http://rafb.net/p/KXNZ6892.html and http://rafb.net/p/jvo5DO38.html > > > > It's running PostgreSQL 8.1 with an effective_cache_size of 30000. > > > > specimens.id is the primary key and there are indexes on > > sequences(specimen_id) and specimen_measurements(specimen_id) > > > > Is there a general "rule" to know when to use the in() version and > > when > > to use the exists() version ? Is it true to say that the exists() > > version is more scalable (with many rows) than the in() version > (from > > the little tests I made it seems the case) ? > > > > Thanks, > > Julien > > > > -- > > Julien Cigar > > Belgian Biodiversity Platform > > http://www.biodiversity.be ( http://www.biodiversity.be/ ) > > Université Libre de Bruxelles (ULB) > > Campus de la Plaine CP 257 > > Bâtiment NO, Bureau 4 N4 115C (Niveau 4) > > Boulevard du Triomphe, entrée ULB 2 > > B-1050 Bruxelles > > Mail: [EMAIL PROTECTED] > > @biobel: http://biobel.biodiversity.be/person/show/471 > > Tel : 02 650 57 52 > > > > > > -- > > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > > To make changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-sql > > > -- > Julien Cigar > Belgian Biodiversity Platform > http://www.biodiversity.be ( http://www.biodiversity.be/ ) > Université Libre de Bruxelles (ULB) > Campus de la Plaine CP 257 > Bâtiment NO, Bureau 4 N4 115C (Niveau 4) > Boulevard du Triomphe, entrée ULB 2 > B-1050 Bruxelles > Mail: [EMAIL PROTECTED] > @biobel: http://biobel.biodiversity.be/person/show/471 > Tel : 02 650 57 52 > > > -- > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql > -- Julien Cigar Belgian Biodiversity Platform http://www.biodiversity.be ( http://www.biodiversity.be/ ) Université Libre de Bruxelles (ULB) Campus de la Plaine CP 257 Bâtiment NO, Bureau 4 N4 115C (Niveau 4) Boulevard du Triomphe, entrée ULB 2 B-1050 Bruxelles Mail: [EMAIL PROTECTED] @biobel: http://biobel.biodiversity.be/person/show/471 Tel : 02 650 57 52 -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql