[PERFORM] SELECT 'DBD::Pg ping test'

2008-04-23 Thread sathiya psql
Hi All,

This query is being executed nearly a million times
   SELECT 'DBD::Pg ping test'

Why this is being executed ? What is the use ?

Am sure that this query is not executed explicitly.

am using postgres 8.1

Any idea ?


Re: [PERFORM] SELECT 'DBD::Pg ping test'

2008-04-23 Thread Craig Ringer
sathiya psql wrote:
 Hi All,
 
 This query is being executed nearly a million times
SELECT 'DBD::Pg ping test'
 
 Why this is being executed ? What is the use ?

A client is sending a query to the server solely to see if the server
responds.

DBD::Pg is the Perl database driver for PostgreSQL. Presumably the
application using that driver has some sort of keepalive or database
connectivity check enabled, so it's periodically issuing these queries.

--
Craig Ringer

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] mysterious difference in speed when combining two queries with OR

2008-04-23 Thread Hans Ekbrand
I cannot understand why the following two queries differ so much in execution 
time (almost ten times)

Query A (two queries)

select distinct moment.mid from moment,timecard where parent = 45 and (pid=17 
and timecard.mid = moment.mid) order by moment.mid;
select distinct moment.mid from moment,timecard where parent = 45 and (pbar = 
0) order by moment.mid;

Query B (combining the two with OR)

select distinct moment.mid from moment,timecard where parent = 45 and ((pid=17 
and timecard.mid = moment.mid) or (pbar = 0)) order by moment.mid;

$ time psql -o /dev/null -f query-a.sql fektest

real0m2.016s
user0m1.532s
sys 0m0.140s

$ time psql -o /dev/null -f query-b.sql fektest

real0m28.534s
user0m1.516s
sys 0m0.156s

I have tested this in two different computers with different amount of
RAM, fast or slow CPU, and the difference is persistent, almost ten
times.

I should say that this is on postgresql 7.4.16 (debian stable).

Can query B be rewritten so that it would execute faster?

TIA

-- 
Hans Ekbrand (http://sociologi.cjb.net) [EMAIL PROTECTED]
GPG Fingerprint: 1408 C8D5 1E7D 4C9C C27E 014F 7C2C 872A 7050 614E


signature.asc
Description: Digital signature


Re: [PERFORM] mysterious difference in speed when combining two queries with OR

2008-04-23 Thread A. Kretschmer
am  Wed, dem 23.04.2008, um  9:23:07 +0200 mailte Hans Ekbrand folgendes:
 I cannot understand why the following two queries differ so much in execution 
 time (almost ten times)

wild guess: different execution plans.


Can you show us the plans? (EXPLAIN ANALYSE SELECT ...)

 
 Query A (two queries)
 
 select distinct moment.mid from moment,timecard where parent = 45 and (pid=17 
 and timecard.mid = moment.mid) order by moment.mid;
 select distinct moment.mid from moment,timecard where parent = 45 and (pbar = 
 0) order by moment.mid;
 
 Query B (combining the two with OR)
 
 select distinct moment.mid from moment,timecard where parent = 45 and 
 ((pid=17 and timecard.mid = moment.mid) or (pbar = 0)) order by moment.mid;

 [ snip ]
 
 I should say that this is on postgresql 7.4.16 (debian stable).

Uhh. Why not a recent version? We have 8.3.0...


 
 Can query B be rewritten so that it would execute faster?

Quick and dirty: use both selects (query A) combined with UNION.

I guess, with a recent version the planner can use a bitmap index scan
to perform Query B faster.


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: - Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] mysterious difference in speed when combining two queries with OR

2008-04-23 Thread A. Kretschmer
am  Wed, dem 23.04.2008, um  9:58:10 +0200 mailte A. Kretschmer folgendes:
  Query A (two queries)
  
  select distinct moment.mid from moment,timecard where parent = 45 and 
  (pid=17 and timecard.mid = moment.mid) order by moment.mid;
  select distinct moment.mid from moment,timecard where parent = 45 and (pbar 
  = 0) order by moment.mid;
  
  Query B (combining the two with OR)
  
  select distinct moment.mid from moment,timecard where parent = 45 and 
  ((pid=17 and timecard.mid = moment.mid) or (pbar = 0)) order by moment.mid;

Thanks to depesz on #postgresql (irc-channel):

Query A, the second query: there are no join between the 2 tables.
Mistake?


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: - Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] mysterious difference in speed when combining two queries with OR

2008-04-23 Thread hans
On Wed, Apr 23, 2008 at 09:58:10AM +0200, A. Kretschmer wrote:
 am  Wed, dem 23.04.2008, um  9:23:07 +0200 mailte Hans Ekbrand folgendes:
  I cannot understand why the following two queries differ so much in 
  execution time (almost ten times)
 
 wild guess: different execution plans.
 
 
 Can you show us the plans? (EXPLAIN ANALYSE SELECT ...)

Query A (first part)

fektest= explain analyse select distinct moment.mid from moment,timecard where 
parent = 45 and (pid=17 and timecard.mid = moment.mid) order by moment.mid;
 QUERY PLAN 
 
-
 Unique  (cost=287.66..288.67 rows=203 width=4) (actual time=11.412..11.622 
rows=41 loops=1)
   -  Sort  (cost=287.66..288.16 rows=203 width=4) (actual time=11.409..11.484 
rows=57 loops=1)
 Sort Key: moment.mid
 -  Hash Join  (cost=60.98..279.88 rows=203 width=4) (actual 
time=2.346..11.182 rows=57 loops=1)
   Hash Cond: (outer.mid = inner.mid)
   -  Seq Scan on timecard  (cost=0.00..211.78 rows=1017 width=4) 
(actual time=0.031..7.427 rows=995 loops=1)
 Filter: (pid = 17)
   -  Hash  (cost=59.88..59.88 rows=444 width=4) (actual 
time=2.127..2.127 rows=0 loops=1)
 -  Seq Scan on moment  (cost=0.00..59.88 rows=444 
width=4) (actual time=0.027..1.825 rows=199 loops=1)
   Filter: (parent = 45)
 Total runtime: 11.852 ms
(11 rows)

Query A (second part)

fektest= explain analyse select distinct moment.mid from moment,timecard where 
parent = 45 and (pbar = 0) order by moment.mid;
  QUERY PLAN
  
--
 Unique  (cost=192.62..3800.67 rows=20 width=4) (actual time=0.578..109.274 
rows=2 loops=1)
   -  Nested Loop  (cost=192.62..3417.57 rows=153240 width=4) (actual 
time=0.575..89.546 rows=15324 loops=1)
 -  Index Scan using moment_mid_idx on moment  (cost=0.00..160.15 
rows=20 width=4) (actual time=0.544..3.490 rows=2 loops=1)
   Filter: ((parent = 45) AND (pbar = 0))
 -  Materialize  (cost=192.62..269.24 rows=7662 width=0) (actual 
time=0.009..21.998 rows=7662 loops=2)
   -  Seq Scan on timecard  (cost=0.00..192.62 rows=7662 width=0) 
(actual time=0.007..14.554 rows=7662 loops=1)
 Total runtime: 109.870 ms
(7 rows)

Query B

fektest= EXPLAIN ANALYSE SELECT distinct moment.mid from moment,timecard where 
parent = 45 and ((pid=17 and timecard.mid = moment.mid) or (pbar = 0)) order by 
moment.mid;
   QUERY PLAN   
 
-
 Unique  (cost=192.62..102469.31 rows=444 width=4) (actual 
time=143.444..4838.067 rows=42 loops=1)
   -  Nested Loop  (cost=192.62..102405.04 rows=25710 width=4) (actual 
time=143.439..4818.215 rows=15379 loops=1)
 Join Filter: (((inner.pid = 17) OR (outer.pbar = 0)) AND 
((inner.mid = outer.mid) OR (outer.pbar = 0)))
 -  Index Scan using moment_mid_idx on moment  (cost=0.00..154.58 
rows=444 width=8) (actual time=0.390..5.954 rows=199 loops=1)
   Filter: (parent = 45)
 -  Materialize  (cost=192.62..269.24 rows=7662 width=8) (actual 
time=0.001..9.728 rows=7662 loops=199)
   -  Seq Scan on timecard  (cost=0.00..192.62 rows=7662 width=8) 
(actual time=0.007..17.007 rows=7662 loops=1)
 Total runtime: 4838.786 ms
(8 rows)

  I should say that this is on postgresql 7.4.16 (debian stable).
 
 Uhh. Why not a recent version? We have 8.3.0...

No particularly good reason, just that I have taken over a production
system and I didn't want to mess up with before I am confident with
it. But I on a test-site I have migrated to 8.1 without problems, so
migration will happen, we just haven't a reason for doing it yet,
since 7.4 has served us well.

  Can query B be rewritten so that it would execute faster?
 
 Quick and dirty: use both selects (query A) combined with UNION.

I will look into that.

 I guess, with a recent version the planner can use a bitmap index scan
 to perform Query B faster.

That might be a good reason to upgrade :-)

Thanks for your answer.

-- 
Every non-free program has a lord, a master --
and if you use the program, he is your master.
Learn to master free software: www.ubuntulinux.com


signature.asc
Description: Digital signature


Re: [PERFORM] mysterious difference in speed when combining two queries with OR

2008-04-23 Thread Hans Ekbrand
On Wed, Apr 23, 2008 at 10:57:04AM +0200, A. Kretschmer wrote:
 am  Wed, dem 23.04.2008, um  9:58:10 +0200 mailte A. Kretschmer folgendes:
   Query A (two queries)
   
   select distinct moment.mid from moment,timecard where parent = 45 and 
   (pid=17 and timecard.mid = moment.mid) order by moment.mid;
   select distinct moment.mid from moment,timecard where parent = 45 and 
   (pbar = 0) order by moment.mid;
   
   Query B (combining the two with OR)
   
   select distinct moment.mid from moment,timecard where parent = 45 and 
   ((pid=17 and timecard.mid = moment.mid) or (pbar = 0)) order by 
   moment.mid;
 
 Thanks to depesz on #postgresql (irc-channel):
 
 Query A, the second query: there are no join between the 2 tables.
 Mistake?

No, I just wanted to show the time differences, I haven't used join
before. Now that you have adviced me to, I have tried your suggestion
to rewrite B as a union and it works good! Just as fast as the A Query!

  QUERY 
PLAN  
--
 Unique  (cost=4100.27..4101.39 rows=223 width=4) (actual time=120.963..121.124 
rows=42 loops=1)
   -  Sort  (cost=4100.27..4100.83 rows=223 width=4) (actual 
time=120.959..121.008 rows=43 loops=1)
 Sort Key: mid
 -  Append  (cost=287.66..4091.57 rows=223 width=4) (actual 
time=11.274..120.795 rows=43 loops=1)
   -  Subquery Scan *SELECT* 1  (cost=287.66..290.70 rows=203 
width=4) (actual time=11.270..11.604 rows=41 loops=1)
 -  Unique  (cost=287.66..288.67 rows=203 width=4) (actual 
time=11.264..11.469 rows=41 loops=1)
   -  Sort  (cost=287.66..288.16 rows=203 width=4) 
(actual time=11.260..11.331 rows=57 loops=1)
 Sort Key: moment.mid
 -  Hash Join  (cost=60.98..279.88 rows=203 
width=4) (actual time=2.563..11.136 rows=57 loops=1)
   Hash Cond: (outer.mid = inner.mid)
   -  Seq Scan on timecard  
(cost=0.00..211.78 rows=1017 width=4) (actual time=0.032..7.156 rows=995 
loops=1)
 Filter: (pid = 17)
   -  Hash  (cost=59.88..59.88 rows=444 
width=4) (actual time=2.329..2.329 rows=0 loops=1)
 -  Seq Scan on moment  
(cost=0.00..59.88 rows=444 width=4) (actual time=0.035..1.980 rows=199 loops=1)
   Filter: (parent = 45)
   -  Subquery Scan *SELECT* 2  (cost=192.62..3800.87 rows=20 
width=4) (actual time=0.583..109.073 rows=2 loops=1)
 -  Unique  (cost=192.62..3800.67 rows=20 width=4) (actual 
time=0.578..109.061 rows=2 loops=1)
   -  Nested Loop  (cost=192.62..3417.57 rows=153240 
width=4) (actual time=0.576..89.437 rows=15324 loops=1)
 -  Index Scan using moment_mid_idx on moment  
(cost=0.00..160.15 rows=20 width=4) (actual time=0.544..3.527 rows=2 loops=1)
   Filter: ((parent = 45) AND (pbar = 0))
 -  Materialize  (cost=192.62..269.24 
rows=7662 width=0) (actual time=0.014..21.930 rows=7662 loops=2)
   -  Seq Scan on timecard  
(cost=0.00..192.62 rows=7662 width=0) (actual time=0.005..14.560 rows=7662 
loops=1)
 Total runtime: 122.076 ms
(23 rows)

-- 
Hans Ekbrand (http://sociologi.cjb.net) [EMAIL PROTECTED]
A. Because it breaks the logical sequence of discussion
Q. Why is top posting bad?


signature.asc
Description: Digital signature


Re: [PERFORM] mysterious difference in speed when combining two queries with OR

2008-04-23 Thread Theo Kramer


On 23 Apr 2008, at 9:23AM, Hans Ekbrand wrote:

I cannot understand why the following two queries differ so much in  
execution time (almost ten times)


Query A (two queries)

select distinct moment.mid from moment,timecard where parent = 45  
and (pid=17 and timecard.mid = moment.mid) order by moment.mid;
select distinct moment.mid from moment,timecard where parent = 45  
and (pbar = 0) order by moment.mid;


Query B (combining the two with OR)

select distinct moment.mid from moment,timecard where parent = 45  
and ((pid=17 and timecard.mid = moment.mid) or (pbar = 0)) order by  
moment.mid;


$ time psql -o /dev/null -f query-a.sql fektest

real0m2.016s
user0m1.532s
sys 0m0.140s

$ time psql -o /dev/null -f query-b.sql fektest

real0m28.534s
user0m1.516s
sys 0m0.156s

I have tested this in two different computers with different amount of
RAM, fast or slow CPU, and the difference is persistent, almost ten
times.

I should say that this is on postgresql 7.4.16 (debian stable).

Can query B be rewritten so that it would execute faster?


Try
select distinct moment.mid from moment,timecard where parent = 45 and  
(pid=17 and timecard.mid = moment.mid) order by moment.mid

union all
select distinct moment.mid from moment,timecard where parent = 45 and  
(pbar = 0) order by moment.mid;

--
Regards
Theo


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] mysterious difference in speed when combining two queries with OR

2008-04-23 Thread PFC

I should say that this is on postgresql 7.4.16 (debian stable).


Whoa.

I cannot understand why the following two queries differ so much in  
execution time (almost ten times)


	Post EXPLAIN ANALYZE for both, and also post table definitions (with  
indexes), use \d table. This will allow people to help you.



$ time psql -o /dev/null -f query-a.sql fektest

real0m2.016s
user0m1.532s
sys 0m0.140s


	You are measuring the time it takes the server to perform the query, plus  
this :

- time for the client (psql) to launch itself,
- to read the configuration file,
- to connect to the server, send the query
	- to transfer the results back to the client (is this on network or local  
? what is the amount of data transferred ?)

- to process the results, format them as text, display them,
- to close the connection,
- to exit cleanly

As you can see from the above numbers,
- 2.016 seconds elapsed on your wall clock, of which :
	- 76% was used as CPU time in the client (therefore of absolutely no  
relevance to postgresql server performance)
	- and the rest (24%) distributed in unknown proportion between server CPU  
spent to process your query, network roundtrips, data transfer, server  
iowait, etcetera.


In order to properly benchmark your query, you should :

	1- Ensure the server is not loaded and processing any other query (unless  
you explicitly intend to test behaviour under load)
	If you don't do that, your timings will be random, depending on how much  
load you have, if someone holds a lock you have to wait on, etc.


	2- ssh to your server and use a psql session local to the server, to  
avoid network roundtrips.


3- enable statement timing with \t

2- EXPLAIN your query.

Check the plan.
	Check the time it took to EXPLAIN, this will tell you how much time it  
takes to parse and plan your query.


2- EXPLAIN ANALYZE your query.

	Do it several times, note the different timings and understand the query  
plans.
	If the data was not cached, the first timing will be much longer than the  
subsequent other timings. This will give you useful information about the  
behaviour of this query : if lasts for 1 second (cached) and 5 minutes  
(not cached), you might not want to execute it at the same time as that  
huge scheduled backup job. Those timings will also provide hints on wether  
you should CLUSTER the table, etc.


3- EXPLAIN SELECT count(*) FROM (your query) AS foo
Check that the plan is the same.

4- SELECT count(*) FROM (your query) AS foo
	The count(*) means very little data is exchanged between client and  
server, so this doesn't mess with the timing.


Now, compare :

	The timings displayed by psql (\t) include query planning, roundtrip to  
server, and result processing (hence the count(*) to reduce this overhead).
	The timings displayed by EXPLAIN ANALYZE include only query execution  
time, but EXPLAIN ANALYZE is slower than just executing the query, because  
it takes time to instrument the query and measure its performance. For  
instance, on a very simple query that computes an aggregate on lots of  
rows, more time will be spent measuring than actually executing the query.  
Hence steps 3 and 4 above.


	Knowing this, you deduce the time it takes to parse  plan your query  
(should you then use PREPAREd statements ? up to you) and the time it  
takes to execute it.


	5- EXPLAIN ANALYZE, while changing the parameters (trying some very  
selective or less selective ones) to check for plan change, mess with  
enable_ parameters to check for different plans, rewrite the query  
differently (DISTINCT/GROUP BY, OR/UNION, JOIN or IN(subquery), etc).




























--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] mysterious difference in speed when combining two queries with OR

2008-04-23 Thread Gregory Stark
Hans Ekbrand [EMAIL PROTECTED] writes:

 No, I just wanted to show the time differences, I haven't used join
 before. Now that you have adviced me to, I have tried your suggestion
 to rewrite B as a union and it works good! Just as fast as the A Query!

You can even do better. If you know the two sets of mid are disjoint you can
use UNION ALL. If not you could remove the two DISTINCTs as the UNION will
take care of removing duplicates.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT 'DBD::Pg ping test'

2008-04-23 Thread Jeffrey Baker
On Wed, Apr 23, 2008 at 12:19 AM, sathiya psql [EMAIL PROTECTED] wrote:
 Hi All,

 This query is being executed nearly a million times
SELECT 'DBD::Pg ping test'

Something in your Perl application is use $dbh-ping().  See perldoc
DBI.  It's possible that this is happening under the hood, because
your application is using connect_cached() instead of connect().

-jwb

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Optimizer's issue

2008-04-23 Thread Vlad Arkhipov

I found strange issue in very simple query. Statistics for all columns
is on the level 1000 but I also tried other levels.

create table g (
 id bigint primary key,
 isgroup boolean not null);

create table a (
 groupid bigint references g(id),
 id bigint,
 unique(id, groupid));

analyze g;
analyze a;

select count(*) from a
294

select count(*) from g
320

explain analyze
select *
from g
 join a on a.groupid = g.id
where g.isgroup

Hash Join  (cost=5.35..11.50 rows=11 width=25) (actual time=0.261..1.755
rows=294 loops=1)
 Hash Cond: (a.groupid = g.id)
 -  Seq Scan on a  (cost=0.00..4.94 rows=294 width=16) (actual
time=0.047..0.482 rows=294 loops=1)
 -  Hash  (cost=5.20..5.20 rows=12 width=9) (actual time=0.164..0.164
rows=12 loops=1)
   -  Seq Scan on g  (cost=0.00..5.20 rows=12 width=9) (actual
time=0.042..0.136 rows=12 loops=1)
 Filter: isgroup
Total runtime: 2.225 ms

And this is more interesting:
explain analyze
select *
from g
 join a on a.groupid = g.id
where not g.isgroup

Hash Join  (cost=9.05..17.92 rows=283 width=25) (actual
time=2.038..2.038 rows=0 loops=1)
 Hash Cond: (a.groupid = g.id)
 -  Seq Scan on a  (cost=0.00..4.94 rows=294 width=16) (actual
time=0.046..0.478 rows=294 loops=1)
 -  Hash  (cost=5.20..5.20 rows=308 width=9) (actual time=1.090..1.090
rows=308 loops=1)
   -  Seq Scan on g  (cost=0.00..5.20 rows=308 width=9) (actual
time=0.038..0.557 rows=308 loops=1)
 Filter: (NOT isgroup)
Total runtime: 2.126 ms

PostgreSQL 8.3
These queries are part of big query and optimizer put them on the leaf
of query tree, so rows miscount causes a real problem.

Statistics for table a:
id
--
histogram_bounds: {1,40,73,111,143,174,204,484,683,715,753}
correlation: 0.796828

groupid
---
n_distinct: 12
most_common_vals: {96,98,21,82,114,131,48,44,173,682,752}
most_common_freqs:
{0.265306,0.17,0.163265,0.136054,0.0884354,0.0782313,0.0714286,0.00680272,0.00680272,0.00680272,0.00680272}
correlation: 0.366704

for table g:
id
--
histogram_bounds: {1,32,64,101,134,166,199,451,677,714,753}
correlation: 1

isgroup
---
n_distinct: 2
most_common_freqs: {0.9625,0.0375}
correlation: 0.904198



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance