[PERFORM] Turn correlated in subquery into join

2008-03-24 Thread Dennis Bjorklund
Look like the mysql people found a subquery that postgresql doesn't 
handle as good as possible:


  http://s.petrunia.net/blog/

Is there some deeper issue here that I fail to see or is it simply that 
it hasn't been implemented but is fairly straigt forward? In the link 
above they do state that it isn't a very common case anyway.


/Dennis

-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow Postgresql server

2007-04-11 Thread Dennis Bjorklund

Jason Lustig skrev:
and work_mem to 8096. What would cause the computer to only use such a 
small percentage of the CPU, with more than half of it waiting on I/O 
requests?


Do your webpages write things to the database on each connect?

Maybe it do a bunch of writes each individually commited? For every 
commit pg will wait for the data to be written down to the disk platter 
before it move on. So if you do several writes you want to do them in 
one transaction so you only need one commit.


/Dennis

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Upgraded to 8.2.3 --- still having performance issues

2007-03-02 Thread Dennis Bjorklund

Carlos Moreno skrev:

The system does very frequent insertions and updates --- the longest
table has, perhaps, some 20 million rows, and it's indexed (the primary
key is the combination of two integer fields).  This longest table only
has inserts (and much less frequent selects), at a peak rate of maybe
one or a few insertions per second.


One or a few inserts per second doesn't sound like that much. I would 
have expected it to work. If you can you might want to group several 
inserts into a single transaction.


A standard hint is also to move the WAL onto its own disk. Or get a disk 
controller with battery backed up ram.


But it's hard to say from your description what the bottleneck is and 
thus hard to give any advice.



Any help/tips/guidance in troubleshooting this issue?  It will be
much appreciated!


You could try to find specific queries that are slow. Pg can for example 
log queries for you that run for longer than X seconds.


/Dennis

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Seqscan/Indexscan still a known issue?

2007-01-26 Thread Dennis Bjorklund

Carlos Moreno skrev:


When I force it via  set enable_seqscan to off, the index scan
takes about 0.1 msec  (as reported by explain analyze), whereas



For the time being, I'm using an explicit enable_seqscan off
in the client code, before executing the select.  But I wonder:
Is this still an issue, or has it been solved in the latest
version?


For most queries it has never been an issue. Every once in a while there 
is a query that the planner makes a non-optimal plan for, but it's not 
that common.


In general the optimizer has improved with every new version of pg.

Almost everyone I've talked to that has upgraded has got a faster 
database tham before. It was like that for 7.4-8.0, for 8.0-8.1 and 
for 8.1-8.2. So in your case going from 7.4-8.2 is most likely going 
to give a speedup (especially if you have some queries that isn't just 
simple primary key lookups).


In your case it's hard to give any advice since you didn't share the 
EXPLAIN ANALYZE output with us. I'm pretty sure it's possible to tune pg 
so it makes the right choice even for this query of yours but without 
the EXPLAIN ANALYZE output we would just be guessing anyway. If you want 
to share it then it might be helpful to show the plan both with and 
without seqscan enabled.


How often do you run VACUUM ANALYZE; on the database?

/Dennis

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Worse perfomance on 8.2.0 than on 7.4.14

2006-12-31 Thread Dennis Bjorklund

Rolf Østvik skrev:


I have a simple query which uses 32ms on 7.4.14 and 1015ms on 8.2.0.
I guess 7.4.14 creates a better execution plan than 8.2.0 for this query but



Try to turn off planner options in 8.2 to make it generate the same plan 
as 7.4. Then run EXPLAIN ANALYZE on that query that generate the same 
plan as in 7.4 and we can compare the costs and maybe understand what go 
wrong.


For example, try

set enable_hashjoin to false;
set enable_bitmapscan to false;

but you might need to turn off more things to get it to generate the 7.4 
plan.


/Dennis

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] How import big amounts of data?

2005-12-29 Thread Dennis Bjorklund
On Thu, 29 Dec 2005, Arnau wrote:

Which is the best way to import data to tables? I have to import 
 9 rows into a column and doing it as inserts takes ages. Would be 
 faster with copy? is there any other alternative to insert/copy?

Wrap the inserts inside a BEGIN/COMMIT block and it will be a lot faster.
Copy is even faster, but for just 9 rows I wouldn't bother.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] 8.1 iss

2005-11-06 Thread Dennis Bjorklund
On Sun, 6 Nov 2005, PostgreSQL wrote:

 SELECT v_barcode, count(v_barcode) FROM lead GROUP BY v_barcode HAVING 
 count(*)  1;
 
 This is a dual Opteron box with 16 Gb memory and a 3ware SATA raid
 runing 64bit SUSE.  Something seems badly wrong.
 
 GroupAggregate  (cost=9899282.83..10285434.26 rows=223858 width=15)
   Filter: (count(*)  1)
   -  Sort  (cost=9899282.83..9994841.31 rows=38223392 width=15)
 Sort Key: v_barcode
 -  Seq Scan on lead  (cost=0.00..1950947.92 rows=38223392 width=15)

What do the plan look like in 8.0? Since it's so much faster I assume you 
get a different plan.

 shared_buffers = 5
 work_mem = 16384
 maintenance_work_mem = 16384
 max_fsm_pages = 10
 max_fsm_relations = 5000
 wal_buffers = 32
 checkpoint_segments = 32
 effective_cache_size = 5
 default_statistics_target = 50

The effective_cache_size is way too low, only 390M and you have a machine
with 16G. Try bumping it to 100 (which means almost 8G, how nice it
would be to be able to write 8G instead...). It could be set even higher 
but it's hard for me to know what else your memory is used for.

I don't know if this setting will affect this very query, but it should 
have a positive effect on a lot of queries.

work_mem also seems low, but it's hard to suggest a good value on it
without knowing more about how your database is used.
 
-- 
/Dennis Björklund


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Comparative performance

2005-09-28 Thread Dennis Bjorklund
On Wed, 28 Sep 2005, Joe wrote:

 Before I post the EXPLAIN and the table schema I'd appreciate
 confirmation that this list is the appropriate forum. 

It is and and useful things to show are

 * the slow query
 * EXPLAIN ANALYZE of the query
 * the output of \d for each table involved in the query
 * the output of SHOW ALL;
 * The amount of memory the machine have

The settings that are the most important to tune in postgresql.conf for
performance is in my opinion; shared_buffers, effective_cache_size and
(to a lesser extent) work_mem.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] How to improve Postgres performance

2005-08-30 Thread Dennis Bjorklund
On Tue, 30 Aug 2005, Hemant Pandey wrote:

 So please tell me how can i improve database performance through
 configuration parameters. I had tried to change parameters in
 postgresql.conf file but of no avail.
 Now i am trying to Auto Vacuum, but don't know how to run Auto Vacuum.

The most important part is that you need to run VACUUM ANALYZE regulary.  
Vacuum can be started each night in a cron job, started from pg_autovacuum
when it thinks it's needed, or started in some other way. In any case, it
has to be run whenever the data in the database have changed enough.

The parameters in the config that is most important in my experience is 
effective_cache_size and shared_buffers.

This is a text I like (it's for pg 7.4 but still useful):

  http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Need for speed

2005-08-16 Thread Dennis Bjorklund
On Tue, 16 Aug 2005, Ulrich Wisser wrote:

 Still when several users are on line the reporting gets very slow. 
 Queries can take more then 2 min.

Could you show an exampleof such a query and the output of EXPLAIN ANALYZE
on that query (preferably done when the database is slow).

It's hard to say what is wrong without more information.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-07-29 Thread Dennis Bjorklund
On Wed, 27 Jul 2005, Matthew Schumacher wrote:

 Then they do this to insert the token:
 
 INSERT INTO bayes_token (
   id,
   token,
   spam_count,
   ham_count,
   atime
 ) VALUES (
   ?,
   ?,
   ?,
   ?,
   ?
 ) ON DUPLICATE KEY
   UPDATE
 spam_count = GREATEST(spam_count + ?, 0),
 ham_count = GREATEST(ham_count + ?, 0),
 atime = GREATEST(atime, ?)
 
 Or update the token:
 
 UPDATE bayes_vars SET
   $token_count_update
   newest_token_age = GREATEST(newest_token_age, ?),
   oldest_token_age = LEAST(oldest_token_age, ?)
   WHERE id = ?
 
 
 I think the reason why the procedure was written for postgres was
 because of the greatest and least statements performing poorly.

How can they perform poorly when they are dead simple? Here are 2
functions that work for the above cases of greatest:

CREATE FUNCTION greatest_int (integer, integer)
 RETURNS INTEGER
 IMMUTABLE STRICT
 AS 'SELECT CASE WHEN $1  $2 THEN $2 ELSE $1 END;'
 LANGUAGE SQL;

CREATE FUNCTION least_int (integer, integer)
 RETURNS INTEGER
 IMMUTABLE STRICT
 AS 'SELECT CASE WHEN $1  $2 THEN $1 ELSE $2 END;'
 LANGUAGE SQL;

and these should be inlined by pg and very fast to execute.

I wrote a function that should do what the insert above does. The update 
I've not looked at (I don't know what $token_count_update is) but the 
update looks simple enough to just implement the same way in pg as in 
mysql.

For the insert or replace case you can probably use this function:

CREATE FUNCTION insert_or_update_token (xid INTEGER,
xtoken BYTEA,
xspam_count INTEGER,
xham_count INTEGER,
xatime INTEGER)
RETURNS VOID AS
$$
BEGIN
   LOOP
 UPDATE bayes_token
SET spam_count = greatest_int (spam_count + xspam_count, 0),
ham_count  = greatest_int (ham_count + xham_count, 0),
atime  = greatest_int (atime, xatime)
  WHERE id = xid
AND token = xtoken;

  IF found THEN
 RETURN;
  END IF;

  BEGIN
INSERT INTO bayes_token VALUES (xid,
xtoken,
xspam_count,
xham_count,
xatime);
RETURN;
  EXCEPTION WHEN unique_violation THEN
-- do nothing
  END;
   END LOOP;
END;
$$
LANGUAGE plpgsql;

It's not really tested so I can't tell if it's faster then what you have.  
What it does do is mimic the way you insert values in mysql. It only work
on pg 8.0 and later however since the exception handling was added in 8.0.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] ported application having performance issues

2005-07-01 Thread Dennis Bjorklund
On Thu, 30 Jun 2005, John Mendenhall wrote:

 Our setting for effective_cache_size is 2048.
 
 random_page_cost = 4, effective_cache_size = 2048   time approximately 4500ms
 random_page_cost = 3, effective_cache_size = 2048   time approximately 1050ms
 random_page_cost = 3, effective_cache_size = 4096   time approximately 1025ms
 
 The decrease of random_page_cost to 3 caused the plan
 to work properly, using the lead_requests table as a
 join starting point and using the contacts index.

The effective_cache_size still looks small. As a rule of tumb you might
want effective_cache_size to be something like 1/2 or 2/3 of your total
memory. I don't know how much you had, but effective_cache_size = 4096 is
only 32M.

shared_buffers and effective_cache_size is normally the two most important 
settings in my experience.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] planner picking more expensive plan

2005-07-01 Thread Dennis Bjorklund
On Fri, 1 Jul 2005, Sam Mason wrote:

The key thing with the query that Sam have is that if you turn off seqscan
you get the first plan that run in 0.4ms and if seqscan is on the runtime
is 27851ms.

There are 100 way to make it select the seq scan, including rewriting the 
query to something more useful, tweaking different parameters and so on. 

The interesting part is that pg give the fast plan a cost of 202 and the
slow a cost of 566141, but still it chooses the slow query unless seqscan
is turned off (or some other tweak with the same effect). It know very
well that the plan with the index scan will be much faster, it just don't
manage to generate it unless you force it to.

It makes you wonder if pg throws away some plans too early in the planning
phase.

  Limit  (cost=0.00..202.52 rows=10 width=8) (actual time=0.221..0.600 rows=10 
 loops=1)
-  Merge Left Join  (cost=0.00..6628.30 rows=3302780 width=8) (actual 
 time=0.211..0.576 rows=10 loops=1)
  Merge Cond: (outer.animalid = inner.animalid)
  -  Index Scan using animals_pkey on animals a  
 (cost=0.00..10198983.91 rows=3302780 width=8) (actual time=0.112..0.276 
 rows=10 loops=1)
  -  Index Scan using movement_animal on movements m  
 (cost=0.00..56642740.73 rows=3107737 width=8) (actual time=0.088..0.235 
 rows=10 loops=1)
Filter: (mtypeid = 0)
  Total runtime: 0.413 ms
 
  Limit  (cost=565969.42..566141.09 rows=10 width=8) (actual 
 time=27769.047..27769.246 rows=10 loops=1)
-  Merge Right Join  (cost=565969.42..57264070.77 rows=3302780 width=8) 
 (actual time=27769.043..27769.228 rows=10 loops=1)
  Merge Cond: (outer.animalid = inner.animalid)
  -  Index Scan using movement_animal on movements m  
 (cost=0.00..56642740.73 rows=3107737 width=8) (actual time=0.022..0.154 
 rows=10 loops=1)
Filter: (mtypeid = 0)
  -  Sort  (cost=565969.42..574226.37 rows=3302780 width=8) (actual 
 time=27768.991..27769.001 rows=10 loops=1)
Sort Key: a.animalid
-  Seq Scan on animals a  (cost=0.00..77086.80 rows=3302780 
 width=8) (actual time=0.039..5620.651 rows=3303418 loops=1)
  Total runtime: 27851.097 ms


Another thing to notice is that if one remove the Limit node then the
situation is reversed and the plan that pg choose (with the Limit node) is
the one with the lowest cost. The startup cost is however very high so 
combining that Merge Join with a Limit will of course produce something 
slow compared to the upper plan where the startup cost is 0.0.

A stand alone test case would be nice, but even without the above plans 
are interesting.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] ETL optimization

2005-06-23 Thread Dennis Bjorklund
On Thu, 23 Jun 2005, Bricklen Anderson wrote:

 iii. UNIQUE constraint on table t1. This didn't seem to perform too
 badly with fewer rows (preliminary tests), but as you'd expect, on error
 the whole transaction would roll back. Is it possible to skip a row if
 it causes an error, as opposed to aborting the transaction altogether?

You don't need to roll back the whole transaction if you use savepoints or 
the exception features in pl/pgsql

Take a look at this example:

http://developer.postgresql.org/docs/postgres/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Needed: Simplified guide to optimal memory configuration

2005-06-15 Thread Dennis Bjorklund
On Wed, 15 Jun 2005, Todd Landfried wrote:

 So, what I need is to be pointed to (or told) what are the best  
 settings for our database given these memory configurations.  What  
 should we do?

Maybe this will help:

http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

 NOTICE:  shared_buffers is 256

This looks like it's way too low. Try something like 2048.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Query plan changes after pg_dump / pg_restore

2005-06-09 Thread Dennis Bjorklund
On Thu, 9 Jun 2005 [EMAIL PROTECTED] wrote:

 I am continously encountering an issue with query plans that changes after 
 a pg_dump / pg_restore operation has been performed.
 
 Have placed an export of the query, query plan etc. online at: 
 http://213.173.234.215:8080/plan.htm in order to ensure it's still 
 readable.

There is not a major difference in time, so pg is at least not way off 
(225ms vs. 280ms). The estimated cost is however not very related to the 
runtime (117 vs 1389).

What you have not showed is if the database is properly tuned. The output
of SHOW ALL; could help explain a lot together with info of how much
memory your computer have.

The first thing that comes to mind to me is that you probably have not 
tuned shared_buffers and effective_cache_size properly (SHOW ALL would 
tell).

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Query plan changes after pg_dump / pg_restore

2005-06-09 Thread Dennis Bjorklund
On Thu, 9 Jun 2005, Jona wrote:

 It's the same (physical) server as well as the same PostGreSQL daemon, 
 so yes.

The only thing that can differ then is the statistics collected and the
amount of dead space in tables and indexes (but since you both reindex and
run vacuum full that should not be it).

So comparing the statistics in the system tables is the only thing I can 
think of that might bring some light on the issue. Maybe someone else have 
some ideas.

And as KL said, the effective_cache_size looked like it was way to small. 
With that setting bigger then pg should select index scans more often. It 
doesn't explain why the databases behave like they do now, but it might 
make pg select the same plan nevertheless.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] COPY vs INSERT

2005-05-06 Thread Dennis Bjorklund
On Fri, 6 May 2005, Jim C. Nasby wrote:

 Has thought been given to supporting inserting multiple rows in a single
 insert? DB2 supported:
 
 INSERT INTO table VALUES(
 (1,2,3),
 (4,5,6),
 (7,8,9)
 );
 
 I'm not sure how standard that is or if other databases support it.

The sql standard include this, except that you can not have the outer ().
So it should be

INSERT INTO table VALUES
(1,2,3),
(4,5,6),
(7,8,9);

Do DB2 demand these extra ()?

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] First set of OSDL Shared Mem scalability results, some

2004-10-10 Thread Dennis Bjorklund
On Fri, 8 Oct 2004, Josh Berkus wrote:

 As you can see, the sweet spot appears to be between 5% and 10% of RAM, 
 which is if anything *lower* than recommendations for 7.4!   

What recommendation is that? To have shared buffers being about 10% of the
ram sounds familiar to me. What was recommended for 7.4? In the past we
used to say that the worst value is 50% since then the same things might
be cached both by pg and the os disk cache.

Why do we excpect the shared buffer size sweet spot to change because of
the new arc stuff? And why would it make it better to have bigger shared 
mem?

Wouldn't it be the opposit, that now we don't invalidate as much of the
cache for vacuums and seq. scan so now we can do as good caching as 
before but with less shared buffers.

That said, testing and getting some numbers of good sizes for shared mem
is good.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] 7.4 vs 7.3 ( hash join issue )

2004-09-22 Thread Dennis Bjorklund
On Wed, 22 Sep 2004, Gaetano Mendola wrote:

   Limit  (cost=10.21..10.21 rows=1 width=24) (actual time=0.885..0.885 rows=0 
 loops=1)
   Limit  (cost=10.28..10.28 rows=1 width=24) (actual time=0.429..0.429 rows=0 
 loops=1)

These estimated costs are almost the same, but the runtime differs a bit. 
This means that maybe you need to alter settings like random_page_cost, 
effective_cache and maybe some others to make the cost reflect the runtime 
better.

Since the costs are so close to each other very small changes can make it 
choose the other plan. It's also very hard to make an estimate that is 
correct in all situations. That's why it's called an estimate after all.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] 7.4 vs 7.3 ( hash join issue )

2004-09-22 Thread Dennis Bjorklund
On 22 Sep 2004, Greg Stark wrote:

 Actually this looks like it's arguably a bug to me. Why does the hash
 join execute the sequential scan at all? Shouldn't it also like the
 merge join recognize that the other hashed relation is empty and skip
 the sequential scan entirely?

I'm not sure you can classify that as a bug. It's just that he in one of 
the plans started with the empty scan and bacause of that didn't need 
the other, but with the hash join it started with the table that had 16 
rows and then got to the empty one.

While I havn't checked, I assume that if it had started with the empty 
table there then it would have skipped the other.

I don't know what criteria is used to select which part to start with when
doing a hash join. Looks like it started with the one that had the highest
estimate of rows here, doing it the other way around might be a good idea
because you in some cases are lucky to find an empty scans and can omit
the other.

The above are just observations of the behaviour, I've not seen the source 
at all.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres

2004-09-11 Thread Dennis Bjorklund
On Sat, 11 Sep 2004, Mark Cotner wrote:

 There are no performance concerns with MySQL, but it would benefit
 greatly from stored procedures, views, etc.  It is a very large rolling
 data warehouse that inserts about 4.5 million rows every 2 hours and
 subsequently rolls this data off the back end of a 90 day window.

While it is impossible to know without testing, postgresql has the benefit
of readers and writers that does not block each other. So in situations
where you do lots of concurrent inserts and selects postgresql should
behave well.

 Merge table definition equivalent.  We use these extensively.

As far as I can tell a merge table in mysql is the same as a view over a 
number of unions of other tables. And possibly a rule that defines how 
inserts will be done if you do inserts in the merged table.

 Merge table equivalent with all tables containing over 100M rows(and
 about 40 columns, some quite wide) will need to do index scans in at
 least 5 seconds(MySQL currently does 2, but we can live with 5) and
 return ~200 rows.

Since each table that are merged will have it's own index the speed should 
be proportional to the number of tables. Index scans in them self are very 
fast, and of you have 30 tables you need 30 index scans.

Also, are you sure you really need merge tables? With pg having row locks
and mvcc, maybe you could go for a simpler model with just one big table.  
Often you can also combine that with partial indexes to get a smaller
index to use for lots of your queries.

 Thoughts, suggestions?

I see nothing in what you have written that indicates that pg can not do 
the job, and do it well. It's however very hard to know exactly what is 
the bottleneck before one tries. There are lots of cases where people have 
converted mysql applications to postgresql and have gotten a massive 
speedup. You could be lucky and have such a case, who knows..

I spend some time each day supporting people using postgresql in the
#postgresql irc channel (on the freenode.net network). There I talk to
people doing both small and big conversions and the majority is very happy
with the postgresql performance. Postgresql have gotten faster and faster 
with each release and while speed was a fair argument a number of years 
ago it's not like that today.

That said, in the end it depends on the application.

We are all interested in how it goes (well, at least me :-), so feel free
to send more mails keeping us posted. Good luck.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Why those queries do not utilize indexes?

2004-08-27 Thread Dennis Bjorklund
On Fri, 27 Aug 2004, Artimenko Igor wrote:

 1. Sequential search and very high cost if set enable_seqscan to on;
 Seq scan on messageinfo ( cost=0.00..24371.30, rows =36802 )
 
 2. Index scan but even bigger cost if set enable_seqscan to off;
 Index “messagesStatus” on messageinfo ( Cost=0.00..27220.72, rows=36802 )

So pg thinks that a sequential scan will be a little bit faster (The cost 
is a little bit smaller). If you compare the actual runtimes maybe you 
will see that pg was right. In this case the cost is almost the same so 
the runtime is probably almost the same.

When you have more data pg will start to use the index since then it will 
be faster to use an index compared to a seq. scan.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Optimizer Selecting Incorrect Index

2004-08-26 Thread Dennis Bjorklund
On Wed, 25 Aug 2004, Richard Huxton wrote:

   Index Scan using trptserc on trans  (cost=0.00..465.10 rows=44 width=118)
 Index Cond: (trn_patno = 19)
 Filter: ((trn_old_date = '1994-08-23'::date) AND (trn_old_date =
  '2004-08-23'::date) AND (trn_bill_inc = 'B'::bpchar))
  (687 rows)
 
   Index Scan using todate on trans  (cost=0.00..105165.74 rows=1 width=143)
 Index Cond: ((trn_old_date = '1994-08-23'::date) AND (trn_old_date =
  '2004-08-23'::date))
 Filter: ((trn_patno = 19) AND (trn_bill_inc = 'B'::bpchar))
  (713 rows)
 
 These queries are different. The first returns 687 rows and the second 
 713 rows.

The 687 and 713 are the number of rows in the plan, not the number of rows 
the queries return.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Performance Bottleneck

2004-08-03 Thread Dennis Bjorklund
On Tue, 3 Aug 2004, Martin Foster wrote:

 to roughly 175 or more. Essentially, the machine seems to struggle 
 to keep up with continual requests and slows down respectively as 
 resources are tied down.

I suggest you try to find queries that are slow and check to see if the 
plans are optimal for those queries.

There are some logging options for logging quries that run longer then a 
user set limit. That can help finding the slow queries. Just doing some 
logging for some typical page fetches often show things that can be done 
better. For example, it's not uncommon to see the same information beeing 
pulled several times by misstake.

Maybe you can also try something like connection pooling. I'm not sure how
much that can give, but for small queries the connection time is usually
the big part.

 Would disabling 'fsync' provide more performance if I choose that
 information may be lost in case of a crash?

I would not do that. In most cases the performance increase is modest and
the data corruption risk after a crash is much bigger so it's not worth
it.

If you have a lot of small inserts then it might be faster with this, but
if possible it's much better to try to do more work in a transaction then 
before.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Query gets slow when where clause increases

2004-07-01 Thread Dennis Bjorklund
On Tue, 29 Jun 2004, James Antill wrote:

-  Index Scan using idx_ticket_groups_assigned on ticket_groups g  
 (cost=0.00..241.76 rows=5 width=20) (actual time=0.13..12.67 rows=604 loops=1)
  Index Cond: (assigned_to = 1540)

Here the planner estimated that it would find 5 rows, but it did find 604.  
I take that as a sign that you have not ran VACUUM ANALYZE recently?

If you done that, then maybe you need to change the statistics target for
that column. Before you set it on that column you could try to just alter
the default statistics target for one session like this:

SET default_statistics_target TO 100;
ANALYZE;

and then see if you get a better plan when you run the query afterwards.

If it helps you can either set the default_statistics_target in
postgresql.conf or set it just for some column using ALTER TABLE.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] no index-usage on aggregate-functions?

2004-06-29 Thread Dennis Bjorklund
On Tue, 29 Jun 2004, Harald Lau (Sector-X) wrote:

  Average and sum can never use an index AFAIK, in any db server.  You 
  need information from every row.
 
 Take a look at the SQLSrv-pendant:
 create index x_1 on the_table (num_found)
 select avg(num_found) from the_table
 - Index Scan(OBJECT:([midata].[dbo].[THE_TABLE].[x_1])

But is it really faster is the question?

This sum needs all the values in that column. As far as I know it uses the
index because it uses less space on disk and thus is a little faster due
to less IO. In pg the index doesn't work like that, so in pg it's faster
to sum all values using the table itself.

If you have a WHERE clause to only sum some values, then pg will use an
index (if applicable) and you will see a speedup.

For min and max the situation is different, there an index can give you
the answer without scanning all rows. For that the workaround exist in pg.  
The pg aggregate functions are very general and no one have special cased
min/max yet. Until that happen the work around works and is fast.

 So, it seems that PG is not s well suited for a datawarehouse and/or
 performing extensive statistics/calculations/reportings on large tables,
 is it?

I don't see how you can say that from your example. Just because it uses
an index for the sum above does not mean that it is a lot faster. It still 
have to do as many additions as pg has to do.

Sure, mvcc is best when you have both read and writes. But it should still
be comparable in speed even if you only do reads.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Quad processor options

2004-05-11 Thread Dennis Bjorklund
On Tue, 11 May 2004, Bjoern Metzdorf wrote:

 I am curious if there are any real life production quad processor setups 
 running postgresql out there. Since postgresql lacks a proper 
 replication/cluster solution, we have to buy a bigger machine.

Du you run the latest version of PG? I've read the thread bug have not 
seen any information about what pg version. All I've seen was a reference 
to debian which might just as well mean that you run pg 7.2 (probably not 
but I have to ask).

Some classes of queries run much faster in pg 7.4 then in older versions
so if you are lucky that can help.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] planner/optimizer question

2004-04-29 Thread Dennis Bjorklund
On Fri, 30 Apr 2004, Gary Doades wrote:

 I should have also pointed out that MSSQL reported that same index scan
 as taking 65% of the overall query time. It was just faster. The
 overall query took 103ms in MSSQL.

Are your results based on a single client accessing the database and no 
concurrent updates?

Would adding more clients, and maybe having some client that
updates/inserts into the tables, still make mssql faster then pg? Maybe
it's so simple as pg being optimized for more concurrent users then mssql?

I'm just asking, I don't know much about the inner workings of 
mssql.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Horribly slow hash join

2004-04-18 Thread Dennis Bjorklund
On Sat, 17 Apr 2004, Tom Lane wrote:

 *some* set of inputs.  (Also, I have been harboring some notions of
 supporting cross-type hash joins for integer types, which will not work
 unless small int8 values hash the same as int4 etc.)

The simple solution would be to always extend integers to 64 bits (or
whatever the biggest integer is) before calculating the hash. It makes the
hash function a little slower for smaller types, but it's mostly an
operation in the cpu and no memory involved, so it's probably not
noticable.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Horribly slow hash join

2004-04-18 Thread Dennis Bjorklund
On Sun, 18 Apr 2004, Tom Lane wrote:

 That creates portability issues though.  We do not depend on there being
 a 64-bit-int type for anything except int8 itself, and I don't want to
 start doing so.

What do you mean? int8 is supported on all platformas and if the 
hasfunction would convert all numbers to int8 before making the hash it 
would work.

I don't see any portability problems.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Horribly slow hash join

2004-04-18 Thread Dennis Bjorklund
On Sun, 18 Apr 2004, Tom Lane wrote:

  What do you mean? int8 is supported on all platformas
 
 No it isn't.

So on platforms where it isn't you would use int4 as the biggest int then. 
I don't really see that as a problem. As long as you calculate the hash on 
the biggest int on that platform it should work.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Horribly slow hash join

2004-04-18 Thread Dennis Bjorklund
On Sun, 18 Apr 2004, Bruno Wolff III wrote:

 Another option would be to put the numbers into two int4s. For int4 or
 smaller types one of these would be zero. int8s would be split between
 the two. The hash function would then be defined on the two int4s.

Sure, this is an internal calculation in the hash function. The only 
important thing is that the number 7 (for example) gives the same hash 
value no matter if it is an int2 or an int8 and that the hash function 
works well also for int8 numbers (which is does not today).

At least that was the properties I understood that we wanted.

We got side tracked into talking about what datatype exists in all 
platforms, that's not an issue at all.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] select count(*) on large tables

2004-04-08 Thread Dennis Bjorklund
On Thu, 8 Apr 2004, Cosimo Streppone wrote:

 The alternative solution I tried, that has an optimal
 speed up, unfortunately is not a way out, and it is based
 on EXPLAIN SELECT count(*) output parsing, which
 is obviously *not* reliable.

Try this to get the estimate:

   SELECT relname, reltuples from pg_class order by relname;

 The times always get better doing a vacuum (and eventually
 reindex) of the table, and they slowly lower again.

Yes, the estimate is updated by the analyze.

 Is there an estimate time for this issue to be resolved?

It's not so easy to fix. The naive fixes makes other operations slower,
most notably makes things less concurrent which is bad since it wont scale 
as good for many users then.

You can always keep the count yourself and have some triggers that update 
the count on each insert and delete on the table. It will of course make 
all inserts and deletes slower, but if you count all rows often maybe it's 
worth it. Most people do not need to count all rows in a table anyway. You 
usually count all rows such as this and that (some condition).

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] good pc but bad performance,why?

2004-04-07 Thread Dennis Bjorklund
On Wed, 7 Apr 2004, huang yaqin wrote:

 You said turning fsync off may cause losing data, that's terrible. I use
 SCSI disk, and file system is ext3. I tune postgresql.conf and can't get
 any improvement. So what can I do?

Make sure you do as much as possible inside one transaction. If you want 
to do 1000 inserts, then do BEGIN; insert ; insert; ... ; COMMIT;

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] simple query join

2004-03-07 Thread Dennis Bjorklund
On Mon, 8 Mar 2004, Chris Smith wrote:

  assetid| integer | not null default 0

  assetid| character varying(255) | not null default '0'

The types above does not match, and these are the attributes you use to 
join.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Fixed width rows faster?

2004-03-05 Thread Dennis Bjorklund
On Fri, 5 Mar 2004, Jeff wrote:

 Is there any performance advantage to using a fixed width row (with PG)?

As far as I know there is only a small win when you want to extract some
field from a tuple and with variable width fields you have to walk to the
correct field. But this is a small performance problem unless you have
very many variable size columns in the table.

 different databases and we have also some custom software here that 
 uses fixed width rows to be able to hit row N in O(1)

This can not happen in pg since there is no row N. Every transaction can 
have a different view of the table, some rows are visible and some others 
are not. To find row N you have to walk from the start and inspect every 
tuple to see if it's visible to this transaction or not.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] query optimization question

2004-01-30 Thread Dennis Bjorklund
On Thu, 29 Jan 2004, Tom Lane wrote:

  jackdb-# GROUP BY memberid_ HAVING ( 
 
 Um, that's not what I had in mind at all.  Does GROUP BY actually do
 anything at all here?  (You didn't answer me as to whether memberid_
 is a unique identifier or not, but if it is, this GROUP BY is just an
 expensive no-op.)

From your comment I assume that there is no transformation in pg that 
detects that the group by columns are unique?

 this is all blue-sky speculation anyway.  What I'm actually more
 interested in is your statement that MSSQL can do the original query
 quickly.  I find that a bit hard to believe because I don't see any
 relevant optimization techniques.

Getting rid of the group by would not give that kind of speedup? Maybe
mssql manage to rewrite the query like that before executing.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] query optimization question

2004-01-30 Thread Dennis Bjorklund
On Thu, 29 Jan 2004, Jack Coates wrote:

  Probably better to repost it as a gzip'd attachment.  That should
 
 complete with a picture of the GUI version. 26k zipped, let's see if
 this makes it through.

Are you sure you attached it?

At least when it got here there was no attachment.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Really slow even after vacuum

2004-01-21 Thread Dennis Bjorklund
On Wed, 21 Jan 2004, Jeroen Baekelandt wrote:

 jms_messages again. It takes 80 seconds!?! While before, with 1000
 records, it took only a fraction of a second.

run: VACUUM FULL ANALYZE;

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Explain not accurate

2004-01-11 Thread Dennis Bjorklund
On Fri, 9 Jan 2004, Richard van den Berg wrote:

 problems. However, I've run into an issue where explain tells us a the 
 costs of a quiry are tremendous (105849017586), but the query actually 
 runs quite fast. Even explain analyze shows these costs.

It would be helpful if you can show the query and the EXPLAIN ANALYZE of
the query (and not just EXPLAIN).

 This makes me wonder: can the estimates explain shows be dead wrong?

Of course they can. An estimation is just an estimation. If you have not
analyzed the database then it's most likely wrong. Dead wrong is not
common, but not impossible.

Run VACUUM ANALYZE and see if the estimate is better after that.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] annoying query/planner choice

2004-01-11 Thread Dennis Bjorklund
On Sun, 11 Jan 2004, Andrew Rawnsley wrote:

 20-25% of the time. Fiddling with CPU_TUPLE_COST doesn't do anything
 until I exceed 0.5, which strikes me as a bit high (though please
 correct me if I am assuming too much...). RANDOM_PAGE_COST seems to have
 no effect.

What about the effective cache size, is that set properly?

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Question about difference in performance of 2 queries

2003-12-29 Thread Dennis Bjorklund
On Mon, 29 Dec 2003, Sean Shanny wrote:

 The first plan below has horrendous performance.  we only get about 2% 
 CPU usage and iostat shows 3-5 MB/sec IO.  The second plan runs at 30% 
 cpu and 15-30MB.sec IO. 
 
 Could someone shed some light on why the huge difference in 
 performance?  Both are doing index scans plus a filter.  We have no 
 content_keys below -1 at this time so the queries return the same results.

EXPLAIN ANALYZE gives more information then EXPLAIN, and is prefered.

It uses different indexes in the two queries, and one seems to be 
faster then the other. Why, I can't tell yet.

I would assume that you would get the fastet result if you had an index 

   (content_key, date_key)

I don't know if pg will even use an index to speed up a  operation. When 
you had  then it could use the idx_pageviews_content index. Why it choose 
that when the other would be faster I don't know. Maybe explain analyze 
will give some hint.

-- 
/Dennis


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Why is restored database faster?

2003-12-18 Thread Dennis Bjorklund
On Thu, 18 Dec 2003, Shridhar Daithankar wrote:

 Well, then the only issue left is file sytem defragmentation.

And the internal fragmentation that can be fixed with the CLUSTER 
command.

-- 
/Dennis


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Why is restored database faster?

2003-12-16 Thread Dennis Bjorklund
On Tue, 16 Dec 2003, David Shadovitz wrote:

 I backed up my database using pg_dump, and then restored it onto a different 
 server using psql.  I see that the query SELECT COUNT(*) FROM myTable 
 executes immediately on the new server but takes several seconds on the old 
 one.  (The servers are identical.)
 
 What could account for this difference?  Clustering?  How can I get the 
 original server to perform as well as the new one?

You probably need to run VACUUM FULL. It locks the tables during its 
execution so only do it when the database is not in full use.

If this helps you probably need to do normal vacuums more often and maybe
tune the max_fsm_pages to be bigger. 

-- 
/Dennis


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] a lot of problems with pg 7.4

2003-12-14 Thread Dennis Bjorklund
On Sat, 13 Dec 2003, Kari Lavikka wrote:

 I evaluated pg 7.4 on our development server and it looked just fine
 but performance with production loads seems to be quite poor. Most of
 performance problems are caused by nonsensical query plans

Some of the estimates that pg made in the plans you showed was way off. I 
assume you have run VACUUM ANALYZE recently? If that does not help maybe 
you need to increaste the statistics gathering on some columns so that pg 
makes better estimates. With the wrong statistics it's not strange that pg 
chooses bad plans.

-- 
/Dennis


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Optimize

2003-11-24 Thread Dennis Bjorklund
On Mon, 24 Nov 2003, Torsten Schulz wrote:

 sort_mem = 32768 # min 32

32 meg per sort can be a lot in total if you have many clients sorting 
things. I assume you have checked so that the computer is not pushed into 
swapping when you have the peak with lots of users. A swapping computer is 
never fast.

Using some swap space is not bad, but a lot of page in and page out to the
swap is not good.

-- 
/Dennis


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Help with count(*)

2003-11-15 Thread Dennis Bjorklund
On Fri, 14 Nov 2003, Tom Lane wrote:

 I believe the previous discussion also agreed that we wanted to postpone
 the freezing of now(), which currently also happens at BEGIN rather than
 the first command after BEGIN.

Or should that happen at the first call to now()?

/me should ge back and try to find this previous discussion.

-- 
/Dennis


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] *very* slow query to summarize data for a month ...

2003-11-10 Thread Dennis Bjorklund
On Mon, 10 Nov 2003, Marc G. Fournier wrote:

 
 explain analyze SELECT ts.company_id, company_name, SUM(ts.bytes) AS total_traffic
 FROM company c, traffic_logs ts
WHERE c.company_id = ts.company_id
  AND month_trunc(ts.runtime) = '2003-10-01'
 GROUP BY company_name,ts.company_id;

What if you do

  ts.runtime = '2003-10-01' AND ts.runtime  '2003-11-01'

and add an index like (runtime, company_name, company_id)?


-- 
/Dennis


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] One or more processor ?

2003-10-10 Thread Dennis Bjorklund
On Fri, 10 Oct 2003, [iso-8859-15] Hervé Piedvache wrote:

 A simple question about PostgreSQL ... I have a Pentium Xeon Quadri
 processors ... If I do a SQL request ... does PostgreSQL use one or more
 processor ?

Each connection becomes a process, and each process runs on one processor. 
So, with only one connection you use only one processor (and the OS might 
use an other processor). Most databases has many concurrent users and then 
it will use more processors.

-- 
/Dennis


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] OFFTOPIC: PostgreSQL vs MySQL

2003-10-09 Thread Dennis Bjorklund
On Thu, 9 Oct 2003, David Griffiths wrote:

   PostgreSQL supports constraints. MySQL doesn't; programmers need to
   take care of that from the client side
   Again, InnoDB supports constraints.
 
  Really?  This is news.   We did some tests on constraints on InnoDB, and
  found that while they parsed, they were not actually enforced.Was 
  our test in error?
 
 You may have turned them off to load data? I've run into constraints
 when my data-load script missed some rows in address_type. When it went
 to do the address_list table, all rows that had the missing address_type
 failed, as they should. I saw no weakness in the constraints.

It sounds like you talk about foreign keys only, while the previous writer 
talkes about other constraints also. For example, in postgresql you 
can do:

CREATE TABLE foo (
  x int,

  CONSTRAINT bar CHECK (x  5)
);

and then

# INSERT INTO foo VALUES (4);
ERROR:  ExecInsert: rejected due to CHECK constraint bar on foo


I don't know MySQL, but I've got the impression from other posts on the
lists that innodb supports foreign keys only. I might be wrong though.

-- 
/Dennis


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] restore time: sort_mem vs. checkpoing_segments

2003-09-24 Thread Dennis Bjorklund
On Tue, 23 Sep 2003, Bruce Momjian wrote:

 With the new warning about too-frequent checkpoints, people have actual
 feedback to encourage them to increase checkpoint_segments.  One issue
 is that it is likely to recommend increasing checkpoint_segments during
 restore, even if there is no value to it being large during normal
 server operation.  Should that be decumented?

One could have a variable that turns off that warning, and have pg_dump
insert a statement to turn it off. That is, if one never want these
warnings from a restore (from a new dump).

In any case, documentation is good and still needed.

-- 
/Dennis


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Reading data in bulk - help?

2003-09-10 Thread Dennis Bjorklund
On Wed, 10 Sep 2003, Chris Huston wrote:

 Sometime before then I need to figure out how to cut the fetch times 
 from the now 200ms to something like 10ms.

You didn't say anything about Joshs first point of adjusting
postgresql.conf to match your machine. Settings like effective_cache_size
you almost always want to increase from the default setting, also shared 
memory.

-- 
/Dennis


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings