Re: [PERFORM] App very unresponsive while performing simple update

2006-05-30 Thread Brendan Duddridge
You should realize this will produce a lot of garbage records and  
mean you'll
have to be running vacuum very frequently. You might consider  
instead of
updating the main table inserting into a separate clickstream  
table. That

trades off not getting instantaneous live totals with isolating the
maintenance headache in a single place. That table will grow large  
but you can
prune it at your leisure without impacting query performance on  
your main

tables.


We actually already have a table for this purpose. product_click_history



Actually, you can still get instant results, you just have to hit two
tables to do it.


Well, not really for our situation. We use the click_count on product  
to sort our product listings by popularity. Joining with our  
product_click_history to get live counts would be very slow. Some  
categories have many tens of thousands of products. Any joins outside  
our category_product table tend to be very slow.


We'll probably have to write a process to update the click_count from  
querying our product_click_history table.




Brendan Duddridge | CTO | 403-277-5591 x24 |  [EMAIL PROTECTED]

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

On May 31, 2006, at 12:23 AM, Jim C. Nasby wrote:


On Sun, May 28, 2006 at 07:20:59PM -0400, Greg Stark wrote:

Brendan Duddridge <[EMAIL PROTECTED]> writes:

We do have foreign keys on other tables that reference the  
product  table.
Also, there will be updates going on at the same time as this   
update. When
anyone clicks on a product details link, we issue an  update  
statement to
increment the click_count on the product. e.g.  update product  
set click_count

= click_count + 1;


You should realize this will produce a lot of garbage records and  
mean you'll
have to be running vacuum very frequently. You might consider  
instead of
updating the main table inserting into a separate clickstream  
table. That

trades off not getting instantaneous live totals with isolating the
maintenance headache in a single place. That table will grow large  
but you can
prune it at your leisure without impacting query performance on  
your main

tables.


Actually, you can still get instant results, you just have to hit two
tables to do it.

More likely you were blocking on some lock. Until that other query  
holding
that lock tries to commit Postgres won't actually detect a  
deadlock, it'll

just sit waiting until the lock becomes available.


Wow, are you sure that's how it works? I would think it would be  
able to

detect deadlocks as soon as both processes are waiting on each other's
locks.
--
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of  
broadcast)---

TIP 5: don't forget to increase your free space map settings





---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] INSERT OU UPDATE WITHOUT SELECT?

2006-05-30 Thread Jim C. Nasby
On Tue, May 30, 2006 at 07:05:08PM -0400, D'Arcy J.M. Cain wrote:
> On Tue, 30 May 2006 17:54:00 -0500
> "Dave Dutcher" <[EMAIL PROTECTED]> wrote:
> > What I do when I'm feeling lazy is execute a delete statement and then
> > an insert.  I only do it when I'm inserting/updating a very small number
> > of rows, so I've never worried if its optimal for performance.  Besides
> > I've heard that an update in postgres is similar in performance to a
> > delete/insert.
> 
> Well, they are basically the same operation in PostgreSQL.  An update
> adds a row to the end and marks the old one dead.  A delete/insert
> marks the row dead and adds one at the end.  There may be some
> optimization if the engine does both in one operation.

The new tuple will actually go on the same page during an update, if
possible. If not, the FSM is consulted. Appending to the end of the
table is a last resort.

Update is more effecient than delete/insert. First, it's one less
statement to parse and plan. Second, AFAIK insert always goes to the
FSM; it has no way to know you're replacing the row(s) you just deleted.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] App very unresponsive while performing simple update

2006-05-30 Thread Jim C. Nasby
On Sun, May 28, 2006 at 07:20:59PM -0400, Greg Stark wrote:
> Brendan Duddridge <[EMAIL PROTECTED]> writes:
> 
> > We do have foreign keys on other tables that reference the product  table.
> > Also, there will be updates going on at the same time as this  update. When
> > anyone clicks on a product details link, we issue an  update statement to
> > increment the click_count on the product. e.g.  update product set 
> > click_count
> > = click_count + 1;
> 
> You should realize this will produce a lot of garbage records and mean you'll
> have to be running vacuum very frequently. You might consider instead of
> updating the main table inserting into a separate clickstream table. That
> trades off not getting instantaneous live totals with isolating the
> maintenance headache in a single place. That table will grow large but you can
> prune it at your leisure without impacting query performance on your main
> tables.
 
Actually, you can still get instant results, you just have to hit two
tables to do it.

> More likely you were blocking on some lock. Until that other query holding
> that lock tries to commit Postgres won't actually detect a deadlock, it'll
> just sit waiting until the lock becomes available.

Wow, are you sure that's how it works? I would think it would be able to
detect deadlocks as soon as both processes are waiting on each other's
locks.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Why the 8.1 plan is worst than 7.4?

2006-05-30 Thread Tom Lane
"[EMAIL PROTECTED]" <[EMAIL PROTECTED]> writes:
> Why the plan is worst in postgres 8.1? 

(1) you have not actually shown us that the plan is worse.  If you are
complaining that the planner is wrong, EXPLAIN output (which contains
only the planner's estimates) is useless for proving your point.  Show
EXPLAIN ANALYZE.

(2) Have you ANALYZEd these tables recently in either database?  The
discrepancies in estimated rowcounts suggest that the two planners
are working with different statistics.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Query performance

2006-05-30 Thread Christopher Kings-Lynne
I'm executing the queries from phpPgAdmin. 
The above are for explain analyse. I was referring to the pure query

execution time.
Does anyone have an idea why the OR-query takes so long?
Any server-side tuning possibilities? I wouldn't like to change the code of
ldap's back-sql...


If you're using phpPgAdmin's timings, they could be more off than the 
real explain analyze timings.  Make sure you're using the figure given 
by explain analyze itself.


Chris


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Bulk loading/merging

2006-05-30 Thread Jim C. Nasby
Your best bet is to do this as a single, bulk operation if possible.
That way you can simply do an UPDATE ... WHERE EXISTS followed by an
INSERT ... SELECT ... WHERE NOT EXISTS.

On Fri, May 26, 2006 at 02:48:20PM -0400, Worky Workerson wrote:
> I've set up something similar the 'recommended' way to merge data into
> the DB, i.e.
> 
> http://www.postgresql.org/docs/current/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING
> 
> however I did it with a trigger on insert, i.e. (not my schema :) ):
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Why the 8.1 plan is worst than 7.4?

2006-05-30 Thread Jim C. Nasby
What's explain analyze show?

On Fri, May 26, 2006 at 09:04:56AM -0300, [EMAIL PROTECTED] wrote:
> Hi, 
> 
> I have 2 servers, one of them has a 7.4 postgres and the other has a 8.1 
> 
> I have this query: 
> 
> select fagrempr,fagrdocr,fagrserr,fagrparr 
> from arqcfat 
> left join arqfagr on fagrorig = 'CFAT' and fagrdocu = cfatdocu and fagrempe 
> = cfatempe and fagrseri = cfatseri 
> where cfatdata between '2006-01-01' and '2006-01-31' 
> and cfattipo = 'VD' 
> and cfatstat <> 'C' 
> and fagrform = 'CT' 
> and fagrtipr = 'REC' 
> group by fagrempr,fagrdocr,fagrserr,fagrparr 
> 
> The 8.1 give me this plan: 
> 
>  HashAggregate  (cost=59.07..59.08 rows=1 width=20) 
>->  Nested Loop  (cost=0.00..59.06 rows=1 width=20) 
>  ->  Index Scan using arqfagr_arqfa3_key on arqfagr  
> (cost=0.00..53.01 rows=1 width=36) 
>Index Cond: ((fagrorig = 'CFAT'::bpchar) AND (fagrform = 
> 'CT'::bpchar)) 
>Filter: (fagrtipr = 'REC'::bpchar) 
>  ->  Index Scan using arqcfat_arqcfat1_key on arqcfat  
> (cost=0.00..6.03 rows=1 width=16) 
>Index Cond: (("outer".fagrempe = arqcfat.cfatempe) AND 
> ("outer".fagrdocu = arqcfat.cfatdocu) AND ("outer".fagrseri = 
> arqcfat.cfatseri)) 
>Filter: ((cfatdata >= '01-01-2006'::date) AND (cfatdata <= 
> '31-01-2006'::date) AND (cfattipo = 'VD'::bpchar) AND (cfatstat <> 
> 'C'::bpchar)) 
> 
> The 7.4 give me this plan: 
> 
> HashAggregate  (cost=2163.93..2163.93 rows=1 width=19) 
>->  Nested Loop  (cost=0.00..2163.92 rows=1 width=19) 
>  ->  Index Scan using arqcfat_arqcfat2_key on arqcfat  
> (cost=0.00..2145.78 rows=3 width=15) 
>Index Cond: ((cfatdata >= '01-01-2006'::date) AND (cfatdata 
> <= '31-01-2006'::date)) 
>Filter: ((cfattipo = 'VD'::bpchar) AND (cfatstat <> 
> 'C'::bpchar)) 
>  ->  Index Scan using arqfagr_arqfa1_key on arqfagr  
> (cost=0.00..6.03 rows=1 width=34) 
>Index Cond: ((arqfagr.fagrorig = 'CFAT'::bpchar) AND 
> (arqfagr.fagrempe = "outer".cfatempe) AND (arqfagr.fagrdocu = 
> "outer".cfatdocu) AND (arqfagr.fagrseri = "outer".cfatseri)) 
>Filter: ((fagrform = 'CT'::bpchar) AND (fagrtipr = 
> 'REC'::bpchar)) 
> 
> Why the plan is worst in postgres 8.1? 
> 
> I know the best plan is read fisrt the table which has a date index as the 
> 7.4 did, because in a few days I will have few lines too, so the query will 
> be faster. 
> 
> Is there some thing I have to change in 8.1 to make the plans as the 7.4? 
> 
> Thanks , 
> 
> Waldomiro C. Neto. 
> 
> 
> 
> ---(end of broadcast)---
> TIP 3: Have you checked our extensive FAQ?
> 
>http://www.postgresql.org/docs/faq
> 

-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] INSERT OU UPDATE WITHOUT SELECT?

2006-05-30 Thread D'Arcy J.M. Cain
On Tue, 30 May 2006 17:54:00 -0500
"Dave Dutcher" <[EMAIL PROTECTED]> wrote:
> What I do when I'm feeling lazy is execute a delete statement and then
> an insert.  I only do it when I'm inserting/updating a very small number
> of rows, so I've never worried if its optimal for performance.  Besides
> I've heard that an update in postgres is similar in performance to a
> delete/insert.

Well, they are basically the same operation in PostgreSQL.  An update
adds a row to the end and marks the old one dead.  A delete/insert
marks the row dead and adds one at the end.  There may be some
optimization if the engine does both in one operation.

-- 
D'Arcy J.M. Cain  |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] INSERT OU UPDATE WITHOUT SELECT?

2006-05-30 Thread Dave Dutcher
What I do when I'm feeling lazy is execute a delete statement and then
an insert.  I only do it when I'm inserting/updating a very small number
of rows, so I've never worried if its optimal for performance.  Besides
I've heard that an update in postgres is similar in performance to a
delete/insert.


> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of PFC
> Sent: Tuesday, May 30, 2006 5:35 PM
> To: Jonah H. Harris; Waldomiro
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] INSERT OU UPDATE WITHOUT SELECT?
> 
> 
> > PostgreSQL does not support MERGE at the moment, sorry.
> 
>   Issue an UPDATE, and watch the rowcount ; if the 
> rowcount is 0, issue an  
> INSERT.
>   Be prepared to retry if another transaction has 
> inserted the row  
> meanwhile, though.
> 
>   MERGE would be really useful.
> 
> 
> ---(end of 
> broadcast)---
> TIP 5: don't forget to increase your free space map settings
> 


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] INSERT OU UPDATE WITHOUT SELECT?

2006-05-30 Thread Alvaro Herrera
PFC wrote:
> >PostgreSQL does not support MERGE at the moment, sorry.
> 
>   Issue an UPDATE, and watch the rowcount ; if the rowcount is 0, 
>   issue an  INSERT.
>   Be prepared to retry if another transaction has inserted the row  
> meanwhile, though.

Oh, you mean, like the example that's in the documentation?

http://www.postgresql.org/docs/8.1/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING

Example 36-1

>   MERGE would be really useful.

It has been discussed before -- MERGE is something different.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Optimizing a huge_table/tiny_table join

2006-05-30 Thread PFC


SELECT * FROM huge_table h, tiny_table t WHERE UPPER( h.id ) =  
UPPER( t.id )


What about :

SELECT * FROM huge_table h WHERE UPPER(id) IN (SELECT upper(id) FROM  
tiny_table t)


	Or, try opening a cursor on your original query and using FETCH. It might  
result in a different plan.

Or lower random_page_cost.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] INSERT OU UPDATE WITHOUT SELECT?

2006-05-30 Thread PFC

PostgreSQL does not support MERGE at the moment, sorry.


	Issue an UPDATE, and watch the rowcount ; if the rowcount is 0, issue an  
INSERT.
	Be prepared to retry if another transaction has inserted the row  
meanwhile, though.


MERGE would be really useful.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Getting even more insert performance (250m+rows/day)

2006-05-30 Thread Daniel J. Luke

On May 30, 2006, at 3:59 PM, Daniel J. Luke wrote:
I should have gprof numbers on a similarly set up test machine  
soon ...


gprof output is available at http://geeklair.net/~dluke/ 
postgres_profiles/


(generated from CVS HEAD as of today).

Any ideas are welcome.

Thanks!
--
Daniel J. Luke
++
| * [EMAIL PROTECTED] * |
| *-- http://www.geeklair.net -* |
++
|   Opinions expressed are mine and do not necessarily   |
|  reflect the opinions of my employer.  |
++




PGP.sig
Description: This is a digitally signed message part


Re: [PERFORM] Split select completes, single select doesn't and

2006-05-30 Thread Ragnar
On þri, 2006-05-30 at 10:26 +1000, Anthony Ransley wrote:
> Can any one explain why the following query
> 
> select f(q) from
> (
> select * from times
> where '2006-03-01 00:00:00'<=q and q<'2006-03-08 00:00:00'
> order by q
> ) v;
> 
> never completes, but splitting up the time span into single days does work.
> 
> select f(q) from
> (
> select * from times
> where '2006-03-01 00:00:00'<=q and q<'2006-03-02 00:00:00'
> order by q
> ) v;

first question: is f() relevant to your problem?

I mean do you see the same effect with:
  select q from
  (
select * from times
where '2006-03-01 00:00:00'<=q and q<'2006-03-08 00:00:00'
order by q
  ) v;

or even:
  select q from times
 where '2006-03-01 00:00:00'<=q and q<'2006-03-08 00:00:00'
 order by q


if f() is needed to make this happen show us f()

if f() is not relevant, show us the simplest cases where
you see this. show us EXPLAIN on the query that does not
finish, show us EXPLAIN ANALYZE on the queries that do.

second question: what indexes exist on the table "times" ?

another question: how many rows in the table ?

next question: is the table newly ANALYZED?

finally: what version of postgresql are you using?


whithout more info , it is difficult to guess what
your problem is, but possibly you need to increase
the statistics target of column "q"

gnari




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] INSERT OU UPDATE WITHOUT SELECT?

2006-05-30 Thread Jonah H. Harris

On 5/30/06, Waldomiro <[EMAIL PROTECTED]> wrote:

 Is there a command to Insert a record If It does not exists and a update if
It exists?


Sure, it's called MERGE.  See http://en.wikipedia.org/wiki/Merge_%28SQL%29


 I mean the postgres should test if a record exist before insert and if It
exist then the postgres must do an update instead an insert.


PostgreSQL does not support MERGE at the moment, sorry.

--
Jonah H. Harris, Software Architect | phone: 732.331.1300
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 2nd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] INSERT OU UPDATE WITHOUT SELECT?

2006-05-30 Thread Waldomiro




Hi,

Is there a command to Insert a record If It does not exists and a
update if It exists?

I do not want to do a select before a insert or update. 

I mean the postgres should test if a record exist before insert and if
It exist then the postgres must do an update instead an insert.

Thanks,

WMiro.






[PERFORM] Split select completes, single select doesn't and becomes IO bound!

2006-05-30 Thread Anthony Ransley

Can any one explain why the following query

select f(q) from
(
   select * from times
   where '2006-03-01 00:00:00'<=q and q<'2006-03-08 00:00:00'
   order by q
) v;

never completes, but splitting up the time span into single days does work.

select f(q) from
(
   select * from times
   where '2006-03-01 00:00:00'<=q and q<'2006-03-02 00:00:00'
   order by q
) v;
select f(q) from
(
   select * from times
   where '2006-03-02 00:00:00'<=q and q<'2006-03-03 00:00:00'
   order by q
) v;
...
select f(q) from
(
   select * from times
   where '2006-03-07 00:00:00'<=q and q<'2006-03-08 00:00:00'
   order by q
) v;

The stored procedure f(q) take a timestamp and does a select and a 
calculation and then an update of a results table. The times table 
containes only a 100 rows per day. It is also observed that the cpu 
starts the query with 100% usage and then the slowly swings up and down 
from 100% to 20% over the first half hour, and then by the following 
morning the query is still running and the cpu usage is 3-5%. IO bound 
i'm guessing as the hdd is in constant use at 5 to 15 MB per second usage.
In contrast the query that is split up into days has a 100% cpu usage 
all the way through to its completion, which only takes twenty minutes 
each. The computer is not being used for anything else, and is a dual 
core Athlon 4400+ with 4GB of ram.


Thanks for any information you can give on this.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Query performance

2006-05-30 Thread Erwin Brandstetter

Antonio Batovanja wrote:
(...)


1) the slow query:
EXPLAIN ANALYZE SELECT DISTINCT ldap_entries.id, organization.id,
text('organization') AS objectClass, ldap_entries.dn AS dn FROM
ldap_entries, organization, ldap_entry_objclasses WHERE
organization.id=ldap_entries.keyval AND ldap_entries.oc_map_id=1 AND
upper(ldap_entries.dn) LIKE '%DC=HUMANOMED,DC=AT' AND 1=1 OR
(ldap_entries.id=ldap_entry_objclasses.entry_id AND
ldap_entry_objclasses.oc_name='organization');



First, presenting your query in any readable form might be helpful if 
you want the community to help you. (Hint! Hint!)


SELECT DISTINCT ldap_entries.id, organization.id,
text('organization') AS objectClass, ldap_entries.dn AS dn
  FROM ldap_entries, organization, ldap_entry_objclasses
 WHERE organization.id=ldap_entries.keyval
   AND ldap_entries.oc_map_id=1
   AND upper(ldap_entries.dn) LIKE '%DC=HUMANOMED,DC=AT'
   AND 1=1
   OR (ldap_entries.id=ldap_entry_objclasses.entry_id
   AND ldap_entry_objclasses.oc_name='organization');

Next, you might want to use aliases to make it more readable.

SELECT DISTINCT e.id, o.id, text('organization') AS objectClass, e.dn AS dn
  FROM ldap_entries AS e, organization AS o, ldap_entry_objclasses AS eo
 WHERE o.id=e.keyval
   AND e.oc_map_id=1
   AND upper(e.dn) LIKE '%DC=HUMANOMED,DC=AT'
   AND 1=1
   OR (e.id=eo.entry_id
   AND eo.oc_name='organization');

There are a couple redundant (nonsensical) items, syntax-wise. Let's 
strip these:


SELECT DISTINCT e.id, o.id, text('organization') AS objectClass, e.dn
  FROM ldap_entries AS e, organization AS o, ldap_entry_objclasses AS eo
 WHERE o.id=e.keyval
   AND e.oc_map_id=1
   AND e.dn ILIKE '%DC=HUMANOMED,DC=AT'
OR e.id=eo.entry_id
   AND eo.oc_name='organization';

And finally, I suspect the lexical precedence of AND and OR might be the 
issue here. 
http://www.postgresql.org/docs/8.1/static/sql-syntax.html#SQL-PRECEDENCE

Maybe that is what you really want (just guessing):

SELECT DISTINCT e.id, o.id, text('organization') AS objectClass, e.dn
  FROM ldap_entries e
  JOIN organization o ON o.id=e.keyval
  LEFT JOIN ldap_entry_objclasses eo ON eo.entry_id=e.id
 WHERE e.oc_map_id=1
   AND e.dn ILIKE '%DC=HUMANOMED,DC=AT'
OR eo.oc_name='organization)';

I didn't take the time to read the rest. My appologies if I guessed wrong.


Regards, Erwin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Query performance

2006-05-30 Thread Erwin Brandstetter

Antonio Batovanja wrote:

Laurenz Albe wrote:


Antonio Batovanja wrote:

I'm having trouble understanding, why a specific query on a small
database is taking so long...


Before I try to understand the execution plans:

Have you run ANALYZE on the tables involved before you ran the query?


Hi,

Just to be on the safe side, I've run ANALYZE now.
Here are the query plans for the two queries:


I suspect a misunderstanding here. What Laurenz probably meant is to run 
 analyze on the involved _tables_ so the statistics data is refreshed. 
If the query planner runs with outdated statistics, queries may perform 
very poorly. Try


vacuum full analyze yourdatabase

To fully vacuum your database and analyze all tables.
(vacuum full is extra, but can't hurt.)

http://www.postgresql.org/docs/8.1/static/sql-vacuum.html
http://www.postgresql.org/docs/8.1/static/sql-analyze.html

Regards, Erwin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] Bulk loading/merging

2006-05-30 Thread Worky Workerson

I've set up something similar the 'recommended' way to merge data into
the DB, i.e.

http://www.postgresql.org/docs/current/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING

however I did it with a trigger on insert, i.e. (not my schema :) ):

CREATE TABLE db (a INT PRIMARY KEY, b TEXT, c INTEGER, d INET);

CREATE FUNCTION merge_db() RETURNS TRIGGER AS
$$
BEGIN
   UPDATE db SET b = NEW.data
   WHERE a = NEW.key
 AND NOT (c IS DISTINCT FROM NEW.c)
 AND NOT (d IS DISTINCT FROM NEW.d);
   IF found THEN
   RETURN NULL;
   END IF;
   RETURN NEW;
END;
$$
LANGUAGE plpgsql;

CREATE TRIGGER merge_db_tr BEFORE INSERT ON db
FOR EACH ROW EXECUTE PROCEDURE merge_db();

Is this the best/fastest way to do this sort of thing?  I only get
about 50 records/second inserts, while without the trigger (inserting
unmerged data) I can get more like 1000/second. I'm doing the whole
NOT ... IS DISTINCT stuff to handle NULL values that might be in the
columns ... I'm only considering two column keys equal if (a,c,d) are
all the same (i.e. either the same value or both NULL).

I read that there is a race condition with the above method as applied
to a normal function ... does this apply to a trigger as well?

Optimization Questions:
-Can I do better with the trigger function itself?

-I realize that I can create indexes on some of the lookup columns
('key' in the above example).  This would speed up the location of the
update record but slow down the actual update insert, right?  Would
this be a win?  I tested an index on 1 rows, and it beat out the
non-indexed by about 7% (3:31 with index, 3:45 without) ... is this
all the benefit that I can expect?

-Will moving pg_xlog to a different disk help all that much, if the
whole DB is currently on a 4 disk RAID10?  What about moving the
indexes?  I've set up my postgresql.conf according to the docs and
Josh Berkus' presentation, i.e. (16GB ram, quad Opteron moachine, not
all settings are relevant):
shared_buffers = 6
temp_buffers = 1
work_mem = 131072
maintenance_work_mem = 524288
effective_cache_size = 12
random_page_cost = 2
wal_buffers = 128
checkpoint_segments = 128
checkpoint_timeout = 3000
max_fsm_pages = 200
max_fsm_relations = 100

-If I break up my dataset into smaller chunks and parallelize it,
could I get better total performance, or would I most likely be
thrashing the disk?

-If I sort the data in the COPY file by key (i.e. a,c,d) before
inserting it into the database, will this help out the DB at all?

-Its cleaner to just be able to insert everything into the database
and let the DB aggregate the records, however I could use some of our
extra hardware to do aggregation in perl and then output the already
aggregated records to the DB ... this has the advantage of being
easily parallelizable but requires a bit of extra work to get right.
Do you think that this is the best way to go?

Also, as a slight aside, without a trigger, COPY seems to process each
record very quickly (using Perl DBI, about 7000 records/second)
however there is a long pause once the last record has been delivered.
Is this just the backend queuing up the insert commands given by
perl, or is there extra processing that needs to be done at the end of
the COPY that could be taking a while (10s on 500K record COPY).

Thanks!

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Bulk loading/merging

2006-05-30 Thread Worky Workerson
Another little question ... would using any sort of TEMP table help out, i.e. loading the unaggregated data into a TEMP table, aggregating the data via a SELECT INTO another TEMP table, and then finally INSERT ... SELECT into the master, aggregated, triggered table?  It seems like this might be a win if A) the TEMP tables fit into memory, and B) the load data aggregates well.  Worst case (
i.e. all unique data in the load) seems like it might take much longer, however, since I'm creating 2 new TEMP tables 


[PERFORM] Why the 8.1 plan is worst than 7.4?

2006-05-30 Thread wmiro
Hi, 

I have 2 servers, one of them has a 7.4 postgres and the other has a 8.1 

I have this query: 

select fagrempr,fagrdocr,fagrserr,fagrparr 
from arqcfat 
left join arqfagr on fagrorig = 'CFAT' and fagrdocu = cfatdocu and fagrempe 
= cfatempe and fagrseri = cfatseri 
where cfatdata between '2006-01-01' and '2006-01-31' 
and cfattipo = 'VD' 
and cfatstat <> 'C' 
and fagrform = 'CT' 
and fagrtipr = 'REC' 
group by fagrempr,fagrdocr,fagrserr,fagrparr 

The 8.1 give me this plan: 

 HashAggregate  (cost=59.07..59.08 rows=1 width=20) 
   ->  Nested Loop  (cost=0.00..59.06 rows=1 width=20) 
 ->  Index Scan using arqfagr_arqfa3_key on arqfagr  
(cost=0.00..53.01 rows=1 width=36) 
   Index Cond: ((fagrorig = 'CFAT'::bpchar) AND (fagrform = 
'CT'::bpchar)) 
   Filter: (fagrtipr = 'REC'::bpchar) 
 ->  Index Scan using arqcfat_arqcfat1_key on arqcfat  
(cost=0.00..6.03 rows=1 width=16) 
   Index Cond: (("outer".fagrempe = arqcfat.cfatempe) AND 
("outer".fagrdocu = arqcfat.cfatdocu) AND ("outer".fagrseri = 
arqcfat.cfatseri)) 
   Filter: ((cfatdata >= '01-01-2006'::date) AND (cfatdata <= 
'31-01-2006'::date) AND (cfattipo = 'VD'::bpchar) AND (cfatstat <> 
'C'::bpchar)) 

The 7.4 give me this plan: 

HashAggregate  (cost=2163.93..2163.93 rows=1 width=19) 
   ->  Nested Loop  (cost=0.00..2163.92 rows=1 width=19) 
 ->  Index Scan using arqcfat_arqcfat2_key on arqcfat  
(cost=0.00..2145.78 rows=3 width=15) 
   Index Cond: ((cfatdata >= '01-01-2006'::date) AND (cfatdata 
<= '31-01-2006'::date)) 
   Filter: ((cfattipo = 'VD'::bpchar) AND (cfatstat <> 
'C'::bpchar)) 
 ->  Index Scan using arqfagr_arqfa1_key on arqfagr  
(cost=0.00..6.03 rows=1 width=34) 
   Index Cond: ((arqfagr.fagrorig = 'CFAT'::bpchar) AND 
(arqfagr.fagrempe = "outer".cfatempe) AND (arqfagr.fagrdocu = 
"outer".cfatdocu) AND (arqfagr.fagrseri = "outer".cfatseri)) 
   Filter: ((fagrform = 'CT'::bpchar) AND (fagrtipr = 
'REC'::bpchar)) 

Why the plan is worst in postgres 8.1? 

I know the best plan is read fisrt the table which has a date index as the 
7.4 did, because in a few days I will have few lines too, so the query will 
be faster. 

Is there some thing I have to change in 8.1 to make the plans as the 7.4? 

Thanks , 

Waldomiro C. Neto. 



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Optimizing a huge_table/tiny_table join

2006-05-30 Thread Kynn Jones

On 5/24/06, Tom Lane <[EMAIL PROTECTED]> wrote: 
<[EMAIL PROTECTED]> writes:>  Limit  (cost=19676.75..21327.99
 rows=6000 width=84)>->  Hash Join  (cost=19676.75..1062244.81 rows=3788315 width=84)>  Hash Cond: (upper(("outer".id)::text) = upper(("inner".id)::text))>  ->  Seq Scan on huge_table h  (cost=
0.00..51292.43 rows=2525543 width=46)>  ->  Hash  (cost=19676.00..19676.00 rows=300 width=38)>->  Seq Scan on tiny_table t  (cost=0.00..19676.00 rows=300 width=38)Um, if huge_table is so much bigger than tiny_table, why are the cost
estimates for seqscanning them only about 2.5x different?  There'ssomething wacko about your statistics, methinks.
 
 
You mean there's a bug in explain?  I agree that it makes no sense that the costs don't differ as much as one would expect, but you can see right there the numbers of rows for the two tables, which are exactly as I described.  At any rate, how would one go about finding an explanation for these strange stats?

 
More bewildering still (and infuriating as hell--because it means that all of my work for yesterday has been wasted) is that I can no longer reproduce the best query plan, even though the tables have not changed at all.  (Hence I can't post the explain analyze for the best query plan.)  No matter what value I use for LIMIT, the query planner now insists on sequentially scanning huge_table and ignoring the available index.  (If I turn off enable_seqscan, I get the second worst query plan I posted yesterday.)

 
Anyway, I take it that there is no way to bypass the optimizer and instruct PostgreSQL exactly how one wants the search performed?
 
Thanks!
 
kj
 


[PERFORM] Optimizing a huge_table/tiny_table join

2006-05-30 Thread kynn




[ I had a problem with my mailer when I first sent this.  My apologies
  for any repeats. ]


I want to optimize this simple join:

SELECT * FROM huge_table h, tiny_table t WHERE UPPER( h.id ) = UPPER( t.id )

huge_table has about 2.5 million records, can be assumed as fixed, and
has the following index:

CREATE INDEX huge_table_index ON huge_table( UPPER( id ) );

...while tiny_table changes with each user request, and typically will
contain on the order of 100-1000 records.  For this analysis, I put
300 records in tiny_table, resulting in 505 records in the join.

I tried several approaches.  In order of increasing speed of
execution:

1. executed as shown above, with enable_seqscan on: about 100 s.

2. executed as shown above, with enable_seqscan off: about 10 s.

3. executed with a LIMIT 6000 clause added to the SELECT statement, and
   enable_seqscan on: about 5 s.

4. executed with a LIMIT 600 clause added to the SELECT statement, and
   enable_seqscan on: less than 1 s.



Clearly, using LIMIT is the way to go.  Unfortunately I *do* want all
the records that would have been produced without the LIMIT clause,
and I don't have a formula for the limit that will guarantee this.  I
could use a very large value (e.g. 20x the size of tiny_table, as in
approach 3 above) which would make the probability of hitting the
limit very small, but unfortunately, the query plan in this case is
different from the query plan when the limit is just above the
expected number of results (approach 4 above).

The query plan for the fastest approach is this:

   QUERY PLAN
-
 Limit  (cost=0.01..2338.75 rows=600 width=84)
   ->  Nested Loop  (cost=0.01..14766453.89 rows=3788315 width=84)
 ->  Seq Scan on tiny_table t  (cost=0.00..19676.00 rows=300 width=38)
 ->  Index Scan using huge_table_index on huge_table h  
(cost=0.01..48871.80 rows=12628 width=46)
   Index Cond: (upper(("outer".id)::text) = upper((h.id)::text))



How can I *force* this query plan even with a higher limit value?

I found, by dumb trial and error, that in this case the switch happens
at LIMIT 5432, which, FWIW, is about 0.2% of the size of huge_table.
Is there a simpler way to determine this limit (hopefully
programmatically)?


Alternatively, I could compute the value for LIMIT as 2x the number of
records in tiny_table, and if the number of records found is *exactly*
this number, I would know that (most likely) some records were left
out.  In this case, I could use the fact that, according to the query
plan above, the scan of tiny_table is sequential to infer which
records in tiny_table were disregarded when the limit was reached, and
then repeat the query with only these left over records in tiny_table.

What's your opinion of this strategy?  Is there a good way to improve
it?

Many thanks in advance!

kj

PS:  FWIW, the query plan for the query with LIMIT 6000 is this:

 QUERY PLAN
-
 Limit  (cost=19676.75..21327.99 rows=6000 width=84)
   ->  Hash Join  (cost=19676.75..1062244.81 rows=3788315 width=84)
 Hash Cond: (upper(("outer".id)::text) = upper(("inner".id)::text))
 ->  Seq Scan on huge_table h  (cost=0.00..51292.43 rows=2525543 
width=46)
 ->  Hash  (cost=19676.00..19676.00 rows=300 width=38)
   ->  Seq Scan on tiny_table t  (cost=0.00..19676.00 rows=300 
width=38)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] Optimizing a huge_table/tiny_table join

2006-05-30 Thread kynn





I want to optimize this simple join:

SELECT * FROM huge_table h, tiny_table t WHERE UPPER( h.id ) = UPPER( t.id )

huge_table has about 2.5 million records, can be assumed as fixed, and
has the following index:

CREATE INDEX huge_table_index ON huge_table( UPPER( id ) );

...while tiny_table changes with each user request, and typically will
contain on the order of 100-1000 records.  For this analysis, I put
300 records in tiny_table, resulting in 505 records in the join.

I tried several approaches.  In order of increasing speed of
execution:

1. executed as shown above, with enable_seqscan on: about 100 s.

2. executed as shown above, with enable_seqscan off: about 10 s.

3. executed with a LIMIT 6000 clause added to the SELECT statement, and
   enable_seqscan on: about 5 s.

4. executed with a LIMIT 600 clause added to the SELECT statement, and
   enable_seqscan on: less than 1 s.



Clearly, using LIMIT is the way to go.  Unfortunately I *do* want all
the records that would have been produced without the LIMIT clause,
and I don't have a formula for the limit that will guarantee this.  I
could use a very large value (e.g. 20x the size of tiny_table, as in
approach 3 above) which would make the probability of hitting the
limit very small, but unfortunately, the query plan in this case is
different from the query plan when the limit is just above the
expected number of results (approach 4 above).

The query plan for the fastest approach is this:

   QUERY PLAN
-
 Limit  (cost=0.01..2338.75 rows=600 width=84)
   ->  Nested Loop  (cost=0.01..14766453.89 rows=3788315 width=84)
 ->  Seq Scan on tiny_table t  (cost=0.00..19676.00 rows=300 width=38)
 ->  Index Scan using huge_table_index on huge_table h  
(cost=0.01..48871.80 rows=12628 width=46)
   Index Cond: (upper(("outer".id)::text) = upper((h.id)::text))



How can I *force* this query plan even with a higher limit value?

I found, by dumb trial and error, that in this case the switch happens
at LIMIT 5432, which, FWIW, is about 0.2% of the size of huge_table.
Is there a simpler way to determine this limit (hopefully
programmatically)?


Alternatively, I could compute the value for LIMIT as 2x the number of
records in tiny_table, and if the number of records found is *exactly*
this number, I would know that (most likely) some records were left
out.  In this case, I could use the fact that, according to the query
plan above, the scan of tiny_table is sequential to infer which
records in tiny_table were disregarded when the limit was reached, and
then repeat the query with only these left over records in tiny_table.

What's your opinion of this strategy?  Is there a good way to improve
it?

Many thanks in advance!

kj

PS:  FWIW, the query plan for the query with LIMIT 6000 is this:

 QUERY PLAN
-
 Limit  (cost=19676.75..21327.99 rows=6000 width=84)
   ->  Hash Join  (cost=19676.75..1062244.81 rows=3788315 width=84)
 Hash Cond: (upper(("outer".id)::text) = upper(("inner".id)::text))
 ->  Seq Scan on huge_table h  (cost=0.00..51292.43 rows=2525543 
width=46)
 ->  Hash  (cost=19676.00..19676.00 rows=300 width=38)
   ->  Seq Scan on tiny_table t  (cost=0.00..19676.00 rows=300 
width=38)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] compiling source code!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

2006-05-30 Thread sibel karaasma
Hi I'm a new postgresql user. I wrote ACO (ant colony optimazition) and want to replace it with GEQO in postres/src/backend/optimizer but I don't know how to compile and run the source code :(     I installed postgresql-8.1.3 and cygwin but I can not use them to compile the source code. I want to compare GEQO and ACO optimizers performance using a small database Can you help me???
		Talk is cheap. Use Yahoo! Messenger to make PC-to-Phone calls.  Great rates starting at 1¢/min.

Re: [PERFORM] Query performance

2006-05-30 Thread Antonio Batovanja
Steinar H. Gunderson wrote:

> [EMAIL PROTECTED] wrote:
>> The above query takes 5 seconds to execute!
>> 
>> [...]
>>
>>  Total runtime: 96109.571 ms
> 
> It sure doesn't look like it...
> 
>>  Total runtime: 461.907 ms
>>
>> [...]
>>
>> Suddenly the query takes only 0.29 seconds!
> 
> How are you timing this, really?
> 
> /* Steinar */

I'm executing the queries from phpPgAdmin. 
The above are for explain analyse. I was referring to the pure query
execution time.
Does anyone have an idea why the OR-query takes so long?
Any server-side tuning possibilities? I wouldn't like to change the code of
ldap's back-sql...

Toni

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Adding and filling new column on big table

2006-05-30 Thread Francisco Reyes

Jonathan Blitz writes:


I suppose I could do but I need to install PostgreSQL there and then copy
over the database.
Maybe I will give it a try.


I really think that is your best bet.
If for whatever reason that will not be an option perhaps you can just let 
the process run over the weekend.. possibly monitor the process from the OS 
to make sure it is not frozen.


Don't recall if you mentioned the OS.. is it any unix like os?
If so there are several ways you could check to make sure the process is not 
frozen such as iostats, top, vmstats(these from FreeBSD, but most unix like 
os should have tools like those if not some with the same name). 



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Adding and filling new column on big table

2006-05-30 Thread PFC


Most laptop drives are only 5,400 RPM which would make a transaction  
like

you are describing likely take a while.


Not sure what my one is but it is new(ish).


	If you're doing data intensive operations (like a big update which looks  
like what you're doing) it will write many megabytes to the harddrive...  
my laptop HDD (5400 rpm) does about 15 MB/s throughput while a standard  
desktop 7200rpm drive does 55-60 MB/s throughput. Plus, seek times on a  
laptop drive are horrendous.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Adding and filling new column on big table

2006-05-30 Thread Scott Marlowe
On Tue, 2006-05-30 at 16:04, Jonathan Blitz wrote:
> > 
> > Most laptop drives are only 5,400 RPM which would make a transaction like
> > you are describing likely take a while.
> 
> Not sure what my one is but it is new(ish).
> 
> > 
> > No desktop at home you could try it on?
> > I think the problem with the laptop is likely it's drive.
> 
> I suppose I could do but I need to install PostgreSQL there and then copy
> over the database.

Keep in mind, most, if not all IDE drives lie about fsync, so the speed
of the drive is a limit only if you're actually writing a lot of data. 
If you're doing a lot of little transactions, the drive should be lying
and holding the data in cache on board, so the speed should be OK.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Adding and filling new column on big table

2006-05-30 Thread Jonathan Blitz
> 
> Most laptop drives are only 5,400 RPM which would make a transaction like
> you are describing likely take a while.

Not sure what my one is but it is new(ish).

> 
> No desktop at home you could try it on?
> I think the problem with the laptop is likely it's drive.

I suppose I could do but I need to install PostgreSQL there and then copy
over the database.
Maybe I will give it a try.

Jonathan

-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.7.4/351 - Release Date: 05/29/2006
 


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Getting even more insert performance (250m+rows/day)

2006-05-30 Thread Daniel J. Luke

On May 29, 2006, at 7:11 AM, Markus Schaber wrote:

One big transaction every 5 minutes using 'COPY FROM' (instead of
inserts).


Are you using "COPY table FROM '/path/to/file'", having the file  
sitting

on the server, or "COPY table FROM STDIN" or psql "/copy", having the
file sitting on the client?


COPY table FROM STDIN using psql on the server

I should have gprof numbers on a similarly set up test machine soon ...
--
Daniel J. Luke
++
| * [EMAIL PROTECTED] * |
| *-- http://www.geeklair.net -* |
++
|   Opinions expressed are mine and do not necessarily   |
|  reflect the opinions of my employer.  |
++




PGP.sig
Description: This is a digitally signed message part


Re: [PERFORM] Adding and filling new column on big table

2006-05-30 Thread Francisco Reyes

Jonathan Blitz writes:


Nope. Didn't think it would make any difference.


May be worth a try.


I am using a laptop :).
Pentium 4 (not 4M) with 1GB of memory - 2 MHZ


Most laptop drives are only 5,400 RPM which would make a transaction like 
you are describing likely take a while.
 

Must do it on  that since the program is aimed for use at home.


No desktop at home you could try it on?
I think the problem with the laptop is likely it's drive.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] pg_dump issue

2006-05-30 Thread mcelroy, tim
Title: RE: [PERFORM] pg_dump issue 





I did carry it down to the subdirectory level but only included the total for brevity.  I'll paste the complete readout at the end of the email.  I'll try the "vmstat 1" as you suggest next time the backups run.  If the Eng staff finds anything I'll post the results and maybe save someone else some grief if they have the same issue.  Thanks again for your input Tom.

Tim


PROD001 PROD002
220K    ./global[PARA]4.0K    ./pg_xlog/archive_status[PARA]529M    ./pg_xlog[PARA]36K ./pg_clog[PARA]256K    ./pg_subtrans[PARA]4.0K    ./base/1/pgsql_tmp[PARA]4.8M    ./base/1[PARA]4.8M    ./base/17229[PARA]4.0K    ./base/62878500/pgsql_tmp[PARA]4.8M    ./base/62878500[PARA]5.5M    ./base/1152695[PARA]4.0K    ./base/67708567/pgsql_tmp[PARA]1.6G    ./base/67708567[PARA]12K ./base/1157024/pgsql_tmp[PARA]6.3G    ./base/1157024[PARA]4.0K    ./base/1159370/pgsql_tmp[PARA]543M    ./base/1159370[PARA]4.0K    ./base/1157190/pgsql_tmp[PARA]164M    ./base/1157190[PARA]4.0K    ./base/1621391/pgsql_tmp[PARA]81M ./base/1621391[PARA]8.6G    ./base[PARA]4.0K    ./pg_tblspc[PARA]604K    ./pg_log[PARA]9.1G    .   220K    ./global[PARA]4.0K    ./pg_xlog/archive_status[PARA]529M    ./pg_xlog[PARA]136K    ./pg_clog[PARA]208K    ./pg_subtrans[PARA]4.0K    ./base/1/pgsql_tmp[PARA]4.9M    ./base/1[PARA]4.8M    ./base/17229[PARA]5.3M    ./base/1274937[PARA]4.0K    ./base/64257611/pgsql_tmp[PARA]1.6G    ./base/64257611[PARA]4.0K    ./base/71683200/pgsql_tmp[PARA]6.1G    ./base/71683200[PARA]4.0K    ./base/1281929/pgsql_tmp[PARA]478M    ./base/1281929[PARA]4.0K    ./base/58579022/pgsql_tmp[PARA]154M    ./base/58579022[PARA]81M ./base/1773916[PARA]4.0K    ./base/55667447/pgsql_tmp[PARA]4.8M    ./base/55667447[PARA]8.3G    ./base[PARA]4.0K    ./pg_tblspc[PARA]588K    ./pg_log[PARA]8.8G    .


 -Original Message-
From:   Tom Lane [mailto:[EMAIL PROTECTED]] 
Sent:   Tuesday, May 30, 2006 12:20 PM
To: mcelroy, tim
Cc: pgsql-performance@postgresql.org
Subject:    Re: [PERFORM] pg_dump issue 


"mcelroy, tim" <[EMAIL PROTECTED]> writes:
> The du . -h  in $PGDATA showed PROD001 at 9.1G and Prod0002 at 8.8G so
> they're pretty much the same, one would think the smaller one should be
> faster.  Yes, the backup files are identical in size.


Hmph.  You should carry the "du" analysis down to the subdirectory
level, eg make sure that it's not a case of lots of pg_xlog bloat
balancing out bloat in a different area on the other system.  But I
suspect you won't find anything.


> I'm hoping the Engineering staff can find something system related as I
> doubted and still doubt that it's a postgres issue.


I tend to agree.  You might try watching "vmstat 1" output while taking
the dumps, so you could at least get a clue whether the problem is CPU
or I/O related ...


            regards, tom lane





Re: [PERFORM] pg_dump issue

2006-05-30 Thread Tom Lane
"mcelroy, tim" <[EMAIL PROTECTED]> writes:
> The du . -h  in $PGDATA showed PROD001 at 9.1G and Prod0002 at 8.8G so
> they're pretty much the same, one would think the smaller one should be
> faster.  Yes, the backup files are identical in size.

Hmph.  You should carry the "du" analysis down to the subdirectory
level, eg make sure that it's not a case of lots of pg_xlog bloat
balancing out bloat in a different area on the other system.  But I
suspect you won't find anything.

> I'm hoping the Engineering staff can find something system related as I
> doubted and still doubt that it's a postgres issue.

I tend to agree.  You might try watching "vmstat 1" output while taking
the dumps, so you could at least get a clue whether the problem is CPU
or I/O related ...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Adding and filling new column on big table

2006-05-30 Thread Jonathan Blitz
> 
> 
> Could you do the updates in batches instead of trying to do them all at
> once?

Nope. Didn't think it would make any difference.
> 
> Have you done a vacuum full on this table ever?

Many times

> 
> What hardware?
> I have a dual CPU opteron with 4GB of RAM and 8 disks in RAID 10 (SATA).
> Doing an update on a 5 million record table took quite a while, but it did
> fininish. :-)

I am using a laptop :).
Pentium 4 (not 4M) with 1GB of memory - 2 MHZ

Must do it on  that since the program is aimed for use at home.

Jonathan

-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.7.4/351 - Release Date: 05/29/2006
 


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Adding and filling new column on big table

2006-05-30 Thread Francisco Reyes

Jonathan Blitz writes:


I just gave up in the end and left it with NULL as the default value.



Could you do the updates in batches instead of trying to do them all at 
once?


Have you done a vacuum full on this table ever?


There were, in fact, over 2 million rows in the table rather than 1/4 of a
million so that was part of the problem.


What hardware?
I have a dual CPU opteron with 4GB of RAM and 8 disks in RAID 10 (SATA). 
Doing an update on a 5 million record table took quite a while, but it did 
fininish. :-)


I just did vacuum full before and after though.. That many updates tend to 
slow down operations on the table aftewards unless you vacuum the table. 
Based on what you wrote it sounded as if you tried a few times and may have 
killed the process.. this would certainly slow down the operations on that 
table unless you did a vacuum full. 

I wonder if running vacuum analyze against the table as the updates are 
running would be of any help.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] pg_dump issue

2006-05-30 Thread mcelroy, tim
Title: RE: [PERFORM] pg_dump issue 





Thanks Tom.  I have scheduled vacuums as follows and all have run without error.


Mon - Thu after-hours:  vacuumdb -z -e -a -v   On Fridays I add the -f option  vacuumdb -z -e -a -v -f


The du . -h  in $PGDATA showed PROD001 at 9.1G and Prod0002 at 8.8G so they're pretty much the same, one would think the smaller one should be faster.  Yes, the backup files are identical in size.  BTW - this is postgres 8.0.1.  Stuck at this due to "that is the latest postgresql version certified by our vendor's application".

I'm hoping the Engineering staff can find something system related as I doubted and still doubt that it's a postgres issue.

Tim



 -Original Message-
From:   Tom Lane [mailto:[EMAIL PROTECTED]] 
Sent:   Tuesday, May 30, 2006 11:16 AM
To: mcelroy, tim
Cc: pgsql-performance@postgresql.org
Subject:    Re: [PERFORM] pg_dump issue 


"mcelroy, tim" <[EMAIL PROTECTED]> writes:
> I have identical postgres installations running on identical machines.  Dual
> Core AMD Opteron(tm) Processor 870 , 16GB RAM, Red Hat Linux 3.2.3-20 and
> 120GB worth of disk space on two drives.


> Recently, I have noticed that my nightly backups take longer on one machine
> than on the other.  I back up five (5) databases totaling 8.6GB in size.  On
> Prod001 the backups take app. 7 minutes, on Prod002 the backups take app. 26
> minutes!  Quite a discrepancy.


Are the resulting backup files identical?  Chasing down the reasons for
any diffs might yield some enlightenment.


One idea that comes to mind is that Prod002 is having performance
problems due to table bloat (maybe a missing vacuum cron job or
some such).  A quick "du" on the two $PGDATA directories to check
for significant size differences would reveal this if so.


            regards, tom lane





Re: [PERFORM] Speedup hint needed, if available? :)

2006-05-30 Thread Mario Splivalo
On Tue, 2006-05-30 at 11:05 -0400, Tom Lane wrote:

> Do you have realistic test data?  The EXPLAIN shows that this is pulling
> 275366 of the 826032 rows in the two tables, which seems like rather a
> lot for a single user.  If it's reasonable that the query needs to fetch
> one-third of the data, then you should resign yourself to it taking
> awhile :-(

I'd say so, yes. The user_subscription table now has only six rows, but
the number of actions (giving/taking credits) for a user could easily be
as high as 50.000. 

> If the expected number of matching rows were much smaller, it would
> make sense to use indexscans over the two big tables, but unfortunately
> existing PG releases don't know how to generate an indexscan join
> with a UNION ALL in between :-(.  FWIW, 8.2 will be able to do it.
> In current releases the only thing I can suggest is to merge
> user_subscription_credits_given and user_subscription_credits_taken
> into one table so you don't need the UNION ALL.

See, that's an idea! :) Thnx, I'll try that.

Is it inapropriate to ask about rough estimate on availableness of
8.2? :)

Mario
-- 
Mario Splivalo
Mob-Art
[EMAIL PROTECTED]

"I can do it quick, I can do it cheap, I can do it well. Pick any two."



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] pg_dump issue

2006-05-30 Thread Tom Lane
"mcelroy, tim" <[EMAIL PROTECTED]> writes:
> I have identical postgres installations running on identical machines.  Dual
> Core AMD Opteron(tm) Processor 870 , 16GB RAM, Red Hat Linux 3.2.3-20 and
> 120GB worth of disk space on two drives.

> Recently, I have noticed that my nightly backups take longer on one machine
> than on the other.  I back up five (5) databases totaling 8.6GB in size.  On
> Prod001 the backups take app. 7 minutes, on Prod002 the backups take app. 26
> minutes!  Quite a discrepancy.

Are the resulting backup files identical?  Chasing down the reasons for
any diffs might yield some enlightenment.

One idea that comes to mind is that Prod002 is having performance
problems due to table bloat (maybe a missing vacuum cron job or
some such).  A quick "du" on the two $PGDATA directories to check
for significant size differences would reveal this if so.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Speedup hint needed, if available? :)

2006-05-30 Thread Tom Lane
Mario Splivalo <[EMAIL PROTECTED]> writes:
> Here is the query which gets information on particular user, shows
> subscriptions to mailinglists and available credits on those
> mailinglists:

> SELECT u.subscription_id, u.user_id, u.mailinglist_id, u.valid_from,
> u.valid_to, sum(credits.credits_given - credits.credits_taken)::integer
> AS credits
> FROM user_subscriptions u
> LEFT JOIN 
>   (SELECT user_subscription_credits_given.subscription_id,
> user_subscription_credits_given.credits AS credits_given, 0 AS
> credits_taken
>FROM user_subscription_credits_given
>   UNION ALL 
>  SELECT user_subscription_credits_taken.subscription_id, 0 AS
> credits_given, user_subscription_credits_taken.credits AS credits_taken
>FROM user_subscription_credits_taken) credits 
>   ON u.subscription_id = credits.subscription_id
> where
>   u.user_id = 1
> GROUP BY u.subscription_id, u.user_id, u.mailinglist_id, u.valid_from,
> u.valid_to

Do you have realistic test data?  The EXPLAIN shows that this is pulling
275366 of the 826032 rows in the two tables, which seems like rather a
lot for a single user.  If it's reasonable that the query needs to fetch
one-third of the data, then you should resign yourself to it taking
awhile :-(

If the expected number of matching rows were much smaller, it would
make sense to use indexscans over the two big tables, but unfortunately
existing PG releases don't know how to generate an indexscan join
with a UNION ALL in between :-(.  FWIW, 8.2 will be able to do it.
In current releases the only thing I can suggest is to merge
user_subscription_credits_given and user_subscription_credits_taken
into one table so you don't need the UNION ALL.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] pg_dump issue

2006-05-30 Thread mcelroy, tim
Title: pg_dump issue





Good morning,


I have identical postgres installations running on identical machines.  Dual Core AMD Opteron(tm) Processor 870 , 16GB RAM, Red Hat Linux 3.2.3-20 and 120GB worth of disk space on two drives.

Recently, I have noticed that my nightly backups take longer on one machine than on the other.  I back up five (5) databases totaling 8.6GB in size.  On Prod001 the backups take app. 7 minutes, on Prod002 the backups take app. 26 minutes!  Quite a discrepancy.  I checked myself than checked with our Engineering staff and have been assured that the machines are identical hardware wise, CPU, disk, etc.  

Question; has anyone run into a similar issue?  Here is the command I use for the nightly backup on both machines:


pg_dump -F c -f $DB.backup.$DATE $DB


Kind of scratching my head on this one


Thank you,
Tim McElroy





[PERFORM] Speedup hint needed, if available? :)

2006-05-30 Thread Mario Splivalo
Hello again.

I have to track user subscriptions to certain mailinglists, and I also
need to track credits users have on those mailinglists. On one side I
have procedures that add credits, on other side I have procedures that
subtract available credits. Add/subtract is pretty intensive, around
30-50 adds per minute (usualy 10 or 100 credits), and around 200-500
subtracts per minute (usualy by one or two credits).

I have created table user_subscriptions to track user subscriptions to
certain mailing list. I have derived subscription_id as primary key. I
have two other tables, user_subscription_credits_given, and
_credits_taken, wich track credits for subscription added or subtracted
to or from certain subscription. I created those two tables so I could
eliminate a lot of UPDATES on user_subscriptions table (if I were to
have a column 'credits' in that table). user_subscriptions table is
projected to have around 100.000 rows, and _credits_given/_credits_taken
table is projected to have around 10.000.000 rows. 

Now, I have filled the tables with test data, and the query results is
kind of poor. It takes almost 50 seconds to get the data for the
particular subscription. Now, is there a way to speed this up, or I need
different approach?

Here is the DDL/DML:

CREATE TABLE user_subscriptions
(
  subscription_id int4 NOT NULL DEFAULT
nextval('user_subscriptions_id_seq'::regclass),
  user_id int4 NOT NULL,
  mailinglist_id int4 NOT NULL,
  valid_from timestamptz NOT NULL,
  valid_to timestamptz,
  CONSTRAINT user_subscriptions_pkey PRIMARY KEY (subscription_id)
);

CREATE TABLE user_subscription_credits_given
(
  subscription_id int4 NOT NULL,
  credits int4 NOT NULL,
  CONSTRAINT user_subscription_credits_given_fk__subscription_id FOREIGN
KEY (subscription_id)
  REFERENCES user_subscriptions (subscription_id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION
);


CREATE INDEX fki_user_subscriptions_fk__mailinglist_id
  ON user_subscriptions
  USING btree
  (mailinglist_id);

CREATE INDEX fki_user_subscriptions_fk__users_id
  ON user_subscriptions
  USING btree
  (user_id);

CREATE INDEX fki_user_subscription_credits_given_fk__subscription_id
  ON user_subscription_credits_given
  USING btree
  (subscription_id);

CREATE INDEX fki_user_subscription_credits_taken_fk__subscription_id
  ON user_subscription_credits_taken
  USING btree
  (subscription_id);


Here is the query which gets information on particular user, shows
subscriptions to mailinglists and available credits on those
mailinglists:

SELECT u.subscription_id, u.user_id, u.mailinglist_id, u.valid_from,
u.valid_to, sum(credits.credits_given - credits.credits_taken)::integer
AS credits
FROM user_subscriptions u
LEFT JOIN 
(SELECT user_subscription_credits_given.subscription_id,
user_subscription_credits_given.credits AS credits_given, 0 AS
credits_taken
   FROM user_subscription_credits_given
UNION ALL 
 SELECT user_subscription_credits_taken.subscription_id, 0 AS
credits_given, user_subscription_credits_taken.credits AS credits_taken
   FROM user_subscription_credits_taken) credits 
ON u.subscription_id = credits.subscription_id
where
u.user_id = 1
GROUP BY u.subscription_id, u.user_id, u.mailinglist_id, u.valid_from,
u.valid_to

And here is the 'explain analyze' of the above query:


QUERY PLAN
--
 HashAggregate  (cost=200079055.24..200079055.28 rows=2 width=36)
(actual time=56527.153..56527.163 rows=2 loops=1)
   ->  Nested Loop Left Join  (cost=200033690.72..200078931.34 rows=8260
width=36) (actual time=0.432..54705.844 rows=275366 loops=1)
 Join Filter: ("outer".subscription_id =
"inner".subscription_id)
 ->  Index Scan using fki_user_subscriptions_fk__users_id on
user_subscriptions u  (cost=0.00..3.03 rows=2 width=28) (actual
time=0.030..0.055 rows=2 loops=1)
   Index Cond: (user_id = 1)
 ->  Materialize  (cost=200033690.72..200045984.63 rows=825991
width=12) (actual time=0.043..22404.107 rows=826032 loops=2)
   ->  Subquery Scan credits
(cost=1.00..200028830.73 rows=825991 width=12) (actual
time=0.050..31500.589 rows=826032 loops=1)
 ->  Append  (cost=1.00..200020570.82
rows=825991 width=8) (actual time=0.041..22571.540 rows=826032 loops=1)
   ->  Subquery Scan "*SELECT* 1"
(cost=1.00..11946.96 rows=78148 width=8) (actual
time=0.031..1226.640 rows=78148 loops=1)
 ->  Seq Scan on
user_subscription_credits_given  (cost=1.00..11165.48
rows=78148 width=8) (actual time=0.022..404.253 rows=78148 loops=1)
   ->  Subquery Scan "*SELECT* 2"
(cost=1.00..100018623.86 rows=747843 width=8) (actual
time=0.032..12641.705

Re: [PERFORM] Selects query stats?

2006-05-30 Thread Alvis Tunkelis

try pgtop. It is mytop clone for postgresql.

Regards,
alvis

Francisco Reyes wrote:
I am not sure if this is what the original poster was refering to, but I 
have used an application called mtop that shows how many queries per 
second mysql is doing.


In my case this is helpfull because we have a number of machines running 
postfix and each incoming mail generates about 7 queries. Queries are 
all very simmilar to each other in that scenario.


Having access to that query/second stat allowed me to improve the 
settings in MysQL. Ultimately once we migrated to a new server I could 
see how moving to the new machine increased the speed at which we could 
accept emails.


I am, little by little, getting PostgreSQL to be used, but for now the 
postfix queries are tied to MySQL. By the time we hopefully do move to 
PostgreSQL for the Postfix backend it will be very helpfull to have some 
sort of way to measure queries/time period.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Adding and filling new column on big table

2006-05-30 Thread Jonathan Blitz
> > So, I have tried to run the following command. The command never
finishes
> > (I gave up after about and hour and a half!).
> 
> Did you ever find what was the problem?
> Perhaps you needed to run a vacuum full on the table?

Nope.
I just gave up in the end and left it with NULL as the default value.
There were, in fact, over 2 million rows in the table rather than 1/4 of a
million so that was part of the problem.

Jonathan
 

-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.7.4/351 - Release Date: 05/29/2006
 


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly