date:20060425

Re: [PERFORM] security for row level but not based on Database user's

2006-04-25 Thread Richard Huxton


Friends wrote:

Hi

I need to set security for row level but not based on Database user's
login. It should be based on the user table login. For the particular
user I need to allow only the particular records to access insert,
update delete and select.


Well, the data access stuff is all manageable via views, which is the 
standard way to do this.


You don't say which version of PostgreSQL you are using, but I'd be 
tempted just to switch to a different user after connecting and use the 
session_user system function to control what is visible in the view.


For example:
CREATE VIEW my_contacts AS SELECT * FROM contacts WHERE owner = 
session_user;


If that's not practical then you'll need to write some functions to 
simulate your own session_user (say application_user()). This is easiest 
to write in plperl/pltcl or some other interpreted language - check the 
list archvies for plenty of discussion.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] security for row level but not based on Database user's

2006-04-25 Thread Richard Huxton


Ramasamy wrote:

Hi Richard ,
 Very good Day. Great information that you given to me.


Great glad you think it's useful. Oh, don't forget to cc: the 
mailing-list - that's the convention around here.



I will try in your idea. Here I am using SQL server 2000(Even I can use SQL
Sever 2005 too if needed.) We are dealing with lot of databases with lot of
business logic. I think your information will great for me.


Ah - you're not using PostgreSQL? Well, the principle is the same but 
you should be aware that this is a PostgreSQL list so the details will 
be different.



Let I try and I will back with you.

Thanks and hands of too you.

But i don't have any idea of getting Session from databases (I think you are
meaning that user can be handle with session). If you have some more idea
then it will be great full to me.


With MS-SQL you'll probably want to look at T-SQL variables, although I 
don't know if they last longer than a single transaction. Failing that, 
functions or temporary tables might be a help.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

[PERFORM] Query on postgresql 7.4.2 not using index

2006-04-25 Thread Arnau


Hi all,

  I have the following running on postgresql version 7.4.2:

CREATE SEQUENCE agenda_user_group_id_seq
MINVALUE 1
MAXVALUE 9223372036854775807
CYCLE
INCREMENT 1
START 1;

CREATE TABLE AGENDA_USERS_GROUPS
(
 AGENDA_USER_GROUP_ID  INT8
   CONSTRAINT pk_agndusrgrp_usergroup PRIMARY KEY
   DEFAULT NEXTVAL('agenda_user_group_id_seq'),
 USER_ID   NUMERIC(10)
   CONSTRAINT fk_agenda_uid  REFERENCES 
AGENDA_USERS (USER_ID)

   ON DELETE CASCADE
   NOT NULL,
 GROUP_ID  NUMERIC(10)
   CONSTRAINT fk_agenda_gid  REFERENCES 
AGENDA_GROUPS (GROUP_ID)

   ON DELETE CASCADE
   NOT NULL,
 CREATION_DATE DATE
   DEFAULT CURRENT_DATE,
   CONSTRAINT un_agndusrgrp_usergroup 
UNIQUE(USER_ID, GROUP_ID)

);

CREATE INDEX i_agnusrsgrs_userid ON AGENDA_USERS_GROUPS ( USER_ID );
CREATE INDEX i_agnusrsgrs_groupid ON AGENDA_USERS_GROUPS ( GROUP_ID );


When I execute:

EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups 
WHERE group_id = 9;


it does a sequential scan and doesn't use the index and I don't 
understand why, any idea? I have the same in postgresql 8.1 and it uses 
the index :-|


Thanks
--
Arnau

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [PERFORM] Query on postgresql 7.4.2 not using index

2006-04-25 Thread chris smith

On 4/25/06, Arnau [EMAIL PROTECTED] wrote:
 Hi all,

I have the following running on postgresql version 7.4.2:

 CREATE SEQUENCE agenda_user_group_id_seq
 MINVALUE 1
 MAXVALUE 9223372036854775807
 CYCLE
 INCREMENT 1
 START 1;

 CREATE TABLE AGENDA_USERS_GROUPS
 (
   AGENDA_USER_GROUP_ID  INT8
 CONSTRAINT pk_agndusrgrp_usergroup PRIMARY KEY
 DEFAULT NEXTVAL('agenda_user_group_id_seq'),
   USER_ID   NUMERIC(10)
 CONSTRAINT fk_agenda_uid  REFERENCES
 AGENDA_USERS (USER_ID)
 ON DELETE CASCADE
 NOT NULL,
   GROUP_ID  NUMERIC(10)
 CONSTRAINT fk_agenda_gid  REFERENCES
 AGENDA_GROUPS (GROUP_ID)
 ON DELETE CASCADE
 NOT NULL,
   CREATION_DATE DATE
 DEFAULT CURRENT_DATE,
 CONSTRAINT un_agndusrgrp_usergroup
 UNIQUE(USER_ID, GROUP_ID)
 );

 CREATE INDEX i_agnusrsgrs_userid ON AGENDA_USERS_GROUPS ( USER_ID );
 CREATE INDEX i_agnusrsgrs_groupid ON AGENDA_USERS_GROUPS ( GROUP_ID );


 When I execute:

 EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups
 WHERE group_id = 9;

Try

EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups
WHERE group_id::int8 = 9;

or

EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups
WHERE group_id = '9';

and let us know what happens.

--
Postgresql  php tutorials
http://www.designmagick.com/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Query on postgresql 7.4.2 not using index

2006-04-25 Thread Arnau


chris smith wrote:

On 4/25/06, Arnau [EMAIL PROTECTED] wrote:


Hi all,

  I have the following running on postgresql version 7.4.2:

CREATE SEQUENCE agenda_user_group_id_seq
MINVALUE 1
MAXVALUE 9223372036854775807
CYCLE
INCREMENT 1
START 1;

CREATE TABLE AGENDA_USERS_GROUPS
(
 AGENDA_USER_GROUP_ID  INT8
   CONSTRAINT pk_agndusrgrp_usergroup PRIMARY KEY
   DEFAULT NEXTVAL('agenda_user_group_id_seq'),
 USER_ID   NUMERIC(10)
   CONSTRAINT fk_agenda_uid  REFERENCES
AGENDA_USERS (USER_ID)
   ON DELETE CASCADE
   NOT NULL,
 GROUP_ID  NUMERIC(10)
   CONSTRAINT fk_agenda_gid  REFERENCES
AGENDA_GROUPS (GROUP_ID)
   ON DELETE CASCADE
   NOT NULL,
 CREATION_DATE DATE
   DEFAULT CURRENT_DATE,
   CONSTRAINT un_agndusrgrp_usergroup
UNIQUE(USER_ID, GROUP_ID)
);

CREATE INDEX i_agnusrsgrs_userid ON AGENDA_USERS_GROUPS ( USER_ID );
CREATE INDEX i_agnusrsgrs_groupid ON AGENDA_USERS_GROUPS ( GROUP_ID );


When I execute:

EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups
WHERE group_id = 9;



Try

EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups
WHERE group_id::int8 = 9;

or

EXPLAIN ANALYZE SELECT agenda_user_group_id FROM agenda_users_groups
WHERE group_id = '9';

and let us know what happens.




 The same, the table has 2547556 entries:

espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM 
agenda_users_groups

espsm_moviltelevision-# WHERE group_id::int8 = 9;
   QUERY PLAN
-
 Seq Scan on agenda_users_groups  (cost=0.00..59477.34 rows=12738 
width=8) (actual time=3409.541..11818.794 rows=367026 loops=1)

   Filter: ((group_id)::bigint = 9)
 Total runtime: 13452.114 ms
(3 filas)

espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM 
agenda_users_groups

espsm_moviltelevision-# WHERE group_id = '9';
   QUERY PLAN

 Seq Scan on agenda_users_groups  (cost=0.00..53108.45 rows=339675 
width=8) (actual time=916.903..5763.830 rows=367026 loops=1)

   Filter: (group_id = 9::numeric)
 Total runtime: 7259.861 ms
(3 filas)

espsm_moviltelevision=# select count(*) from agenda_users_groups ;
  count
-
 2547556


Thanks
--
Arnau

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Query on postgresql 7.4.2 not using index

2006-04-25 Thread Guillaume Smet

On 4/25/06, Arnau [EMAIL PROTECTED] wrote:
 espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM
 agenda_users_groups
 espsm_moviltelevision-# WHERE group_id = '9';
 QUERY PLAN
 
   Seq Scan on agenda_users_groups  (cost=0.00..53108.45 rows=339675
 width=8) (actual time=916.903..5763.830 rows=367026 loops=1)
 Filter: (group_id = 9::numeric)
   Total runtime: 7259.861 ms
 (3 filas)

Arnau,

Why do you use a numeric instead of an integer/bigint??

IIRC, there were a few problems with index on numeric column on older
version of PostgreSQL.

You can't change the type of a column with 7.4, so create a new
integer column then copy the values in this new column, drop the old
one, rename the new one. Run vacuum analyze and recreate your index.

It should work far better with an int.

Note that you will have to update all the tables referencing this key...

--
Guillaume

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Query on postgresql 7.4.2 not using index

2006-04-25 Thread Scott Marlowe

On Tue, 2006-04-25 at 08:49, Arnau wrote:
 chris smith wrote:
  On 4/25/06, Arnau [EMAIL PROTECTED] wrote:
  
 Hi all,
 
I have the following running on postgresql version 7.4.2:
 
 CREATE SEQUENCE agenda_user_group_id_seq
 MINVALUE 1
 MAXVALUE 9223372036854775807
 CYCLE
 INCREMENT 1
 START 1;
 
 CREATE TABLE AGENDA_USERS_GROUPS
 (
   AGENDA_USER_GROUP_ID  INT8
 CONSTRAINT pk_agndusrgrp_usergroup PRIMARY KEY
 DEFAULT NEXTVAL('agenda_user_group_id_seq'),
   USER_ID   NUMERIC(10)
 CONSTRAINT fk_agenda_uid  REFERENCES
 AGENDA_USERS (USER_ID)
 ON DELETE CASCADE
 NOT NULL,
   GROUP_ID  NUMERIC(10)
 CONSTRAINT fk_agenda_gid  REFERENCES
 AGENDA_GROUPS (GROUP_ID)
 ON DELETE CASCADE
 NOT NULL,
   CREATION_DATE DATE
 DEFAULT CURRENT_DATE,
 CONSTRAINT un_agndusrgrp_usergroup
 UNIQUE(USER_ID, GROUP_ID)
 );
 
 CREATE INDEX i_agnusrsgrs_userid ON AGENDA_USERS_GROUPS ( USER_ID );
 CREATE INDEX i_agnusrsgrs_groupid ON AGENDA_USERS_GROUPS ( GROUP_ID );

SNIP

   The same, the table has 2547556 entries:
 
 espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM 
 agenda_users_groups
 espsm_moviltelevision-# WHERE group_id::int8 = 9;
 QUERY PLAN
 -
   Seq Scan on agenda_users_groups  (cost=0.00..59477.34 rows=12738 
 width=8) (actual time=3409.541..11818.794 rows=367026 loops=1)
 Filter: ((group_id)::bigint = 9)
   Total runtime: 13452.114 ms
 (3 filas)
 
 espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM 
 agenda_users_groups
 espsm_moviltelevision-# WHERE group_id = '9';
 QUERY PLAN
 
   Seq Scan on agenda_users_groups  (cost=0.00..53108.45 rows=339675 
 width=8) (actual time=916.903..5763.830 rows=367026 loops=1)
 Filter: (group_id = 9::numeric)
   Total runtime: 7259.861 ms
 (3 filas)
 
 espsm_moviltelevision=# select count(*) from agenda_users_groups ;
count
 -
   2547556

OK, a few points.

1:  7.4.2 is WAY out of date for the 7.4 series.  The 7.4 series, also
it a bit out of date, and many issues in terms of performance have been
enhanced in the 8.x series.  You absolutely should update to the latest
7.4 series, as there are known data loss bugs and other issues in the
7.4.2 version.

2:  An index scan isn't always faster.  In this instance, it looks like
the number of rows that match in the last version of your query is well
over 10% of the rows.  Assuming your average row takes up 10% or so of
a block, which is pretty common, then you're going to have to hit almost
every block anyway to get your data.  So, an index scan is no win.

3:  To test whether or not an index scan IS a win, you can use the
enable_xxx settings to prove it to yourself:

set enable_seqscan = off;
explain analyze your query here;

and compare.  Note that the enable_seqscan = off thing is a sledge
hammer, not a nudge, and generally should NOT be used in production.  If
an index scan is generally a win for you, but the database isn't using
it, you might need to tune the database for your machine.  note that you
should NOT tune your database based on a single query.  You'll need to
reach a compromise on your settings that makes all your queries run
reasonably fast without the planner making insane decisions.  One of the
better postgresql tuning docs out there is the one at: 
http://www.varlena.com/GeneralBits/Tidbits/perf.html .

Good luck.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Query on postgresql 7.4.2 not using index

2006-04-25 Thread Tom Lane

Arnau [EMAIL PROTECTED] writes:

   Seq Scan on agenda_users_groups  (cost=0.00..53108.45 rows=339675 
 width=8) (actual time=916.903..5763.830 rows=367026 loops=1)
 Filter: (group_id = 9::numeric)
   Total runtime: 7259.861 ms
 (3 filas)

 espsm_moviltelevision=# select count(*) from agenda_users_groups ;
count
 -
   2547556

So the SELECT is fetching nearly 15% of the rows in the table.  The
planner is doing *the right thing* to use a seqscan, at least for
this particular group_id value.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Query on postgresql 7.4.2 not using index

2006-04-25 Thread Arnau


Tom Lane wrote:

Arnau [EMAIL PROTECTED] writes:


 Seq Scan on agenda_users_groups  (cost=0.00..53108.45 rows=339675 
width=8) (actual time=916.903..5763.830 rows=367026 loops=1)

   Filter: (group_id = 9::numeric)
 Total runtime: 7259.861 ms
(3 filas)




espsm_moviltelevision=# select count(*) from agenda_users_groups ;
  count
-
 2547556



So the SELECT is fetching nearly 15% of the rows in the table.  The
planner is doing *the right thing* to use a seqscan, at least for
this particular group_id value.



I have done the same tests on 8.1.0.


espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM 
agenda_users_groups WHERE group_id = 9;

   QUERY PLAN
--
 Bitmap Heap Scan on agenda_users_groups  (cost=2722.26..30341.78 
rows=400361 width=8) (actual time=145.533..680.839 rows=367026 loops=1)

   Recheck Cond: (group_id = 9::numeric)
   -  Bitmap Index Scan on i_agnusrsgrs_groupid  (cost=0.00..2722.26 
rows=400361 width=0) (actual time=142.958..142.958 rows=367026 loops=1)

 Index Cond: (group_id = 9::numeric)
 Total runtime: 1004.966 ms
(5 rows)

espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM 
agenda_users_groups WHERE group_id::int8 = 9;

  QUERY PLAN
---
 Seq Scan on agenda_users_groups  (cost=0.00..60947.43 rows=12777 
width=8) (actual time=457.963..2244.928 rows=367026 loops=1)

   Filter: ((group_id)::bigint = 9)
 Total runtime: 2571.496 ms
(3 rows)

espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM 
agenda_users_groups WHERE group_id::int8 = '9';

  QUERY PLAN
---
 Seq Scan on agenda_users_groups  (cost=0.00..60947.43 rows=12777 
width=8) (actual time=407.193..2182.880 rows=367026 loops=1)

   Filter: ((group_id)::bigint = 9::bigint)
 Total runtime: 2506.998 ms
(3 rows)

espsm_moviltelevision=# select count(*) from agenda_users_groups ;
  count
-
 2555437
(1 row)


  Postgresql then uses the index, I don't understand why? in this 
server I tried to tune the configuration, it's because of the tuning? 
Because it's a newer version of postgresql?



Thanks for all the replies
--
Arnau

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [PERFORM] Query on postgresql 7.4.2 not using index

2006-04-25 Thread Scott Marlowe

On Tue, 2006-04-25 at 10:47, Arnau wrote:
 Tom Lane wrote:
  Arnau [EMAIL PROTECTED] writes:
  
  
 espsm_moviltelevision=# select count(*) from agenda_users_groups ;
count
 -
   2547556
  
  
  So the SELECT is fetching nearly 15% of the rows in the table.  The
  planner is doing *the right thing* to use a seqscan, at least for
  this particular group_id value.
 
 
 I have done the same tests on 8.1.0.
 
 
 espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM 
 agenda_users_groups WHERE group_id = 9;
 QUERY PLAN
 --
   Bitmap Heap Scan on agenda_users_groups  (cost=2722.26..30341.78 
 rows=400361 width=8) (actual time=145.533..680.839 rows=367026 loops=1)
 Recheck Cond: (group_id = 9::numeric)
 -  Bitmap Index Scan on i_agnusrsgrs_groupid  (cost=0.00..2722.26 
 rows=400361 width=0) (actual time=142.958..142.958 rows=367026 loops=1)
   Index Cond: (group_id = 9::numeric)
   Total runtime: 1004.966 ms
 (5 rows)

How big are these individual records?  I'm guessing a fairly good size,
since an index scan is winning.

 espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM 
 agenda_users_groups WHERE group_id::int8 = 9;
QUERY PLAN
 ---
   Seq Scan on agenda_users_groups  (cost=0.00..60947.43 rows=12777 
 width=8) (actual time=457.963..2244.928 rows=367026 loops=1)
 Filter: ((group_id)::bigint = 9)
   Total runtime: 2571.496 ms
 (3 rows)

OK.  Stop and think about what you're telling postgresql to do here.

You're telling it to cast the field group_id to int8, then compare it to
9.  How can it cast the group_id to int8 without fetching it?  That's
right, you're ensuring a seq scan.  You need to put the int8 cast on the
other side of that equality comparison, like:

where group_id = 9::int8



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Query on postgresql 7.4.2 not using index

2006-04-25 Thread Tom Lane

Arnau [EMAIL PROTECTED] writes:
 I have done the same tests on 8.1.0.

Bitmap scans are a totally different animal that doesn't exist in 7.4.
A plain indexscan, such as 7.4 knows about, is generally not effective
for fetching more than a percent or two of the table.  The crossover
point for a bitmap scan is much higher (don't know exactly, but probably
something like 30-50%).

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Query on postgresql 7.4.2 not using index

2006-04-25 Thread Arnau




I have done the same tests on 8.1.0.


espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM 
agenda_users_groups WHERE group_id = 9;

   QUERY PLAN
--
 Bitmap Heap Scan on agenda_users_groups  (cost=2722.26..30341.78 
rows=400361 width=8) (actual time=145.533..680.839 rows=367026 loops=1)

   Recheck Cond: (group_id = 9::numeric)
   -  Bitmap Index Scan on i_agnusrsgrs_groupid  (cost=0.00..2722.26 
rows=400361 width=0) (actual time=142.958..142.958 rows=367026 loops=1)

 Index Cond: (group_id = 9::numeric)
 Total runtime: 1004.966 ms
(5 rows)



How big are these individual records?  I'm guessing a fairly good size,
since an index scan is winning.


  How I could know the size on an individual record?




espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM 
agenda_users_groups WHERE group_id::int8 = 9;

  QUERY PLAN
---
 Seq Scan on agenda_users_groups  (cost=0.00..60947.43 rows=12777 
width=8) (actual time=457.963..2244.928 rows=367026 loops=1)

   Filter: ((group_id)::bigint = 9)
 Total runtime: 2571.496 ms
(3 rows)



OK.  Stop and think about what you're telling postgresql to do here.

You're telling it to cast the field group_id to int8, then compare it to
9.  How can it cast the group_id to int8 without fetching it?  That's
right, you're ensuring a seq scan.  You need to put the int8 cast on the
other side of that equality comparison, like:

where group_id = 9::int8


  I just did what Chris Smith asked me to do :), here I paste the 
results I get when I change the cast.


espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM 
agenda_users_groups WHERE group_id = 9::int8;

  QUERY PLAN
--
 Bitmap Heap Scan on agenda_users_groups  (cost=2722.33..30343.06 
rows=400379 width=8) (actual time=147.723..714.473 rows=367026 loops=1)

   Recheck Cond: (group_id = 9::numeric)
   -  Bitmap Index Scan on i_agnusrsgrs_groupid  (cost=0.00..2722.33 
rows=400379 width=0) (actual time=145.015..145.015 rows=367026 loops=1)

 Index Cond: (group_id = 9::numeric)
 Total runtime: 1038.537 ms
(5 rows)

espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM 
agenda_users_groups WHERE group_id = '9'::int8;

  QUERY PLAN
--
 Bitmap Heap Scan on agenda_users_groups  (cost=2722.33..30343.06 
rows=400379 width=8) (actual time=153.858..1192.838 rows=367026 loops=1)

   Recheck Cond: (group_id = 9::numeric)
   -  Bitmap Index Scan on i_agnusrsgrs_groupid  (cost=0.00..2722.33 
rows=400379 width=0) (actual time=151.298..151.298 rows=367026 loops=1)

 Index Cond: (group_id = 9::numeric)
 Total runtime: 1527.039 ms
(5 rows)


Thanks
--
Arnau

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

[PERFORM] PL/pgSQL Loop Vs. Batch Update

2006-04-25 Thread David Wheeler


Fellow PostgreSQLers,

This post is longish and has a bit of code, but here's my question up- 
front: Why are batch queries in my PL/pgSQL functions no faster than  
iterating over a loop and executing a series of queries for each  
iteration of the loop? Are batch queries or array or series  
generation slow in PL/pgSQL?


Now, for the details of my problem. For managing an ordered many-to- 
many relations between blog entries and tags, I created these tables:


  CREATE TABLE entry (
id SERIAL PRIMARY KEY,
title text,
content text
  );

  CREATE TABLE tag (
id SERIAL PRIMARY KEY,
name text
  );

  CREATE TABLE entry_coll_tag (
entry_id integer REFERENCES entry(id)
 ON UPDATE CASCADE
 ON DELETE CASCADE,
tag_id   integer REFERENCES tag(id)
 ON UPDATE CASCADE
 ON DELETE CASCADE,
ord  smallint,
PRIMARY KEY (entry_id, tag_id)
  );

To make it easy to associate an entry with a bunch of tags in a  
single query, I wrote this PL/pgSQL function:


  CREATE OR REPLACE FUNCTION entry_coll_tag_set (
  obj_id   integer,
  coll_ids integer[]
  ) RETURNS VOID AS $$
  DECLARE
-- For checking to see if a tuple was updated.
update_count smallint;
-- For looping through the array.
iloop integer   := 1;
  BEGIN
  -- Lock the containing object tuple to prevernt inserts into the
  -- collection table.
  PERFORM true FROM entry WHERE id = obj_id FOR UPDATE;

  -- Update possible existing record with the current sequence so
  -- as to avoid unique constraint violations. We just set it to a
  -- negative number, since negative numbers are never used for
  -- ord.
  UPDATE entry_coll_tag
  SETord = -ord
  WHERE  entry_id = obj_id;

  -- Loop through the tag IDs to associate with the entry ID.
  while coll_ids[iloop] is not null loop
  -- Update possible existing collection record.
  UPDATE entry_coll_tag
  SETord = iloop
  WHERE  entry_id = obj_id
 AND tag_id = coll_ids[iloop];

  -- Only insert the item if nothing was updated.
  IF FOUND IS false THEN
  -- Insert a new record.
  INSERT INTO entry_coll_tag (entry_id, tag_id, ord)
  VALUES (obj_id, coll_ids[iloop], iloop);
  END IF;
  iloop := iloop + 1;
  END loop;

  -- Delete any remaining tuples.
  DELETE FROM entry_coll_tag
  WHERE  entry_id = obj_id AND ord  0;
  END;
  $$ LANGUAGE plpgsql SECURITY DEFINER;

Josh Berkus kindly reviewed it, and suggested that I take advantage  
of generate_series() and batch updates instead of looping through the  
results. Here's the revised version:


  CREATE OR REPLACE FUNCTION entry_coll_tag_set (
  obj_id   integer,
  coll_ids integer[]
  ) RETURNS VOID AS $$
  BEGIN
  -- Lock the containing object tuple to prevernt inserts into the
  -- collection table.
  PERFORM true FROM entry WHERE id = obj_id FOR UPDATE;

  -- First set all ords to negative value to prevent unique index
  -- violations.
  UPDATE entry_coll_tag
  SETord = -ord
  WHERE  entry_id = obj_id;

  IF FOUND IS false THEN
  -- There are no existing tuples, so just insert the new ones.
  INSERT INTO entry_coll_tag (entry_id, tag_id, ord)
  SELECT obj_id, coll_ids[gs.ser], gs.ser
  FROM   generate_series(1, array_upper(coll_ids, 1))
 AS gs(ser);
  ELSE
  -- First, update the existing tuples to new ord values.
  UPDATE entry_coll_tag SET ord = ser
  FROM   (
  SELECT gs.ser, coll_ids[gs.ser] as move_tag
  FROM   generate_series(1, array_upper(coll_ids, 1))
 AS gs(ser)
  ) AS expansion
  WHERE move_tag = entry_coll_tag.tag_id
AND entry_id = obj_id;

  -- Now insert the new tuples.
  INSERT INTO entry_coll_tag (entry_id, tag_id, ord )
  SELECT obj_id, coll_ids[gs.ser], gs.ser
  FROM   generate_series(1, array_upper(coll_ids, 1)) AS gs 
(ser)

  WHERE  coll_ids[gs.ser] NOT IN (
  SELECT tag_id FROM entry_coll_tag ect2
  WHERE  entry_id = obj_id
  );

  -- Delete any remaining tuples.
  DELETE FROM entry_coll_tag
  WHERE  entry_id = obj_id AND ord  0;
  END IF;
  END;
  $$ LANGUAGE plpgsql SECURITY DEFINER;

Josh thought that the replacement of the loop with a couple of batch  
queries would be an order of magnitude faster. So I happily ran my  
benchmark comparing the two approaches. The benchmark actually tests  
two different functions that had a similar refactoring, as well as  
two other functions that are quite simple. Before the tests, I  
inserted 100,000 entry records, and a random number of tag records  
(1-10 for each entry) and related

[PERFORM] Firebird 1.5.3 X Postgresql 8.1.3 (linux and windows)

2006-04-25 Thread andremachado

Hello,
The performance comparison saga of the last month continues (see list archive).
After some time experimenting on windows, the conclusion is clear:

windows is likely crap for databases other than MS-SQL.

I guess that MS-SQL uses lot of undocumented api calls, may run in kernel
mode, ring 0 and a lot of dirty tricks to get some reasonable performance.

Then, I asked my coleague to send a new FB dump and a Pg dump to try at my
desktop machine.
This time, the database is somewhat bigger. Around 20 million records.

The timings are attached. Tried to follow the same query sequence on both files.
Both databases are much more faster on linux than on windows, and the desktop
machine is not dedicated and tuned. (no scsi, no raid, many services enabled,
ext3 fs, etc).
At many of the queries, postgresql is faster, sometimes way MUCH faster.
But Firebird have very good defaults out of the box and a few of the queries
are really a pain in Postgresql.
Please, see the abismal timing differences at the last 2 queries, for example.
They used 100% cpu, almost no disk activity, no twait cpu, for lng time to
complete.
Maybe these queries bring into the light some instructions weaknesses, or bad
tuning. 
Do you have some suggestions?
Regards.

Andre Felipe Machado
http://www.techforce.com.br
linux blog


queries_timing_firebird.txt.tar.gz
Description: GNU Zip compressed data


queries_timing_postgresql.txt.tar.gz
Description: GNU Zip compressed data

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[PERFORM] planner not using index for like operator

2006-04-25 Thread Sriram Dandapani









For the query





Select col1 from table1

Where col1 like 172.%



The table has 133 million unique ip addresses. Col1 is
indexed.



The optimizer is using a sequential scan



This is the explain analyze output



Seq Scan on table1 (cost=0.00..2529284.80 rows=1
width=15) (actual time=307591.339..565251.775 rows=524288 loops=1)

 Filter: ((col1)::text ~~ '172.%'::text)

Total runtime: 565501.873 ms





The number of affected rows (500K) is a small fraction of
the total row count.

[PERFORM] Slow queries salad ;)

2006-04-25 Thread PFC



Here is a simple test case for this strange behaviour :

annonces= CREATE TABLE test.current (id INTEGER PRIMARY KEY, description  
TEXT);
INFO:  CREATE TABLE / PRIMARY KEY creera un index implicite  
current_pkey pour la table current

CREATE TABLE

annonces= CREATE TABLE test.archive (id INTEGER PRIMARY KEY, description  
TEXT);
INFO:  CREATE TABLE / PRIMARY KEY creera un index implicite  
archive_pkey pour la table archive

CREATE TABLE

annonces= CREATE VIEW test.all AS SELECT * FROM test.archive UNION ALL  
SELECT * FROM test.current ;

CREATE VIEW

let's populate...

annonces= INSERT INTO test.current SELECT id, description FROM annonces;
INSERT 0 11524
annonces= INSERT INTO test.archive SELECT id, description FROM  
archive_ext;

INSERT 0 437351

annonces= ANALYZE test.archive;
ANALYZE
annonces= ANALYZE test.current;
ANALYZE

This is the bookmarks table...

SELECT count(*), list_id FROM bookmarks GROUP BY list_id;
 count | list_id
---+-
15 |   7
 5 |   5
   150 |   4
 3 |   3
15 |   2
 2 |   1
 6 |   8

I want to list the stuff I bookmarked :


annonces= EXPLAIN ANALYZE SELECT * FROM test.current WHERE id IN (SELECT  
annonce_id FROM bookmarks WHERE list_id IN ('4'));

   QUERY PLAN
-
 Hash Join  (cost=6.58..532.84 rows=140 width=203) (actual  
time=0.747..5.283 rows=150 loops=1)

   Hash Cond: (outer.id = inner.annonce_id)
   -  Seq Scan on current  (cost=0.00..467.24 rows=11524 width=203)  
(actual time=0.006..3.191 rows=11524 loops=1)
   -  Hash  (cost=6.23..6.23 rows=140 width=4) (actual time=0.244..0.244  
rows=150 loops=1)
 -  HashAggregate  (cost=4.83..6.23 rows=140 width=4) (actual  
time=0.155..0.184 rows=150 loops=1)
   -  Seq Scan on bookmarks  (cost=0.00..4.45 rows=150  
width=4) (actual time=0.008..0.097 rows=150 loops=1)

 Filter: (list_id = 4)
 Total runtime: 5.343 ms
(8 lignes)

annonces= set enable_hashjoin TO 0;
SET
annonces= EXPLAIN ANALYZE SELECT * FROM test.current WHERE id IN (SELECT  
annonce_id FROM bookmarks WHERE list_id IN ('4'));

   QUERY PLAN

 Nested Loop  (cost=4.83..824.22 rows=140 width=203) (actual  
time=0.219..1.034 rows=150 loops=1)
   -  HashAggregate  (cost=4.83..6.23 rows=140 width=4) (actual  
time=0.158..0.199 rows=150 loops=1)
 -  Seq Scan on bookmarks  (cost=0.00..4.45 rows=150 width=4)  
(actual time=0.011..0.096 rows=150 loops=1)

   Filter: (list_id = 4)
   -  Index Scan using current_pkey on current  (cost=0.00..5.83 rows=1  
width=203) (actual time=0.005..0.005 rows=1 loops=150)

 Index Cond: (current.id = outer.annonce_id)
 Total runtime: 1.108 ms
(7 lignes)

	Hm, the row estimates on the bookmarks table are spot on ; why did it  
choose the hash join ?


	Now, if I want to access the all view which contains the union of the  
current and archive table :


annonces= set enable_hashjoin TO 1;
SET
annonces= EXPLAIN ANALYZE SELECT * FROM test.all WHERE id IN (SELECT  
annonce_id FROM bookmarks WHERE list_id IN ('4'));

   QUERY PLAN

 Hash Join  (cost=6.58..33484.41 rows=314397 width=36) (actual  
time=8300.484..8311.784 rows=150 loops=1)

   Hash Cond: (outer.?column1? = inner.annonce_id)
   -  Append  (cost=0.00..23596.78 rows=449139 width=219) (actual  
time=6.390..8230.821 rows=448875 loops=1)
 -  Seq Scan on archive  (cost=0.00..18638.15 rows=437615  
width=219) (actual time=6.389..8175.491 rows=437351 loops=1)
 -  Seq Scan on current  (cost=0.00..467.24 rows=11524 width=203)  
(actual time=0.022..8.985 rows=11524 loops=1)
   -  Hash  (cost=6.23..6.23 rows=140 width=4) (actual time=0.255..0.255  
rows=150 loops=1)
 -  HashAggregate  (cost=4.83..6.23 rows=140 width=4) (actual  
time=0.168..0.197 rows=150 loops=1)
   -  Seq Scan on bookmarks  (cost=0.00..4.45 rows=150  
width=4) (actual time=0.015..0.102 rows=150 loops=1)

 Filter: (list_id = 4)
 Total runtime: 8311.870 ms
(10 lignes)

annonces= set enable_hashjoin TO 0;
SET
annonces= EXPLAIN ANALYZE SELECT * FROM test.all WHERE id IN (SELECT  
annonce_id FROM bookmarks WHERE list_id IN ('4'));

 QUERY PLAN
-
 Merge Join  (cost=79604.61..84994.98 rows=314397

Re: [PERFORM] planner not using index for like operator

2006-04-25 Thread Dave Dutcher

Title: Message



If you 
are using a locale other than the C locale, you need to create the index with an 
operator class to get index scans with like.

See 
here for details:

http://www.postgresql.org/docs/8.1/interactive/indexes-opclass.html


  
  -Original Message-From: 
  [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED] On Behalf Of Sriram 
  DandapaniSent: Tuesday, April 25, 2006 12:08 PMTo: 
  Pgsql-Performance (E-mail)Subject: [PERFORM] planner not using 
  index for like operator
  
  For the 
  query
  
  
  Select col1 from 
  table1
  Where col1 like 
  172.%
  
  The table has 133 million unique 
  ip addresses. Col1 is indexed.
  
  The optimizer is using a 
  sequential scan
  
  This is the explain analyze 
  output
  
  "Seq Scan on table1 
  (cost=0.00..2529284.80 rows=1 width=15) (actual time=307591.339..565251.775 
  rows=524288 loops=1)"
  " Filter: ((col1)::text ~~ 
  '172.%'::text)"
  "Total runtime: 565501.873 
  ms"
  
  
  The number of affected rows (500K) 
  is a small fraction of the total row 
count.

[PERFORM] Large (8M) cache vs. dual-core CPUs

2006-04-25 Thread Bill Moran


I've been given the task of making some hardware recommendations for
the next round of server purchases.  The machines to be purchased
will be running FreeBSD  PostgreSQL.

Where I'm stuck is in deciding whether we want to go with dual-core
pentiums with 2M cache, or with HT pentiums with 8M cache.

Both of these are expensive bits of hardware, and I'm trying to
gather as much evidence as possible before making a recommendation.
The FreeBSD community seems pretty divided over which is likely to
be better, and I have been unable to discover a method for estimating
how much of the 2M cache on our existing systems is being used.

Does anyone in the PostgreSQL community have any experience with
large caches or dual-core pentiums that could make any recommendations?
Our current Dell 2850 systems are CPU bound - i.e. they have enough
RAM, and fast enough disks that the CPUs seem to be the limiting
factor.  As a result, this decision on what kind of CPUs to get in
the next round of servers is pretty important.

Any advice is much appreciated.

-- 
Bill Moran
Collaborative Fusion Inc.


IMPORTANT: This message contains confidential information and is
intended only for the individual named. If the reader of this
message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Slow queries salad ;)

2006-04-25 Thread Tom Lane

PFC [EMAIL PROTECTED] writes:
   The IN() is quite small (150 values), but the two large tables are  
 seq-scanned... is there a way to avoid this ?

Not in 8.1.  HEAD is a bit smarter about joins to Append relations.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] planner not using index for like operator

2006-04-25 Thread Jim C. Nasby

On Tue, Apr 25, 2006 at 10:08:02AM -0700, Sriram Dandapani wrote:
Here's the key:

   Filter: ((col1)::text ~~ '172.%'::text)

In order to do a like comparison, it has to convert col1 (which I'm
guessing is of type 'inet') to text, so there's no way it can use an
index on col1 (maybe a function index, but that's a different story).

Is there some reason you're not doing

WHERE col1 = '172/8'::inet

?
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] planner not using index for like operator

2006-04-25 Thread Sriram Dandapani

The col is a varchar. I am currently testing with the inet data type(and
also the ipv4 pgfoundry data type).

Due to time constraints, I am trying to minimize code changes.

What kind of index do I need to create to enable efficient range scans
(e.g anything between 172.16.x.x thru 172.31.x.x) on the inet data type?

Thanks

Sriram

-Original Message-
From: Jim C. Nasby [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, April 25, 2006 11:25 AM
To: Sriram Dandapani
Cc: Pgsql-Performance (E-mail)
Subject: Re: [PERFORM] planner not using index for like operator

On Tue, Apr 25, 2006 at 10:08:02AM -0700, Sriram Dandapani wrote:
Here's the key:

   Filter: ((col1)::text ~~ '172.%'::text)

In order to do a like comparison, it has to convert col1 (which I'm
guessing is of type 'inet') to text, so there's no way it can use an
index on col1 (maybe a function index, but that's a different story).

Is there some reason you're not doing

WHERE col1 = '172/8'::inet

?
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Large (8M) cache vs. dual-core CPUs

2006-04-25 Thread Scott Marlowe

On Tue, 2006-04-25 at 13:14, Bill Moran wrote:
 I've been given the task of making some hardware recommendations for
 the next round of server purchases.  The machines to be purchased
 will be running FreeBSD  PostgreSQL.
 
 Where I'm stuck is in deciding whether we want to go with dual-core
 pentiums with 2M cache, or with HT pentiums with 8M cache.

Given a choice between those two processors, I'd choose the AMD 64 x 2
CPU.  It's a significantly better processor than either of the Intel
choices.  And if you get the HT processor, you might as well turn of HT
on a PostgreSQL machine.  I've yet to see it make postgresql run faster,
but I've certainly seen HT make it run slower.

If you can't run AMD in your shop due to bigotry (let's call a spade a
spade) then I'd recommend the real dual core CPU with 2M cache.  Most of
what makes a database slow is memory and disk bandwidth.  Few datasets
are gonna fit in that 8M cache, and when they do, they'll get flushed
right out by the next request anyway.

 Does anyone in the PostgreSQL community have any experience with
 large caches or dual-core pentiums that could make any recommendations?
 Our current Dell 2850 systems are CPU bound - i.e. they have enough
 RAM, and fast enough disks that the CPUs seem to be the limiting
 factor.  As a result, this decision on what kind of CPUs to get in
 the next round of servers is pretty important.

If the CPUs are running at 100% then you're likely not memory I/O bound,
but processing speed bound.  The dual core will definitely be the better
option in that case.  I take it you work at a Dell Only place, hence
no AMD for you...

Sad, cause the AMD is, on a price / performance scale, twice the
processor for the same money as the Intel.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Large (8M) cache vs. dual-core CPUs

2006-04-25 Thread Gavin Hamill

On Tue, 25 Apr 2006 14:14:35 -0400
Bill Moran [EMAIL PROTECTED] wrote:

 Does anyone in the PostgreSQL community have any experience with
 large caches or dual-core pentiums that could make any
 recommendations? 

Heh :) You're in the position I was in about a year ago - we naturally
replaced our old Dell 2650 with £14k of Dell 6850 Quad Xeon with 8M
cache, and TBH the performance is woeful :/

Having gone through Postgres consultancy, been through IBM 8-way POWER4
hardware, discovered a bit of a shortcoming in PG on N-way hardware
(where N is large) [1] , I have been able to try out a dual-dual-core
Opteron machine, and it flies.

In fact, it flies so well that we ordered one that day. So, in short
£3k's worth of dual-opteron beat the living daylights out of our Xeon
monster. I can't praise the Opteron enough, and I've always been a firm
Intel pedant - the HyperTransport stuff must really be doing wonders. I
typically see 500ms searches on it instead of 1000-2000ms on the Xeon)

As it stands, I've had to borrow this Opteron so much (and send live
searches across the net to the remote box) because otherwise we simply
don't have enough CPU power to run the website (!)

Cheers,
Gavin.

[1] Simon Riggs + Tom Lane are currently involved in optimisation work
for this - it turns out our extremely read-heavy load pattern reveals
some buffer locking issues in PG.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Slow queries salad ;)

2006-04-25 Thread Jim C. Nasby

On Tue, Apr 25, 2006 at 07:53:15PM +0200, PFC wrote:
What version is this??

 annonces= EXPLAIN ANALYZE SELECT * FROM test.current WHERE id IN (SELECT  
 annonce_id FROM bookmarks WHERE list_id IN ('4'));
QUERY PLAN
 -
  Hash Join  (cost=6.58..532.84 rows=140 width=203) (actual  
 time=0.747..5.283 rows=150 loops=1)
Hash Cond: (outer.id = inner.annonce_id)
-  Seq Scan on current  (cost=0.00..467.24 rows=11524 width=203)  
 (actual time=0.006..3.191 rows=11524 loops=1)
-  Hash  (cost=6.23..6.23 rows=140 width=4) (actual time=0.244..0.244  
 rows=150 loops=1)
  -  HashAggregate  (cost=4.83..6.23 rows=140 width=4) (actual  
 time=0.155..0.184 rows=150 loops=1)
-  Seq Scan on bookmarks  (cost=0.00..4.45 rows=150  
 width=4) (actual time=0.008..0.097 rows=150 loops=1)
  Filter: (list_id = 4)
  Total runtime: 5.343 ms
 (8 lignes)
 
 annonces= set enable_hashjoin TO 0;
 SET
 annonces= EXPLAIN ANALYZE SELECT * FROM test.current WHERE id IN (SELECT  
 annonce_id FROM bookmarks WHERE list_id IN ('4'));
QUERY PLAN
 
  Nested Loop  (cost=4.83..824.22 rows=140 width=203) (actual  
 time=0.219..1.034 rows=150 loops=1)
-  HashAggregate  (cost=4.83..6.23 rows=140 width=4) (actual  
 time=0.158..0.199 rows=150 loops=1)
  -  Seq Scan on bookmarks  (cost=0.00..4.45 rows=150 width=4)  
 (actual time=0.011..0.096 rows=150 loops=1)
Filter: (list_id = 4)
-  Index Scan using current_pkey on current  (cost=0.00..5.83 rows=1  
 width=203) (actual time=0.005..0.005 rows=1 loops=150)
  Index Cond: (current.id = outer.annonce_id)
  Total runtime: 1.108 ms
 (7 lignes)
 
   Hm, the row estimates on the bookmarks table are spot on ; why did 
   it  choose the hash join ?

Because it thought it would be cheaper; see the estimates. Increasing
effective_cache_size or decreasing random_page_cost would favor the
index scan.

   Now, if I want to access the all view which contains the union of 
   the  current and archive table :
 
 annonces= set enable_hashjoin TO 1;
 SET
 annonces= EXPLAIN ANALYZE SELECT * FROM test.all WHERE id IN (SELECT  
 annonce_id FROM bookmarks WHERE list_id IN ('4'));
QUERY PLAN
 
  Hash Join  (cost=6.58..33484.41 rows=314397 width=36) (actual  
 time=8300.484..8311.784 rows=150 loops=1)
Hash Cond: (outer.?column1? = inner.annonce_id)
-  Append  (cost=0.00..23596.78 rows=449139 width=219) (actual  
 time=6.390..8230.821 rows=448875 loops=1)
  -  Seq Scan on archive  (cost=0.00..18638.15 rows=437615  
 width=219) (actual time=6.389..8175.491 rows=437351 loops=1)
  -  Seq Scan on current  (cost=0.00..467.24 rows=11524 width=203)  
 (actual time=0.022..8.985 rows=11524 loops=1)
-  Hash  (cost=6.23..6.23 rows=140 width=4) (actual time=0.255..0.255  
 rows=150 loops=1)
  -  HashAggregate  (cost=4.83..6.23 rows=140 width=4) (actual  
 time=0.168..0.197 rows=150 loops=1)
-  Seq Scan on bookmarks  (cost=0.00..4.45 rows=150  
 width=4) (actual time=0.015..0.102 rows=150 loops=1)
  Filter: (list_id = 4)
  Total runtime: 8311.870 ms
 (10 lignes)
 
 annonces= set enable_hashjoin TO 0;
 SET
 annonces= EXPLAIN ANALYZE SELECT * FROM test.all WHERE id IN (SELECT  
 annonce_id FROM bookmarks WHERE list_id IN ('4'));
  QUERY PLAN
 -
  Merge Join  (cost=79604.61..84994.98 rows=314397 width=36) (actual  
 time=6944.229..7109.371 rows=150 loops=1)
Merge Cond: (outer.annonce_id = inner.id)
-  Sort  (cost=11.22..11.57 rows=140 width=4) (actual  
 time=0.326..0.355 rows=150 loops=1)
  Sort Key: bookmarks.annonce_id
  -  HashAggregate  (cost=4.83..6.23 rows=140 width=4) (actual  
 time=0.187..0.218 rows=150 loops=1)
-  Seq Scan on bookmarks  (cost=0.00..4.45 rows=150  
 width=4) (actual time=0.028..0.126 rows=150 loops=1)
  Filter: (list_id = 4)
-  Sort  (cost=79593.40..80716.25 rows=449139 width=36) (actual  
 time=6789.786..7014.815 rows=448625 loops=1)
  Sort Key: all.id
  -  Append  (cost=0.00..23596.78 rows=449139 width=219) (actual  
 time=0.013..391.447 rows=448875 loops=1)
-  Seq Scan on archive  (cost=0.00..18638.15 rows=437615

Re: [PERFORM] Large (8M) cache vs. dual-core CPUs

2006-04-25 Thread Scott Marlowe

On Tue, 2006-04-25 at 13:14, Bill Moran wrote:
 I've been given the task of making some hardware recommendations for
 the next round of server purchases.  The machines to be purchased
 will be running FreeBSD  PostgreSQL.
 
 Where I'm stuck is in deciding whether we want to go with dual-core
 pentiums with 2M cache, or with HT pentiums with 8M cache.

BTW: For an interesting article on why the dual core Opterons are so
much better than their Intel cousins, read this article:

http://techreport.com/reviews/2005q2/opteron-x75/index.x?pg=1

Enlightening read.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Large (8M) cache vs. dual-core CPUs

2006-04-25 Thread mark

On Tue, Apr 25, 2006 at 01:33:38PM -0500, Scott Marlowe wrote:
 Sad, cause the AMD is, on a price / performance scale, twice the
 processor for the same money as the Intel.

Maybe a year or two ago. Prices are all coming down. Intel more
than AMD.

AMD still seems better - but not X2, and it depends on the workload.

X2 sounds like biggotry against Intel... :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Large (8M) cache vs. dual-core CPUs

2006-04-25 Thread Scott Marlowe

On Tue, 2006-04-25 at 13:38, [EMAIL PROTECTED] wrote:
 On Tue, Apr 25, 2006 at 01:33:38PM -0500, Scott Marlowe wrote:
  Sad, cause the AMD is, on a price / performance scale, twice the
  processor for the same money as the Intel.
 
 Maybe a year or two ago. Prices are all coming down. Intel more
 than AMD.
 
 AMD still seems better - but not X2, and it depends on the workload.
 
 X2 sounds like biggotry against Intel... :-)

Actually, that was from an article from this last month that compared
the dual core intel to the amd.  for every dollar spent on the intel,
you got about half the performance of the amd.  Not bigotry.  fact.

But don't believe me or the other people who've seen the difference.  Go
buy the Intel box.  No skin off my back.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Large (8M) cache vs. dual-core CPUs

2006-04-25 Thread Joshua D. Drake


Bill Moran wrote:

I've been given the task of making some hardware recommendations for
the next round of server purchases.  The machines to be purchased
will be running FreeBSD  PostgreSQL.

Where I'm stuck is in deciding whether we want to go with dual-core
pentiums with 2M cache, or with HT pentiums with 8M cache.


Dual Core Opterons :)

Joshua D. Drake



Both of these are expensive bits of hardware, and I'm trying to
gather as much evidence as possible before making a recommendation.
The FreeBSD community seems pretty divided over which is likely to
be better, and I have been unable to discover a method for estimating
how much of the 2M cache on our existing systems is being used.

Does anyone in the PostgreSQL community have any experience with
large caches or dual-core pentiums that could make any recommendations?
Our current Dell 2850 systems are CPU bound - i.e. they have enough
RAM, and fast enough disks that the CPUs seem to be the limiting
factor.  As a result, this decision on what kind of CPUs to get in
the next round of servers is pretty important.

Any advice is much appreciated.




--

   === The PostgreSQL Company: Command Prompt, Inc. ===
 Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
 Providing the most comprehensive  PostgreSQL solutions since 1997
http://www.commandprompt.com/



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] planner not using index for like operator

2006-04-25 Thread Sriram Dandapani

Using an index on col1 with the operator class varchar_pattern_ops , I
was able to get a 3 second response time. That will work for me.
I used a like '172.%' and an extra pattern matching condition to
restrict
Between 172.16.x.x and 172.31.x.x

Thanks for the input..I will also test the inet data type to see if
there are differences.

Sriram

-Original Message-
From: Jim C. Nasby [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, April 25, 2006 11:25 AM
To: Sriram Dandapani
Cc: Pgsql-Performance (E-mail)
Subject: Re: [PERFORM] planner not using index for like operator

On Tue, Apr 25, 2006 at 10:08:02AM -0700, Sriram Dandapani wrote:
Here's the key:

   Filter: ((col1)::text ~~ '172.%'::text)

In order to do a like comparison, it has to convert col1 (which I'm
guessing is of type 'inet') to text, so there's no way it can use an
index on col1 (maybe a function index, but that's a different story).

Is there some reason you're not doing

WHERE col1 = '172/8'::inet

?
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Large (8M) cache vs. dual-core CPUs

2006-04-25 Thread Joshua D. Drake




But don't believe me or the other people who've seen the difference.  Go
buy the Intel box.  No skin off my back.


To be more detailed... AMD Opteron has some specific technical 
advantages to their design over Intel when it comes to peforming for a 
database. Specifically no front side bus :)


Also it is widely known and documented (just review the archives) that 
AMD performs better then the equivelant Intel CPU, dollar for dollar.


Lastly it is also known that Dell frankly, sucks for PostgreSQL. Again, 
check the archives.


Joshua D. Drake


--

   === The PostgreSQL Company: Command Prompt, Inc. ===
 Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
 Providing the most comprehensive  PostgreSQL solutions since 1997
http://www.commandprompt.com/



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] ip address data type

2006-04-25 Thread Florian Weimer

* Sriram Dandapani:

 Does the inet data type offer comparison/search performance benefits
 over plain text for ip addresses..

Queries like host  '192.168.17.192/28' use an available index on
the host column.  In theory, you could do this with LIKE and strings,
but this gets pretty messy and needs a lot of client-side logic.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Large (8M) cache vs. dual-core CPUs

2006-04-25 Thread David Boreham







  Actually, that was from an article from this last month that compared
the dual core intel to the amd.  for every dollar spent on the intel,
you got about half the performance of the amd.  Not bigotry.  fact.

But don't believe me or the other people who've seen the difference.  Go
buy the Intel box.  No skin off my back.
  

I've been doing plenty of performance evaluation on a parallel
application
we're developing here : on Dual Core Opterons, P4, P4D. I can say that
the Opterons open up a can of wupass on the Intel processors. Almost 2x
the performance on our application vs. what the SpecCPU numbers would 
suggest.

Re: [PERFORM] Large (8M) cache vs. dual-core CPUs

2006-04-25 Thread Joshua D. Drake


David Boreham wrote:



Actually, that was from an article from this last month that compared
the dual core intel to the amd.  for every dollar spent on the intel,
you got about half the performance of the amd.  Not bigotry.  fact.

But don't believe me or the other people who've seen the difference.  Go
buy the Intel box.  No skin off my back.
  

I've been doing plenty of performance evaluation on a parallel application
we're developing here : on Dual Core Opterons, P4, P4D. I can say that
the Opterons open up a can of wupass on the Intel processors. Almost 2x
the performance on our application vs. what the SpecCPU numbers would
suggest.


Because Stone Cold Said So!







--

   === The PostgreSQL Company: Command Prompt, Inc. ===
 Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
 Providing the most comprehensive  PostgreSQL solutions since 1997
http://www.commandprompt.com/



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Large (8M) cache vs. dual-core CPUs

2006-04-25 Thread Bruce Momjian

Joshua D. Drake wrote:
 David Boreham wrote:
  
  Actually, that was from an article from this last month that compared
  the dual core intel to the amd.  for every dollar spent on the intel,
  you got about half the performance of the amd.  Not bigotry.  fact.
 
  But don't believe me or the other people who've seen the difference.  Go
  buy the Intel box.  No skin off my back.

  I've been doing plenty of performance evaluation on a parallel application
  we're developing here : on Dual Core Opterons, P4, P4D. I can say that
  the Opterons open up a can of wupass on the Intel processors. Almost 2x
  the performance on our application vs. what the SpecCPU numbers would
  suggest.
 
 Because Stone Cold Said So!

I'll believe someone who uses 'wupass' in a sentence any day!

-- 
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Large (8M) cache vs. dual-core CPUs

2006-04-25 Thread Ron Peacetree

As others have noted, the current price/performance sweet spot for DB servers 
is 2S 2C AMD CPUs.  These CPUs are also the highest performing x86 compatible 
solution for pg.

If you must go Intel for some reason, then wait until the new NGMA CPU's 
(Conroe, Merom, Woodcrest) come out and see how they bench on DB workloads.  
Preliminary benches on these chips look good, but I would not recommend making 
a purchase decision based on just preliminary benches of unreleased products.

If you must buy soon, then the decision is clear cut from anything except 
possinly a political/religious standpoint.
The NetBurst based Pentium and Xeon solutions are simply not worth the money 
spent or the PITA they will put you through compared to the AMD dual cores.  
The new Intel NGMA CPUs may be different, but all the pertinent evidence is not 
yet available.

My personal favorite pg platform at this time is one based on a 2 socket, dual 
core ready mainboard with 16 DIMM slots combined with dual core AMD Kx's.

Less money than the comparable Intel solution and _far_ more performance.

...and even if you do buy Intel, =DONT= buy Dell unless you like causing 
trouble for yourself.
Bad experiences with Dell in general and their poor PERC RAID controllers in 
specific are all over this and other DB forums.

Ron


-Original Message-
From: Bill Moran [EMAIL PROTECTED]
Sent: Apr 25, 2006 2:14 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Large (8M) cache vs. dual-core CPUs


I've been given the task of making some hardware recommendations for
the next round of server purchases.  The machines to be purchased
will be running FreeBSD  PostgreSQL.

Where I'm stuck is in deciding whether we want to go with dual-core
pentiums with 2M cache, or with HT pentiums with 8M cache.

Both of these are expensive bits of hardware, and I'm trying to
gather as much evidence as possible before making a recommendation.
The FreeBSD community seems pretty divided over which is likely to
be better, and I have been unable to discover a method for estimating
how much of the 2M cache on our existing systems is being used.

Does anyone in the PostgreSQL community have any experience with
large caches or dual-core pentiums that could make any recommendations?
Our current Dell 2850 systems are CPU bound - i.e. they have enough
RAM, and fast enough disks that the CPUs seem to be the limiting
factor.  As a result, this decision on what kind of CPUs to get in
the next round of servers is pretty important.

Any advice is much appreciated.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Large (8M) cache vs. dual-core CPUs

2006-04-25 Thread David Boreham




My personal favorite pg platform at this time is one based on a 2 socket, dual 
core ready mainboard with 16 DIMM slots combined with dual core AMD Kx's.
 


Right. We've been buying Tyan bare-bones boxes like this.
It's better to go with bare-bones than building boxes from bare metal
because the cooling issues are addressed correctly.

Note that if you need a large number of machines, then Intel
Core Duo may give the best overall price/performance because
they're cheaper to run and cool.




---(end of broadcast)---
TIP 6: explain analyze is your friend

[PERFORM] slow deletes on pgsql 7.4

2006-04-25 Thread Junaili Lie

Hi all,
we encounter issues when deleting from a table based on id (primary
key). On certain 'id', it took forever to delete and the i/o is 100%
busy.
Table scenario has around 1400 entries. It is the parent of 3 other table.

Table public.scenario
 Column
|
Type
|
Modifiers
-+---+
id
|
bigint
| not null default nextval('scenario_seq'::text)
name | character varying(50) |
description |
text
|
subscriber_id |
bigint
|
organization_id | bigint |
schedule_id |
bigint
|
Indexes:
 scenario_pkey primary key, btree (id)
 org_ind_scenario_index btree (organization_id)
 sch_ind_scenario_index btree (schedule_id)
 sub_ind_scenario_index btree (subscriber_id)
Check constraints:
 $3 CHECK (schedule_id = 0)
 $2 CHECK (organization_id = 0)
 $1 CHECK (subscriber_id = 0)
Foreign-key constraints:
 0_4774 FOREIGN KEY (schedule_id) REFERENCES schedule(id) ON DELETE CASCADE
 0_4773 FOREIGN KEY (organization_id) REFERENCES organization(id) ON DELETE CASCADE
 0_4772 FOREIGN KEY (subscriber_id) REFERENCES subscriber(id) ON DELETE CASCADE

In all the child tables, the foreign key has the same data type and are indexed.
When I do delete from scenario where id='1023', it takes less than 200 ms.
But when i do delete from scenario where id='1099', it took forever (more than 10 minutes that i decided to cancel it.
I can't do explain analyze, but here is the explain:
MONSOON=# begin;
BEGIN
MONSOON=# explain delete from scenario where id='1099';

QUERY PLAN
--
Index Scan using scenario_pkey on scenario (cost=0.00..3.14 rows=1 width=6)
 Index Cond: (id = 1099::bigint)
(2 rows)

MONSOON=# explain delete from scenario where id='1023';

QUERY PLAN
--
Index Scan using scenario_pkey on scenario (cost=0.00..3.14 rows=1 width=6)
 Index Cond: (id = 1023::bigint)
(2 rows)

MONSOON=# explain analyze delete from scenario where id='1023';

QUERY PLAN

Index Scan using scenario_pkey on scenario (cost=0.00..3.14
rows=1 width=6) (actual time=0.028..0.030 rows=1 loops=1)
 Index Cond: (id = 1023::bigint)
Total runtime: 0.174 ms
(3 rows)

I have also tried increasing statistics on both parent and child tables
to 100, vacuum analyze parent and all child tables. But still the same
slowness.
The o/s is Solaris 10, with fsync = true.
Any ideas what's going on?
Thanks in advance,

J

Re: [PERFORM] Large (8M) cache vs. dual-core CPUs

2006-04-25 Thread Ron Peacetree

I've had intermittent freeze and reboot and, worse, just plain freeze 
problems with the Core Duo's I've been testing.  I have not been able to narrow 
it down so I do not know if it is a platform issue or a CPU issue.  It appears 
to be HW, not SW, related since I have experienced the problem both under M$ 
and Linux 2.6 based OS's.  I have not tested the Core Duo's under *BSD.

Also, being that they are only 32b Core Duo's have limited utility for a 
present day DB server.

Power and space critical applications where 64b is not required may be a 
reasonable place for them... ...if the   
present reliability problems I'm seeing go away.

Ron


-Original Message-
From: David Boreham [EMAIL PROTECTED]
Sent: Apr 25, 2006 5:15 PM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Large (8M) cache vs. dual-core CPUs


My personal favorite pg platform at this time is one based on a 2 socket, 
dual core ready mainboard with 16 DIMM slots combined with dual core AMD Kx's.
  

Right. We've been buying Tyan bare-bones boxes like this.
It's better to go with bare-bones than building boxes from bare metal
because the cooling issues are addressed correctly.

Note that if you need a large number of machines, then Intel
Core Duo may give the best overall price/performance because
they're cheaper to run and cool.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] slow deletes on pgsql 7.4

2006-04-25 Thread Junaili Lie

I should also mention that select ... for update is fast:
MONSOON=# begin;explain analyze select * from SCENARIO WHERE id = '1099' FOR UPDATE;
BEGIN

QUERY PLAN
-
Index Scan using scenario_pkey on scenario (cost=0.00..3.17
rows=1 width=64) (actual time=0.016..0.017 rows=1 loops=1)
 Index Cond: (id = 1099::bigint)
Total runtime: 0.072 ms
(3 rows)


On 4/25/06, Junaili Lie [EMAIL PROTECTED] wrote:
Hi all,
we encounter issues when deleting from a table based on id (primary
key). On certain 'id', it took forever to delete and the i/o is 100%
busy.
Table scenario has around 1400 entries. It is the parent of 3 other table.
Table public.scenario
 Column
|
Type
|
Modifiers
-+---+
id
|
bigint
| not null default nextval('scenario_seq'::text)
name | character varying(50) |
description |
text
|
subscriber_id |
bigint
|
organization_id | bigint |
schedule_id |
bigint
|
Indexes:
 scenario_pkey primary key, btree (id)
 org_ind_scenario_index btree (organization_id)
 sch_ind_scenario_index btree (schedule_id)
 sub_ind_scenario_index btree (subscriber_id)
Check constraints:
 $3 CHECK (schedule_id = 0)
 $2 CHECK (organization_id = 0)
 $1 CHECK (subscriber_id = 0)
Foreign-key constraints:
 0_4774 FOREIGN KEY (schedule_id) REFERENCES schedule(id) ON DELETE CASCADE
 0_4773 FOREIGN KEY (organization_id) REFERENCES organization(id) ON DELETE CASCADE
 0_4772 FOREIGN KEY (subscriber_id) REFERENCES subscriber(id) ON DELETE CASCADE

In all the child tables, the foreign key has the same data type and are indexed.
When I do delete from scenario where id='1023', it takes less than 200 ms.
But when i do delete from scenario where id='1099', it took forever (more than 10 minutes that i decided to cancel it.
I can't do explain analyze, but here is the explain:
MONSOON=# begin;
BEGIN
MONSOON=# explain delete from scenario where id='1099';

QUERY PLAN
--
Index Scan using scenario_pkey on scenario (cost=0.00..3.14 rows=1 width=6)
 Index Cond: (id = 1099::bigint)
(2 rows)

MONSOON=# explain delete from scenario where id='1023';

QUERY PLAN
--
Index Scan using scenario_pkey on scenario (cost=0.00..3.14 rows=1 width=6)
 Index Cond: (id = 1023::bigint)
(2 rows)

MONSOON=# explain analyze delete from scenario where id='1023';

QUERY PLAN

Index Scan using scenario_pkey on scenario (cost=0.00..3.14
rows=1 width=6) (actual time=0.028..0.030 rows=1 loops=1)
 Index Cond: (id = 1023::bigint)
Total runtime: 0.174 ms
(3 rows)

I have also tried increasing statistics on both parent and child tables
to 100, vacuum analyze parent and all child tables. But still the same
slowness.
The o/s is Solaris 10, with fsync = true.
Any ideas what's going on?
Thanks in advance,

J

Re: [PERFORM] Large (8M) cache vs. dual-core CPUs

2006-04-25 Thread Joshua D. Drake


Ron Peacetree wrote:

As others have noted, the current price/performance sweet spot for DB servers 
is 2S 2C AMD CPUs.  These CPUs are also the highest performing x86 compatible solution 
for pg.

If you must go Intel for some reason, then wait until the new NGMA CPU's 
(Conroe, Merom, Woodcrest) come out and see how they bench on DB workloads.  
Preliminary benches on these chips look good, but I would not recommend making 
a purchase decision based on just preliminary benches of unreleased products.

If you must buy soon, then the decision is clear cut from anything except 
possinly a political/religious standpoint.
The NetBurst based Pentium and Xeon solutions are simply not worth the money 
spent or the PITA they will put you through compared to the AMD dual cores.  
The new Intel NGMA CPUs may be different, but all the pertinent evidence is not 
yet available.

My personal favorite pg platform at this time is one based on a 2 socket, dual 
core ready mainboard with 16 DIMM slots combined with dual core AMD Kx's.

Less money than the comparable Intel solution and _far_ more performance.

...and even if you do buy Intel, =DONT= buy Dell unless you like causing 
trouble for yourself.
Bad experiences with Dell in general and their poor PERC RAID controllers in 
specific are all over this and other DB forums.

Ron



To add to this... the HP DL 385 is a pretty nice dual core capable 
opteron box. Just don't buy the extra ram from HP (they like to charge 
entirely too much).


Joshua D. Drake

--

   === The PostgreSQL Company: Command Prompt, Inc. ===
 Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
 Providing the most comprehensive  PostgreSQL solutions since 1997
http://www.commandprompt.com/



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [PERFORM] slow deletes on pgsql 7.4

2006-04-25 Thread Tom Lane

Junaili Lie [EMAIL PROTECTED] writes:
 we encounter issues when deleting from a table based on id (primary key). O=
 n
 certain 'id', it took forever to delete and the i/o is 100% busy.

Almost always, if delete is slow when selecting the same rows is fast,
it's because you've got a trigger performance problem --- most commonly,
there are foreign keys referencing this table from other tables and you
don't have the referencing columns indexed.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Large (8M) cache vs. dual-core CPUs

2006-04-25 Thread Jim C. Nasby

On Tue, Apr 25, 2006 at 01:33:38PM -0500, Scott Marlowe wrote:
 On Tue, 2006-04-25 at 13:14, Bill Moran wrote:
  I've been given the task of making some hardware recommendations for
  the next round of server purchases.  The machines to be purchased
  will be running FreeBSD  PostgreSQL.
  
  Where I'm stuck is in deciding whether we want to go with dual-core
  pentiums with 2M cache, or with HT pentiums with 8M cache.
 
 Given a choice between those two processors, I'd choose the AMD 64 x 2
 CPU.  It's a significantly better processor than either of the Intel
 choices.  And if you get the HT processor, you might as well turn of HT
 on a PostgreSQL machine.  I've yet to see it make postgresql run faster,
 but I've certainly seen HT make it run slower.

Actually, believe it or not, a coworker just saw HT double the
performance of pgbench on his desktop machine. Granted, not really a
representative test case, but it still blew my mind. This was with a
database that fit in his 1G of memory, and running windows XP. Both
cases were newly minted pgbench databases with a scale of 40. Testing
was 40 connections and 100 transactions. With HT he saw 47.6 TPS,
without it was 21.1.

I actually had IT build put w2k3 server on a HT box specifically so I
could do more testing.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Slow deletes in 8.1 when FKs are involved

2006-04-25 Thread Will Reese

I did double check for indexes on the referenced and referencing  
columns, and even though this database is restored and vacuum  
analyzed nightly the issue remains.  Using explain analyze in  
postgresql 8.1, I was able to see where the problem lies.  For  
performance reasons on our 7.4 server, we removed one of the 3 RI  
triggers for some constraints (the RI trigger that performs the  
SELECTFOR UPDATE to prevent modifications) and replaced it with a  
trigger to just prevent deletes on this data indefinitely (the data  
never gets deleted or updated in our app).  This works great in  
postgresql 7.4 and nearly eliminated our performance issue, but when  
that database is restored to postgresql 8.1 one of the remaining two  
RI triggers does not perform well at all when you try to delete from  
that table (even though it's fine in postgresql 7.4).  On the 8.1  
server I dropped the remaining two RI triggers, and added the  
constraint to recreate the 3 RI triggers.  After that the delete  
performed fine.  So it looks like the 7.4 RI triggers that carried  
over to the 8.1 server don't perform very well.  I'm hoping that the  
SELECT...FOR SHARE functionality in 8.1 will allow us to re-add our  
constraints and not suffer from the locking issues we had in  
postgresql 7.4.


Will Reese -- http://blog.rezra.com

On Apr 23, 2006, at 10:32 PM, Tom Lane wrote:


Will Reese [EMAIL PROTECTED] writes:

... Both servers have identical postgresql.conf settings and were
restored from the same 7.4 backup. Almost everything is faster on the
8.1 server (mostly due to hardware), except one thing...deletes from
tables with many foreign keys pointing to them.


I think it's unquestionable that you have a bad FK plan in use on the
8.1 server.  Double check that you have suitable indexes on the
referencing (not referenced) columns, that you've ANALYZEd all the
tables involved, and that you've started a fresh psql session  
(remember

the backend tends to cache FK plans for the life of the connection).

It might help to EXPLAIN ANALYZE one of the slow deletes --- 8.1 will
break out the time spent in FK triggers, which would let you see which
one(s) are the culprit.

regards, tom lane



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [PERFORM] slow deletes on pgsql 7.4

2006-04-25 Thread Junaili Lie

hi,
Thanks for the answer.
I have double checked that all the foreign key that are referencing id on scenario are indexed.
I have even vacuum analyze scenario table and all the tables that referenced this table.
Something that is interesting is that: it only happens for a certain values. 
ie. delete from scenario where id='1023' is very fast, but delete from scenario where id='1099' is running forever.

Any ideas?

J


On 4/25/06, Tom Lane [EMAIL PROTECTED] wrote:
Junaili Lie [EMAIL PROTECTED] writes: we encounter issues when deleting from a table based on id (primary key). O= n certain 'id', it took forever to delete and the i/o is 100% busy.
Almost always, if delete is slow when selecting the same rows is fast,it's because you've got a trigger performance problem --- most commonly,there are foreign keys referencing this table from other tables and you
don't have the referencing columns indexed.regards,
tom lane

Re: [PERFORM] Large (8M) cache vs. dual-core CPUs

2006-04-25 Thread mark

On Tue, Apr 25, 2006 at 01:42:31PM -0500, Scott Marlowe wrote:
 On Tue, 2006-04-25 at 13:38, [EMAIL PROTECTED] wrote:
  On Tue, Apr 25, 2006 at 01:33:38PM -0500, Scott Marlowe wrote:
   Sad, cause the AMD is, on a price / performance scale, twice the
   processor for the same money as the Intel.
  Maybe a year or two ago. Prices are all coming down. Intel more
  than AMD.
  AMD still seems better - but not X2, and it depends on the workload.
  X2 sounds like biggotry against Intel... :-)
 Actually, that was from an article from this last month that compared
 the dual core intel to the amd.  for every dollar spent on the intel,
 you got about half the performance of the amd.  Not bigotry.  fact.
 But don't believe me or the other people who've seen the difference.  Go
 buy the Intel box.  No skin off my back.

AMD Opteron vs Intel Xeon is different than AMD X2 vs Pentium D.

For AMD X2 vs Pentium D - I have both - in similar price range, and
similar speed. I choose to use the AMD X2 as my server, and Pentium D
as my Windows desktop. They're both quite fast.

I made the choice I describe based on a lot of research. I was going
to go both Intel, until I noticed that the Intel prices were dropping
fast. 30% price cut in 2 months. AMD didn't drop at all during the
same time.

There are plenty of reasons to choose one over the other. Generally
the AMD comes out on top. It is *not* 2X though. Anybody who claims
this is being highly selective about which benchmarks they consider.

One article is nothing.

There is a lot of hype these days. AMD is winning the elite market,
which means that they are able to continue to sell high. Intel, losing
this market, is cutting its prices to compete. And they do compete.
Quite well.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] slow deletes on pgsql 7.4

2006-04-25 Thread Tom Lane

Junaili Lie [EMAIL PROTECTED] writes:
 ie. delete from scenario where id=3D'1023' is very fast, but delete from
 scenario where id=3D'1099' is running forever.

What does EXPLAIN show for each of those cases?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Large (8M) cache vs. dual-core CPUs

2006-04-25 Thread mark

On Tue, Apr 25, 2006 at 08:54:40PM -0400, [EMAIL PROTECTED] wrote:
 I made the choice I describe based on a lot of research. I was going
 to go both Intel, until I noticed that the Intel prices were dropping
 fast. 30% price cut in 2 months. AMD didn't drop at all during the
 same time.

Errr.. big mistake. That was going to be - I was going to go both AMD.

 There are plenty of reasons to choose one over the other. Generally
 the AMD comes out on top. It is *not* 2X though. Anybody who claims
 this is being highly selective about which benchmarks they consider.

I have an Intel Pentium D 920, and an AMD X2 3800+. These are very
close in performance. The retail price difference is:

Intel Pentium D 920 is selling for $310 CDN
AMD X2 3800+is selling for $347 CDN

Another benefit of Pentium D over AMD X2, at least until AMD chooses
to switch, is that Pentium D supports DDR2, whereas AMD only supports
DDR. There are a lot of technical pros and cons to each - with claims
from AMD that DDR2 can be slower than DDR - but one claim that isn't
often made, but that helped me make my choice:

1) DDR2 supports higher transfer speeds. I'm using DDR2 5400 on
   the Intel. I think I'm at 3200 or so on the AMD X2.

2) DDR2 is cheaper. I purchased 1 Gbyte DDR2 5400 for $147 CDN.
   1 Gbyte of DDR 3200 starts at around the same price, and
   stretches into $200 - $300 CDN.

Now, granted, the Intel 920 requires more electricity to run. Running
24/7 for a year might make the difference in cost.

It doesn't address point 1) though. I like my DDR2 5400.

So, unfortunately, I won't be able to do a good test for you to prove
that my Windows Pentium D box is not only cheaper to buy, but faster,
because the specs aren't exactly equivalent. In the mean time, I'm
quite enjoying my 3d games while doing other things at the same time.
I imagine my desktop load approaches that of a CPU-bound database
load. 3d games require significant I/O and CPU.

Anybody who claims that Intel is 2X more expensive for the same
performance, isn't considering all factors. No question at all - the
Opteron is good, and the Xeon isn't - but the original poster didn't
ask about Opeteron or Xeon, did he? For the desktop lines - X2 is not
double Pentium D. Maybe 10%. Maybe not at all. Especially now that
Intel is dropping it's prices due to overstock.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] PL/pgSQL Loop Vs. Batch Update

2006-04-25 Thread Tom Lane

David Wheeler [EMAIL PROTECTED] writes:
 This post is longish and has a bit of code, but here's my question up- 
 front: Why are batch queries in my PL/pgSQL functions no faster than  
 iterating over a loop and executing a series of queries for each  
 iteration of the loop?

You'd really have to look at the plans generated for each of the
commands in the functions to be sure.  A knee-jerk reaction is to
suggest that that NOT IN might be the core of the problem, but it's
only a guess.

It's a bit tricky to examine the behavior of a parameterized query,
which is what these will all be since they depend on local variables
of the plpgsql function (which are passed as parameters to the main
SQL executor).  The basic idea is

PREPARE foo(datatype, datatype, ...) AS SELECT ... $1 ... $2 ...

EXPLAIN ANALYZE EXECUTE foo(value, value)

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Query on postgresql 7.4.2 not using index

2006-04-25 Thread chris smith

 OK.  Stop and think about what you're telling postgresql to do here.

 You're telling it to cast the field group_id to int8, then compare it to
 9.  How can it cast the group_id to int8 without fetching it?  That's
 right, you're ensuring a seq scan.  You need to put the int8 cast on the
 other side of that equality comparison, like:

Yeh that one was my fault :) I couldn't remember which way it went and
if 7.4.x had issues with int8 indexes..

--
Postgresql  php tutorials
http://www.designmagick.com/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Firebird 1.5.3 X Postgresql 8.1.3 (linux and windows)

2006-04-25 Thread Tom Lane

andremachado [EMAIL PROTECTED] writes:
 After some time experimenting on windows, the conclusion is clear:
 windows is likely crap for databases other than MS-SQL.

Maybe.  One thing that comes to mind is that you really should do some
performance tuning experiments.  In particular it'd be a good idea to
increase checkpoint_segments and try other settings for wal_sync_method.
Your fifth query,

bddnf=# explain analyze update NOTA_FISCAL  set VA_TOTAL_ITENSDNF = (select 
sum(ITEM_NOTA.VA_TOTAL) from ITEM_NOTA where ITEM_NOTA.ID_NF = 
NOTA_FISCAL.ID_NF)  where ID_NF in (select NF2.ID_NF from DECLARACAO DE inner 
join CADASTRO CAD on (CAD.ID_DECLARACAO=DE.ID_DECLARACAO) inner join 
NOTA_FISCAL  NF2 on (NF2.ID_CADASTRO=CAD.ID_CADASTRO) where DE.ID_ARQUIVO in 
(1) );

shows runtime of the plan proper as 158 seconds but total runtime as 746
seconds --- the discrepancy has to be associated with writing out the
updated rows, which there are a lot of (719746) in this query, but still
we should be able to do it faster than that.  So I surmise a bottleneck
in pushing WAL updates to disk.

The last two queries are interesting.  Does Firebird have any equivalent
of EXPLAIN, ie a way to see what sort of query plan they are using?
I suspect they are being more aggressive about optimizing the max()
functions in the sub-selects than we are.  In particular, the 8.1 code
for optimizing min/max just punts if it sees any sub-selects in the
WHERE clause, which prevents us from doing anything with these examples.

/*
 * Also reject cases with subplans or volatile functions in WHERE. This
 * may be overly paranoid, but it's not entirely clear if the
 * transformation is safe then.
 */
if (contain_subplans(parse-jointree-quals) ||
contain_volatile_functions(parse-jointree-quals))
return NULL;

This is something I'd wanted to go back and look at more carefully, but
never got around to.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Large (8M) cache vs. dual-core CPUs

2006-04-25 Thread Leigh Dyer


[EMAIL PROTECTED] wrote:

Another benefit of Pentium D over AMD X2, at least until AMD chooses
to switch, is that Pentium D supports DDR2, whereas AMD only supports
DDR. There are a lot of technical pros and cons to each - with claims
from AMD that DDR2 can be slower than DDR - but one claim that isn't
often made, but that helped me make my choice:

They're switching quite soon though -- within the next month now it 
seems, after moving up their earlier plans to launch in June:


http://www.dailytech.com/article.aspx?newsid=1854

This Anandtech article shows the kind of performance increase we can 
expect with DDR2 on AMD's new socket:


http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2741

The short version is that it's an improvement, but not an enormous one, 
and you need to spend quite a bit of cash on 800Mhz (PC6400) DDR2 sticks 
to see the most benefit. Some brief local (Australian) price comparisons 
show 1GB PC-3200 DDR sticks starting at just over AU$100, with 1GB 
PC2-4200 DDR2 sticks around the same price, though Anandtech's tests 
showed PC2-4200 DDR2 benching generally slower than PC-3200 DDR, 
probably due to the increased latency in DDR2.


Comparing reasonable quality matched pairs of 1GB sticks, PC-3200 DDR 
still seems generally cheaper than PC2-5300 DDR2, though not by a lot, 
and I'm sure the DDR2 will start dropping even further as AMD systems 
start using it in the next month or so.


One thing's for sure though -- Intel's Pentium D prices are remarkably 
low, and at the lower end of the price range AMD has nothing that's even 
remotely competitive in terms of price/performance. The Pentium D 805, 
for instance, with its dual 2.67Ghz cores, costs just AU$180. The X2 
3800+ is a far better chip, but it's also two-and-a-half times the price.


None of this really matters much in the server space though, where 
Opteron's real advantage over Xeon is not its greater raw CPU power, or 
its better dual-core implementation (though both would be hard to 
dispute), but the improved system bandwidth provided by Hypertransport. 
Even with Intel's next-gen CPUs, which look set to address the first two 
points quite well, they still won't have an interconnect technology that 
can really compete with AMD's.


Thanks
Leigh


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] PL/pgSQL Loop Vs. Batch Update

2006-04-25 Thread David Wheeler


On Apr 25, 2006, at 18:19, Tom Lane wrote:


You'd really have to look at the plans generated for each of the
commands in the functions to be sure.  A knee-jerk reaction is to
suggest that that NOT IN might be the core of the problem, but it's
only a guess.


Well, the rows are indexed (I forgot to include the indexes in my  
first post), and given that each entry_id has no more than ten  
associated tag_ids, I would expect it to be quite fast, relying on  
the primary key index to look up the entry_id first, and then the  
associated tag_ids. But that's just a guess on my part, too. Perhaps  
I should try a left outer join with tag_id IS NULL?



It's a bit tricky to examine the behavior of a parameterized query,
which is what these will all be since they depend on local variables
of the plpgsql function (which are passed as parameters to the main
SQL executor).


Right, that makes sense.


The basic idea is

PREPARE foo(datatype, datatype, ...) AS SELECT ... $1 ... $2 ...

EXPLAIN ANALYZE EXECUTE foo(value, value)


Just on a lark, I tried to get this to work:

try=# explain analyze EXECUTE foo(1, ARRAY 
[61,62,63,64,65,66,67]);

  QUERY PLAN
 
--
Result  (cost=0.00..0.01 rows=1 width=0) (actual time=26.241..26.251  
rows=1 loops=1)

Total runtime: 27.512 ms
(2 rows)

That's not much use. Is there no way to EXPLAIN ANALYZE this stuff?

Thanks Tom.

Best,

David


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [PERFORM] PL/pgSQL Loop Vs. Batch Update

2006-04-25 Thread Tom Lane

David Wheeler [EMAIL PROTECTED] writes:
 Just on a lark, I tried to get this to work:

 try=# explain analyze EXECUTE foo(1, ARRAY 
 [61,62,63,64,65,66,67]);
QUERY PLAN
  
 --
 Result  (cost=0.00..0.01 rows=1 width=0) (actual time=26.241..26.251  
 rows=1 loops=1)
 Total runtime: 27.512 ms
 (2 rows)

 That's not much use.

It looks like you had something trivial as the definition of foo().
Try one of the actual queries from the plpgsql function.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Large (8M) cache vs. dual-core CPUs

2006-04-25 Thread Ron Peacetree

Another benefit of Pentium D over AMD X2, at least until AMD chooses
to switch, is that Pentium D supports DDR2, whereas AMD only supports
DDR. There are a lot of technical pros and cons to each - with claims
from AMD that DDR2 can be slower than DDR - but one claim that isn't
often made, but that helped me make my choice:

1) DDR2 supports higher transfer speeds. I'm using DDR2 5400 on
   the Intel. I think I'm at 3200 or so on the AMD X2.

2) DDR2 is cheaper. I purchased 1 Gbyte DDR2 5400 for $147 CDN.
   1 Gbyte of DDR 3200 starts at around the same price, and
   stretches into $200 - $300 CDN.

There's a logical fallacy here that needs to be noted.

THROUGHPUT is better with DDR2 if and only if there is enough data to be 
fetched in a serial fashion from memory.

LATENCY however is dependent on the base clock rate of the RAM involved.
So PC3200, 200MHz x2, is going to actually perform better than PC2-5400, 166MHz 
x4, for almost any memory access pattern except those that are highly 
sequential.

In fact, even PC2-6400, 200MHz x4, has a disadvantage compared to 200MHz x2 
memory.
The minimum latency of the two types of memory in clock cycles is always going 
to be higher for the memory type that multiplies its base clock rate by the 
most.

For the mostly random memory access patterns that comprise many DB 
applications, the base latency of the RAM involved is going to matter more than 
the peak throughput AKA the bandwidth of that RAM.

The big message here is that despite engineering tricks and marketing claims, 
the base clock rate of the RAM you use matters.

A minor point to be noted in addition here is that most DB servers under load 
are limited by their physical IO subsystem, their HDs, and not the speed of 
their RAM.

All of the above comments about the relative performance of different RAM types 
become insignificant when performance is gated by the HD subsystem. 


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] PL/pgSQL Loop Vs. Batch Update

2006-04-25 Thread David Wheeler


On Apr 25, 2006, at 19:36, Tom Lane wrote:


It looks like you had something trivial as the definition of foo().


Yeah, the function call. :-)


Try one of the actual queries from the plpgsql function.


Oh. Duh. Will do. Tomorrow.

Best,

David


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

55 matches

Mail list logo