[PERFORM] URGENT: pg_statistic_relid_att_index has gone

2005-10-03 Thread Harald Lau
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

on a 7.2 System (Suse-Linux) I got an error duplicate key in unique
index pg_statistic_relid_att_index (think it was while vacuuming)

I REINDEXd the database.
Now the table pg_statistic_relid_att_index is completely gone.

Has anybody an advise?

tia,
Harald
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.2 (MingW32)
Comment: GnuPT 2.7.2

iQEVAwUBQ0Ejx3fX3+fgcdKcAQIVoQf+MFnt+U65FPNxQjHwZ15eT13NwBoCsOE9
d3nFaKTG58SmI9QziMt1Tpo+pD89LMQZacnCRDv/M3Tz6ruSQaPIsxS6m1evKjq7
7ixSRCwD+41C2x27qSRZDOEUt6AvG5xfSv43NxJQNS/zB+/TnQ3wGXzwdRrRQiQE
Mv6DXv5s+3Wrbg9qG78Xn3mHOGGySFSG1x9ItUoST+jC6a7rOl5YL3wDCacdgve/
pzq3fe6+JYjEQyKFxEzZYJidsWvr9C7EKfc321PYBscuPNyGMU1Vohe8kDYFbyeG
L23jXPV8c7WO2w4aQMdQr6V9YXtnBeMgGdAFjo4My29xbdepkwOUvw==
=I8Ax
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] URGENT: pg_statistic_relid_att_index has gone

2005-10-03 Thread Harald Lau
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

 on a 7.2 System (Suse-Linux) I got an error duplicate key in unique
 index pg_statistic_relid_att_index (think it was while vacuuming)
 
 I REINDEXd the database.
 Now the table pg_statistic_relid_att_index is completely gone.

go searching the internet first, man ...
Surprisingly I'm not the first one having such a breakdown

Found a solution provided by Tom Lane:
http://www.xy1.org/pgsql-general@postgresql.org/msg04568.html

Seems to work, many thanks

Sorry for the overhasty question
Harald
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.2 (MingW32)
Comment: GnuPT 2.7.2

iQEVAwUBQ0E5nXfX3+fgcdKcAQJWsgf+JvgWRjgl/RLBzGd8wNt7x6/VngGOzdpT
4E3OgbrGuAPEC3INkMLTLRU2hVvjRqgkNaWS2YlXpFmlAff6czGeSbwXv4vDiiH7
AYHpONACLgr8jcHohS0kmylqu/3QYSsmRBDOTOCNms1iMEmJZvpru9YJpSEjwWUL
/n5pu5lurcpU+VGLTCikin5UnsNWmQzsegz+f2co3UuTDHIUER+W2538Fb9iiZBD
P9TCI972U+oC2YTg+Puh22jPfS1gG7EHUxKt/XbE9klca1AnCdJX6LdsIh7vdMhw
6u8JzaaAz9nHtqYFpClkEpnkp9KEohw/uQyDUCB7FK//MRtSWx+MPw==
=52pe
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] Alternative to a temporary table

2005-10-03 Thread Steven Rosenstein




I have a PHP web-based application where a temporary list of servers and
their characteristics (each represented by a unique numeric server_id) is
extracted from a master server list based on a number of dynamic and
user-selected criteria (can the user view the server, is it on-line, is it
a member of a specific group, etc).  Once the user selects the set of
criteria (and servers), it rarely change during the rest of the browser
session.  The resulting temporary list of servers is then joined against
other tables with different sets of information about each of the servers,
based on the server_id.

I currently create a temporary table to hold the selected server_id's and
characteristics.  I then join this temp table with other data tables to
produce my reports.  My reason for using the temporary table method is that
the SQL for the initial server selection is dynamically created based on
the user's selections, and is complex enough that it does not lend itself
to being easily incorporated into any of the other data extraction queries
(which may also contain dynamic filtering).

Unfortunately, the PHP connection to the database does not survive from
webscreen to webscreen, so I have to re-generate the temporary server_id
table each time it is needed for a report screen.  An idea I had to make
this process more efficient was instead of re-creating the temporary table
over and over each time it is needed, do a one-time extraction of the list
of user-selected server_id's, store the list in a PHP global variable, and
then use the list in a dynamically-created WHERE clause in the rest of the
queries.  The resulting query would look something like

 SELECT *
 FROM some_data_table
 WHERE server_id IN (sid1,sid5,sid6,sid17,sid24...)

Simple enough, however in rare cases the list of server_id's can range
between 6,000 and 10,000.

My question to the group is, can having so many values in a WHERE/IN clause
effect query performance?  Am I being naive about this and is there a
different, better way?  The server_id field is of course indexed, but it is
possible that the list of selected sid's can contain almost all of the
values in the some_data_table server_id index (in the situation where _all_
of the records are requested I wouldn't use the WHERE clause in the query).
The some_data_table can contain millions of records for thousands of
servers, so every bit of efficiency helps.

If this is not the proper group for this kind of question, please point me
in the right direction.

Thanks!
--- Steve
___

Steven Rosenstein
IT Architect/Developer | IBM Virtual Server Administration
Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001
Text Messaging: 6463456978 @ mobile.mycingular.com
Email: srosenst @ us.ibm.com

Learn from the mistakes of others because you can't live long enough to
make them all yourself. -- Eleanor Roosevelt


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Alternative to a temporary table

2005-10-03 Thread Steve Atkins
On Mon, Oct 03, 2005 at 11:47:52AM -0400, Steven Rosenstein wrote:

 I currently create a temporary table to hold the selected server_id's and
 characteristics.  I then join this temp table with other data tables to
 produce my reports.  My reason for using the temporary table method is that
 the SQL for the initial server selection is dynamically created based on
 the user's selections, and is complex enough that it does not lend itself
 to being easily incorporated into any of the other data extraction queries
 (which may also contain dynamic filtering).
 
 Unfortunately, the PHP connection to the database does not survive from
 webscreen to webscreen, so I have to re-generate the temporary server_id
 table each time it is needed for a report screen.  An idea I had to make
 this process more efficient was instead of re-creating the temporary table
 over and over each time it is needed, do a one-time extraction of the list
 of user-selected server_id's, store the list in a PHP global variable, and
 then use the list in a dynamically-created WHERE clause in the rest of the
 queries.  The resulting query would look something like
 
  SELECT *
  FROM some_data_table
  WHERE server_id IN (sid1,sid5,sid6,sid17,sid24...)
 
 Simple enough, however in rare cases the list of server_id's can range
 between 6,000 and 10,000.
 
 My question to the group is, can having so many values in a WHERE/IN clause
 effect query performance? 

Probably, yes. As always, benchmark a test case, but last time I
checked (in 7.4) you'd be better creating a new temporary table for
every query than use an IN list that long. It's a lot better in 8.0, I
believe, so you should benchmark it there.

 Am I being naive about this and is there a
 different, better way?  The server_id field is of course indexed, but it is
 possible that the list of selected sid's can contain almost all of the
 values in the some_data_table server_id index (in the situation where _all_
 of the records are requested I wouldn't use the WHERE clause in the query).
 The some_data_table can contain millions of records for thousands of
 servers, so every bit of efficiency helps.

Don't use a temporary table. Instead use a permanent table that
contains the server ids you need and the PHP session token. Then you
can create your list of server_ids once and insert it into that table
associated with your sessionid. Then future queries can be simple
joins against that table.

SELECT some_data_table.*
FROM some_data_table, session_table
WHERE some_data_table.server_id = session_table.server_id
  AND session_table.session_id = 'foobar'

You'd need a reaper process to delete old data from that table to
prevent it from growing without limit, and probably a table associating
session start time with sessionid to make reaping easier.

Cheers,
  Steve

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Ultra-cheap NVRAM device

2005-10-03 Thread Dan Harris


On Oct 3, 2005, at 5:02 AM, Steinar H. Gunderson wrote:

I thought this might be interesting, not the least due to the  
extremely low

price ($150 + the price of regular DIMMs):





This has been posted before, and the main reason nobody got very  
excited is that:


a) it only uses the PCI bus to provide power to the device, not for I/O
b) It is limited to SATA bandwidth
c) The benchmarks did not prove it to be noticeably faster than a  
good single SATA drive


A few of us were really excited at first too, until seeing the  
benchmarks..


-Dan



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Ultra-cheap NVRAM device

2005-10-03 Thread Dan Harris


On Oct 3, 2005, at 5:02 AM, Steinar H. Gunderson wrote:

I thought this might be interesting, not the least due to the  
extremely low

price ($150 + the price of regular DIMMs):




Replying before my other post came through.. It looks like their  
benchmarks are markedly improved since the last article I read on  
this.  There may be more interest now..


-Dan


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Ultra-cheap NVRAM device

2005-10-03 Thread Jeffrey W. Baker
On Mon, 2005-10-03 at 11:15 -0600, Dan Harris wrote:
 On Oct 3, 2005, at 5:02 AM, Steinar H. Gunderson wrote:
 
  I thought this might be interesting, not the least due to the  
  extremely low
  price ($150 + the price of regular DIMMs):
 
 
 
 
 This has been posted before, and the main reason nobody got very  
 excited is that:
 
 a) it only uses the PCI bus to provide power to the device, not for I/O
 b) It is limited to SATA bandwidth
 c) The benchmarks did not prove it to be noticeably faster than a  
 good single SATA drive
 
 A few of us were really excited at first too, until seeing the  
 benchmarks..

Also, no ECC support.  You'd be crazy to use it for anything.

-jwb

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Ultra-cheap NVRAM device

2005-10-03 Thread Ron Peacetree
Nah.  It's still not right.  It needs:
1= full PCI, preferably at least 64b 133MHz PCI-X, bandwidth.
A RAM card should blow the doors off the fastest commodity
RAID setup you can build.
2= 8-16 DIMM slots
3= a standard battery type that I can pick up spares for easily
4= ECC support

If it had all those features, I'd buy it at even 2x or possibly
even 3x it's current price.

8, 16, or 32GB (using 1, 2, or 4GB DIMMs respectively in an 8 slot
form factor) of very fast temporary work memory (sorting
 anyone ;-) ).  Yum.

Ron

-Original Message-
From: Dan Harris [EMAIL PROTECTED]
Sent: Oct 3, 2005 1:21 PM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Ultra-cheap NVRAM device


On Oct 3, 2005, at 5:02 AM, Steinar H. Gunderson wrote:

 I thought this might be interesting, not the least due to the  
 extremely low
 price ($150 + the price of regular DIMMs):



Replying before my other post came through.. It looks like their  
benchmarks are markedly improved since the last article I read on  
this.  There may be more interest now..

-Dan


---(end of broadcast)---
TIP 6: explain analyze is your friend


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Query seem to slow if table have more than 200 million rows

2005-10-03 Thread Qingqing Zhou

Ahmad Fajar [EMAIL PROTECTED] wrote
 Hi Qingqing,

 I don't know whether the statistic got is bad or good, this is the
 statistic:

Please do it in this way:

1. Start postmaster with stats_start_collector=true and 
stats_block_level=true.

2. Use psql connect it, do something like this:

test=# select pg_stat_reset();
 pg_stat_reset
---
 t
(1 row)

test=# select * from pg_statio_user_indexes ;
 relid | indexrelid | schemaname | relname | indexrelname | idx_blks_read | 
idx_
blks_hit
---+++-+--+---+-
-
 16385 |  16390 | public | test| test_idx | 0 |
   0
(1 row)

test=# select count(*) from test where a = 1234;
 count
---
  7243
(1 row)

test=# select * from pg_statio_user_indexes ;
 relid | indexrelid | schemaname | relname | indexrelname | idx_blks_read | 
idx_
blks_hit
---+++-+--+---+-
-
 16385 |  16390 | public | test| test_idx |55 |
   0
(1 row)


This gives us that to get select count(*) from test where a = 1234, I 
have to read 55 index blocks (no index block hit since I just restart 
postmaster so the bufferpool is empty).


Regards,
Qingqing



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Comparative performance

2005-10-03 Thread Joe

PFC wrote:
Even though this query isn't that optimized, it's still only 16  
milliseconds.

Why does it take this long for PHP to get the results ?

Can you try pg_query'ing this exact same query, FROM PHP, and timing 
it  with getmicrotime() ?


That query took about 27 msec in actual PHP execution time.  It turns out the 
real culprit is the following query, which interestingly enough retrieves zero 
rows in the case of the Economists page that I've been using for testing, yet it 
uses up about 1370 msec in actual runtime:


SELECT topic_id1, topic_id2, topic_name, categ_id, list_name, t.title, url, 
page_type, rel_type, inverse_id, r.description AS rel_descrip, r.created, r.updated

FROM relationship r, topic t, entry_type e
WHERE ((topic_id1 = topic_id AND topic_id2 = 1252) OR (topic_id2 = topic_id and 
topic_id1 = 1252)) AND rel_type = type_id AND e.class_id = 2

ORDER BY rel_type, list_name;

The EXPLAIN ANALYZE output, after I ran VACUUM ANALYZE on the three tables, is:

 Sort  (cost=4035.55..4035.56 rows=1 width=131) (actual time=2110.000..2110.000 
rows=0 loops=1)

   Sort Key: r.rel_type, t.list_name
   -  Nested Loop  (cost=36.06..4035.54 rows=1 width=131) (actual 
time=2110.000..2110.000 rows=0 loops=1)
 Join Filter: (((inner.topic_id1 = outer.topic_id) AND 
(inner.topic_id2 = 1252)) OR ((inner.topic_id2 = outer.topic_id) AND 
(inner.topic_id1 = 1252)))
 -  Seq Scan on topic t  (cost=0.00..38.34 rows=1234 width=90) (actual 
time=0.000..15.000 rows=1234 loops=1)
 -  Materialize  (cost=36.06..37.13 rows=107 width=45) (actual 
time=0.000..0.509 rows=466 loops=1234)
   -  Merge Join  (cost=30.31..35.96 rows=107 width=45) (actual 
time=0.000..0.000 rows=466 loops=1)

 Merge Cond: (outer.type_id = inner.rel_type)
 -  Index Scan using entry_type_pkey on entry_type e  (cost
=0.00..3.94 rows=16 width=4) (actual time=0.000..0.000 rows=15 loops=1)
   Filter: (class_id = 2)
 -  Sort  (cost=30.31..31.48 rows=466 width=43) (actual 
time=0.000..0.000 rows=466 loops=1)

   Sort Key: r.rel_type
   -  Seq Scan on relationship r  (cost=0.00..9.66 
rows=466 width=43) (actual time=0.000..0.000 rows=466 loops=1)

 Total runtime: 2110.000 ms
(14 rows)

The tables are as follows:

CREATE TABLE entry_type (
  type_id SMALLINT NOT NULL PRIMARY KEY,
  title VARCHAR(32) NOT NULL,
  rel_title VARCHAR(32),
  class_id SMALLINT NOT NULL DEFAULT 1,
  inverse_id SMALLINT,
  updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP)
WITHOUT OIDS;

CREATE TABLE topic (
  topic_id serial PRIMARY KEY,
  topic_name VARCHAR(48) NOT NULL UNIQUE,
  categ_id SMALLINT NOT NULL,
  parent_entity INTEGER,
  parent_concept INTEGER,
  crossref_id INTEGER,
  list_name VARCHAR(80) NOT NULL,
  title VARCHAR(80),
  description VARCHAR(255),
  url VARCHAR(64),
  page_type SMALLINT NOT NULL,
  dark_ind BOOLEAN NOT NULL DEFAULT FALSE,
  ad_code INTEGER,
  created DATE NOT NULL DEFAULT CURRENT_DATE,
  updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP)
WITHOUT OIDS;

CREATE TABLE relationship (
  topic_id1 INTEGER NOT NULL REFERENCES topic,
  topic_id2 INTEGER NOT NULL REFERENCES topic,
  rel_type INTEGER NOT NULL,
  description VARCHAR(255),
  created DATE NOT NULL DEFAULT CURRENT_DATE,
  updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (topic_id1, topic_id2, rel_type))
WITHOUT OIDS;

I'm thinking that perhaps I need to set up another index with topic_id2 first 
and topic_id1 second.  In addition, an index on entry_type.class_id may improve 
things.  Another possibility would be to rewrite the query as a UNION.


Of course, this doesn't explain how MySQL manages to execute the query in about 
9 msec.  The only minor differences in the schema are:  entry_type.title and 
rel_title are char(32) in MySQL, entry_type.class_id is a tinyint, and 
topic.categ_id, page_type and dark_ind are also tinyints. MySQL also doesn't 
have the REFERENCES.


A couple of interesting side notes from my testing.  First is that pg_connect() 
took about 39 msec but mysql_connect() took only 4 msec, however, pg_pconnect() 
took 0.14 msec while mysql_pconnect() took 0.99 msec (all tests were repeated 
five times and the quoted results are averages).  Second, is that PostgreSQL's 
performance appears to be much more consistent in certain queries.  For example, 
the query that retrieves the list of subtopics (the names and description of 
economists), took 17 msec in PG, with a low of 15 (three times) and a high of 
21, whereas MySQL took 60 msec on average but had a low of 22 and a high of 102 
msec.


Joe


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org