[PERFORM] URGENT: pg_statistic_relid_att_index has gone
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, on a 7.2 System (Suse-Linux) I got an error duplicate key in unique index pg_statistic_relid_att_index (think it was while vacuuming) I REINDEXd the database. Now the table pg_statistic_relid_att_index is completely gone. Has anybody an advise? tia, Harald -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.2 (MingW32) Comment: GnuPT 2.7.2 iQEVAwUBQ0Ejx3fX3+fgcdKcAQIVoQf+MFnt+U65FPNxQjHwZ15eT13NwBoCsOE9 d3nFaKTG58SmI9QziMt1Tpo+pD89LMQZacnCRDv/M3Tz6ruSQaPIsxS6m1evKjq7 7ixSRCwD+41C2x27qSRZDOEUt6AvG5xfSv43NxJQNS/zB+/TnQ3wGXzwdRrRQiQE Mv6DXv5s+3Wrbg9qG78Xn3mHOGGySFSG1x9ItUoST+jC6a7rOl5YL3wDCacdgve/ pzq3fe6+JYjEQyKFxEzZYJidsWvr9C7EKfc321PYBscuPNyGMU1Vohe8kDYFbyeG L23jXPV8c7WO2w4aQMdQr6V9YXtnBeMgGdAFjo4My29xbdepkwOUvw== =I8Ax -END PGP SIGNATURE- ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] URGENT: pg_statistic_relid_att_index has gone
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 on a 7.2 System (Suse-Linux) I got an error duplicate key in unique index pg_statistic_relid_att_index (think it was while vacuuming) I REINDEXd the database. Now the table pg_statistic_relid_att_index is completely gone. go searching the internet first, man ... Surprisingly I'm not the first one having such a breakdown Found a solution provided by Tom Lane: http://www.xy1.org/pgsql-general@postgresql.org/msg04568.html Seems to work, many thanks Sorry for the overhasty question Harald -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.2 (MingW32) Comment: GnuPT 2.7.2 iQEVAwUBQ0E5nXfX3+fgcdKcAQJWsgf+JvgWRjgl/RLBzGd8wNt7x6/VngGOzdpT 4E3OgbrGuAPEC3INkMLTLRU2hVvjRqgkNaWS2YlXpFmlAff6czGeSbwXv4vDiiH7 AYHpONACLgr8jcHohS0kmylqu/3QYSsmRBDOTOCNms1iMEmJZvpru9YJpSEjwWUL /n5pu5lurcpU+VGLTCikin5UnsNWmQzsegz+f2co3UuTDHIUER+W2538Fb9iiZBD P9TCI972U+oC2YTg+Puh22jPfS1gG7EHUxKt/XbE9klca1AnCdJX6LdsIh7vdMhw 6u8JzaaAz9nHtqYFpClkEpnkp9KEohw/uQyDUCB7FK//MRtSWx+MPw== =52pe -END PGP SIGNATURE- ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] Alternative to a temporary table
I have a PHP web-based application where a temporary list of servers and their characteristics (each represented by a unique numeric server_id) is extracted from a master server list based on a number of dynamic and user-selected criteria (can the user view the server, is it on-line, is it a member of a specific group, etc). Once the user selects the set of criteria (and servers), it rarely change during the rest of the browser session. The resulting temporary list of servers is then joined against other tables with different sets of information about each of the servers, based on the server_id. I currently create a temporary table to hold the selected server_id's and characteristics. I then join this temp table with other data tables to produce my reports. My reason for using the temporary table method is that the SQL for the initial server selection is dynamically created based on the user's selections, and is complex enough that it does not lend itself to being easily incorporated into any of the other data extraction queries (which may also contain dynamic filtering). Unfortunately, the PHP connection to the database does not survive from webscreen to webscreen, so I have to re-generate the temporary server_id table each time it is needed for a report screen. An idea I had to make this process more efficient was instead of re-creating the temporary table over and over each time it is needed, do a one-time extraction of the list of user-selected server_id's, store the list in a PHP global variable, and then use the list in a dynamically-created WHERE clause in the rest of the queries. The resulting query would look something like SELECT * FROM some_data_table WHERE server_id IN (sid1,sid5,sid6,sid17,sid24...) Simple enough, however in rare cases the list of server_id's can range between 6,000 and 10,000. My question to the group is, can having so many values in a WHERE/IN clause effect query performance? Am I being naive about this and is there a different, better way? The server_id field is of course indexed, but it is possible that the list of selected sid's can contain almost all of the values in the some_data_table server_id index (in the situation where _all_ of the records are requested I wouldn't use the WHERE clause in the query). The some_data_table can contain millions of records for thousands of servers, so every bit of efficiency helps. If this is not the proper group for this kind of question, please point me in the right direction. Thanks! --- Steve ___ Steven Rosenstein IT Architect/Developer | IBM Virtual Server Administration Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001 Text Messaging: 6463456978 @ mobile.mycingular.com Email: srosenst @ us.ibm.com Learn from the mistakes of others because you can't live long enough to make them all yourself. -- Eleanor Roosevelt ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Alternative to a temporary table
On Mon, Oct 03, 2005 at 11:47:52AM -0400, Steven Rosenstein wrote: I currently create a temporary table to hold the selected server_id's and characteristics. I then join this temp table with other data tables to produce my reports. My reason for using the temporary table method is that the SQL for the initial server selection is dynamically created based on the user's selections, and is complex enough that it does not lend itself to being easily incorporated into any of the other data extraction queries (which may also contain dynamic filtering). Unfortunately, the PHP connection to the database does not survive from webscreen to webscreen, so I have to re-generate the temporary server_id table each time it is needed for a report screen. An idea I had to make this process more efficient was instead of re-creating the temporary table over and over each time it is needed, do a one-time extraction of the list of user-selected server_id's, store the list in a PHP global variable, and then use the list in a dynamically-created WHERE clause in the rest of the queries. The resulting query would look something like SELECT * FROM some_data_table WHERE server_id IN (sid1,sid5,sid6,sid17,sid24...) Simple enough, however in rare cases the list of server_id's can range between 6,000 and 10,000. My question to the group is, can having so many values in a WHERE/IN clause effect query performance? Probably, yes. As always, benchmark a test case, but last time I checked (in 7.4) you'd be better creating a new temporary table for every query than use an IN list that long. It's a lot better in 8.0, I believe, so you should benchmark it there. Am I being naive about this and is there a different, better way? The server_id field is of course indexed, but it is possible that the list of selected sid's can contain almost all of the values in the some_data_table server_id index (in the situation where _all_ of the records are requested I wouldn't use the WHERE clause in the query). The some_data_table can contain millions of records for thousands of servers, so every bit of efficiency helps. Don't use a temporary table. Instead use a permanent table that contains the server ids you need and the PHP session token. Then you can create your list of server_ids once and insert it into that table associated with your sessionid. Then future queries can be simple joins against that table. SELECT some_data_table.* FROM some_data_table, session_table WHERE some_data_table.server_id = session_table.server_id AND session_table.session_id = 'foobar' You'd need a reaper process to delete old data from that table to prevent it from growing without limit, and probably a table associating session start time with sessionid to make reaping easier. Cheers, Steve ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Ultra-cheap NVRAM device
On Oct 3, 2005, at 5:02 AM, Steinar H. Gunderson wrote: I thought this might be interesting, not the least due to the extremely low price ($150 + the price of regular DIMMs): This has been posted before, and the main reason nobody got very excited is that: a) it only uses the PCI bus to provide power to the device, not for I/O b) It is limited to SATA bandwidth c) The benchmarks did not prove it to be noticeably faster than a good single SATA drive A few of us were really excited at first too, until seeing the benchmarks.. -Dan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Ultra-cheap NVRAM device
On Oct 3, 2005, at 5:02 AM, Steinar H. Gunderson wrote: I thought this might be interesting, not the least due to the extremely low price ($150 + the price of regular DIMMs): Replying before my other post came through.. It looks like their benchmarks are markedly improved since the last article I read on this. There may be more interest now.. -Dan ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Ultra-cheap NVRAM device
On Mon, 2005-10-03 at 11:15 -0600, Dan Harris wrote: On Oct 3, 2005, at 5:02 AM, Steinar H. Gunderson wrote: I thought this might be interesting, not the least due to the extremely low price ($150 + the price of regular DIMMs): This has been posted before, and the main reason nobody got very excited is that: a) it only uses the PCI bus to provide power to the device, not for I/O b) It is limited to SATA bandwidth c) The benchmarks did not prove it to be noticeably faster than a good single SATA drive A few of us were really excited at first too, until seeing the benchmarks.. Also, no ECC support. You'd be crazy to use it for anything. -jwb ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Ultra-cheap NVRAM device
Nah. It's still not right. It needs: 1= full PCI, preferably at least 64b 133MHz PCI-X, bandwidth. A RAM card should blow the doors off the fastest commodity RAID setup you can build. 2= 8-16 DIMM slots 3= a standard battery type that I can pick up spares for easily 4= ECC support If it had all those features, I'd buy it at even 2x or possibly even 3x it's current price. 8, 16, or 32GB (using 1, 2, or 4GB DIMMs respectively in an 8 slot form factor) of very fast temporary work memory (sorting anyone ;-) ). Yum. Ron -Original Message- From: Dan Harris [EMAIL PROTECTED] Sent: Oct 3, 2005 1:21 PM To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Ultra-cheap NVRAM device On Oct 3, 2005, at 5:02 AM, Steinar H. Gunderson wrote: I thought this might be interesting, not the least due to the extremely low price ($150 + the price of regular DIMMs): Replying before my other post came through.. It looks like their benchmarks are markedly improved since the last article I read on this. There may be more interest now.. -Dan ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Query seem to slow if table have more than 200 million rows
Ahmad Fajar [EMAIL PROTECTED] wrote Hi Qingqing, I don't know whether the statistic got is bad or good, this is the statistic: Please do it in this way: 1. Start postmaster with stats_start_collector=true and stats_block_level=true. 2. Use psql connect it, do something like this: test=# select pg_stat_reset(); pg_stat_reset --- t (1 row) test=# select * from pg_statio_user_indexes ; relid | indexrelid | schemaname | relname | indexrelname | idx_blks_read | idx_ blks_hit ---+++-+--+---+- - 16385 | 16390 | public | test| test_idx | 0 | 0 (1 row) test=# select count(*) from test where a = 1234; count --- 7243 (1 row) test=# select * from pg_statio_user_indexes ; relid | indexrelid | schemaname | relname | indexrelname | idx_blks_read | idx_ blks_hit ---+++-+--+---+- - 16385 | 16390 | public | test| test_idx |55 | 0 (1 row) This gives us that to get select count(*) from test where a = 1234, I have to read 55 index blocks (no index block hit since I just restart postmaster so the bufferpool is empty). Regards, Qingqing ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Comparative performance
PFC wrote: Even though this query isn't that optimized, it's still only 16 milliseconds. Why does it take this long for PHP to get the results ? Can you try pg_query'ing this exact same query, FROM PHP, and timing it with getmicrotime() ? That query took about 27 msec in actual PHP execution time. It turns out the real culprit is the following query, which interestingly enough retrieves zero rows in the case of the Economists page that I've been using for testing, yet it uses up about 1370 msec in actual runtime: SELECT topic_id1, topic_id2, topic_name, categ_id, list_name, t.title, url, page_type, rel_type, inverse_id, r.description AS rel_descrip, r.created, r.updated FROM relationship r, topic t, entry_type e WHERE ((topic_id1 = topic_id AND topic_id2 = 1252) OR (topic_id2 = topic_id and topic_id1 = 1252)) AND rel_type = type_id AND e.class_id = 2 ORDER BY rel_type, list_name; The EXPLAIN ANALYZE output, after I ran VACUUM ANALYZE on the three tables, is: Sort (cost=4035.55..4035.56 rows=1 width=131) (actual time=2110.000..2110.000 rows=0 loops=1) Sort Key: r.rel_type, t.list_name - Nested Loop (cost=36.06..4035.54 rows=1 width=131) (actual time=2110.000..2110.000 rows=0 loops=1) Join Filter: (((inner.topic_id1 = outer.topic_id) AND (inner.topic_id2 = 1252)) OR ((inner.topic_id2 = outer.topic_id) AND (inner.topic_id1 = 1252))) - Seq Scan on topic t (cost=0.00..38.34 rows=1234 width=90) (actual time=0.000..15.000 rows=1234 loops=1) - Materialize (cost=36.06..37.13 rows=107 width=45) (actual time=0.000..0.509 rows=466 loops=1234) - Merge Join (cost=30.31..35.96 rows=107 width=45) (actual time=0.000..0.000 rows=466 loops=1) Merge Cond: (outer.type_id = inner.rel_type) - Index Scan using entry_type_pkey on entry_type e (cost =0.00..3.94 rows=16 width=4) (actual time=0.000..0.000 rows=15 loops=1) Filter: (class_id = 2) - Sort (cost=30.31..31.48 rows=466 width=43) (actual time=0.000..0.000 rows=466 loops=1) Sort Key: r.rel_type - Seq Scan on relationship r (cost=0.00..9.66 rows=466 width=43) (actual time=0.000..0.000 rows=466 loops=1) Total runtime: 2110.000 ms (14 rows) The tables are as follows: CREATE TABLE entry_type ( type_id SMALLINT NOT NULL PRIMARY KEY, title VARCHAR(32) NOT NULL, rel_title VARCHAR(32), class_id SMALLINT NOT NULL DEFAULT 1, inverse_id SMALLINT, updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP) WITHOUT OIDS; CREATE TABLE topic ( topic_id serial PRIMARY KEY, topic_name VARCHAR(48) NOT NULL UNIQUE, categ_id SMALLINT NOT NULL, parent_entity INTEGER, parent_concept INTEGER, crossref_id INTEGER, list_name VARCHAR(80) NOT NULL, title VARCHAR(80), description VARCHAR(255), url VARCHAR(64), page_type SMALLINT NOT NULL, dark_ind BOOLEAN NOT NULL DEFAULT FALSE, ad_code INTEGER, created DATE NOT NULL DEFAULT CURRENT_DATE, updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP) WITHOUT OIDS; CREATE TABLE relationship ( topic_id1 INTEGER NOT NULL REFERENCES topic, topic_id2 INTEGER NOT NULL REFERENCES topic, rel_type INTEGER NOT NULL, description VARCHAR(255), created DATE NOT NULL DEFAULT CURRENT_DATE, updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (topic_id1, topic_id2, rel_type)) WITHOUT OIDS; I'm thinking that perhaps I need to set up another index with topic_id2 first and topic_id1 second. In addition, an index on entry_type.class_id may improve things. Another possibility would be to rewrite the query as a UNION. Of course, this doesn't explain how MySQL manages to execute the query in about 9 msec. The only minor differences in the schema are: entry_type.title and rel_title are char(32) in MySQL, entry_type.class_id is a tinyint, and topic.categ_id, page_type and dark_ind are also tinyints. MySQL also doesn't have the REFERENCES. A couple of interesting side notes from my testing. First is that pg_connect() took about 39 msec but mysql_connect() took only 4 msec, however, pg_pconnect() took 0.14 msec while mysql_pconnect() took 0.99 msec (all tests were repeated five times and the quoted results are averages). Second, is that PostgreSQL's performance appears to be much more consistent in certain queries. For example, the query that retrieves the list of subtopics (the names and description of economists), took 17 msec in PG, with a low of 15 (three times) and a high of 21, whereas MySQL took 60 msec on average but had a low of 22 and a high of 102 msec. Joe ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org