Re: [PERFORM] serious problems with vacuuming databases
Hi Tomas, Tomas wrote: We've decided to remove unneeded 'old' data, which means removing about 99.999% of rows from tables A, C and D (about 2 GB of data). At the beginning, the B table (containing aggregated from A, C and D) was emptied (dropped and created) and filled in with current data. Then, before the deletion the data from tables A, C, D were backed up using another tables (say A_old, C_old, D_old) filled in using . 1) drop, create and fill table B (aggregated data from A, C, D) 2) copy 'old' data from A, C and D to A_old, C_old a D_old 3) delete old data from A, C, D 4) dump data from A_old, C_old and D_old 5) truncate tables A, C, D 6) vacuum full analyze tables A, C, D, A_old, C_old and D_old I think you do some difficult database maintainance. Why you do that, if you just want to have some small piece of datas from your tables. Why don't you try something like: 1. create table A with no index (don't fill data to this table), 2. create table A_week_year inherit table A, with index you want, and some condition for insertion. (eg: table A1 you used for 1 week data of a year and so on..) 3. do this step for table B, C and D 4. if you have relation, make the relation to inherit table (optional). I think you should read the postgresql help, for more information about table inheritance. The impact is, you might have much table. But each table will only have small piece of datas, example: just for one week. And you don't have to do a difficult database maintainance like you have done. You just need to create tables for every week of data, do vacuum/analyze and regular backup. Best regards, ahmad fajar, ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] Takes too long to fetch the data from database
Hello, I have difficulty in fetching the records from the database. Database table contains more than 1 GB data. For fetching the records it is taking more the 1 hour and that's why it is slowing down the performance. please provide some help regarding improving the performance and how do I run query so that records will be fetched in a less time.
Re: [PERFORM]
Hi, I Attached here a file with details about the tables, the queries and the Explain analyze plans. Hope this can be helpful to analyze my problem 10x Doron -Original Message- From: Ragnar [mailto:[EMAIL PROTECTED] Sent: Sunday, April 09, 2006 2:37 PM To: Doron Baranes Subject: RE: [PERFORM] On sun, 2006-04-09 at 14:11 +0200, Doron Baranes wrote: Please reply to the list, not to me directly. this way others can help you too. I did vacuum database analyze a few days ago. yes, I saw that in your original post. I mentioned VACUUM FULL ANALYZE , not just VACUUM ANALYZE I'll attached a few explain plans. [explain plans deleted] These are useless. you must show us the output of EXPLAIN ANALYZE. these are output of EXPLAIN. A plan is not much use without seeing the query itself. you still have not answered the question about what indexes you have. gnari TABLES Table log.msg_info Column |Type |Modifiers +-+-- msgid | bigint | not null default nextval('log.msg_info_msgid_seq'::text) sender | character varying(255) | subject| text| size | bigint | entry_time | timestamp without time zone | default now() source_ip | cidr| origin | smallint| Indexes: msg_info_pkey primary key, btree (msgid) ddindx btree (date(entry_time)) msg_info_entry_time btree (entry_time) msg_info_sender_index btree (sender) msg_info_size btree (size) msg_info_subject btree (subject) Table log.msg_fate Column| Type | Modifiers -++ msgid | bigint | not null default nextval('log.msg_fate_msgid_seq'::text) grp_fate_id | bigint | not null default nextval('log.msg_fate_grp_fate_id_seq'::text) modid | integer| description | character varying(255) | rule_origin | bigint | action | smallint | ruleid | integer| Indexes: msg_fate_pkey primary key, btree (grp_fate_id) msg_fate_action btree (action) msg_fate_description btree (description) msg_fate_modid btree (modid) msg_fate_msgid btree (msgid) Foreign-key constraints: msgid_fkey FOREIGN KEY (msgid) REFERENCES log.msg_info(msgid) ON UPDATE CASCADE ON DELETE CASCADE Table log.msg_fate_recipients Column|Type | Modifiers -+-+--- grp_fate_id | bigint | recipient | character varying(255) | update_at | timestamp without time zone | default now() last_action | integer | zone_id | integer | direction | smallint| default 7 Indexes: msg_fate_recipients_grp_fate_id btree (grp_fate_id) msg_fate_recipients_last_action_idx btree (last_action) msg_fate_recipients_recipient_idx btree (recipient) msg_fate_recipients_update_at btree (update_at) msg_fate_recipients_zone_id btree (zone_id) Triggers: stats_for_domain AFTER INSERT ON log.msg_fate_recipients FOR EACH ROW EXECUTE PROCEDURE log.collect_stats_for_domain() stats_for_object AFTER INSERT ON log.msg_fate_recipients FOR EACH ROW EXECUTE PROCEDURE log.collect_stats_for_object() update_timestamp_last_action BEFORE UPDATE ON log.msg_fate_recipients FOR EACH ROW EXECUTE PROCEDURE log.recipients_status_changed_update() Table pineapp.zones Column | Type | Modifiers ---++- zone_id | integer| not null default nextval('pineapp.zones_zone_id_seq'::text) zone_name | character varying(20) | zone_desc | character varying(255) | zone_type | smallint | Indexes: zones_pkey primary key, btree (zone_id) zones_zone_id btree (zone_id) QUERIES *** 1) explain analyze SELECT date_trunc('hour'::text, i.entry_time) AS datetime, COUNT(fr.grp_fate_id) , SUM(i.size) FROM log.msg_info as i,log.msg_fate as f, log.msg_fate_recipients as fr WHERE i.origin = 1 AND i.msgid=f.msgid AND i.entry_time '2006-01-25' AND f.grp_fate_id=fr.grp_fate_id GROUP BY datetime order by datetime; QUERY PLAN
[PERFORM] Restore performance?
Hi I'm currently upgrading a Posgresql 7.3.2 database to a 8.1.something-good I'd run pg_dump | gzip sqldump.gz on the old system. That took about 30 hours and gave me an 90GB zipped file. Running cat sqldump.gz | gunzip | psql into the 8.1 database seems to take about the same time. Are there any tricks I can use to speed this dump+restore process up? The database contains quite alot of BLOB, thus the size. Jesper -- ./Jesper Krogh, [EMAIL PROTECTED], Jabber ID: [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Dump restore performance 7.3 - 8.1
Hi I'm currently upgrading a Posgresql 7.3.2 database to a 8.1.something-good I'd run pg_dump | gzip sqldump.gz on the old system. That took about 30 hours and gave me an 90GB zipped file. Running cat sqldump.gz | gunzip | psql into the 8.1 database seems to take about the same time. Are there any tricks I can use to speed this dump+restore process up? Neither disk-io (viewed using vmstat 1) or cpu (viewed using top) seems to be the bottleneck. The database contains quite alot of BLOB's, thus the size. Jesper -- Jesper Krogh ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Restore performance?
Jesper Krogh wrote: Hi I'm currently upgrading a Posgresql 7.3.2 database to a 8.1.something-good I'd run pg_dump | gzip sqldump.gz on the old system. That took about 30 hours and gave me an 90GB zipped file. Running cat sqldump.gz | gunzip | psql into the 8.1 database seems to take about the same time. Are there any tricks I can use to speed this dump+restore process up? If you can have both database systems up at the same time, you could pg_dump | psql. Regards, Andreas ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Restore performance?
If they both took the same amount of time, then you are almost certainly bottlenecked on gzip. Try a faster CPU or use gzip -fast. gzip does not seem to be the bottleneck, on restore is psql the nr. 1 consumer on cpu-time. Jesper Sorry for the double post. -- Jesper Krogh ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Restore performance?
I'd run pg_dump | gzip sqldump.gz on the old system. That took about 30 hours and gave me an 90GB zipped file. Running cat sqldump.gz | gunzip | psql into the 8.1 database seems to take about the same time. Are there any tricks I can use to speed this dump+restore process up? The database contains quite alot of BLOB, thus the size. You could try slony - it can do almost-zero-downtime upgrades. Greetings Marcin Mank ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] OT: Data structure design question: How do they count
Brendan Duddridge wrote: Now, initially I thought they would just pre-compute these counts, but the problem is, when you click on any of the above attribute values, they reduce the remaining possible set of matching products (and set of possible remaining attributes and attribute values) by the amount displayed next to the attribute value selected. You can click on any combination of attribute values to filter down the remaining set of matching products, so there's a large combination of paths you can take to arrive at a set of products you might be interested in. Do you think they are pre-computed? Or do you think they might use a query similar to the following?: Pre-computed almost certainly, but at what level of granularity? And with application-level caching? select pav.attribute_value_id, count(p.product_id) from product_attribute_value pav, attribute a, product p where a.attribute_id in (some set of attribute ids) and pav.product_id = p.product_id and pav.attribute_id = a.attribute_id and p.product_id in (select product_id from category_product where category_id = some category id) and p.is_active = 'true' group by pav.attribute_value_id; It would seem to me that although the above query suggests a normalized database structure, that joining with 3 tables plus a 4th table in the sub-query with an IN qualifier and grouping to get the product counts would take a VERY long time, especially on a possible result set of 1,260,658 products. Hmm - I'm not sure I'd say this was necessarily normalised. In the example you gave there were three definite types of attribute: 1. Price range ( 20, 20-50, ...) 2. Product type (lighting, rugs, ...) 3. Store (art.com, homeannex, ...) Your example discards this type information. I'm also not sure it lets store A sell widgets for 19.99 and B for 25.99 So - let's look at how we might break this down into simple relations: product_types (product_id, prod_type, prod_subtype) product_availability (product_id, store_id, price_range) and so on for each set of parameters. Then, if PG isn't calculating fast enough I'd be tempted to throw in a summary table: product_counts(store_id, price_range, prod_type, prod_subtype, ..., num_products) Then total over this for the top-level queries. I'd also cache common top-level queries at the applicaton level anyway. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.
On Fri, 2006-04-07 at 19:05 -0400, Tom Lane wrote: It's plausible though that we are seeing contention across members of the LWLock array, with the semop storm just being a higher-level symptom of the real hardware-level problem. You might try increasing LWLOCK_PADDED_SIZE to 64 or even 128, see src/backend/storage/lmgr/lwlock.c (this is something that does exist in 8.1, so it'd be easy to try). pSeries cache lines are 128 bytes wide, so I'd go straight to 128. If you're renting all 8 CPUs, I'd drop to 4 and try that instead. With 8 CPUs the contention will vary according to what each CPU is doing at any one time - when they all hit the contention spot, things will get worse. The pSeries has good CPUs and great caching, so I'd expect contention to be somewhat more apparent as a bottleneck. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.
Simon Riggs wrote: pSeries cache lines are 128 bytes wide, so I'd go straight to 128. Hello :) OK, that line of code is: #define LWLOCK_PADDED_SIZE (sizeof(LWLock) = 16 ? 16 : 32) What should I change this to? I don't understand the syntax of the = 16 ? : stuff... would a simple #define LWLOCK_PADDED_SIZE 128 be sufficient? If you're renting all 8 CPUs, I'd drop to 4 and try that instead. With 8 CPUs the contention will vary according to what each CPU is doing at any one time - when they all hit the contention spot, things will get worse. We have a physical machine installed in our rack at the data centre, rather than renting a virtual partition of a real machine... I'm not sure how to enable/disable CPUs even with the help of 'smitty' :) The pSeries has good CPUs and great caching, so I'd expect contention to be somewhat more apparent as a bottleneck. Yep, I expected 32MB of 'L3' cache would yield impressive results :) Cheers, Gavin. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM]
On mán, 2006-04-10 at 10:30 +0200, Doron Baranes wrote: I Attached here a file with details about the tables, the queries and the Explain analyze plans. Hope this can be helpful to analyze my problem first query: explain analyze SELECT date_trunc('hour'::text, i.entry_time) AS datetime, COUNT(fr.grp_fate_id) , SUM(i.size) FROM log.msg_info as i,log.msg_fate as f, log.msg_fate_recipients as fr WHERE i.origin = 1 AND i.msgid=f.msgid AND i.entry_time '2006-01-25' AND f.grp_fate_id=fr.grp_fate_id GROUP BY datetime order by datetime; if i.origin has high selectivity (if very few rows in msg_info have origin=1 in this case), an index on msg_info(orgin) can help. unfortunately, as you are using 7.4 and this is a smallint column, you would have to change the query slightly to make use of that: WHERE i.origin = 1::smallint if more than a few % or the rows have this value, then this will not help the index on msg_info(entry_time) is unlikely to be used, because a simple '' comparison has little selectivity. try to add an upper limit to the query to make it easier for the planner so see that few rows would be returned (if that is the case) for example: AND i.entry_time BETWEEN '2006-01-25' AND '2006-05-01' this might also improve the estimated number of groups on datetime (notice: estimated rows=1485233, real=623), although I am not sure if that will help you I do now know how good the planner is with dealing with the date_trunc('hour'::text, i.entry_time), so possibly you could get some improvement with an indexed entry_hour column populated with trigger or by your application, and change your query to: explain analyze SELECT i.entry_hour, COUNT(fr.grp_fate_id) , SUM(i.size) FROM log.msg_info as i,log.msg_fate as f, log.msg_fate_recipients as fr WHERE i.origin = 1 AND i.msgid=f.msgid AND i.entry_hour BETWEEN '2006-01-25:00:00' AND '2006-05-01:00:00' AND f.grp_fate_id=fr.grp_fate_id GROUP BY entry_hour order by entry_hour; (adjust the upper limit to your reality) do these suggestions help at all? gnari ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Restore performance?
Jesper Krogh [EMAIL PROTECTED] writes: gzip does not seem to be the bottleneck, on restore is psql the nr. 1 consumer on cpu-time. Hm. We've seen some situations where readline mistakenly decides that the input is interactive and wastes lots of cycles doing useless processing (like keeping history). Try psql -n and see if that helps. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.
Gavin Hamill [EMAIL PROTECTED] writes: would a simple #define LWLOCK_PADDED_SIZE 128 be sufficient? Yeah, that's fine. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Restore performance?
On 4/10/06, Jesper Krogh [EMAIL PROTECTED] wrote: HiI'm currently upgrading a Posgresql 7.3.2 database to a8.1.something-goodI'd run pg_dump | gzip sqldump.gzon the old system. That took about30 hours and gave me an 90GB zipped file. Running cat sqldump.gz | gunzip | psqlinto the 8.1 database seems to take about the same time. Are thereany tricks I can use to speed this dump+restore process up?was the last restore successfull ? if so why do you want to repeat ?some tips1. run new version of postgres in a different port and pipe pg_dump to psqlthis may save the CPU time of compression , there is no need for a temporary dump file.pg_dump | /path/to/psql813 -p 54XX newdb2. use new version of pg_dump to dump the old database as new version is supposed to be wiser.3. make sure you are trapping the restore errors properly psql newdb 21 | cat | tee err works for me.The database contains quite alot of BLOB, thus the size. Jesper--./Jesper Krogh, [EMAIL PROTECTED], Jabber ID: [EMAIL PROTECTED]---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Restore performance?
sorry for the post , i didn' saw the other replies only after posting.On 4/10/06, Rajesh Kumar Mallah [EMAIL PROTECTED] wrote: On 4/10/06, Jesper Krogh [EMAIL PROTECTED] wrote: HiI'm currently upgrading a Posgresql 7.3.2 database to a8.1.something-goodI'd run pg_dump | gzip sqldump.gzon the old system. That took about30 hours and gave me an 90GB zipped file. Running cat sqldump.gz | gunzip | psqlinto the 8.1 database seems to take about the same time. Are thereany tricks I can use to speed this dump+restore process up? was the last restore successfull ? if so why do you want to repeat ?some tips1. run new version of postgres in a different port and pipe pg_dump to psqlthis may save the CPU time of compression , there is no need for a temporary dump file.pg_dump | /path/to/psql813 -p 54XX newdb2. use new version of pg_dump to dump the old database as new version is supposed to be wiser.3. make sure you are trapping the restore errors properly psql newdb 21 | cat | tee err works for me. The database contains quite alot of BLOB, thus the size. Jesper--./Jesper Krogh, [EMAIL PROTECTED], Jabber ID: [EMAIL PROTECTED]---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Takes too long to fetch the data from database
what is the query ?use LIMIT or a restricting where clause.regdsmallah.On 4/10/06, soni de [EMAIL PROTECTED] wrote:Hello, I have difficulty in fetching the records from the database. Database table contains more than 1 GB data. For fetching the records it is taking more the 1 hour and that's why it is slowing down the performance. please provide some help regarding improving the performance and how do I run query so that records will be fetched in a less time.
Re: [PERFORM] Restore performance?
4. fsync can also be turned off while loading huge dataset , but seek others comments too (as study docs) as i am not sure about the reliability. i think it can make a lot of difference. On 4/10/06, Jesper Krogh [EMAIL PROTECTED] wrote: Rajesh Kumar Mallah wrote: I'd run pg_dump | gzip sqldump.gzon the old system. That took about 30 hours and gave me an 90GB zipped file. Running cat sqldump.gz | gunzip | psql into the 8.1 database seems to take about the same time. Are there any tricks I can use to speed this dump+restore process up? was the last restore successfull ? if so why do you want to repeat ? about the same time == Estimated guess from restoring a few tablesI was running a testrun, without disabling updates to the productiondatabase, the real run is scheduled for easter where there hopefully is no users on the system. So I need to repeat, I'm just trying to get afeelingabout how long time I need to allocate for the operation. 1. run new version of postgres in a different port and pipe pg_dump to psql this may save the CPU time of compression , there is no need for a temporary dump file. pg_dump | /path/to/psql813-p 54XX newdbI'll do that. It is a completely different machine anyway. 2. use new version of pg_dump to dump the old database as new version is supposed to be wiser.Check. 3. make sure you are trapping the restore errors properly psql newdb 21 | cat | tee err works for me. Thats noted.--Jesper Krogh, [EMAIL PROTECTED]
Re: [PERFORM] Restore performance?
Rajesh Kumar Mallah wrote: 4. fsync can also be turned off while loading huge dataset , but seek others comments too (as study docs) as i am not sure about the reliability. i think it can make a lot of difference. Also be sure to increase maintenance_work_mem so that index creation goes faster. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Better index stategy for many fields with few values
Hi,I want to optimize something like this.- My items table:code int -- can take one of 100 valuesproperty varchar(250) -- can take one of 5000 valuesparam01 char(10) -- can take one of 10 valuesparam02 char(10) -- can take one of 10 values...[ 20 similar columns }...parama20 char(10) -- can take one of 10 values- The kind of query I want to optimize:select * from itemswhere code betwwen 5 and 22and param01 = 'P'and param02 = 'G'...[ all the 20 paramXX columns are used in the query}...and param20 = 'C';How can I optimize this kind of query? I was thinking about using a multicolumns index, but I have read that we should limit multicolumns indice to at most 2 or 3 columns. If that's true then 22 columns for a multicolumn incdex seems way too much. Or maybe it is workable as every column uses only a very limited set of values?I was also thinking about about using a functional index. What do you think would be the best solution in such a case?Thanks.Oscar Blab-away for as little as 1¢/min. Make PC-to-Phone Calls using Yahoo! Messenger with Voice.
Re: [PERFORM] Better index stategy for many fields with few values
- My items table: code int -- can take one of 100 values property varchar(250) -- can take one of 5000 values param01 char(10) -- can take one of 10 values param02 char(10) -- can take one of 10 values ... [ 20 similar columns } ... parama20 char(10) -- can take one of 10 values Instead of 20 columns, you could instead use a param field containing an array of 20 TEXT fields. Then create a simple index on (code, param) and SELECT WHERE code BETWEEN ... AND param = '{P,G,,C}' If you don't want to modify your structure, you can create a functional index on an array {param1...param20}, but your queries will be a bit uglier. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Takes too long to fetch the data from database
Rajesh Kumar Mallah wrote: what is the query ? use LIMIT or a restricting where clause. You could also use a cursor. Joshua D. Drake regds mallah. On 4/10/06, *soni de* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Hello, I have difficulty in fetching the records from the database. Database table contains more than 1 GB data. For fetching the records it is taking more the 1 hour and that's why it is slowing down the performance. please provide some help regarding improving the performance and how do I run query so that records will be fetched in a less time. -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Restore performance?
On Apr 10, 2006, at 3:55 AM, Jesper Krogh wrote: I'd run pg_dump | gzip sqldump.gz on the old system. That took about 30 hours and gave me an 90GB zipped file. Running cat sqldump.gz | gunzip | psql into the 8.1 database seems to take about the same time. Are there any tricks I can use to speed this dump+restore process up? The database contains quite alot of BLOB, thus the size. Well, your pg_dump command lost your BLOBs since the plain text format doesn't support them. But once you use the -Fc format on your dump and enable blob backups, you can speed up reloads by increasing your checkpoint segments to a big number like 256 and the checkpoint timeout to something like 10 minutes. All other normal tuning parameters should be what you plan to use for your normal operations, too. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.
Tom Lane wrote: This is unfortunately not going to help you as far as getting that machine into production now (unless you're brave enough to run CVS tip as production, which I certainly am not). I'm afraid you're most likely going to have to ship that pSeries back at the end of the month, but while you've got it it'd be awfully nice if we could use it as a testbed We have PSeries boxes here that won't be going away anytime soon. If there are any specific test cases that need to run, I should be able to find the time to do it. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] OT: Data structure design question: How do they count so fast?
Hi Richard (and anyone else who want's to comment!), I'm not sure it will really work pre-computed. At least not in an obvious way (for me! :-)) It's fine to display a pre-computed list of product counts for the initial set of attribute and attribute values, but we need to be able to display the counts of products for any combination of selected attribute values. Once an attribute value is picked that reduces the set of products to a small enough set, the queries are fast, so, perhaps we just need to pre-compute the counts for combinations of attribute values that lead to a high count of products. Hence, our thought on building a decision-tree type data structure. This would store the various combinations of attributes and attribute values, along with the product count, along with a list of attributes and attribute values that are applicable for the current set of selected attribute values. This sounds like it could get rather complicated, so we were hoping someone might have an idea on a much simpler solution. Thanks, Brendan Duddridge | CTO | 403-277-5591 x24 | [EMAIL PROTECTED] ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com On Apr 10, 2006, at 3:23 AM, Richard Huxton wrote: Brendan Duddridge wrote: Now, initially I thought they would just pre-compute these counts, but the problem is, when you click on any of the above attribute values, they reduce the remaining possible set of matching products (and set of possible remaining attributes and attribute values) by the amount displayed next to the attribute value selected. You can click on any combination of attribute values to filter down the remaining set of matching products, so there's a large combination of paths you can take to arrive at a set of products you might be interested in. Do you think they are pre-computed? Or do you think they might use a query similar to the following?: Pre-computed almost certainly, but at what level of granularity? And with application-level caching? select pav.attribute_value_id, count(p.product_id) from product_attribute_value pav, attribute a, product p where a.attribute_id in (some set of attribute ids) and pav.product_id = p.product_id and pav.attribute_id = a.attribute_id and p.product_id in (select product_id from category_product where category_id = some category id) and p.is_active = 'true' group by pav.attribute_value_id; It would seem to me that although the above query suggests a normalized database structure, that joining with 3 tables plus a 4th table in the sub-query with an IN qualifier and grouping to get the product counts would take a VERY long time, especially on a possible result set of 1,260,658 products. Hmm - I'm not sure I'd say this was necessarily normalised. In the example you gave there were three definite types of attribute: 1. Price range ( 20, 20-50, ...) 2. Product type (lighting, rugs, ...) 3. Store (art.com, homeannex, ...) Your example discards this type information. I'm also not sure it lets store A sell widgets for 19.99 and B for 25.99 So - let's look at how we might break this down into simple relations: product_types (product_id, prod_type, prod_subtype) product_availability (product_id, store_id, price_range) and so on for each set of parameters. Then, if PG isn't calculating fast enough I'd be tempted to throw in a summary table: product_counts(store_id, price_range, prod_type, prod_subtype, ..., num_products) Then total over this for the top-level queries. I'd also cache common top-level queries at the applicaton level anyway. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] bad performance on Solaris 10
Chris, Just to make sure the x4100 config is similar to your Linux system, can you verify the default setting for disk write cache and make sure they are both enabled or disabled. Here's how to check in Solaris. As root, run format -e - pick a disk - cache - write_cache - display Not sure how to do it on Linux though! Regards, -Robert I don't have access to the machine for the next few days due to eh... let's call it firewall accident ;), but it might very well be that it was off on the x4100 (I know it's on the smaller Linux box). That together with the bad default sync method can definitely explain the strangely slow out of box performance I got. So thanks again for explaining this to me :) Bye, Chris. Just for completeness: I checked now using the above commands and can confirm the write cache was disabled on the x4100 and was on on Linux. Bye, Chris. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Encouraging multi-table join order
Dan Harris [EMAIL PROTECTED] writes: I have a query that is intended to select from multiple small tables to get a limited subset of incidentid and then join with a very large table. One of the operations will require a sequential scan, but the planner is doing the scan on the very large table before joining the small ones, resulting in a huge amount of disk I/O. How would I make this query join the large table only after narrowing down the possible selections from the smaller tables? This is running on version 8.0.3. That's very strange --- the estimated cost of the seqscan is high enough that the planner should have chosen a nestloop with inner indexscan on the big table. I'm not sure about the join-order point, but the hash plan for the first join seems wrong in any case. Um, you do have an index on eventactivity.incidentid, right? What's the datatype(s) of the incidentid columns? What happens to the plan if you turn off enable_hashjoin and enable_mergejoin? regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Encouraging multi-table join order
Tom Lane wrote: That's very strange --- the estimated cost of the seqscan is high enough that the planner should have chosen a nestloop with inner indexscan on the big table. I'm not sure about the join-order point, but the hash plan for the first join seems wrong in any case. Um, you do have an index on eventactivity.incidentid, right? What's the datatype(s) of the incidentid columns? What happens to the plan if you turn off enable_hashjoin and enable_mergejoin? regards, tom lane Yes, eventactivity.incidentid is indexed. The datatype is varchar(40). Although, by checking this, I noticed that k_h.incidentid was varchar(100). Perhaps the difference in length between the keys caused the planner to not use the fastest method? I have no defense as to why those aren't the same.. I will make them so and check. Here's the EXPLAIN analyze with enable_hashjoin = off and enable_mergejoin = off : Limit (cost=4226535.73..4226544.46 rows=698 width=82) (actual time=74339.016..74356.521 rows=888 loops=1) - Unique (cost=4226535.73..4226544.46 rows=698 width=82) (actual time=74339.011..74354.073 rows=888 loops=1) - Sort (cost=4226535.73..4226537.48 rows=698 width=82) (actual time=74339.003..74344.031 rows=3599 loops=1) Sort Key: eventmain.entrydate, eventmain.incidentid, eventgeo.eventlocation, eventactivity.recordtext - Nested Loop (cost=0.00..4226502.76 rows=698 width=82) (actual time=921.325..74314.959 rows=3599 loops=1) - Nested Loop (cost=0.00..4935.61 rows=731 width=72) (actual time=166.354..14638.308 rows=1162 loops=1) - Nested Loop (cost=0.00..2482.47 rows=741 width=50) (actual time=150.396..7348.013 rows=1162 loops=1) - Index Scan using k_h_id_idx on k_h (cost=0.00..217.55 rows=741 width=14) (actual time=129.540..1022.243 rows=1162 loops=1) Index Cond: (id = 33396) Filter: ((entrydate = '2006-01-01 00:00:00'::timestamp without time zone) AND (entrydate '2006-04-08 00:00:00'::timestamp without time zone)) - Index Scan using eventgeo_incidentid_idx on eventgeo (cost=0.00..3.04 rows=1 width=36) (actual time=5.260..5.429 rows=1 loops=1162) Index Cond: ((eventgeo.incidentid)::text = (outer.incidentid)::text) - Index Scan using eventmain_incidentid_idx on eventmain (cost=0.00..3.30 rows=1 width=22) (actual time=5.976..6.259 rows=1 loops=1162) Index Cond: ((eventmain.incidentid)::text = (outer.incidentid)::text) - Index Scan using eventactivity1 on eventactivity (cost=0.00..5774.81 rows=20 width=52) (actual time=29.768..51.334 rows=3 loops=1162) Index Cond: ((outer.incidentid)::text = (eventactivity.incidentid)::text) Filter: ' '::text || (recordtext)::text) || ' '::text) ~~ '%HAL%'::text) AND (entrydate = '2006-01-01 00:00:00'::timestamp without time zone) AND (entrydate '2006-04-08 00:00:00'::timestamp without time zone)) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Encouraging multi-table join order
Dan Harris [EMAIL PROTECTED] writes: Yes, eventactivity.incidentid is indexed. The datatype is varchar(40). Although, by checking this, I noticed that k_h.incidentid was varchar(100). Perhaps the difference in length between the keys caused the planner to not use the fastest method? No, the planner wouldn't care about that. Here's the EXPLAIN analyze with enable_hashjoin = off and enable_mergejoin = off : OK, so it does consider the right plan, but it's estimating it'll take longer than the other one. One thing that's very strange is that the estimated number of rows out has changed ... did you re-ANALYZE since the previous message? - Index Scan using eventactivity1 on eventactivity (cost=0.00..5774.81 rows=20 width=52) (actual time=29.768..51.334 rows=3 loops=1162) Index Cond: ((outer.incidentid)::text = (eventactivity.incidentid)::text) Filter: ' '::text || (recordtext)::text) || ' '::text) ~~ '%HAL%'::text) AND (entrydate = '2006-01-01 00:00:00'::timestamp without time zone) AND (entrydate '2006-04-08 00:00:00'::timestamp without time zone)) So it's estimating 5775 cost units per probe into eventactivity, which is pretty high --- it must think that a lot of rows will be retrieved by the index (way more than the 20 or so it thinks will get past the filter condition). What does the pg_stats entry for eventactivity.incidentid contain? It might be worth increasing the statistics target for that column to try to get a better estimate. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] pgmemcache
I was wondering if anyone on the list has a successful installation of pgmemcache running that uses LISTEN/NOTIFY to signal a successfully completed transaction, i.e., to get around the fact that TRIGGERS are transaction unaware. Or perhaps any other information regarding a successful deployment of pgmemcache. The information available on the web/groups is pretty scant. Any information would be greatly appreciated! ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] slow IN clause
I have a slow sql: SELECT * FROM mytable WHERE id IN (1,3,5,7,3k here...); mytable is about 10k rows. if don't use the IN clause, it will cost 0,11 second, otherwise it will cost 2.x second I guess pg use linear search to deal with IN clause, is there any way to let pg use other search method with IN clause? (ex.Binary Search or hash Search) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] slow IN clause
On lun, 2006-04-10 at 12:44 +0800, Qingqing Zhou wrote: [EMAIL PROTECTED] wrote I have a slow sql: SELECT * FROM mytable WHERE id IN (1,3,5,7,3k here...); mytable is about 10k rows. if don't use the IN clause, it will cost 0,11 second, otherwise it will cost 2.x second I guess pg use linear search to deal with IN clause, is there any way to let pg use other search method with IN clause? (ex.Binary Search or hash Search) If you can put (1, 3, .., 3k) in a table, PG may choose a hash join. And maybe using SELECT * FROM yourtable WHERE id 6002 AND id % 2 = 1; turns out to be faster, if we are allowed to extrapolate from the example. V. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org