Re: [PERFORM] Index problem
What causes this behaviour? is there any workaround? Suggestions? How many rows are there in the table, and can you post the 'explain analyze' for both queries after doing a 'vacuum verbose analyze [tablename]'? There are about 2500 rows in that table. 1st query explain analyze: Seq Scan on PRIORITY_STATISTICS (cost=0.00..491.44 rows=127 width=12) (actual time=98.58..98.58 rows=0 loops=1) Total runtime: 98.74 msec 2nd query explain analyze: NOTICE: QUERY PLAN: Index Scan using PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id on PRIORITY_STATISTICS (cost=0.00..394.06 rows=102 width=12) (actual time=20.93..20.93 rows=0 loops=1) Total runtime: 21.59 msec Any help? Rigmor Cheers Matt --- Incoming mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.515 / Virus Database: 313 - Release Date: 01.09.2003 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.515 / Virus Database: 313 - Release Date: 01.09.2003 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Index problem
There are about 2500 rows in that table. 1st query explain analyze: Seq Scan on PRIORITY_STATISTICS (cost=0.00..491.44 rows=127 width=12) (actual time=98.58..98.58 rows=0 loops=1) Total runtime: 98.74 msec 2nd query explain analyze: NOTICE: QUERY PLAN: Index Scan using PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id, [snip] PRIORITY_STATISTICS_reo_id on PRIORITY_STATISTICS (cost=0.00..394.06 rows=102 width=12) (actual time=20.93..20.93 rows=0 loops=1) Total runtime: 21.59 msec With only 2500 rows the planner could be deciding that it's going to have to read every disk block to do an index scan anyway, so it might as well do a sequential scan. If the pages are in fact in the kernel cache then the compute time will dominate, not the IO time, so it ends up looking like a bad plan, but it's probably not really such a bad plan... Is your effective_cache_size set to something sensibly large? You could also try decreasing cpu_index_tuple_cost and cpu_tuple_cost. These will affect all your queries though, so what you gain on one might be lost on another. Matt ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] performance hit when joining with a view?
Palle Girgensohn [EMAIL PROTECTED] writes: Vydefinition: SELECT p.userid, p.giver, p.first_name, p.last_name, p.email, p.default_language, p.created, p.created_by, w.course_id FROM (person p LEFT JOIN wiol w ON ((p.userid = w.userid))); explain analyze select p.pim_id, p.recipient, p.sender, p.message, p.ts, p.type, case when sender.userid is not null then sender.first_name || ' ' || sender.last_name else null end as sender_name, sender.course_id is not null as is_online from pim p left outer join person_wiol_view sender on (sender.userid = p.sender) where p.recipient = 'axto6551' and p.type = 0 limit 1; explain analyze select p.pim_id, p.recipient, p.sender, p.message, p.ts, p.type, case when sender.userid is not null then sender.first_name || ' ' || sender.last_name else null end as sender_name, w.course_id is not null as is_online from pim p left outer join person sender on (sender.userid = p.sender) left join wiol w on (w.userid=sender.userid) where p.recipient = 'axto6551' and p.type = 0 limit 1; These are not actually the same query. In the former case the implicit parenthesization of the joins is pim left join (person left join wiol) whereas in the latter case the implicit parenthesization is left-to-right: (pim left join person) left join wiol Since the only restriction conditions you have provided are on pim, the first parenthesization implies forming the entire join of person and wiol :-(. If you were using plain joins then the two queries would be logically equivalent, but outer joins are in general not associative, so the planner will not consider re-ordering them. There is some work in 7.4 to make the planner smarter about outer joins, but offhand I don't think any of it will improve results for this particular example. I have seen some academic papers about how to prove that a particular pair of outer join operators can safely be swapped (as I think is true in this example). Some knowledge of that sort may eventually get into the planner, but it ain't there now. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] upping checkpoints on production server
On Wed, 2003-09-24 at 17:57, Tom Lane wrote: Robert Treat [EMAIL PROTECTED] writes: In .conf file I have default checkpoints set to 3, but I noticed that in my pg_xlog directory I always seem to have at least 8 log files. Since this is more than the suggested 7, I'm wondering if this means I ought to bump my checkpoint segments up to 4? Hm. What is the typical delta in the mod times of the log files? It sounds like you are in a regime where checkpoints are always triggered by checkpoint_segments and never by checkpoint_timeout, in which case increasing the former might be a good idea. Or decrease the latter, but that could put a drag on performance. # ls -lht /var/lib/pgsql/data/pg_xlog/ total 129M -rw---1 postgres postgres 16M Sep 25 11:12 006E0059 -rw---1 postgres postgres 16M Sep 25 11:12 006E005A -rw---1 postgres postgres 16M Sep 25 11:08 006E0058 -rw---1 postgres postgres 16M Sep 25 11:05 006E005F -rw---1 postgres postgres 16M Sep 25 11:02 006E005E -rw---1 postgres postgres 16M Sep 25 10:59 006E005D -rw---1 postgres postgres 16M Sep 25 10:55 006E005B -rw---1 postgres postgres 16M Sep 25 10:51 006E005C #ls -lht /var/lib/pgsql/data/pg_xlog/ total 129M -rw---1 postgres postgres 16M Sep 25 10:52 006E0054 -rw---1 postgres postgres 16M Sep 25 10:51 006E0053 -rw---1 postgres postgres 16M Sep 25 10:49 006E0052 -rw---1 postgres postgres 16M Sep 25 10:45 006E0059 -rw---1 postgres postgres 16M Sep 25 10:40 006E0057 -rw---1 postgres postgres 16M Sep 25 10:37 006E0058 -rw---1 postgres postgres 16M Sep 25 10:33 006E0056 -rw---1 postgres postgres 16M Sep 25 10:29 006E0055 from the 7.4 docs: Checkpoints are fairly expensive because they force all dirty kernel buffers to disk using the operating system sync() call. Busy servers may fill checkpoint segment files too quickly, causing excessive checkpointing. it goes on to mention checkpoint_warning, which I don't have in 7.3, but I think this is a case where I'd likely see those warnings. The server in question has a fairly high write/read ratio and is fairly busy (over 100 tps iirc). since more often than not I don't make it to 5 minutes, seems like upping checkpoint segments is the way to go, right? Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] TPC-R benchmarks
Title: Message I am running TPC-R benchmarks with a scale factor of 1, which correspond to approximately 1 GB database size on PostgreSQL 7.3.4 installed on CygWin on Windows XP. I dedicated 128 MB of shared memory to my postrges installation. Most of the queries were able to complete in a matter of minutes, but query 17 was taking hours and hours. The query is show below. Is there any way to optimize it ? selectsum(l_extendedprice) / 7.0 as avg_yearlyfromlineitem,partwherep_partkey = l_partkeyand p_brand = 'Brand#11'and p_container = 'SM PKG'and l_quantity (select0.2 * avg(l_quantity)fromlineitemwherel_partkey = p_partkey); Thanks. Oleg * This e-mail may contain privileged or confidential material intended for the named recipient only. If you are not the named recipient, delete this message and all attachments. Unauthorized reviewing, copying, printing, disclosing, or otherwise using information in this e-mail is prohibited. We reserve the right to monitor e-mail sent through our network. *
Re: [PERFORM] TPC-R benchmarks
I am running TPC-H with scale factor of 1 on RedHat7.2 with the kernel 2.5.74. Q17 can always finish in about 7 seconds on my system. The execution plan is: Aggregate (cost=780402.43..780402.43 rows=1 width=48) - Nested Loop (cost=0.00..780397.50 rows=1973 width=48) Join Filter: (inner.l_quantity (subplan)) - Seq Scan on part (cost=0.00..8548.00 rows=197 width=12) Filter: ((p_brand = 'Brand#31'::bpchar) AND (p_container = 'LG CASE'::bpchar)) - Index Scan using i_l_partkey on lineitem (cost=0.00..124.32 rows=30 width=36) Index Cond: (outer.p_partkey = lineitem.l_partkey) SubPlan - Aggregate (cost=124.40..124.40 rows=1 width=11) - Index Scan using i_l_partkey on lineitem (cost=0.00..124.32 rows=30 width=11) Index Cond: (l_partkey = $0) (11 rows) Hope this helps, Jenny On Thu, 2003-09-25 at 12:40, Oleg Lebedev wrote: I am running TPC-R benchmarks with a scale factor of 1, which correspond to approximately 1 GB database size on PostgreSQL 7.3.4 installed on CygWin on Windows XP. I dedicated 128 MB of shared memory to my postrges installation. Most of the queries were able to complete in a matter of minutes, but query 17 was taking hours and hours. The query is show below. Is there any way to optimize it ? select sum(l_extendedprice) / 7.0 as avg_yearly from lineitem, part where p_partkey = l_partkey and p_brand = 'Brand#11' and p_container = 'SM PKG' and l_quantity ( select 0.2 * avg(l_quantity) from lineitem where l_partkey = p_partkey ); Thanks. Oleg * This e-mail may contain privileged or confidential material intended for the named recipient only. If you are not the named recipient, delete this message and all attachments. Unauthorized reviewing, copying, printing, disclosing, or otherwise using information in this e-mail is prohibited. We reserve the right to monitor e-mail sent through our network. * ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] TPC-R benchmarks
The index is created by: create index i_l_partkey on lineitem (l_partkey); I do not have any foreign key defined. Does the spec require foreign keys? When you create a foreign key reference, does PG create an index automatically? Can you try with the index? Jenny On Thu, 2003-09-25 at 14:39, Oleg Lebedev wrote: Seems like in your case postgres uses an i_l_partkey index on lineitem table. I have a foreign key constraint defined between the lineitem and part table, but didn't create an special indexes. Here is my query plan: - Aggregate (cost=1517604222.32..1517604222.32 rows=1 width=31) - Hash Join (cost=8518.49..1517604217.39 rows=1969 width=31) Hash Cond: (outer.l_partkey = inner.p_partkey) Join Filter: (outer.l_quantity (subplan)) - Seq Scan on lineitem (cost=0.00..241889.15 rows=6001215 widt h=27) - Hash (cost=8518.00..8518.00 rows=197 width=4) - Seq Scan on part (cost=0.00..8518.00 rows=197 width=4) Filter: ((p_brand = 'Brand#11'::bpchar) AND (p_contai ner = 'SM PKG'::bpchar)) SubPlan - Aggregate (cost=256892.28..256892.28 rows=1 width=11) - Seq Scan on lineitem (cost=0.00..256892.19 rows=37 w idth=11) Filter: (l_partkey = $0) -Original Message- From: Jenny Zhang [mailto:[EMAIL PROTECTED] Sent: Thursday, September 25, 2003 3:33 PM To: Oleg Lebedev Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: [PERFORM] TPC-R benchmarks I am running TPC-H with scale factor of 1 on RedHat7.2 with the kernel 2.5.74. Q17 can always finish in about 7 seconds on my system. The execution plan is: Aggregate (cost=780402.43..780402.43 rows=1 width=48) - Nested Loop (cost=0.00..780397.50 rows=1973 width=48) Join Filter: (inner.l_quantity (subplan)) - Seq Scan on part (cost=0.00..8548.00 rows=197 width=12) Filter: ((p_brand = 'Brand#31'::bpchar) AND (p_container = 'LG CASE'::bpchar)) - Index Scan using i_l_partkey on lineitem (cost=0.00..124.32 rows=30 width=36) Index Cond: (outer.p_partkey = lineitem.l_partkey) SubPlan - Aggregate (cost=124.40..124.40 rows=1 width=11) - Index Scan using i_l_partkey on lineitem (cost=0.00..124.32 rows=30 width=11) Index Cond: (l_partkey = $0) (11 rows) Hope this helps, Jenny On Thu, 2003-09-25 at 12:40, Oleg Lebedev wrote: I am running TPC-R benchmarks with a scale factor of 1, which correspond to approximately 1 GB database size on PostgreSQL 7.3.4 installed on CygWin on Windows XP. I dedicated 128 MB of shared memory to my postrges installation. Most of the queries were able to complete in a matter of minutes, but query 17 was taking hours and hours. The query is show below. Is there any way to optimize it ? select sum(l_extendedprice) / 7.0 as avg_yearly from lineitem, part where p_partkey = l_partkey and p_brand = 'Brand#11' and p_container = 'SM PKG' and l_quantity ( select 0.2 * avg(l_quantity) from lineitem where l_partkey = p_partkey ); Thanks. Oleg * This e-mail may contain privileged or confidential material intended for the named recipient only. If you are not the named recipient, delete this message and all attachments. Unauthorized reviewing, copying, printing, disclosing, or otherwise using information in this e-mail is prohibited. We reserve the right to monitor e-mail sent through our network. * * This e-mail may contain privileged or confidential material intended for the named recipient only. If you are not the named recipient, delete this message and all attachments. Unauthorized reviewing, copying, printing, disclosing, or otherwise using information in this e-mail is prohibited. We reserve the right to monitor e-mail sent through our network. * ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Indices arent being used
Hi guys Im running a Datawarehouse benchmark (APB-1) on PostgreSql. The objective is to choose which of the to main db (PostgreSQL, MySQL) is fastest. I've run into a small problem which I hope could be resolved here. I'm trying to speed up this query: select count(*) from actvars, prodlevel where actvars.product_level=prodlevel.code_level and prodlevel.division_level='OY3S5LAPALL6'; ACTVARS is a fact table of aproximatly 16 million rows, PRODLEVEL has 20700 rows. Both have btree indices. I executed the query and it took me almost half an hour to execute! Running the same query on MySQL the result came 6 seconds after. As you can see there is a large differences between execution times. After running an explain: Aggregate (cost=3123459.62..3123459.62 rows=1 width=32) - Merge Join (cost=3021564.79..3119827.17 rows=1452981 width=32) Merge Cond: (outer.product_level = inner.code_level) - Sort (cost=3020875.00..3060938.81 rows=16025523 width=16) Sort Key: actvars.product_level - Seq Scan on actvars (cost=0.00..365711.23 rows=16025523 width=16) - Sort (cost=689.79..694.48 rows=1877 width=16) Sort Key: prodlevel.code_level - Seq Scan on prodlevel (cost=0.00..587.75 rows=1877 width=16) Filter: (division_level = 'OY3S5LAPALL6'::bpchar) I found that the indices werent being used. The database has been vacuumed and analyze has been executed. I tried disabling the seqscan, so as to force index usage. The planner uses index scans but the query stil takes a very long time to execute. Any suggestions on resolving this would would be appreciated. P.S: Im running PostgrSQL 7.3.2 - This message was sent using Endymion MailMan. http://www.endymion.com/products/mailman/ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Indices arent being used
Im running a Datawarehouse benchmark (APB-1) on PostgreSql. The objective is to choose which of the to main db (PostgreSQL, MySQL) is fastest. I've run into a small problem which I hope could be resolved here. I'm trying to speed up this query: select count(*) from actvars, prodlevel where actvars.product_level=prodlevel.code_level and prodlevel.division_level='OY3S5LAPALL6'; How about EXPLAIN ANALYZE output? ACTVARS is a fact table of aproximatly 16 million rows, PRODLEVEL has 20700 rows. Both have btree indices. The database has been vacuumed and analyze has been executed. The usual postgresql.conf adjustments have also been made? signature.asc Description: This is a digitally signed message part
Re: [PERFORM] TPC-R benchmarks
Jenny, create index i_l_partkey on lineitem (l_partkey); I do not have any foreign key defined. Does the spec require foreign keys? When you create a foreign key reference, does PG create an index automatically? No. A index is not required to enforce a foriegn key, and is sometimes not useful (for example, FK fields with only 3 possible values). So it may be that you need to create an index on that field. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] TPC-R benchmarks
Oleg Lebedev [EMAIL PROTECTED] writes: Seems like in your case postgres uses an i_l_partkey index on lineitem table. I have a foreign key constraint defined between the lineitem and part table, but didn't create an special indexes. Here is my query plan: The planner is obviously unhappy with this plan (note the large cost numbers), but it can't find a way to do better. An index on lineitem.l_partkey would help, I think. The whole query seems like it's written in a very inefficient fashion; couldn't the estimation of '0.2 * avg(l_quantity)' be amortized across multiple join rows? But I dunno whether the TPC rules allow for significant manual rewriting of the given query. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly