Re: [PERFORM] Index problem

2003-09-25 Thread Rigmor Ukuhe
  What causes this behaviour? is there any workaround? Suggestions?
 

 How many rows are there in the table, and can you post the
 'explain analyze' for both queries after doing a 'vacuum verbose analyze
 [tablename]'?

There are about 2500 rows in that table.

1st query explain analyze: Seq Scan on PRIORITY_STATISTICS
(cost=0.00..491.44 rows=127 width=12) (actual time=98.58..98.58 rows=0
loops=1)
Total runtime: 98.74 msec

2nd query explain analyze: NOTICE:  QUERY PLAN:

Index Scan using PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id on PRIORITY_STATISTICS  (cost=0.00..394.06
rows=102 width=12) (actual time=20.93..20.93 rows=0 loops=1)
Total runtime: 21.59 msec

Any help?

Rigmor



 Cheers

 Matt



 ---
 Incoming mail is certified Virus Free.
 Checked by AVG anti-virus system (http://www.grisoft.com).
 Version: 6.0.515 / Virus Database: 313 - Release Date: 01.09.2003

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.515 / Virus Database: 313 - Release Date: 01.09.2003


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Index problem

2003-09-25 Thread Matt Clark
 There are about 2500 rows in that table.

 1st query explain analyze: Seq Scan on PRIORITY_STATISTICS
 (cost=0.00..491.44 rows=127 width=12) (actual time=98.58..98.58 rows=0
 loops=1)
 Total runtime: 98.74 msec

 2nd query explain analyze: NOTICE:  QUERY PLAN:

 Index Scan using PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
[snip]
 PRIORITY_STATISTICS_reo_id on PRIORITY_STATISTICS  (cost=0.00..394.06
 rows=102 width=12) (actual time=20.93..20.93 rows=0 loops=1)
 Total runtime: 21.59 msec

With only 2500 rows the planner could be deciding that it's going to have to read 
every disk block to do an index scan anyway, so it
might as well do a sequential scan.  If the pages are in fact in the kernel cache then 
the compute time will dominate, not the IO
time, so it ends up looking like a bad plan, but it's probably not really such a bad 
plan...

Is your effective_cache_size set to something sensibly large?

You could also try decreasing cpu_index_tuple_cost and cpu_tuple_cost.  These will 
affect all your queries though, so what you gain
on one might be lost on another.

Matt



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] performance hit when joining with a view?

2003-09-25 Thread Tom Lane
Palle Girgensohn [EMAIL PROTECTED] writes:
 Vydefinition: SELECT p.userid, p.giver, p.first_name, p.last_name, p.email, 
 p.default_language, p.created, p.created_by, w.course_id FROM (person p 
 LEFT JOIN wiol w ON ((p.userid = w.userid)));

  explain  analyze select p.pim_id, p.recipient, p.sender, p.message, p.ts, 
 p.type, case when sender.userid is not null then sender.first_name || ' ' 
 || sender.last_name else null end as sender_name, sender.course_id is not 
 null as is_online from pim p left outer join person_wiol_view sender on 
 (sender.userid = p.sender) where p.recipient = 'axto6551' and p.type = 0 
 limit 1;

 explain  analyze select p.pim_id, p.recipient, p.sender, p.message, p.ts, 
 p.type, case when sender.userid is not null then sender.first_name || ' ' 
 || sender.last_name else null end as sender_name, w.course_id is not null 
 as is_online from pim p left outer join person sender on (sender.userid = 
 p.sender) left join wiol w on (w.userid=sender.userid) where p.recipient = 
 'axto6551' and p.type = 0  limit 1;

These are not actually the same query.  In the former case the implicit
parenthesization of the joins is
pim left join (person left join wiol)
whereas in the latter case the implicit parenthesization is left-to-right:
(pim left join person) left join wiol
Since the only restriction conditions you have provided are on pim, the
first parenthesization implies forming the entire join of person and
wiol :-(.

If you were using plain joins then the two queries would be logically
equivalent, but outer joins are in general not associative, so the
planner will not consider re-ordering them.

There is some work in 7.4 to make the planner smarter about outer joins,
but offhand I don't think any of it will improve results for this
particular example.

I have seen some academic papers about how to prove that a particular
pair of outer join operators can safely be swapped (as I think is true
in this example).  Some knowledge of that sort may eventually get into
the planner, but it ain't there now.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] upping checkpoints on production server

2003-09-25 Thread Robert Treat
On Wed, 2003-09-24 at 17:57, Tom Lane wrote:
 Robert Treat [EMAIL PROTECTED] writes:
  In .conf file I have default checkpoints set to 3, but I noticed that in
  my pg_xlog directory I always seem to have at least 8 log files. Since
  this is more than the suggested 7, I'm wondering if this means I ought
  to bump my checkpoint segments up to 4?
 
 Hm.  What is the typical delta in the mod times of the log files?  It
 sounds like you are in a regime where checkpoints are always triggered
 by checkpoint_segments and never by checkpoint_timeout, in which case
 increasing the former might be a good idea.  Or decrease the latter,
 but that could put a drag on performance.
 

# ls -lht /var/lib/pgsql/data/pg_xlog/
total 129M
-rw---1 postgres postgres  16M Sep 25 11:12 006E0059
-rw---1 postgres postgres  16M Sep 25 11:12 006E005A
-rw---1 postgres postgres  16M Sep 25 11:08 006E0058
-rw---1 postgres postgres  16M Sep 25 11:05 006E005F
-rw---1 postgres postgres  16M Sep 25 11:02 006E005E
-rw---1 postgres postgres  16M Sep 25 10:59 006E005D
-rw---1 postgres postgres  16M Sep 25 10:55 006E005B
-rw---1 postgres postgres  16M Sep 25 10:51 006E005C

#ls -lht /var/lib/pgsql/data/pg_xlog/
total 129M
-rw---1 postgres postgres  16M Sep 25 10:52 006E0054
-rw---1 postgres postgres  16M Sep 25 10:51 006E0053
-rw---1 postgres postgres  16M Sep 25 10:49 006E0052
-rw---1 postgres postgres  16M Sep 25 10:45 006E0059
-rw---1 postgres postgres  16M Sep 25 10:40 006E0057
-rw---1 postgres postgres  16M Sep 25 10:37 006E0058
-rw---1 postgres postgres  16M Sep 25 10:33 006E0056
-rw---1 postgres postgres  16M Sep 25 10:29 006E0055






from the 7.4 docs:

 Checkpoints are fairly expensive because they force all dirty kernel
buffers to disk using the operating system sync() call. Busy servers may
fill checkpoint segment files too quickly, causing excessive
checkpointing. 

it goes on to mention checkpoint_warning, which I don't have in 7.3, but
I think this is a case where I'd likely see those warnings. The server
in question has a fairly high write/read ratio and is fairly busy (over
100 tps iirc). 

since more often than not I don't make it to 5 minutes, seems like
upping checkpoint segments is the way to go, right?

Robert Treat
-- 
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] TPC-R benchmarks

2003-09-25 Thread Oleg Lebedev
Title: Message



I am running TPC-R 
benchmarks with a scale factor of 1, which correspond to approximately 1 GB 
database size on PostgreSQL 7.3.4 installed on CygWin on Windows XP. I dedicated 
128 MB of shared memory to my postrges installation.
Most of the queries 
were able to complete in a matter of minutes, but query 17 was taking hours and 
hours. The query is show below. Is there any way to optimize it 
?

selectsum(l_extendedprice) / 7.0 as 
avg_yearlyfromlineitem,partwherep_partkey 
= l_partkeyand p_brand = 'Brand#11'and p_container = 'SM 
PKG'and l_quantity  
(select0.2 * 
avg(l_quantity)fromlineitemwherel_partkey 
= p_partkey);

Thanks.

Oleg

*

This e-mail may contain privileged or confidential material intended for the named recipient only.
If you are not the named recipient, delete this message and all attachments.
Unauthorized reviewing, copying, printing, disclosing, or otherwise using information in this e-mail is prohibited.
We reserve the right to monitor e-mail sent through our network. 

*





Re: [PERFORM] TPC-R benchmarks

2003-09-25 Thread Jenny Zhang
I am running TPC-H with scale factor of 1 on RedHat7.2 with the kernel
2.5.74.  Q17 can always finish in about 7 seconds on my system.  The
execution plan is:

 Aggregate  (cost=780402.43..780402.43 rows=1 width=48)
   -  Nested Loop  (cost=0.00..780397.50 rows=1973 width=48)
 Join Filter: (inner.l_quantity  (subplan))
 -  Seq Scan on part  (cost=0.00..8548.00 rows=197 width=12)
   Filter: ((p_brand = 'Brand#31'::bpchar) AND (p_container
= 'LG CASE'::bpchar))
 -  Index Scan using i_l_partkey on lineitem 
(cost=0.00..124.32 rows=30 width=36)
   Index Cond: (outer.p_partkey = lineitem.l_partkey)
 SubPlan
   -  Aggregate  (cost=124.40..124.40 rows=1 width=11)
 -  Index Scan using i_l_partkey on lineitem 
(cost=0.00..124.32 rows=30 width=11)
   Index Cond: (l_partkey = $0)
(11 rows)

Hope this helps,
Jenny
On Thu, 2003-09-25 at 12:40, Oleg Lebedev wrote:
 I am running TPC-R benchmarks with a scale factor of 1, which correspond
 to approximately 1 GB database size on PostgreSQL 7.3.4 installed on
 CygWin on Windows XP. I dedicated 128 MB of shared memory to my postrges
 installation.
 Most of the queries were able to complete in a matter of minutes, but
 query 17 was taking hours and hours. The query is show below. Is there
 any way to optimize it ?
  
 select
  sum(l_extendedprice) / 7.0 as avg_yearly
 from
  lineitem,
  part
 where
  p_partkey = l_partkey
  and p_brand = 'Brand#11'
  and p_container = 'SM PKG'
  and l_quantity  (
   select
0.2 * avg(l_quantity)
   from
lineitem
   where
l_partkey = p_partkey
  );
  
 Thanks.
  
 Oleg
 
 *
 
 This e-mail may contain privileged or confidential material intended for the named 
 recipient only.
 If you are not the named recipient, delete this message and all attachments.
 Unauthorized reviewing, copying, printing, disclosing, or otherwise using 
 information in this e-mail is prohibited.
 We reserve the right to monitor e-mail sent through our network. 
 
 *


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] TPC-R benchmarks

2003-09-25 Thread Jenny Zhang
The index is created by:
create index i_l_partkey on lineitem (l_partkey);
I do not have any foreign key defined.  Does the spec require foreign
keys?

When you create a foreign key reference, does PG create an index
automatically?

Can you try with the index?

Jenny
On Thu, 2003-09-25 at 14:39, Oleg Lebedev wrote:
 Seems like in your case postgres uses an i_l_partkey index on lineitem
 table. I have a foreign key constraint defined between the lineitem and
 part table, but didn't create an special indexes. Here is my query plan:
 
-  Aggregate  (cost=1517604222.32..1517604222.32 rows=1 width=31)
  -  Hash Join  (cost=8518.49..1517604217.39 rows=1969 width=31)
Hash Cond: (outer.l_partkey = inner.p_partkey)
Join Filter: (outer.l_quantity  (subplan))
-  Seq Scan on lineitem  (cost=0.00..241889.15
 rows=6001215 widt
 h=27)
-  Hash  (cost=8518.00..8518.00 rows=197 width=4)
  -  Seq Scan on part  (cost=0.00..8518.00 rows=197
 width=4)
 
Filter: ((p_brand = 'Brand#11'::bpchar) AND
 (p_contai
 ner = 'SM PKG'::bpchar))
SubPlan
  -  Aggregate  (cost=256892.28..256892.28 rows=1
 width=11)
-  Seq Scan on lineitem  (cost=0.00..256892.19
 rows=37 w
 idth=11)
  Filter: (l_partkey = $0)
 
 -Original Message-
 From: Jenny Zhang [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, September 25, 2003 3:33 PM
 To: Oleg Lebedev
 Cc: [EMAIL PROTECTED];
 [EMAIL PROTECTED]
 Subject: Re: [PERFORM] TPC-R benchmarks
 
 
 I am running TPC-H with scale factor of 1 on RedHat7.2 with the kernel
 2.5.74.  Q17 can always finish in about 7 seconds on my system.  The
 execution plan is:
 
 
  Aggregate  (cost=780402.43..780402.43 rows=1 width=48)
-  Nested Loop  (cost=0.00..780397.50 rows=1973 width=48)
  Join Filter: (inner.l_quantity  (subplan))
  -  Seq Scan on part  (cost=0.00..8548.00 rows=197 width=12)
Filter: ((p_brand = 'Brand#31'::bpchar) AND (p_container
 = 'LG CASE'::bpchar))
  -  Index Scan using i_l_partkey on lineitem 
 (cost=0.00..124.32 rows=30 width=36)
Index Cond: (outer.p_partkey = lineitem.l_partkey)
  SubPlan
-  Aggregate  (cost=124.40..124.40 rows=1 width=11)
  -  Index Scan using i_l_partkey on lineitem 
 (cost=0.00..124.32 rows=30 width=11)
Index Cond: (l_partkey = $0)
 (11 rows)
 
 Hope this helps,
 Jenny
 On Thu, 2003-09-25 at 12:40, Oleg Lebedev wrote:
  I am running TPC-R benchmarks with a scale factor of 1, which 
  correspond to approximately 1 GB database size on PostgreSQL 7.3.4 
  installed on CygWin on Windows XP. I dedicated 128 MB of shared memory
 
  to my postrges installation. Most of the queries were able to complete
 
  in a matter of minutes, but query 17 was taking hours and hours. The 
  query is show below. Is there any way to optimize it ?
   
  select
   sum(l_extendedprice) / 7.0 as avg_yearly
  from
   lineitem,
   part
  where
   p_partkey = l_partkey
   and p_brand = 'Brand#11'
   and p_container = 'SM PKG'
   and l_quantity  (
select
 0.2 * avg(l_quantity)
from
 lineitem
where
 l_partkey = p_partkey
   );
   
  Thanks.
   
  Oleg
  
  *
  
  This e-mail may contain privileged or confidential material intended 
  for the named recipient only. If you are not the named recipient, 
  delete this message and all attachments. Unauthorized reviewing, 
  copying, printing, disclosing, or otherwise using information in this 
  e-mail is prohibited. We reserve the right to monitor e-mail sent 
  through our network.
  
  *
 
 *
 
 This e-mail may contain privileged or confidential material intended for the named 
 recipient only.
 If you are not the named recipient, delete this message and all attachments.
 Unauthorized reviewing, copying, printing, disclosing, or otherwise using 
 information in this e-mail is prohibited.
 We reserve the right to monitor e-mail sent through our network. 
 
 *
 
 
 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Indices arent being used

2003-09-25 Thread rantunes
Hi guys 

Im running a Datawarehouse benchmark (APB-1) on PostgreSql. The objective is to
choose which of the to main db (PostgreSQL, MySQL) is fastest. I've run into a
small problem which I hope could be resolved here.

I'm trying to speed up this query:

select count(*) from actvars, prodlevel where
actvars.product_level=prodlevel.code_level and
prodlevel.division_level='OY3S5LAPALL6';

ACTVARS is a fact table of aproximatly 16 million rows, PRODLEVEL has 20700
rows. Both have btree indices. 

I executed the query and it took me almost half an hour to execute! Running the
same query on MySQL the result came 6 seconds after. As you can see there is a
large differences between execution times.

After running an explain:

Aggregate  (cost=3123459.62..3123459.62 rows=1 width=32)
   -  Merge Join  (cost=3021564.79..3119827.17 rows=1452981 width=32)
 Merge Cond: (outer.product_level = inner.code_level)
 -  Sort  (cost=3020875.00..3060938.81 rows=16025523 width=16)
   Sort Key: actvars.product_level
   -  Seq Scan on actvars  (cost=0.00..365711.23 rows=16025523
width=16)
 -  Sort  (cost=689.79..694.48 rows=1877 width=16)
   Sort Key: prodlevel.code_level
   -  Seq Scan on prodlevel  (cost=0.00..587.75 rows=1877 width=16)
 Filter: (division_level = 'OY3S5LAPALL6'::bpchar)

I found that the indices werent being used. 

The database has been vacuumed and analyze has been executed.

I tried disabling the seqscan, so as to force index usage. The planner uses
index scans but the query stil takes a very long time to execute.

Any suggestions on resolving this would would be appreciated.

P.S: Im running PostgrSQL
7.3.2

-
This message was sent using Endymion MailMan.
http://www.endymion.com/products/mailman/



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Indices arent being used

2003-09-25 Thread Rod Taylor
 Im running a Datawarehouse benchmark (APB-1) on PostgreSql. The objective is to
 choose which of the to main db (PostgreSQL, MySQL) is fastest. I've run into a
 small problem which I hope could be resolved here.
 
 I'm trying to speed up this query:
 
 select count(*) from actvars, prodlevel where
 actvars.product_level=prodlevel.code_level and
 prodlevel.division_level='OY3S5LAPALL6';

How about EXPLAIN ANALYZE output?

 ACTVARS is a fact table of aproximatly 16 million rows, PRODLEVEL has 20700
 rows. Both have btree indices. 

 The database has been vacuumed and analyze has been executed.

The usual postgresql.conf adjustments have also been made?



signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] TPC-R benchmarks

2003-09-25 Thread Josh Berkus
Jenny,

 create index i_l_partkey on lineitem (l_partkey);
 I do not have any foreign key defined.  Does the spec require foreign
 keys?

 When you create a foreign key reference, does PG create an index
 automatically?

No.   A index is not required to enforce a foriegn key, and is sometimes not 
useful (for example, FK fields with only 3 possible values).

So it may be that you need to create an index on that field.


-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] TPC-R benchmarks

2003-09-25 Thread Tom Lane
Oleg Lebedev [EMAIL PROTECTED] writes:
 Seems like in your case postgres uses an i_l_partkey index on lineitem
 table. I have a foreign key constraint defined between the lineitem and
 part table, but didn't create an special indexes. Here is my query plan:

The planner is obviously unhappy with this plan (note the large cost
numbers), but it can't find a way to do better.  An index on
lineitem.l_partkey would help, I think.

The whole query seems like it's written in a very inefficient fashion;
couldn't the estimation of '0.2 * avg(l_quantity)' be amortized across
multiple join rows?  But I dunno whether the TPC rules allow for
significant manual rewriting of the given query.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly