Re: [PERFORM] planner/optimizer question

2004-04-28 Thread Gary Doades
I know you will shoot me down, but...

Why is there an entry in the index for a row if the row is not valid? 
Wouldn't it be better for the index entry validity to track the row validity. 
If a particular data value for a query (join, where etc.) can be satisfied 
by the index entry itself this would be a big performance gain.

Cheers,
Gary.

On 28 Apr 2004 at 0:27, Tom Lane wrote:

 [EMAIL PROTECTED] writes:
  ... Wouldn't the most efficient plan be to scan the index regardless
  of crm_id because the only columns needed are in the index?
 
 No.  People coming from other databases often have the misconception
 that queries can be answered by looking only at an index.  That is never
 true in Postgres because row validity info is only stored in the table;
 so we must always visit the table entry to make sure the row is still
 valid/visible for the current query.
 
 Accordingly, columns added to the index that aren't constrained by the
 WHERE clause are not very useful ...
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 8: explain analyze is your friend



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] planner/optimizer question

2004-04-28 Thread Christopher Kings-Lynne
Why is there an entry in the index for a row if the row is not valid? 
Wouldn't it be better for the index entry validity to track the row validity. 
If a particular data value for a query (join, where etc.) can be satisfied 
by the index entry itself this would be a big performance gain.
For SELECTs, yes - but for INSERT, UPDATE and DELETE it would be a big 
performance loss.

Chris
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] planner/optimizer question

2004-04-28 Thread Gary Doades
I can understand the performance loss on non-selects for keeping the 
index validity state tracking the row validity, but would that outweigh the 
performance gains on selects? Depends on your mix of selects to non 
selects I guess, but other database systems seem to imply that keeping 
the index on track is worth it overall.

Cheers,
Gary.

On 28 Apr 2004 at 15:04, Christopher Kings-Lynne wrote:

  Why is there an entry in the index for a row if the row is not valid? 
  Wouldn't it be better for the index entry validity to track the row validity. 
  If a particular data value for a query (join, where etc.) can be satisfied 
  by the index entry itself this would be a big performance gain.
 
 For SELECTs, yes - but for INSERT, UPDATE and DELETE it would be a big 
 performance loss.
 
 Chris
 



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Join problem

2004-04-28 Thread Suller Andrs
these two queries are not equal. Query1 returns 6 rows, query2 returns 0 
rows, because '~*' and '=' operators are not same. BTW when you use '=', 
it could use index on item.description.
On query1, Seq Scan on item estimates 1 row, on query2 it estimates 
733 rows. IMHO that's why query1 uses nested loop, query2 uses hash join.

bye,
Suller Andras
Silke Trissl rta:
Hi,
Query 1:
  QUERY PLAN
-- 

 Nested Loop  (cost=0.00..28836.75 rows=1 width=4) (actual 
time=65350.780..452130.702 rows=6 loops=1)
   Join Filter: (inner.order_id = outer.order_id)
   -  Seq Scan on item  (cost=0.00..28814.24 rows=1 width=8) (actual 
time=33.180..1365.190 rows=716 loops=1)
 Filter: (description ~* 'CD'::text)
   -  Seq Scan on orders  (cost=0.00..22.50 rows=1 width=4) (actual 
time=21.644..629.500 rows=18 loops=716)
 Filter: ((order_name)::text ~* 'Smith'::text)
 Total runtime: 452130.782 ms
### 

Query 2:
  QUERY PLAN
-- 

  Hash Join  (cost=22.50..28840.44 rows=4 width=4) (actual 
time=1187.798..1187.798 rows=0 loops=1)
   Hash Cond: (outer.order_id = inner.order_id)
   -  Seq Scan on item  (cost=0.00..28814.24 rows=733 width=8) 
(actual time=542.737..542.737 rows=0 loops=1)
 Filter: (description = 'CD'::text)
   -  Hash  (cost=22.50..22.50 rows=1 width=4) (actual 
time=645.042..645.042 rows=0 loops=1)
 -  Seq Scan on orders  (cost=0.00..22.50 rows=1 width=4) 
(actual time=22.373..644.996 rows=18 loops=1)
   Filter: ((order_name)::text ~* 'Smith'::text)
 Total runtime: 1187.865 ms
 


Query 1 with  'set enable_nestloop to false'
   QUERY PLAN
--- 

 Hash Join  (cost=22.50..28836.75 rows=1 width=4) (actual 
time=1068.593..2003.330 rows=6 loops=1)
   Hash Cond: (outer.item_id = inner.item_id)
   -  Seq Scan on item  (cost=0.00..28814.24 rows=1 width=8) (actual 
time=33.347..1357.073 rows=716 loops=1)
 Filter: (description ~* 'CD'::text)
   -  Hash  (cost=22.50..22.50 rows=1 width=4) (actual 
time=645.287..645.287 rows=0 loops=1)
 -  Seq Scan on orders  (cost=0.00..22.50 rows=1 width=4) 
(actual time=22.212..645.239 rows=18 loops=1)
   Filter: ((order_name)::text ~* 'CD'::text)
 Total runtime: 2003.409 ms


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Simply join in PostrgeSQL takes too long

2004-04-28 Thread Manfred Koizar
On Tue, 27 Apr 2004 18:01:34 -0400, Rod Taylor [EMAIL PROTECTED] wrote:
On Tue, 2004-04-27 at 17:27, Vitaly Belman wrote:
 Hello pgsql-performance,
 
   I discussed the whole subject for some time in DevShed and didn't
   achieve much (as for results). I wonder if any of you guys can help
   out:
 
   http://forums.devshed.com/t136202/s.html

The point is that a book cannot be of a certain genre more than once.

Rod, he has a hierarchy of genres.  Genre 1 has 6379 child genres and a
book can be in more than one of these.

Vitaly, though LIMIT makes this look like a small query, DISTINCT
requires the whole result set to be retrieved.  0.7 seconds doesn't look
so bad for several thousand rows.  Did you try with other genre_ids?

Maybe a merge join is not the best choice.  Set enable_mergejoin to
false and see whether you get a (hopefully faster) hash join, assuming
that sort_mem is large enough to keep the hash table in memory.

If you send me your table contents I'll try it on Linux.

Servus
 Manfred

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Join problem

2004-04-28 Thread Tom Lane
Silke Trissl [EMAIL PROTECTED] writes:
 I found out, that the change of the operator from '~*' to '=' for the 
 item.description brought a great boost in performance (425 secs to 1 
 sec!), but not in cost (Query plans at the end).

The main problem seems to be bad estimation of the number of rows
extracted from the item table.  Have you ANALYZEd that table lately?

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] planner/optimizer question

2004-04-28 Thread Tom Lane
Manfred Koizar [EMAIL PROTECTED] writes:
 On Wed, 28 Apr 2004 07:35:41 +0100, Gary Doades [EMAIL PROTECTED]
 wrote:
 Why is there an entry in the index for a row if the row is not valid? 

 Because whether a row is seen as valid or not lies in the eye of the
 transaction looking at it.  Full visibility information is stored in the
 heap tuple header.  The developers' consensus is that this overhead
 should not be in every index tuple.

Storing that information would at least double the overhead space used
for each index tuple.  The resulting index bloat would significantly
slow index operations by requiring more I/O.  So it's far from clear
that this would be a win, even for those who care only about select
speed.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] index usage

2004-04-28 Thread scott.marlowe
On Mon, 26 Apr 2004, Stephan Szabo wrote:

 
 On Fri, 23 Apr 2004 [EMAIL PROTECTED] wrote:
 
  I have a query which I think should be using an index all of the time but
  postgres only uses the index part of the time.  The index
  (ticket_crm_map_crm_id_suppid) has the where clause column (crm_id) listed
  first followed by the selected column (support_person_id).  Wouldn't the
  most efficient plan be to scan the index each time because the only columns
  needed are in the index?  Below is the table, 2 queries showing the
 
 Not necessarily.  The rows in the actual file still need to be checked to
 see if they're visible to the select and if it's expected that the entire
 file (or a reasonable % of the pages anyway) will need to be loaded using
 the index isn't necessarily a win.

While those of us familiar with PostgreSQL are well aware of the fact that 
indexes can't be used directly to garner information, but only as a lookup 
to a tuple in the table, it seems this misconception is quite common among 
those coming to postgreSQL from other databases.

Is there any information that directly reflects this issue in the docs?  
There are tons of hints that it works this way in how they're written, but 
nothing that just comes out and says that with pgsql's mvcc 
implementation, an index scan still has to hit the pages that contain the 
tuples, so often in pgsql a seq scan is a win where in other databases and 
index scan would have been a win?

If not, where would I add it if I were going to write something up for the 
docs?  Just wondering...


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] index usage

2004-04-28 Thread Tom Lane
scott.marlowe [EMAIL PROTECTED] writes:
 There are tons of hints that it works this way in how they're written, but 
 nothing that just comes out and says that with pgsql's mvcc 
 implementation, an index scan still has to hit the pages that contain the 
 tuples, so often in pgsql a seq scan is a win where in other databases and 
 index scan would have been a win?

 If not, where would I add it if I were going to write something up for the 
 docs?  Just wondering...

AFAIR the only place in the docs that mentions seqscan or indexscan at
all is the discussion of EXPLAIN in Performance Tips.  Perhaps a
suitably-enlarged version of that section could cover this.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] [ADMIN] 7.4.2 out of memory

2004-04-28 Thread scott.marlowe
On Wed, 28 Apr 2004, Jie Liang wrote:

 All,
 After I upgraded postgres from 7.3.4 to 7.4.2, one of my program got following error:
 DRROR:out of memory
 DETAIL:   Fail on request of size 92.
 
 any idea??
 does memory management have big difference between 7.3.4 and 7.4.2???
 this program using a chunk of share memory and a lot of temp tables.

More than likely this is a hash aggregate problem (or can they spill to 
disk in 7.4.2 yet?  I don't think they can, but maybe we should ask Tom.

Try setting this before running the query and see what happens:

set enable_hashagg = false;




---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-28 Thread Robert Creager
When grilled further on (Wed, 21 Apr 2004 10:29:43 -0700),
Josh Berkus [EMAIL PROTECTED] confessed:

 Dave,
 
  After some testing if you use the current head code for s_lock.c which
  has some mods in it to alleviate this situation, and change
  SPINS_PER_DELAY to 10 you can drastically reduce the cs with tom's test.
  I am seeing a slight degradation in throughput using pgbench -c 10 -t
  1000 but it might be liveable, considering the alternative is unbearable
  in some situations.
 
  Can anyone else replicate my results?
 
 Can you produce a patch against 7.4.1?   I'd like to test your fix against a 
 real-world database.

I would like to see the same, as I have a system that exhibits the same behavior
on a production db that's running 7.4.1.

Cheers,
Rob


-- 
 18:55:22 up  1:40,  4 users,  load average: 2.00, 2.04, 2.00
Linux 2.6.5-01 #7 SMP Fri Apr 16 22:45:31 MDT 2004


pgpEBmUUlf2Tx.pgp
Description: PGP signature


Re: [JDBC] [PERFORM] is a good practice to create an index on the

2004-04-28 Thread Bruno Wolff III
On Wed, Apr 28, 2004 at 10:13:14 +0200,
  Edoardo Ceccarelli [EMAIL PROTECTED] wrote:
 do you mean that, declaring an index serial, I'd never have to deal with 
 incrementing its primary key? good to know!

That isn't what is happening. Serial is a special type. It is int plus
a default rule linked to a sequence. No index is created by default
for the serial type. Declaring a column as a primary key will however
create a unique index on that column.

Also note that you should only assume that the serial values are unique.
(This assumes that you don't use setval and that you don't roll a sequence
over.) Within a single session you can assume the sequence values will
be monotonicly increasing. The values that end up in your table can have
gaps. Typically this happens when a transaction rolls back after obtaining
a new value from a sequence. It can also happen if you grab sequence
values in larger blocks (which might be more efficient if a session normally
acquires mulitple values from a particular sequence) than the default 1.

 anyway  in this particular situation I don't need such accurate 
 behaviour: this table is filled up with a lot of data twice per week and 
 it's used only to answer queries.
 I could drop it whenever I want :)

You really don't want to use oids.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org