Re: [PERFORM] planner/optimizer question
I know you will shoot me down, but... Why is there an entry in the index for a row if the row is not valid? Wouldn't it be better for the index entry validity to track the row validity. If a particular data value for a query (join, where etc.) can be satisfied by the index entry itself this would be a big performance gain. Cheers, Gary. On 28 Apr 2004 at 0:27, Tom Lane wrote: [EMAIL PROTECTED] writes: ... Wouldn't the most efficient plan be to scan the index regardless of crm_id because the only columns needed are in the index? No. People coming from other databases often have the misconception that queries can be answered by looking only at an index. That is never true in Postgres because row validity info is only stored in the table; so we must always visit the table entry to make sure the row is still valid/visible for the current query. Accordingly, columns added to the index that aren't constrained by the WHERE clause are not very useful ... regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] planner/optimizer question
Why is there an entry in the index for a row if the row is not valid? Wouldn't it be better for the index entry validity to track the row validity. If a particular data value for a query (join, where etc.) can be satisfied by the index entry itself this would be a big performance gain. For SELECTs, yes - but for INSERT, UPDATE and DELETE it would be a big performance loss. Chris ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] planner/optimizer question
I can understand the performance loss on non-selects for keeping the index validity state tracking the row validity, but would that outweigh the performance gains on selects? Depends on your mix of selects to non selects I guess, but other database systems seem to imply that keeping the index on track is worth it overall. Cheers, Gary. On 28 Apr 2004 at 15:04, Christopher Kings-Lynne wrote: Why is there an entry in the index for a row if the row is not valid? Wouldn't it be better for the index entry validity to track the row validity. If a particular data value for a query (join, where etc.) can be satisfied by the index entry itself this would be a big performance gain. For SELECTs, yes - but for INSERT, UPDATE and DELETE it would be a big performance loss. Chris ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Join problem
these two queries are not equal. Query1 returns 6 rows, query2 returns 0 rows, because '~*' and '=' operators are not same. BTW when you use '=', it could use index on item.description. On query1, Seq Scan on item estimates 1 row, on query2 it estimates 733 rows. IMHO that's why query1 uses nested loop, query2 uses hash join. bye, Suller Andras Silke Trissl rta: Hi, Query 1: QUERY PLAN -- Nested Loop (cost=0.00..28836.75 rows=1 width=4) (actual time=65350.780..452130.702 rows=6 loops=1) Join Filter: (inner.order_id = outer.order_id) - Seq Scan on item (cost=0.00..28814.24 rows=1 width=8) (actual time=33.180..1365.190 rows=716 loops=1) Filter: (description ~* 'CD'::text) - Seq Scan on orders (cost=0.00..22.50 rows=1 width=4) (actual time=21.644..629.500 rows=18 loops=716) Filter: ((order_name)::text ~* 'Smith'::text) Total runtime: 452130.782 ms ### Query 2: QUERY PLAN -- Hash Join (cost=22.50..28840.44 rows=4 width=4) (actual time=1187.798..1187.798 rows=0 loops=1) Hash Cond: (outer.order_id = inner.order_id) - Seq Scan on item (cost=0.00..28814.24 rows=733 width=8) (actual time=542.737..542.737 rows=0 loops=1) Filter: (description = 'CD'::text) - Hash (cost=22.50..22.50 rows=1 width=4) (actual time=645.042..645.042 rows=0 loops=1) - Seq Scan on orders (cost=0.00..22.50 rows=1 width=4) (actual time=22.373..644.996 rows=18 loops=1) Filter: ((order_name)::text ~* 'Smith'::text) Total runtime: 1187.865 ms Query 1 with 'set enable_nestloop to false' QUERY PLAN --- Hash Join (cost=22.50..28836.75 rows=1 width=4) (actual time=1068.593..2003.330 rows=6 loops=1) Hash Cond: (outer.item_id = inner.item_id) - Seq Scan on item (cost=0.00..28814.24 rows=1 width=8) (actual time=33.347..1357.073 rows=716 loops=1) Filter: (description ~* 'CD'::text) - Hash (cost=22.50..22.50 rows=1 width=4) (actual time=645.287..645.287 rows=0 loops=1) - Seq Scan on orders (cost=0.00..22.50 rows=1 width=4) (actual time=22.212..645.239 rows=18 loops=1) Filter: ((order_name)::text ~* 'CD'::text) Total runtime: 2003.409 ms ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Simply join in PostrgeSQL takes too long
On Tue, 27 Apr 2004 18:01:34 -0400, Rod Taylor [EMAIL PROTECTED] wrote: On Tue, 2004-04-27 at 17:27, Vitaly Belman wrote: Hello pgsql-performance, I discussed the whole subject for some time in DevShed and didn't achieve much (as for results). I wonder if any of you guys can help out: http://forums.devshed.com/t136202/s.html The point is that a book cannot be of a certain genre more than once. Rod, he has a hierarchy of genres. Genre 1 has 6379 child genres and a book can be in more than one of these. Vitaly, though LIMIT makes this look like a small query, DISTINCT requires the whole result set to be retrieved. 0.7 seconds doesn't look so bad for several thousand rows. Did you try with other genre_ids? Maybe a merge join is not the best choice. Set enable_mergejoin to false and see whether you get a (hopefully faster) hash join, assuming that sort_mem is large enough to keep the hash table in memory. If you send me your table contents I'll try it on Linux. Servus Manfred ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Join problem
Silke Trissl [EMAIL PROTECTED] writes: I found out, that the change of the operator from '~*' to '=' for the item.description brought a great boost in performance (425 secs to 1 sec!), but not in cost (Query plans at the end). The main problem seems to be bad estimation of the number of rows extracted from the item table. Have you ANALYZEd that table lately? regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] planner/optimizer question
Manfred Koizar [EMAIL PROTECTED] writes: On Wed, 28 Apr 2004 07:35:41 +0100, Gary Doades [EMAIL PROTECTED] wrote: Why is there an entry in the index for a row if the row is not valid? Because whether a row is seen as valid or not lies in the eye of the transaction looking at it. Full visibility information is stored in the heap tuple header. The developers' consensus is that this overhead should not be in every index tuple. Storing that information would at least double the overhead space used for each index tuple. The resulting index bloat would significantly slow index operations by requiring more I/O. So it's far from clear that this would be a win, even for those who care only about select speed. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] index usage
On Mon, 26 Apr 2004, Stephan Szabo wrote: On Fri, 23 Apr 2004 [EMAIL PROTECTED] wrote: I have a query which I think should be using an index all of the time but postgres only uses the index part of the time. The index (ticket_crm_map_crm_id_suppid) has the where clause column (crm_id) listed first followed by the selected column (support_person_id). Wouldn't the most efficient plan be to scan the index each time because the only columns needed are in the index? Below is the table, 2 queries showing the Not necessarily. The rows in the actual file still need to be checked to see if they're visible to the select and if it's expected that the entire file (or a reasonable % of the pages anyway) will need to be loaded using the index isn't necessarily a win. While those of us familiar with PostgreSQL are well aware of the fact that indexes can't be used directly to garner information, but only as a lookup to a tuple in the table, it seems this misconception is quite common among those coming to postgreSQL from other databases. Is there any information that directly reflects this issue in the docs? There are tons of hints that it works this way in how they're written, but nothing that just comes out and says that with pgsql's mvcc implementation, an index scan still has to hit the pages that contain the tuples, so often in pgsql a seq scan is a win where in other databases and index scan would have been a win? If not, where would I add it if I were going to write something up for the docs? Just wondering... ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] index usage
scott.marlowe [EMAIL PROTECTED] writes: There are tons of hints that it works this way in how they're written, but nothing that just comes out and says that with pgsql's mvcc implementation, an index scan still has to hit the pages that contain the tuples, so often in pgsql a seq scan is a win where in other databases and index scan would have been a win? If not, where would I add it if I were going to write something up for the docs? Just wondering... AFAIR the only place in the docs that mentions seqscan or indexscan at all is the discussion of EXPLAIN in Performance Tips. Perhaps a suitably-enlarged version of that section could cover this. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] [ADMIN] 7.4.2 out of memory
On Wed, 28 Apr 2004, Jie Liang wrote: All, After I upgraded postgres from 7.3.4 to 7.4.2, one of my program got following error: DRROR:out of memory DETAIL: Fail on request of size 92. any idea?? does memory management have big difference between 7.3.4 and 7.4.2??? this program using a chunk of share memory and a lot of temp tables. More than likely this is a hash aggregate problem (or can they spill to disk in 7.4.2 yet? I don't think they can, but maybe we should ask Tom. Try setting this before running the query and see what happens: set enable_hashagg = false; ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Wierd context-switching issue on Xeon
When grilled further on (Wed, 21 Apr 2004 10:29:43 -0700), Josh Berkus [EMAIL PROTECTED] confessed: Dave, After some testing if you use the current head code for s_lock.c which has some mods in it to alleviate this situation, and change SPINS_PER_DELAY to 10 you can drastically reduce the cs with tom's test. I am seeing a slight degradation in throughput using pgbench -c 10 -t 1000 but it might be liveable, considering the alternative is unbearable in some situations. Can anyone else replicate my results? Can you produce a patch against 7.4.1? I'd like to test your fix against a real-world database. I would like to see the same, as I have a system that exhibits the same behavior on a production db that's running 7.4.1. Cheers, Rob -- 18:55:22 up 1:40, 4 users, load average: 2.00, 2.04, 2.00 Linux 2.6.5-01 #7 SMP Fri Apr 16 22:45:31 MDT 2004 pgpEBmUUlf2Tx.pgp Description: PGP signature
Re: [JDBC] [PERFORM] is a good practice to create an index on the
On Wed, Apr 28, 2004 at 10:13:14 +0200, Edoardo Ceccarelli [EMAIL PROTECTED] wrote: do you mean that, declaring an index serial, I'd never have to deal with incrementing its primary key? good to know! That isn't what is happening. Serial is a special type. It is int plus a default rule linked to a sequence. No index is created by default for the serial type. Declaring a column as a primary key will however create a unique index on that column. Also note that you should only assume that the serial values are unique. (This assumes that you don't use setval and that you don't roll a sequence over.) Within a single session you can assume the sequence values will be monotonicly increasing. The values that end up in your table can have gaps. Typically this happens when a transaction rolls back after obtaining a new value from a sequence. It can also happen if you grab sequence values in larger blocks (which might be more efficient if a session normally acquires mulitple values from a particular sequence) than the default 1. anyway in this particular situation I don't need such accurate behaviour: this table is filled up with a lot of data twice per week and it's used only to answer queries. I could drop it whenever I want :) You really don't want to use oids. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org