Re: [PERFORM] cost-based vacuum
On Fri, 2005-07-08 at 12:25 -0400, Ian Westmacott wrote: I am beginning to look at Postgres 8, and am particularly interested in cost-based vacuum/analyze. I'm hoping someone can shed some light on the behavior I am seeing. Suppose there are three threads: writer_thread every 1/15 second do BEGIN TRANSACTION COPY table1 FROM stdin ... COPY tableN FROM stdin perform several UPDATEs, DELETEs and INSERTs COMMIT reader_thread every 1/15 second do BEGIN TRANSACTION SELECT FROM table1 ... ... SELECT FROM tableN ... COMMIT analyze_thread every 5 minutes do ANALYZE table1 ... ANALYZE tableN Now, Postgres 8.0.3 out-of-the-box (all default configs) on a particular piece of hardware runs the Postgres connection for writer_thread at about 15% CPU (meaningless, I know, but for comparison) and runs the Postgres connection for reader_thread at about 30% CPU. Latency for reader_thread seeing updates from writer_thread is well under 1/15s. Impact of analyze_thread is negligible. If I make the single configuration change of setting vacuum_cost_delay=1000, each iteration in analyze_thread takes much longer, of course. But what I also see is that the CPU usage of the connections for writer_thread and reader_thread spike up to well over 80% each (this is a dualie) and latency drops to 8-10s, during the ANALYZEs. I don't understand why this would be. I don't think there are any lock issues, and I don't see any obvious I/O issues. Am I missing something? Is there any way to get some insight into what those connections are doing? The ANALYZE commands hold read locks on the tables you wish to write to. If you slow them down, you merely slow down your write transactions also, and then the read transactions that wait behind them. Every time the ANALYZE sleeps it wakes up the other transactions, which then realise they can't move because of locks and then wake up the ANALYZEs for another shot. The end result is that you introduce more context- switching, without any chance of doing more useful work while the ANALYZEs sleep. Don't use the vacuum_cost_delay in this situation. You might try setting it to 0 for the analyze_thread only. Sounds like you could speed things up by splitting everything into two sets of tables, with writer_thread1 and writer_thread2 etc. That way your 2 CPUs would be able to independently be able to get through more work without locking each other out. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Question
In the past week, one guy of Unix Group in Colombia say: Postgrest in production is bat, if the power off in any time the datas is lost why this datas is in plain files. Postgrest no ssupport data bases with more 1 millon of records. Wath tell me in this respect?, is more best Informix as say Ing. Alejandro Lemus G. Radio Taxi Aeropuerto S.A. Avenida de las Américas # 51 - 39 Bogotá - Colombia Tel: 571-4470694 / 571-4202600 Ext. 260 Fax: 571-2624070 email: [EMAIL PROTECTED] __ Correo Yahoo! Espacio para todos tus mensajes, antivirus y antispam ¡gratis! Regístrate ya - http://correo.espanol.yahoo.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Question
Perhaps choose a better subject than question next time? Alejandro Lemus wrote: In the past week, one guy of Unix Group in Colombia say: Postgrest in production is bat, if the power off in any time the datas is lost Wrong. And it's called PostgreSQL. why this datas is in plain files. Postgrest no ssupport data bases with more 1 millon of records. Wrong. Wath tell me in this respect?, is more best Informix as say Your contact in the Unix Group in Columbia obviously talks on subjects where he knows little. Perhaps re-evaluate anything else you've heard from him. You can find details on PostgreSQL at http://www.postgresql.org/, including the manuals: http://www.postgresql.org/docs/8.0/static/index.html The FAQ: http://www.postgresql.org/docs/faq/ Spanish/Brazilian communities, which might prove useful http://www.postgresql.org/community/international PostgreSQL is licensed under the BSD licence, which means you can freely download or deploy it in a commercial setting if you desire. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Question
In the past week, one guy of Unix Group in Colombia say: Postgrest in production is bat, if the power off in any time the datas is lost why this datas is in plain files. Postgrest no ssupport data bases with more 1 millon of records. Wath tell me in this respect?, is more best Informix as say Both these statements are completely incorrect. Unlike some other database systems, PostgreSQL *does* survive power loss without any major problems. Assuming you use a metadata journailng filesystem, and don't run with non-battery-backed write-cache (but no db can survive that..) And having a million records is no problem at all. You may run into considerations when you're talking billions, but you can do that as well - it just takes a bit more knowledge before you can do it right. //Magnus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] cost-based vacuum
On Mon, 2005-07-11 at 09:07 -0400, Ian Westmacott wrote: On Mon, 2005-07-11 at 07:31, Simon Riggs wrote: The ANALYZE commands hold read locks on the tables you wish to write to. If you slow them down, you merely slow down your write transactions also, and then the read transactions that wait behind them. Every time the ANALYZE sleeps it wakes up the other transactions, which then realise they can't move because of locks and then wake up the ANALYZEs for another shot. The end result is that you introduce more context- switching, without any chance of doing more useful work while the ANALYZEs sleep. Let me make sure I understand. ANALYZE acquires a read lock on the table, that it holds until the operation is complete (including any sleeps). That read lock blocks the extension of that table via COPY. Is that right? According to the 8.0 docs, ANALYZE acquires an ACCESS SHARE lock on the table, and that conflicts only with ACCESS EXCLUSIVE. Thats why I didn't think I had a lock issue, since I think COPY only needs ROW EXCLUSIVE. Or perhaps the transaction needs something more? The docs are correct, but don't show catalog and buffer locks. ...but on further reading of the code there are no catalog locks or buffer locks held across the sleep points. So, my explanation doesn't work as an explanation for the sleep/no sleep difference you have observed. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Question
As a sometimes Informix and PostgreSQL DBA, I disagree with the contentions below. We have many tables with 10s of millions of rows in Postgres. We have had (alas) power issues with our lab on more than one occasion and the afflicted servers have recovered like a champ, every time. This person may not like postgres (or very much likes Informix), but he shouldn't conjure up spurious reasons to support his/her prejudice. Informix is an excellent product, but it can be costly for web related applications. PostgeSQL is also an excellent database. Each has differences which may make the decision between the two of them clear. But facts are necessary to have a real discussion. Greg WIlliamson DBA GlobeXplorer LLC -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Alejandro Lemus Sent: Monday, July 11, 2005 6:00 AM To: pgsql-performance@postgresql.org Subject: [PERFORM] Question In the past week, one guy of Unix Group in Colombia say: Postgrest in production is bat, if the power off in any time the datas is lost why this datas is in plain files. Postgrest no ssupport data bases with more 1 millon of records. Wath tell me in this respect?, is more best Informix as say Ing. Alejandro Lemus G. Radio Taxi Aeropuerto S.A. Avenida de las Américas # 51 - 39 Bogotá - Colombia Tel: 571-4470694 / 571-4202600 Ext. 260 Fax: 571-2624070 email: [EMAIL PROTECTED] __ Correo Yahoo! Espacio para todos tus mensajes, antivirus y antispam ¡gratis! Regístrate ya - http://correo.espanol.yahoo.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend !DSPAM:42d26e2065882109568359! ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] join and query planner
(first at all, sorry for my english) Hi. - Does left join restrict the order in which the planner must join tables? I've read about join, but i'm not sure about left join... - If so: Can I avoid this behavior? I mean, make the planner resolve the query, using statistics (uniqueness, data distribution) rather than join order. My query looks like: SELECT ... FROM a, b, LEFT JOIN c ON (c.key = a.key) LEFT JOIN d on (d.key=a.key) WHERE (a.key = b.key) AND (b.column = 100) b.column has a lot better selectivity, but planner insist on resolve first c.key = a.key. Of course, I could rewrite something like: SELECT ... FROM (SELECT ... FROM a,b LEFT JOIN d on (d.key=a.key) WHERE (b.column = 100) ) as aa LEFT JOIN c ON (c.key = aa.key) but this is query is constructed by an application with a multicolumn filter. It's dynamic. It means that a user could choose to look for c.column = 1000. And also, combinations of filters. So, I need the planner to choose the best plan... I've already change statistics, I clustered tables with cluster, ran vacuum analyze, changed work_mem, shared_buffers... Greetings. TIA. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] join and query planner
Dario Pudlo wrote: (first at all, sorry for my english) Hi. - Does left join restrict the order in which the planner must join tables? I've read about join, but i'm not sure about left join... - If so: Can I avoid this behavior? I mean, make the planner resolve the query, using statistics (uniqueness, data distribution) rather than join order. My query looks like: SELECT ... FROM a, b, LEFT JOIN c ON (c.key = a.key) LEFT JOIN d on (d.key=a.key) WHERE (a.key = b.key) AND (b.column = 100) b.column has a lot better selectivity, but planner insist on resolve first c.key = a.key. Of course, I could rewrite something like: SELECT ... FROM (SELECT ... FROM a,b LEFT JOIN d on (d.key=a.key) WHERE (b.column = 100) ) as aa LEFT JOIN c ON (c.key = aa.key) but this is query is constructed by an application with a multicolumn filter. It's dynamic. It means that a user could choose to look for c.column = 1000. And also, combinations of filters. So, I need the planner to choose the best plan... Probably forcing the other join earlier could help: SELECT ... FROM a JOIN b ON (a.key = b.key) LEFT JOIN c ON (c.key = a.key) ... I think the problem is that postgresql can't break JOIN syntax very easily. But you can make the JOIN earlier. John =:- I've already change statistics, I clustered tables with cluster, ran vacuum analyze, changed work_mem, shared_buffers... Greetings. TIA. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster signature.asc Description: OpenPGP digital signature
[PERFORM] Sorting on longer key is faster ?
The 2 queries are almost same, but ORDER BY x||t is FASTER than ORDER BY x.. How can that be possible? Btw: x and x||t are same ordered phoeniks= explain analyze SELECT * FROM test WHERE i20 ORDER BY x || t; QUERY PLAN -- Sort (cost=2282.65..2284.92 rows=907 width=946) (actual time=74.982..79.114 rows=950 loops=1) Sort Key: (x || t) - Index Scan using i_i on test (cost=0.00..2238.09 rows=907 width=946) (actual time=0.077..51.015 rows=950 loops=1) Index Cond: (i 20) Total runtime: 85.944 ms (5 rows) phoeniks= explain analyze SELECT * FROM test WHERE i20 ORDER BY x; QUERY PLAN - Sort (cost=2280.38..2282.65 rows=907 width=946) (actual time=175.431..179.239 rows=950 loops=1) Sort Key: x - Index Scan using i_i on test (cost=0.00..2235.82 rows=907 width=946) (actual time=0.024..5.378 rows=950 loops=1) Index Cond: (i 20) Total runtime: 183.317 ms (5 rows) phoeniks= \d+ test Table public.test Column | Type | Modifiers | Description +-+---+- i | integer | | t | text| | x | text| | Indexes: i_i btree (i) x_i btree (xpath_string(x, 'data'::text)) x_ii btree (xpath_string(x, 'movie/characters/character'::text)) Has OIDs: no ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Sorting on longer key is faster ?
Chris Travers wrote: John A Meinel wrote: jobapply wrote: The 2 queries are almost same, but ORDER BY x||t is FASTER than ORDER BY x.. How can that be possible? Btw: x and x||t are same ordered phoeniks= explain analyze SELECT * FROM test WHERE i20 ORDER BY x || t; QUERY PLAN What types are x and t, I have the feeling x || t is actually a boolean, so it is only a True/False sort, while ORDER BY x has to do some sort of string comparison (which might actually be a locale depended comparison, and strcoll can be very slow on some locales) Am I reading this that wrong? I would think that x || t would mean concatenate x and t. Sorry, I think you are right. I was getting my operators mixed up. This is interesting. I never through of writing a multicolumn sort this way I'm also surprised that the sort is faster with a merge operation. Are you using UNICODE as the database format? I'm just wondering if it is doing something funny like casting it to an easier to sort type. Best Wishes, Chris Travers Metatron Technology Consulting PS Don't forget to Reply All so that your messages go back to the list. signature.asc Description: OpenPGP digital signature
Re: [PERFORM] Sorting on longer key is faster ?
jobapply wrote: The 2 queries are almost same, but ORDER BY x||t is FASTER than ORDER BY x.. How can that be possible? Btw: x and x||t are same ordered phoeniks= explain analyze SELECT * FROM test WHERE i20 ORDER BY x || t; QUERY PLAN I also thought of another possibility. Are there a lot of similar entries in X? Meaning that the same value is repeated over and over? It is possible that the sort code has a weakness when sorting equal values. For instance, if it was doing a Hash aggregation, you would have the same hash repeated. (It isn't I'm just mentioning a case where it might affect something). If it is creating a tree representation, it might cause some sort of pathological worst-case behavior, where all entries keep adding to the same side of the tree, rather than being more balanced. I don't know the internals of postgresql sorting, but just some ideas. John =:- signature.asc Description: OpenPGP digital signature