Re: [PERFORM] Linux machine aggressively clearing cache
On 03/30/2012 05:51 PM, Josh Berkus wrote: So this turned out to be a Linux kernel issue. Will document it on www.databasesoup.com. Anytime soon? About to build two PostgreSQL servers and wondering if you have uncovered a kernel version or similar issue to avoid. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Random performance hit, unknown cause.
Hi all, OS: Linux 64 bit 2.6.32 PostgreSQL 9.0.5 installed from Ubuntu packages. 8 CPU cores 64 GB system memory Database cluster is on raid 10 direct attached drive, using a HP p800 controller card. I have a system that has been having occasional performance hits, where the load on the system skyrockets, all queries take longer to execute and a hot standby slave I have set up via streaming replication starts to get behind. I'm having trouble pinpointing where the exact issue is. This morning, during our nightly backup process (where we grab a copy of the data directory), we started having this same issue. The main thing that I see in all of these is a high disk wait on the system. When we are performing 'well', the %wa from top is usually around 30%, and our load is around 12 - 15. This morning we saw a load 21 - 23, and an %wa jumping between 60% and 75%. The top process pretty much at all times is the WAL Sender Process, is this normal? From what I can tell, my access patterns on the database has not changed, same average number of inserts, updates, deletes, and had nothing on the system changed in any way. No abnormal autovacuum processes that aren't normally already running. So what things can I do to track down what an issue is? Currently the system has returned to a 'good' state, and performance looks great. But I would like to know how to prevent this, as well as be able to grab good stats if it does happen again in the future. Has anyone had any issues with the HP p800 controller card in a postgres environment? Is there anything that can help us maximise the performance to disk in this case, as it seems to be one of our major bottlenecks? I do plan on moving the pg_xlog to a separate drive down the road, the cluster is extremely active so that will help out a ton. some IO stats: $ iostat -d -x 5 3 Device:rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util dev11.9975.24 651.06 438.04 41668.57 8848.18 46.38 0.603.68 0.70 76.36 dev20.00 0.00 653.05 513.43 41668.57 8848.18 43.31 2.184.78 0.65 76.35 Device:rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util dev10.0035.20 676.20 292.00 35105.60 5688.00 42.1367.76 70.73 1.03 100.00 dev20.00 0.00 671.80 295.40 35273.60 4843.20 41.4873.41 76.62 1.03 100.00 Device:rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util dev11.2040.80 865.40 424.80 51355.20 8231.00 46.1837.87 29.22 0.77 99.80 dev20.00 0.00 867.40 465.60 51041.60 8231.00 44.4738.28 28.58 0.75 99.80 Thanks in advance, Brian F
Re: [PERFORM] Random performance hit, unknown cause.
On Thu, Apr 12, 2012 at 3:41 PM, Brian Fehrle bri...@consistentstate.com wrote: This morning, during our nightly backup process (where we grab a copy of the data directory), we started having this same issue. The main thing that I see in all of these is a high disk wait on the system. When we are performing 'well', the %wa from top is usually around 30%, and our load is around 12 - 15. This morning we saw a load 21 - 23, and an %wa jumping between 60% and 75%. The top process pretty much at all times is the WAL Sender Process, is this normal? Sounds like vacuum to me. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Random performance hit, unknown cause.
Claudio Freire klaussfre...@gmail.com wrote: On Thu, Apr 12, 2012 at 3:41 PM, Brian Fehrle bri...@consistentstate.com wrote: This morning, during our nightly backup process (where we grab a copy of the data directory), we started having this same issue. The main thing that I see in all of these is a high disk wait on the system. When we are performing 'well', the %wa from top is usually around 30%, and our load is around 12 - 15. This morning we saw a load 21 - 23, and an %wa jumping between 60% and 75%. The top process pretty much at all times is the WAL Sender Process, is this normal? Sounds like vacuum to me. More particularly, it seems consistent with autovacuum finding a large number of tuples which had reached their freeze threshold. Rewriting the tuple in place with a frozen xmin is a WAL-logged operation. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Random performance hit, unknown cause.
On Thu, Apr 12, 2012 at 3:41 PM, Brian Fehrle bri...@consistentstate.com wrote: Is there anything that can help us maximise the performance to disk in this case, as it seems to be one of our major bottlenecks? If it's indeed autovacuum, like I think it is, you can try limiting it with pg's autovacuum_cost_delay params. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Random performance hit, unknown cause.
Interesting, that is very likely. In this system I have a table that is extremely active. On a 'normal' day, the autovacuum process takes about 7 hours to complete on this table, and once it's complete, the system performs an autoanalyze on the table, finding that we have millions of new dead rows. Once this happens, it kicks off the autovacuum again, so we basically always have a vacuum running on this table at any given time. If I were to tweak the autovacuum_vacuum_cost_delay parameter, what would that be doing? Would it be limiting what the current autovacuum is allowed to do? Or does it simply space out the time between autovacuum runs? In my case, with 7 hour long autovacuums (sometimes 14 hours), a few milliseconds between each vacuum wouldn't mean anything to me. If that parameter does limit the amount of work autovacuum can do, It may cause the system to perform better at that time, but would prolong the length of the autovacuum right? That's an issue I'm already having issue with, and wouldn't want to make the autovacuum any longer if I don't need to. - Brian F On 04/12/2012 01:52 PM, Kevin Grittner wrote: Claudio Freireklaussfre...@gmail.com wrote: On Thu, Apr 12, 2012 at 3:41 PM, Brian Fehrle bri...@consistentstate.com wrote: This morning, during our nightly backup process (where we grab a copy of the data directory), we started having this same issue. The main thing that I see in all of these is a high disk wait on the system. When we are performing 'well', the %wa from top is usually around 30%, and our load is around 12 - 15. This morning we saw a load 21 - 23, and an %wa jumping between 60% and 75%. The top process pretty much at all times is the WAL Sender Process, is this normal? Sounds like vacuum to me. More particularly, it seems consistent with autovacuum finding a large number of tuples which had reached their freeze threshold. Rewriting the tuple in place with a frozen xmin is a WAL-logged operation. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Random performance hit, unknown cause.
Brian Fehrle bri...@consistentstate.com wrote: In this system I have a table that is extremely active. On a 'normal' day, the autovacuum process takes about 7 hours to complete on this table, and once it's complete, the system performs an autoanalyze on the table, finding that we have millions of new dead rows. Once this happens, it kicks off the autovacuum again, so we basically always have a vacuum running on this table at any given time. If I were to tweak the autovacuum_vacuum_cost_delay parameter, what would that be doing? That controls how long an autovacuum worker naps after it has done enough work to hit the autovacuum_cost_limit. As tuning knobs go, this one is pretty coarse. Would it be limiting what the current autovacuum is allowed to do? No, just how fast it does it. Or does it simply space out the time between autovacuum runs? Not that either; it's part of pacing the work of a run. In my case, with 7 hour long autovacuums (sometimes 14 hours), a few milliseconds between each vacuum wouldn't mean anything to me. Generally, I find that the best way to tune it is to pick 10ms to 20ms for autovacuum_cost_delay, and adjust adjust autovacuum_cost_limit to tune from there. A small change in the former can cause a huge change in pacing; the latter is better for fine-tuning. It may cause the system to perform better at that time, but would prolong the length of the autovacuum right? Right. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Slow fulltext query plan
Hi, I would like to understand why the following query execution don't use any fulltext indexes and takes more than 300s (using lot of temporary files): EXPLAIN ANALYZE SELECT hierarchy.id FROM hierarchy JOIN fulltext ON fulltext.id = hierarchy.id, TO_TSQUERY('whatever') query1, TO_TSQUERY('whatever') query2 WHERE (query1 @@ nx_to_tsvector(fulltext.fulltext)) OR (query2 @@ nx_to_tsvector(fulltext.fulltext_title)); The query plan is here: http://explain.depesz.com/s/YgP While if I replace the query2 by query1 in the second clause: EXPLAIN ANALYZE SELECT hierarchy.id FROM hierarchy JOIN fulltext ON fulltext.id = hierarchy.id, TO_TSQUERY('whatever') query1, TO_TSQUERY('whatever') query2 WHERE (query1 @@ nx_to_tsvector(fulltext.fulltext)) OR (query1 @@ nx_to_tsvector(fulltext.fulltext_title)); It is 5 order of magniude faster (15ms) using the gin indexes: http://explain.depesz.com/s/RLa The nx_to_tsvector is an immutable function with the following code: SELECT TO_TSVECTOR('english', SUBSTR($1, 1, 25)) Here is the list of indexes: hierarchy: hierarchy_pk PRIMARY KEY, btree (id) fulltext: fulltext_fulltext_idx gin (nx_to_tsvector(fulltext::character varying)) fulltext: fulltext_fulltext_title_idx gin (nx_to_tsvector(fulltext_title::character varying)) fulltext and fulltext_title are text type. And some PostgreSQL configuration: PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu shared_buffers: 4GB effective_cache_size: 10GB work_mem: 20MB Thanks for your work and enlightenment ben -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow fulltext query plan
On 13/04/2012 00:25, Tom Lane wrote: Benoit Delboscbdelb...@nuxeo.com writes: EXPLAIN ANALYZE SELECT hierarchy.id FROM hierarchy JOIN fulltext ON fulltext.id = hierarchy.id, TO_TSQUERY('whatever') query1, TO_TSQUERY('whatever') query2 WHERE (query1 @@ nx_to_tsvector(fulltext.fulltext)) OR (query2 @@ nx_to_tsvector(fulltext.fulltext_title)); Is there a reason why you're writing the query in such a non-straightforward way, rather than just EXPLAIN ANALYZE SELECT hierarchy.id FROM hierarchy JOIN fulltext ON fulltext.id = hierarchy.id WHERE (TO_TSQUERY('whatever') @@ nx_to_tsvector(fulltext.fulltext)) OR (TO_TSQUERY('whatever') @@ nx_to_tsvector(fulltext.fulltext_title)); ? This query is written by a framework, also I thought that is a common pattern that can be found in the documentation: http://www.postgresql.org/docs/9.1/interactive/textsearch-controls.html if you think this a wrong way to do it then I will try to fix the framework. btw your version takes 15ms :) Thanks ben -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance