Re: [PERFORM] Linux machine aggressively clearing cache

2012-04-12 Thread Steve Crawford

On 03/30/2012 05:51 PM, Josh Berkus wrote:


So this turned out to be a Linux kernel issue.  Will document it on
www.databasesoup.com.
Anytime soon? About to build two PostgreSQL servers and wondering if you 
have uncovered a kernel version or similar issue to avoid.


Cheers,
Steve


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Random performance hit, unknown cause.

2012-04-12 Thread Brian Fehrle

Hi all,

OS: Linux 64 bit 2.6.32
PostgreSQL 9.0.5 installed from Ubuntu packages.
8 CPU cores
64 GB system memory
Database cluster is on raid 10 direct attached drive, using a HP p800 
controller card.



I have a system that has been having occasional performance hits, where 
the load on the system skyrockets, all queries take longer to execute 
and a hot standby slave I have set up via streaming replication starts 
to get behind. I'm having trouble pinpointing where the exact issue is.


This morning, during our nightly backup process (where we grab a copy of 
the data directory), we started having this same issue. The main thing 
that I see in all of these is a high disk wait on the system. When we 
are performing 'well', the %wa from top is usually around 30%, and our 
load is around 12 - 15. This morning we saw a load  21 - 23, and an %wa 
jumping between 60% and 75%.


The top process pretty much at all times is the WAL Sender Process, is 
this normal?


From what I can tell, my access patterns on the database has not 
changed, same average number of inserts, updates, deletes, and had 
nothing on the system changed in any way. No abnormal autovacuum 
processes that aren't normally already running.


So what things can I do to track down what an issue is? Currently the 
system has returned to a 'good' state, and performance looks great. But 
I would like to know how to prevent this, as well as be able to grab 
good stats if it does happen again in the future.


Has anyone had any issues with the HP p800 controller card in a postgres 
environment? Is there anything that can help us maximise the performance 
to disk in this case, as it seems to be one of our major bottlenecks? I 
do plan on moving the pg_xlog to a separate drive down the road, the 
cluster is extremely active so that will help out a ton.


some IO stats:

$ iostat -d -x 5 3
Device:rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s 
avgrq-sz avgqu-sz   await  svctm  %util
dev11.9975.24  651.06  438.04 41668.57  8848.18
46.38 0.603.68   0.70  76.36
dev20.00 0.00  653.05  513.43 41668.57  8848.18
43.31 2.184.78   0.65  76.35


Device:rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s 
avgrq-sz avgqu-sz   await  svctm  %util
dev10.0035.20  676.20  292.00 35105.60  5688.00
42.1367.76   70.73   1.03 100.00
dev20.00 0.00  671.80  295.40 35273.60  4843.20
41.4873.41   76.62   1.03 100.00


Device:rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s 
avgrq-sz avgqu-sz   await  svctm  %util
dev11.2040.80  865.40  424.80 51355.20  8231.00
46.1837.87   29.22   0.77  99.80
dev20.00 0.00  867.40  465.60 51041.60  8231.00
44.4738.28   28.58   0.75  99.80


Thanks in advance,
Brian F


Re: [PERFORM] Random performance hit, unknown cause.

2012-04-12 Thread Claudio Freire
On Thu, Apr 12, 2012 at 3:41 PM, Brian Fehrle
bri...@consistentstate.com wrote:
 This morning, during our nightly backup process (where we grab a copy of the
 data directory), we started having this same issue. The main thing that I
 see in all of these is a high disk wait on the system. When we are
 performing 'well', the %wa from top is usually around 30%, and our load is
 around 12 - 15. This morning we saw a load  21 - 23, and an %wa jumping
 between 60% and 75%.

 The top process pretty much at all times is the WAL Sender Process, is this
 normal?

Sounds like vacuum to me.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Random performance hit, unknown cause.

2012-04-12 Thread Kevin Grittner
Claudio Freire klaussfre...@gmail.com wrote:
 On Thu, Apr 12, 2012 at 3:41 PM, Brian Fehrle
 bri...@consistentstate.com wrote:
 This morning, during our nightly backup process (where we grab a
 copy of the data directory), we started having this same issue.
 The main thing that I see in all of these is a high disk wait on
 the system. When we are performing 'well', the %wa from top is
 usually around 30%, and our load is around 12 - 15. This morning
 we saw a load  21 - 23, and an %wa jumping between 60% and 75%.

 The top process pretty much at all times is the WAL Sender
 Process, is this normal?
 
 Sounds like vacuum to me.
 
More particularly, it seems consistent with autovacuum finding a
large number of tuples which had reached their freeze threshold. 
Rewriting the tuple in place with a frozen xmin is a WAL-logged
operation.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Random performance hit, unknown cause.

2012-04-12 Thread Claudio Freire
On Thu, Apr 12, 2012 at 3:41 PM, Brian Fehrle
bri...@consistentstate.com wrote:
 Is there anything that can help us maximise the performance to disk in this
 case, as it seems to be one of our major bottlenecks?

If it's indeed autovacuum, like I think it is, you can try limiting it
with pg's autovacuum_cost_delay params.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Random performance hit, unknown cause.

2012-04-12 Thread Brian Fehrle

Interesting, that is very likely.

In this system I have a table that is extremely active. On a 'normal' 
day, the autovacuum process takes about 7 hours to complete on this 
table, and once it's complete, the system performs an autoanalyze on the 
table, finding that we have millions of new dead rows. Once this 
happens, it kicks off the autovacuum again, so we basically always have 
a vacuum running on this table at any given time.


If I were to tweak the autovacuum_vacuum_cost_delay parameter, what 
would that be doing? Would it be limiting what the current autovacuum is 
allowed to do? Or does it simply space out the time between autovacuum 
runs? In my case, with 7 hour long autovacuums (sometimes 14 hours), a 
few milliseconds between each vacuum wouldn't mean anything to me.


If that parameter does limit the amount of work autovacuum can do, It 
may cause the system to perform better at that time, but would prolong 
the length of the autovacuum right? That's an issue I'm already having 
issue with, and wouldn't want to make the autovacuum any longer if I 
don't need to.


- Brian F


On 04/12/2012 01:52 PM, Kevin Grittner wrote:

Claudio Freireklaussfre...@gmail.com  wrote:

On Thu, Apr 12, 2012 at 3:41 PM, Brian Fehrle
bri...@consistentstate.com  wrote:

This morning, during our nightly backup process (where we grab a
copy of the data directory), we started having this same issue.
The main thing that I see in all of these is a high disk wait on
the system. When we are performing 'well', the %wa from top is
usually around 30%, and our load is around 12 - 15. This morning
we saw a load  21 - 23, and an %wa jumping between 60% and 75%.

The top process pretty much at all times is the WAL Sender
Process, is this normal?

Sounds like vacuum to me.


More particularly, it seems consistent with autovacuum finding a
large number of tuples which had reached their freeze threshold.
Rewriting the tuple in place with a frozen xmin is a WAL-logged
operation.

-Kevin



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Random performance hit, unknown cause.

2012-04-12 Thread Kevin Grittner
Brian Fehrle bri...@consistentstate.com wrote:
 
 In this system I have a table that is extremely active. On a
 'normal' day, the autovacuum process takes about 7 hours to
 complete on this table, and once it's complete, the system
 performs an autoanalyze on the table, finding that we have
 millions of new dead rows. Once this happens, it kicks off the
 autovacuum again, so we basically always have a vacuum running on
 this table at any given time.
 
 If I were to tweak the autovacuum_vacuum_cost_delay parameter,
 what would that be doing?
 
That controls how long an autovacuum worker naps after it has done
enough work to hit the autovacuum_cost_limit.  As tuning knobs go,
this one is pretty coarse.
 
 Would it be limiting what the current autovacuum is allowed to do?
 
No, just how fast it does it.
 
 Or does it simply space out the time between autovacuum runs?
 
Not that either; it's part of pacing the work of a run.
 
 In my case, with 7 hour long autovacuums (sometimes 14 hours), a 
 few milliseconds between each vacuum wouldn't mean anything to me.
 
Generally, I find that the best way to tune it is to pick 10ms to
20ms for autovacuum_cost_delay, and adjust adjust
autovacuum_cost_limit to tune from there.  A small change in the
former can cause a huge change in pacing; the latter is better for
fine-tuning.
 
 It may cause the system to perform better at that time, but would
 prolong the length of the autovacuum right?
 
Right.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Slow fulltext query plan

2012-04-12 Thread Benoit Delbosc

Hi,

I would like to understand why the following query execution don't use 
any fulltext indexes

and takes more than 300s (using lot of temporary files):

  EXPLAIN ANALYZE SELECT hierarchy.id
  FROM hierarchy
  JOIN fulltext ON fulltext.id = hierarchy.id,
  TO_TSQUERY('whatever') query1,
  TO_TSQUERY('whatever') query2
  WHERE (query1 @@ nx_to_tsvector(fulltext.fulltext)) OR (query2 @@ 
nx_to_tsvector(fulltext.fulltext_title));


The query plan is here:
  http://explain.depesz.com/s/YgP

While if I replace the query2 by query1 in the second clause:

  EXPLAIN ANALYZE SELECT hierarchy.id
  FROM hierarchy
  JOIN fulltext ON fulltext.id = hierarchy.id,
  TO_TSQUERY('whatever') query1,
  TO_TSQUERY('whatever') query2
  WHERE (query1 @@ nx_to_tsvector(fulltext.fulltext)) OR (query1 @@ 
nx_to_tsvector(fulltext.fulltext_title));


It is 5 order of magniude faster (15ms) using the gin indexes:
  http://explain.depesz.com/s/RLa

The nx_to_tsvector is an immutable function with the following code:
  SELECT TO_TSVECTOR('english', SUBSTR($1, 1, 25))

Here is the list of indexes:
  hierarchy: hierarchy_pk PRIMARY KEY, btree (id)
  fulltext: fulltext_fulltext_idx gin 
(nx_to_tsvector(fulltext::character varying))
  fulltext: fulltext_fulltext_title_idx gin 
(nx_to_tsvector(fulltext_title::character varying))


fulltext and fulltext_title are text type.

And some PostgreSQL configuration:
  PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu
  shared_buffers: 4GB
  effective_cache_size: 10GB
  work_mem: 20MB

Thanks for your work and enlightenment

ben

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow fulltext query plan

2012-04-12 Thread Benoit Delbosc

On 13/04/2012 00:25, Tom Lane wrote:

Benoit Delboscbdelb...@nuxeo.com  writes:

EXPLAIN ANALYZE SELECT hierarchy.id
FROM hierarchy
JOIN fulltext ON fulltext.id = hierarchy.id,
TO_TSQUERY('whatever') query1,
TO_TSQUERY('whatever') query2
WHERE (query1 @@ nx_to_tsvector(fulltext.fulltext)) OR (query2 @@
nx_to_tsvector(fulltext.fulltext_title));

Is there a reason why you're writing the query in such a
non-straightforward way, rather than just

EXPLAIN ANALYZE SELECT hierarchy.id
FROM hierarchy
JOIN fulltext ON fulltext.id = hierarchy.id
WHERE (TO_TSQUERY('whatever') @@ nx_to_tsvector(fulltext.fulltext))
   OR (TO_TSQUERY('whatever') @@ nx_to_tsvector(fulltext.fulltext_title));

?

This query is written by a framework, also I thought that is a common 
pattern that can be found in the documentation:


  http://www.postgresql.org/docs/9.1/interactive/textsearch-controls.html

if you think this a wrong way to do it then I will try to fix the framework.

btw your version takes 15ms :)

Thanks

ben

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance