subject:"\[PERFORM\] Really really slow select count\(\*\)"

Maciek Sakrejda msakre...@truviso.com wrote:
 Well it said Failed to shutdown ...  and then
 returned control. and then proceeded to run for about an hour.
 I'm not sure how graceful that is.

 Ah, but that was just the control script that sends the database
 the command to shut down. The 'graceful' part, is that the
 database is being nice to everyone trying to do things with the
 data inside.

 The control script has a timeout. So it'll send the command, wait
 a few seconds to see if the database responds, and then gives up.
 
 For what it's worth, I think that's the not-so-graceful part. The
 control script gives up, but the actual shutdown still occurs
 eventually, after all current connections have ended. I think most
 users will take pg_ctl at its word, and assume Failed to
 shutdown means I couldn't shut down with this command, maybe you
 should try something else, and not I couldn't shut down right
 now, although I'll get to it as soon as everyone disconnects..
 
Yeah, current behavior with that shutdown option is the opposite of
smart for any production environment I've seen.  (I can see where it
would be handy in development, though.)  What's best in production
is the equivalent of the fast option with escalation to immediate if
necessary to ensure shutdown within the time limit.
 
In my world, telling PostgreSQL to shut down PostgreSQL is most
often because in a few minutes someone is going to pull the plug to
move the server, an electrician is going to flip the circuit off to
do some wiring, or (in one recent event) the building is on fire and
the fire department is about to cut electrical power.  In such
situations, patiently waiting for a long-running query to complete
is a Very Bad Idea, much less waiting for a connection pool to cycle
all connections out.  Telling the user that the shutdown failed,
when what is really happening is that it will block new connections
and keep waiting around indefinitely, with an actual shutdown at
some ill-defined future moment is adding insult to injury.
 
In my view, anyway
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

2011-02-08 Thread Marti Raudsepp

On Tue, Feb 8, 2011 at 18:36, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Yeah, current behavior with that shutdown option is the opposite of
 smart for any production environment I've seen.  (I can see where it
 would be handy in development, though.)  What's best in production
 is the equivalent of the fast option with escalation to immediate if
 necessary to ensure shutdown within the time limit.

+1, we should call it dumb :)

Not accepting new connections with the database system is shutting
down makes it even worse -- it means you can't log in to the server
to inspect who's querying it or call pg_terminate_backend() on them.

I couldn't find any past discussions about changing the default to fast.
Are there any reasons why that cannot be done in a future release?

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

2011-02-08 Thread Scott Marlowe

On Tue, Feb 8, 2011 at 9:50 AM, Marti Raudsepp ma...@juffo.org wrote:
 On Tue, Feb 8, 2011 at 18:36, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
 Yeah, current behavior with that shutdown option is the opposite of
 smart for any production environment I've seen.  (I can see where it
 would be handy in development, though.)  What's best in production
 is the equivalent of the fast option with escalation to immediate if
 necessary to ensure shutdown within the time limit.

 +1, we should call it dumb :)

 Not accepting new connections with the database system is shutting
 down makes it even worse -- it means you can't log in to the server
 to inspect who's querying it or call pg_terminate_backend() on them.

 I couldn't find any past discussions about changing the default to fast.
 Are there any reasons why that cannot be done in a future release?

Or at least throw a hint the user's way that -m fast might be needed.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

2011-02-08 Thread Maciek Sakrejda

 I couldn't find any past discussions about changing the default to fast.
 Are there any reasons why that cannot be done in a future release?

 Or at least throw a hint the user's way that -m fast might be needed.

I think there are several issues here:

1. Does pg_ctl give a clear indication of the outcome of a failed
smart mode shutdown?
2. Is the current smart shutdown mode behavior useful?
3. Should the default shutdown mode be changed to fast?

I think felix mainly complained about (1), and that's what I was
talking about as well. The current message (I have only an 8.3 handy,
but I don't imagine this has changed much) is:

pg_ctl stop -t5
waiting for server to shut down failed
pg_ctl: server does not shut down

This leaves out crucial information (namely, but it will stop
accepting new connections and shut down when all current connections
are closed). It seems like something along those lines should be
added to the error message, or perhaps at least to pg_ctl
documentation. Currently, the docs page (
http://www.postgresql.org/docs/current/static/app-pg-ctl.html ) only
hints at this, and pg_ctl --help does not really mention this at all.

Of the two other issues, (3) seems reasonable (I have no strong
feelings there either way), and (2) is probably a moot point (the
behavior won't change in a backward-incompatible manner now, and if
it's dethroned as default, that doesn't really matter).

---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

Marti Raudsepp ma...@juffo.org wrote:
 
 I couldn't find any past discussions about changing the default to
 fast.
 
It's not entirely unrelated to the Linux LSB init script in August
and September of 1009:
 
http://archives.postgresql.org/pgsql-hackers/2009-08/msg01843.php
 
http://archives.postgresql.org/pgsql-hackers/2009-09/msg01963.php
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

2011-02-08 Thread Greg Smith


Marti Raudsepp wrote:

I couldn't find any past discussions about changing the default to fast.
Are there any reasons why that cannot be done in a future release?
  


Well, it won't actually help as much as you might think.  It's possible 
for clients to be in a state where fast shutdown doesn't work, either.  
You either have to kill them manually or use an immediate shutdown.


Kevin and I both suggested a fast plus timeout then immediate behavior 
is what many users seem to want.  My comments were at 
http://archives.postgresql.org/pgsql-hackers/2009-09/msg01145.php ; for 
an example of how fast shutdown can fail see 
http://archives.postgresql.org/pgsql-bugs/2009-03/msg00062.php


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

2011-02-08 Thread Marti Raudsepp

On Tue, Feb 8, 2011 at 22:09, Greg Smith g...@2ndquadrant.com wrote:
 Kevin and I both suggested a fast plus timeout then immediate behavior is
 what many users seem to want.  My comments were at
 http://archives.postgresql.org/pgsql-hackers/2009-09/msg01145.php ; for an
 example of how fast shutdown can fail see
 http://archives.postgresql.org/pgsql-bugs/2009-03/msg00062.php

True, I've hit that a few times too.

Seems that a better solution would be implementing a new -m option
that does this transparently?

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

Marti Raudsepp ma...@juffo.org wrote:
 Greg Smith g...@2ndquadrant.com wrote:
 Kevin and I both suggested a fast plus timeout then immediate
 behavior is what many users seem to want.
 
 Seems that a better solution would be implementing a new -m option
 that does this transparently?
 
Maybe.  Another option might be to use -t or some new switch (or -t
in combination with some new switch) as a time limit before
escalating to the next shutdown mode.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

2011-02-08 Thread Scott Marlowe

On Tue, Feb 8, 2011 at 1:09 PM, Greg Smith g...@2ndquadrant.com wrote:
 Marti Raudsepp wrote:

 I couldn't find any past discussions about changing the default to fast.
 Are there any reasons why that cannot be done in a future release?

 Kevin and I both suggested a fast plus timeout then immediate behavior is
 what many users seem to want.  My comments were at
 http://archives.postgresql.org/pgsql-hackers/2009-09/msg01145.php ; for an
 example of how fast shutdown can fail see
 http://archives.postgresql.org/pgsql-bugs/2009-03/msg00062.php

Are there any settings in postgresql.conf that would make it unsafe to
use -m immediate?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

Scott Marlowe scott.marl...@gmail.com wrote:
 Greg Smith g...@2ndquadrant.com wrote:
 
 Kevin and I both suggested a fast plus timeout then immediate
 behavior is what many users seem to want.
 
 Are there any settings in postgresql.conf that would make it
 unsafe to use -m immediate?
 
I don't think so.  There could definitely be problems if someone
cuts power before your shutdown completes, though.  (I hear that
those firefighters like to cut power to a building before they grab
those big brass nozzles to spray a stream of water into a building. 
Go figure...)
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

2011-02-08 Thread Greg Smith


Scott Marlowe wrote:

Are there any settings in postgresql.conf that would make it unsafe to
use -m immediate?
  


Two concerns:

-Clients will be killed without any review, and data related to them lost

-The server will have to go through recovery to start back up again, 
which could potentially take a long time.  If you manage a successful 
shutdown that doesn't happen.


Shouldn't be unsafe, just has those issues.

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

2011-02-08 Thread Scott Marlowe

On Tue, Feb 8, 2011 at 3:08 PM, Greg Smith g...@2ndquadrant.com wrote:
 Scott Marlowe wrote:

 Are there any settings in postgresql.conf that would make it unsafe to
 use -m immediate?


 Two concerns:

 -Clients will be killed without any review, and data related to them lost

 -The server will have to go through recovery to start back up again, which
 could potentially take a long time.  If you manage a successful shutdown
 that doesn't happen.

 Shouldn't be unsafe, just has those issues.

Good, I was kinda worried about full_page_writes being off or fsync or
something like that being a problem.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

2011-02-07 Thread Marti Raudsepp

On Mon, Feb 7, 2011 at 05:03, Craig Ringer cr...@postnewspapers.com.au wrote:
 What would possibly help would be if Pg could fall back to lower
 shared_buffers automatically, screaming about it in the logs but still
 launching. OTOH, many people don't check the logs, so they'd think their
 new setting had taken effect and it hadn't - you've traded one usability
 problem for another. Even if Pg issued WARNING messages to each client
 that connected, lots of (non-psql) clients don't display them, so many
 users would never know.

 Do you have a suggestion about how to do this better? The current
 approach is known to be rather unlovely, but nobody's come up with a
 better one that works reasonably and doesn't trample on other System V
 shared memory users that may exist on the system.

We could do something similar to what Apache does -- provide distros
with a binary to check the configuration file in advance. This check
program is launched before the restart command, and if it fails, the
server is not restarted.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

2011-02-07 Thread felix

+1

this is exactly what I was looking for at the time:  a -t (configtest)
option to pg_ctl

and I think it should fall back to lower shared buffers and log it.

SHOW ALL; would show the used value



On Mon, Feb 7, 2011 at 11:30 AM, Marti Raudsepp ma...@juffo.org wrote:

 On Mon, Feb 7, 2011 at 05:03, Craig Ringer cr...@postnewspapers.com.au
 wrote:
  What would possibly help would be if Pg could fall back to lower
  shared_buffers automatically, screaming about it in the logs but still
  launching. OTOH, many people don't check the logs, so they'd think their
  new setting had taken effect and it hadn't - you've traded one usability
  problem for another. Even if Pg issued WARNING messages to each client
  that connected, lots of (non-psql) clients don't display them, so many
  users would never know.
 
  Do you have a suggestion about how to do this better? The current
  approach is known to be rather unlovely, but nobody's come up with a
  better one that works reasonably and doesn't trample on other System V
  shared memory users that may exist on the system.

 We could do something similar to what Apache does -- provide distros
 with a binary to check the configuration file in advance. This check
 program is launched before the restart command, and if it fails, the
 server is not restarted.

 Regards,
 Marti

Re: [PERFORM] Really really slow select count(*)

2011-02-07 Thread Scott Marlowe

On Mon, Feb 7, 2011 at 8:05 AM, felix crucialfe...@gmail.com wrote:
 +1
 this is exactly what I was looking for at the time:  a -t (configtest)
 option to pg_ctl
 and I think it should fall back to lower shared buffers and log it.
 SHOW ALL; would show the used value

however, much like apache, this might not have gotten caught.  In
order to catch it we'd have to see how much shared mem was available,
and I think you have to actually allocate it to find out if you can.
Since pg is already running, allocating shared_buffers / fsm twice
might fail when allocating it once would succeed.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

2011-02-07 Thread Greg Smith


Craig Ringer wrote:

What would possibly help would be if Pg could fall back to lower
shared_buffers automatically, screaming about it in the logs but still
launching.


This is exactly what initdb does when it produces an initial setting for 
shared_buffers that goes into the postgresql.conf file.  It wouldn't be 
hard to move that same logic into a loop that executed when startup 
failed to allocated enough memory.


There are two problems here, one almost solved, the other more 
philosphical.  It used to be that max_fsm_pages and wal_buffers could be 
large enough components to the allocation that reducing them might 
actually be a necessary fix, too.  With the removal of the former and a 
method to automatically set the latter now available, the remaining 
components to the shared memory sizing computation are probably possible 
to try and fix automatically if the kernel limits are too low.


But it's unclear whether running in a degraded mode, where performance 
might be terrible, with only a log message is preferrable to stopping 
and forcing the DBA's attention toward the mistake that was made 
immediately.  Log files get rotated out, and it's not hard to imagine 
this problem coming to haunt someone only a month or two later--by which 
time the change to shared_buffers is long forgotten, and the log message 
complaining about it lost too.  Accordingly I would expect any serious 
attempt to add some auto-reduction behavior to be beset with argument, 
and I'd never consider writing such a thing as a result.  Too many 
non-controversial things I could work on instead.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

2011-02-07 Thread Craig Ringer


On 02/07/2011 06:30 PM, Marti Raudsepp wrote:

On Mon, Feb 7, 2011 at 05:03, Craig Ringercr...@postnewspapers.com.au  wrote:

What would possibly help would be if Pg could fall back to lower
shared_buffers automatically, screaming about it in the logs but still
launching. OTOH, many people don't check the logs, so they'd think their
new setting had taken effect and it hadn't - you've traded one usability
problem for another. Even if Pg issued WARNING messages to each client
that connected, lots of (non-psql) clients don't display them, so many
users would never know.

Do you have a suggestion about how to do this better? The current
approach is known to be rather unlovely, but nobody's come up with a
better one that works reasonably and doesn't trample on other System V
shared memory users that may exist on the system.


We could do something similar to what Apache does -- provide distros
with a binary to check the configuration file in advance. This check
program is launched before the restart command, and if it fails, the
server is not restarted.


That would work for config file errors (and would probably be a good 
idea) but won't help with bad shared memory configuration. When Pg is 
already running, it's usually not possible for a test program to claim 
the amount of shared memory the config file says to allocate, because Pg 
is already using it. Nonetheless, Pg will work fine when restarted.


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

2011-02-07 Thread Craig Ringer


On 02/08/2011 03:05 AM, Greg Smith wrote:


Accordingly I would expect any serious
attempt to add some auto-reduction behavior to be beset with argument,
and I'd never consider writing such a thing as a result. Too many
non-controversial things I could work on instead.


Yep. I expressed my own doubts in the post I suggested that in.

If Pg did auto-correct down, it'd be necessary to scream about it 
angrily and continuously, not just once during startup. Given that it's 
clear many people never even look at the logs (what logs? where are 
they?) I think Pg would also have to send notices to the client. 
Problem is, many clients don't process notices/warnings, so particularly 
slack admins won't see that either.


I'm not particularly excited about the idea.

--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

2011-02-07 Thread felix

On Mon, Feb 7, 2011 at 6:05 AM, Shaun Thomas stho...@peak6.com wrote:


 That’s one of the things I talked about. To be safe, PG will start to shut
 down but disallow new connections, and **that’s all**. Old connections are
 grandfathered in until they disconnect, and when they all go away, it shuts
 down gracefully.



Well it said Failed to shutdown ...  and then returned
control.
and then proceeded to run for about an hour.

I'm not sure how graceful that is.

I generally take programs at their word.  Failed is clearly past tense.



So far as your Django install, have you activated the memcache contrib.
 module? Your pages should be lazy-caching and rarely depend on the DB, if
 they can.


yes thanks my web app is very finely tuned and is working splendidly.
I've been working on very large sites sites since 1998 and this client has
been with me for 10 years already.  its a fairly high traffic site.

I've only been using postgres since we migrated in May

but it is one particular table on postgres that has shit the sock drawer.




 You should also rarely be doing count(*) on a 300k row table, even if
 everything is cached and speedy.


I'm not

this is a test query that is obviously way out of bounds for acceptable
response.

there is something very very wrong with this table and I need to solve it
ASAP.
other tables that have less updates but similar sizes are not having this
problem.

there are foreign keys pointing to this table so its a bit tricky to just
refill it, but I can think of one way.  I'll have to do that.

its only conjecture that the issue is file space bloat or free map problems.
 those are overall issues that I will get to as soon as I can. but this is
table specific.


 That’s an application design issue you need to address before it’s too
 late, or you have to rush and implement a hasty fix.


it is not an application design issue, though there are always improvements
being made.

Being a DBA sucks sometimes. J


I am not a DBA, I'm just trying to query a 300k row table.

though I am happy to learn more. I know an awful lot about a lot of things.
 but you can't specialize in everything

Re: [PERFORM] Really really slow select count(*)

2011-02-07 Thread Scott Marlowe

On Mon, Feb 7, 2011 at 8:17 PM, felix crucialfe...@gmail.com wrote:

 On Mon, Feb 7, 2011 at 6:05 AM, Shaun Thomas stho...@peak6.com wrote:

 That’s one of the things I talked about. To be safe, PG will start to shut
 down but disallow new connections, and *that’s all*. Old connections are
 grandfathered in until they disconnect, and when they all go away, it shuts
 down gracefully.

 Well it said Failed to shutdown ...  and then returned
 control.
 and then proceeded to run for about an hour.
 I'm not sure how graceful that is.
 I generally take programs at their word.  Failed is clearly past tense.

I agree that here what pg_ctl said and what it didn't aren't exactly
the same thing.

 but it is one particular table on postgres that has shit the sock drawer.

What queries are running slow, and what does explain analyze have to
say about them?

 You should also rarely be doing count(*) on a 300k row table, even if
 everything is cached and speedy.

 I'm not
 this is a test query that is obviously way out of bounds for acceptable
 response.
 there is something very very wrong with this table and I need to solve it
 ASAP.
 other tables that have less updates but similar sizes are not having this
 problem.

Is this the same problem you had at the beginning and were trying to
fix with clustering and increasing fsm, or is this now a different
table and a different problem?

 there are foreign keys pointing to this table so its a bit tricky to just
 refill it, but I can think of one way.  I'll have to do that.
 its only conjecture that the issue is file space bloat or free map problems.
  those are overall issues that I will get to as soon as I can. but this is
 table specific.

What does the query you ran before that shows bloat show on this table now?

  That’s an application design issue you need to address before it’s too
 late, or you have to rush and implement a hasty fix.

 it is not an application design issue, though there are always improvements
 being made.

If your application is doing select count(*) with either no where
clause or with a very non-selective one, then it is somewhat of a
design issue, and there are ways to make that faster.  if it's a
different query, show us what it and its explain analyze look like.

 Being a DBA sucks sometimes. J

 I am not a DBA, I'm just trying to query a 300k row table.
 though I am happy to learn more. I know an awful lot about a lot of things.
  but you can't specialize in everything

Well the good news is that there's a LOT less arcana involved in keep
pgsql happy than there is in keeping something like Oracle happy.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

2011-02-06 Thread felix

BRUTAL


http://www.postgresql.org/docs/8.3/interactive/runtime-config-resource.html
max_fsm_pages

See Section 
17.4.1http://www.postgresql.org/docs/8.3/interactive/kernel-resources.html#SYSVIPC
for
information on how to adjust those parameters, if necessary.

I see absolutely nothing in there about how to set those parameters.

several hours later (
where is my data directory ?  8.4 shows it in SHOW ALL; 8.3 does not.
conf files ? in the data directory no, its in /etc/postgres/8.3/main
where is pg_ctl ?
what user do I need to be ? postgres
then why was it installed in the home dir of a user that does not have
permissions to use it ??
)


cd /home/crucial/bin

/home/crucial/bin/pg_ctl -D /var/lib/postgresql/8.3/main reload

reload does not reset max_fsm_pages, I need to actually restart the server.

postgres@nestseekers:/home/crucial/bin$ /home/crucial/bin/pg_ctl -D
/var/lib/postgresql/8.3/main restart
waiting for server to shut
down... failed
pg_ctl: server does not shut down


OK, my mistake.   probably I have to disconnect all clients.  I don't want
to do a planned maintenance right now.

so I go to sleep

the server restarts itself an hour later.

but no, it fails to restart because this memory setting you recommend is not
possible without reconfiguring the kernel.


postgres@nestseekers:/home/crucial/bin$ 2011-02-06 05:18:00 EST LOG:  could
not load root certificate file root.crt: No such file or directory
2011-02-06 05:18:00 EST DETAIL:  Will not verify client certificates.
2011-02-06 05:18:00 EST FATAL:  could not create shared memory segment:
Invalid argument
2011-02-06 05:18:00 EST DETAIL:  Failed system call was shmget(key=5432001,
size=35463168, 03600).
2011-02-06 05:18:00 EST HINT:  This error usually means that PostgreSQL's
request for a shared memory segment exceeded your kernel's SHMMAX parameter.
 You can either reduce the request size or reconfigure the kernel with
larger SHMMAX.  To reduce the request size (currently 35463168 bytes),
reduce PostgreSQL's shared_buffers parameter (currently 3072) and/or its
max_connections parameter (currently 103).
If the request size is already small, it's possible that it is less than
your kernel's SHMMIN parameter, in which case raising the request size or
reconfiguring SHMMIN is called for.
The PostgreSQL documentation contains more information about shared memory
configuration.
^C

*and the website is down for the next 6 hours while I sleep.*

total disaster

after a few tries I get it to take an max_fsm_pages of 300k

postgres@nestseekers:/home/crucial/bin$ 2011-02-06 05:19:26 EST LOG:  could
not load root certificate file root.crt: No such file or directory
2011-02-06 05:19:26 EST DETAIL:  Will not verify client certificates.
2011-02-06 05:19:26 EST LOG:  database system was shut down at 2011-02-06
00:07:41 EST
2011-02-06 05:19:27 EST LOG:  autovacuum launcher started
2011-02-06 05:19:27 EST LOG:  database system is ready to accept connections
^C



2011-02-06 05:33:45 EST LOG:  checkpoints are occurring too frequently (21
seconds apart)
2011-02-06 05:33:45 EST HINT:  Consider increasing the configuration
parameter checkpoint_segments.


??


From my perspective: the defaults for postgres 8.3 result in a database that
does not scale and fails dramatically after 6 months.  changing that default
is brutally difficult and can only really be done by adjusting something in
the kernel.


I have clustered that table, its still unbelievably slow.
I still don't know if this bloat due to the small free space map has
anything to do with why the table is performing like this.


On Fri, Feb 4, 2011 at 5:35 PM, Shaun Thomas stho...@peak6.com wrote:


 You can stop the bloating by setting the right max_fsm_pages setting,







 but you'll either have to go through and VACUUM FULL every table in your
 database, or dump/restore to regain all the lost space and performance (the
 later would actually be faster). Before I even touch an older PostgreSQL DB,
 I set it to some value over 3-million just as a starting value to be on the
 safe side. A little used memory is a small price to pay for stopping gradual
 expansion.

Re: [PERFORM] Really really slow select count(*)

2011-02-06 Thread Ray Stell

On Sun, Feb 06, 2011 at 11:48:50AM +0100, felix wrote:
 BRUTAL
 

Did the changes work in your test environment?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

On Sun, Feb 6, 2011 at 3:48 AM, felix crucialfe...@gmail.com wrote:
 BRUTAL

SNIP

 OK, my mistake.   probably I have to disconnect all clients.  I don't want
 to do a planned maintenance right now.
 so I go to sleep
 the server restarts itself an hour later.
 but no, it fails to restart because this memory setting you recommend is not
 possible without reconfiguring the kernel.

SNIP

 and the website is down for the next 6 hours while I sleep.
 total disaster

Let's review:
1: No test or staging system used before production
2: DB left in an unknown state (trying to shut down, not able)
3: No monitoring software to tell you when the site is down
4: I'm gonna just go ahead and guess no backups were taken either, or
are regularly taken.

This website can't be very important, if that's the way you treat it.
Number 1 up there becomes even worse because it was your first time
trying to make this particular change in Postgresql.  If it is
important, you need to learn how to start treating it that way.  Even
the most junior of sys admins or developers I work with know we test
it a couple times outside of production before just trying it there.
And my phone starts complaining a minute after the site stops
responding if something does go wrong the rest of the time.  Do not
lay this at anyone else's feet.

 From my perspective: the defaults for postgres 8.3 result in a database that
 does not scale and fails dramatically after 6 months.

Agreed.  Welcome to using shared memory and the ridiculously low
defaults on most flavors of unix or linux.

  changing that default
 is brutally difficult and can only really be done by adjusting something in
 the kernel.

Please, that's a gross exaggeration.  The sum totoal to changing them is:

run sysctl -a|grep shm
copy out proper lines to cahnge
edit sysctl.conf
put new lines in there with changes
sudo sysctl -p  # applies changes
edit the appropriate postgresql.conf, make changes
sudo /etc/init.d/postgresql-8.3 stop
sudo /etc/init.d/postgresql-8.3 start

 I have clustered that table, its still unbelievably slow.

Did you actually delete the old entries before clustering it?  if it's
still got 4G of old sessions or whatever in it, clustering ain't gonna
help.

 I still don't know if this bloat due to the small free space map has
 anything to do with why the table is performing like this.

Since you haven't show us what changes, if any, have happened to the
table, neither do we :)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

On Fri, Feb 4, 2011 at 1:14 PM, felix crucialfe...@gmail.com wrote:
  schema_name |            table_name            |  row_count  | mb_used |
 total_mb_used
 -+--+-+-+---
  public      | django_session                   | 1.47843e+07 |    4122 |
       18832

So does this row still have 15M rows in it?  Any old ones you can
delete, then run cluster on the table?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

2011-02-06 Thread Pierre C




I have clustered that table, its still unbelievably slow.


Did you actually delete the old entries before clustering it?  if it's
still got 4G of old sessions or whatever in it, clustering ain't gonna
help.


Also, IMHO it is a lot better to store sessions in something like  
memcached, rather than imposing this rather large load on the main  
database...


PS : if your site has been down for 6 hours, you can TRUNCATE your  
sessions table...


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

On Sun, Feb 6, 2011 at 12:19 PM, Pierre C li...@peufeu.com wrote:

 I have clustered that table, its still unbelievably slow.

 Did you actually delete the old entries before clustering it?  if it's
 still got 4G of old sessions or whatever in it, clustering ain't gonna
 help.

 Also, IMHO it is a lot better to store sessions in something like memcached,
 rather than imposing this rather large load on the main database...

 PS : if your site has been down for 6 hours, you can TRUNCATE your sessions
 table...

Agreed.  When I started where I am sessions were on pg and falling
over all the time.  Because I couldn't change it at the time, I was
forced to make autovac MUCH more aggressive.  I didn't have to crank
up fsm a lot really but did a bit. Then just ran a vacuum full /
reindex across the sessions table and everything was fine after that.
But we could handle 100x time the load for sessions with memcached I
bet.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

2011-02-06 Thread felix

On Sun, Feb 6, 2011 at 4:23 PM, Scott Marlowe scott.marl...@gmail.comwrote:

 Let's review:



1: No test or staging system used before production


no, I do not have a full ubuntu machine replicating the exact memory and
application load of the production server.

this was changing one configuration parameter. something I was advised to
do, read about quite a bit, tested on my development server (mac) and then
proceeded to do at 6 am on Sunday morning, our slowest time.


2: DB left in an unknown state (trying to shut down, not able)


what ?

I checked the site, everything was normal.  I went in via psql and tried
some queries for about half an hour and continued to monitor the site.  then
I went to bed at 7am (EU time).

Why did it shutdown so much later ?

I have never restarted postgres before, so this was all new to me.  I
apologize that I wasn't born innately with such knowledge.

So is it normal for postgres to report that it failed to shut down, operate
for an hour and then go ahead and restart itself ?

3: No monitoring software to tell you when the site is down


of course I have monitoring software.  both external and internal.  but it
doesn't come and kick me out of bed.  yes, I need an automated cel phone
call.  that was the first thing I saw to afterwards.


4: I'm gonna just go ahead and guess no backups were taken either, or
 are regularly taken.


WTF ?   of course I have backups.  I just went through a very harsh down
period event.  I fail to see why it is now necessary for you to launch such
an attack on me.

Perhaps the tone of my post sounded like I was blaming you, or at least you
felt that way.  Why do you feel that way ?

Why not respond with:  ouch !  did you check this ... that  say
something nice and helpful.  correct my mistakes




 This website can't be very important, if that's the way you treat it.


just to let you know, that is straight up offensive

This is high traffic real estate site.  Downtime is unacceptable.  I had
less downtime than this when I migrated to the new platform.

I spent rather a large amount of time reading and questioning here.  I asked
many questions for clarification and didn't do ANYTHING until I was sure it
was the correct solution.  I didn't just pull some shit off a blog and start
changing settings at random.

I double checked opinions against different people and I searched for more
docs on that param.  Amazingly none of the ones I found commented on the
shared memory issue and I didn't even understand the docs discussing shared
memory because it didn't seem to apply to what I was doing.  that's my
misunderstanding.  I come her to share my misunderstanding.




 And my phone starts complaining a minute after the site stops
 responding if something does go wrong the rest of the time.  Do not
 lay this at anyone else's feet.


I didn't.  There is not even the slightest hint of that in my post.

I came here and posted the details of where I went wrong and what confused
me about the documentation that I followed.  That's so other people can
follow it and so somebody here can comment on it.



  changing that default
  is brutally difficult and can only really be done by adjusting something
 in
  the kernel.

 Please, that's a gross exaggeration.  The sum totoal to changing them is:

 run sysctl -a|grep shm
 copy out proper lines to cahnge
 edit sysctl.conf
 put new lines in there with changes
 sudo sysctl -p  # applies changes
 edit the appropriate postgresql.conf, make changes
 sudo /etc/init.d/postgresql-8.3 stop
 sudo /etc/init.d/postgresql-8.3 start


Considering how splendidly the experiment with changing fsm_max_pages went,
I think you can understand that I have no desire to experiment with kernel
settings.

It is easy for you because you ALREADY KNOW everything involved.  I am not a
sysadmin and we don't have one.  My apologies for that.

so does the above mean that I don't have to restart the entire server, just
postgres ?  I assumed that changing kernel settings means rebooting the
server.



 I have clustered that table, its still unbelievably slow.

 Did you actually delete the old entries before clustering it?  if it's
 still got 4G of old sessions or whatever in it, clustering ain't gonna
 help.


its a different table.  the problem one has only 300k rows

the problem is not the size, the problem is the speed is catastrophic



 I still don't know if this bloat due to the small free space map has
  anything to do with why the table is performing like this.

 Since you haven't show us what changes, if any, have happened to the
 table, neither do we :)


sorry, it didn't seem to be the most important topic when I got out of bed

Re: [PERFORM] Really really slow select count(*)

2011-02-06 Thread felix

yeah, it already uses memcached with db save.  nothing important in session
anyway

the session table is not the issue

and I never clustered that one or ever will

thanks for the tip, also the other one about HOT


On Sun, Feb 6, 2011 at 8:19 PM, Pierre C li...@peufeu.com wrote:


  I have clustered that table, its still unbelievably slow.


 Did you actually delete the old entries before clustering it?  if it's
 still got 4G of old sessions or whatever in it, clustering ain't gonna
 help.


 Also, IMHO it is a lot better to store sessions in something like
 memcached, rather than imposing this rather large load on the main
 database...

 PS : if your site has been down for 6 hours, you can TRUNCATE your sessions
 table...

Re: [PERFORM] Really really slow select count(*)

2011-02-06 Thread Craig Ringer

On 07/02/11 09:52, felix wrote:

 So is it normal for postgres to report that it failed to shut down,
 operate for an hour and then go ahead and restart itself ?

That's pretty wacky. Did you shut it down via  pg_ctl or using an init
script / service command in your OS?

It shouldn't matter, but it'd be good to know. If the problem is with an
init script, then knowing which OS and version you're on would help. If
it was with psql directly, that's something that can be looked into.

 this was changing one configuration parameter. something I was advised
 to do, read about quite a bit, tested on my development server (mac) and
 then proceeded to do at 6 am on Sunday morning, our slowest time.

System V shared memory is awful - but it's really the only reasonable
alternative for a multi-process (rather than multi-threaded) server.

PostgreSQL could use mmap()ed temp files, but that'd add additional
overheads and they'd potentially get flushed from main memory unless the
memory was mlock()ed. As mlock() has similar limits and configuration
methods to system V shared memory, you get back to the same problem in a
slightly different form.

What would possibly help would be if Pg could fall back to lower
shared_buffers automatically, screaming about it in the logs but still
launching. OTOH, many people don't check the logs, so they'd think their
new setting had taken effect and it hadn't - you've traded one usability
problem for another. Even if Pg issued WARNING messages to each client
that connected, lots of (non-psql) clients don't display them, so many
users would never know.

Do you have a suggestion about how to do this better? The current
approach is known to be rather unlovely, but nobody's come up with a
better one that works reasonably and doesn't trample on other System V
shared memory users that may exist on the system.

 so does the above mean that I don't have to restart the entire server,
 just postgres ?  I assumed that changing kernel settings means rebooting
 the server.

Nope. sysctl settings like shmmax may be changed on the fly.

-- 
System  Network Administrator
POST Newspapers

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

On Sun, Feb 6, 2011 at 6:52 PM, felix crucialfe...@gmail.com wrote:
 On Sun, Feb 6, 2011 at 4:23 PM, Scott Marlowe scott.marl...@gmail.com
 wrote:

 Let's review:

 1: No test or staging system used before production

 no, I do not have a full ubuntu machine replicating the exact memory and
 application load of the production server.
 this was changing one configuration parameter. something I was advised to
 do, read about quite a bit, tested on my development server (mac) and then
 proceeded to do at 6 am on Sunday morning, our slowest time.

I would strongly suggest you at least test these changes out
elsewhere.  It doesn't have to exactly match, but if you had a machine
that was even close to test on you'd have known what to expect.
Virtual machines are dirt simple to set up now.  So not having one
inexcusable.

 2: DB left in an unknown state (trying to shut down, not able)

 what ?

You told it to restart, which is a stop and a start.  It didn't stop.
It was in an unknown state.  With settings in its config file you
didn't know whether or not they worked because you hadn't tested them
already on somthing similar.

 Why did it shutdown so much later ?

Because that's when the last open connection from before when you told
it to shutdown / restart.

 I have never restarted postgres before, so this was all new to me.

Which is why you use a virtual machine to build a test lab so you CAN
make these changes somewhere other than produciton.

  I apologize that I wasn't born innately with such knowledge.

Guess what!?  Neither was I!  I do however know how to setup a test
system so I don't test things on my production machine.

 So is it normal for postgres to report that it failed to shut down, operate
 for an hour and then go ahead and restart itself ?

Yes.  It eventually finished your restart you told it to do.

 3: No monitoring software to tell you when the site is down

 of course I have monitoring software.  both external and internal.  but it
 doesn't come and kick me out of bed.  yes, I need an automated cel phone
 call.  that was the first thing I saw to afterwards.

Monitoring software that can't send you emails when things break is in
need of having that feature enabled.


 4: I'm gonna just go ahead and guess no backups were taken either, or
 are regularly taken.

 WTF ?   of course I have backups.  I just went through a very harsh down
 period event.  I fail to see why it is now necessary for you to launch such
 an attack on me.

No, it just seemed like your admin skills were pretty sloppy, so a
lack of a backup wouldn't surprise me.

 Perhaps the tone of my post sounded like I was blaming you, or at least you
 felt that way.

It felt more like you were blaming PostgreSQL for being overly
complex, but I wasn't taking it all that personally.

  Why do you feel that way ?

I don't.

 Why not respond with:  ouch !  did you check this ... that  say
 something nice and helpful.  correct my mistakes

I'd be glad to, but your message wasn't looking for help.  go back and
read it.  It's one long complaint.

 This website can't be very important, if that's the way you treat it.

 just to let you know, that is straight up offensive

Really?  I'd say performing maintenance with no plan or pre-testing is
far more offensive.

 This is high traffic real estate site.  Downtime is unacceptable.  I had
 less downtime than this when I migrated to the new platform.

I expect you did more planning an testing?

 I spent rather a large amount of time reading and questioning here.  I asked
 many questions for clarification and didn't do ANYTHING until I was sure it
 was the correct solution.  I didn't just pull some shit off a blog and start
 changing settings at random.

But yet you failed to test it on even the simplest similar system
setup.  And so you lacked the practical knowledge of how to make this
change in production safely.

 I double checked opinions against different people and I searched for more
 docs on that param.  Amazingly none of the ones I found commented on the
 shared memory issue and I didn't even understand the docs discussing shared
 memory because it didn't seem to apply to what I was doing.  that's my
 misunderstanding.  I come her to share my misunderstanding.

Well, that's useful.  And I can see where there could be some changes
made to the docs or a user friendly howto on how to increase shared
memory and fsm and all that.

 Please, that's a gross exaggeration.  The sum totoal to changing them is:

 run sysctl -a|grep shm
 copy out proper lines to cahnge
 edit sysctl.conf
 put new lines in there with changes
 sudo sysctl -p  # applies changes
 edit the appropriate postgresql.conf, make changes
 sudo /etc/init.d/postgresql-8.3 stop
 sudo /etc/init.d/postgresql-8.3 start

 Considering how splendidly the experiment with changing fsm_max_pages went,
 I think you can understand that I have no desire to experiment with kernel
 settings.

Experimenting is what you do on a test machine,

Re: [PERFORM] Really really slow select count(*)

2011-02-06 Thread Shaun Thomas

You really got screwed by the default settings. You don’t actually need to 
“hack” the kernel, but you do have to make these changes, because the amount of 
memory PG has on your system is laughable. That might actually be the majority 
of your problem.

In your /etc/sysctl.conf, you need these lines:

kernel.shmmax = 68719476736
kernel.shmall = 4294967296

Then you need to run

sysctl -p

These changes can only be made as root, by the way. That will give you more 
than enough shared memory to restart PG. But it also tells me you’re using the 
default memory settings. If you have more than 4GB on that system, you need to 
set shared_buffers to 1G or so. In addition, you need to bump your 
effective_cache_size to something representing the remaining inode cache in 
your system. Run ‘free’ to see that.

You also need to know something about unix systems. If you’re running an ubuntu 
system, your control files are in /etc/init.d, and you can invoke them with:

service pg_cluster restart

or the more ghetto:

/etc/init.d/pg_cluster restart

It may also be named postgres, postgresql, or some other variant.

The problem you’ll run into with this is that PG tries to play nice, so it’ll 
wait for all connections to disconnect before it shuts down to restart. That 
means, of course, you need to do a fast shutdown, which forces all connections 
to disconnect, but the service control script won’t do that. So you’re left 
with the pg_ctl command again.

pg_ctl –D /my/pg/dir –m fast

And yeah, your checkpoint segments probably are too low. Based on your session 
table, you should probably have that at 25 or higher.

But that’s part of the point. I highly recommend you scan around Google for 
pages on optimizing PostgreSQL installs. These are pretty much covered in all 
of them. Fixing the shmall and shmax kernel settings are also pretty well known 
in database circles, because they really are set to ridiculously low defaults 
for any machine that may eventually be a server of anything. I was surprised it 
blocked the memory request for the max_fsm_pages setting, but that just proves 
your system was unoptimized in several different ways that may have been 
slowing down your count(*) statements, among other things.

Please, for your own sanity and the safety of your systems, look this stuff up 
to the point you can do most of it without looking. You can clearly do well, 
because you picked your way through the manuals to know about the kernel 
settings, and that you could call pg_ctl, and so on.


__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

Re: [PERFORM] Really really slow select count(*)

2011-02-06 Thread Shaun Thomas

 I checked the site, everything was normal.  I went in via psql and tried some
 queries for about half an hour and continued to monitor the site.  then I went
 to bed at 7am (EU time).

 Why did it shutdown so much later ?

That’s one of the things I talked about. To be safe, PG will start to shut down 
but disallow new connections, and *that’s all*. Old connections are 
grandfathered in until they disconnect, and when they all go away, it shuts 
down gracefully.

pg_ctl –D /my/pg/dir stop –m fast
pg_ctl –D /my/pg/dir start

Is what you wanted.

 I have never restarted postgres before, so this was all new to me.  I 
 apologize
 that I wasn't born innately with such knowledge.

Forget about it. But you need to learn your tools. Restarting the DB server is 
something you’ll need to do occasionally. Just like restarting your Django 
proxy or app. You need to be fully knowledgeable about every part of your 
tool-chain, or at least the parts you’re responsible for.

 I double checked opinions against different people and I searched for more 
 docs
 on that param.  Amazingly none of the ones I found commented on the shared
 memory issue and I didn't even understand the docs discussing shared memory
 because it didn't seem to apply to what I was doing.

That’s no coincidence. I’ve seen that complaint if you increase shared_buffers, 
but not for max_fsm_pages. I guess I’m so used to bumping up shmmax and shmall 
that I forget how low default systems leave those values. But you do need to 
increase them. Every time. They’re crippling your install in more ways than 
just postgres.

So far as your Django install, have you activated the memcache contrib. module? 
Your pages should be lazy-caching and rarely depend on the DB, if they can. You 
should also rarely be doing count(*) on a 300k row table, even if everything is 
cached and speedy. 300k row tables have nasty habits of becoming 3M row tables 
(or more) after enough time, and no amount of cache will save you from counting 
that. It’ll take 1 second or more every time eventually, and then you’ll be in 
real trouble. That’s an application design issue you need to address before 
it’s too late, or you have to rush and implement a hasty fix.

I suggest setting your log_min_duration to 1000, so every query that takes 
longer than 1 second to execute is logged in your postgres logs. You can use 
that to track down trouble spots before they get really bad. That’s normally 
aggressive enough to catch the real problem queries without flooding your logs 
with too much output.

Being a DBA sucks sometimes. ☺


__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

Re: [PERFORM] Really really slow select count(*)

2011-02-06 Thread Greg Smith


felix wrote:
So is it normal for postgres to report that it failed to shut down, 
operate for an hour and then go ahead and restart itself ?


You've already gotten a few explanations for why waiting for connections 
can cause this.  I'll only add that it is critical to be watching the 
database log file when doing work like this with PostgreSQL.  Go back 
and check it if you still have the data from when your problematic 
restart attempt happened, normally you'll get some warnings about it 
starting to shutdown.  Try to look for the actual server shutdown 
message and then the restart one after doing this sort of thing.  If you 
don't see them when you do this again, you'll know something unexpected 
is happening, and then to look into what that is.


Also, as a general downtime commentary born from years of being the 
receiving end of outages, I'd recommend against ever doing any server 
maintenance operation for the first time just before bedtime.  While 
that may be convienent from a less users are using the site 
perspective, the downside is what you've seen here:  mistakes can mean 
rather extended outages.  Better to get up early and do this sort of 
thing instead, so you can watch the site afterwards for a few hours to 
make sure nothing is broken.  For similar reasons I try to avoid ever 
doing major changes on a Friday.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

2011-02-05 Thread Pierre C


On Fri, 04 Feb 2011 21:37:56 +0100, Shaun Thomas stho...@peak6.com wrote:


On 02/04/2011 02:14 PM, felix wrote:


oh and there in the footnotes to django they say dont' forget to run
the delete expired sessions management every once in a while.
thanks guys.


Oh Django... :)


it won't run now because its too big, I can delete them from psql though


You might be better off deleting the inverse. You know, start a  
transaction, select all the sessions that *aren't* expired, truncate the  
table, insert them back into the session table, and commit.


Note that for a session table, that is updated very often, you can use the  
postgres' HOT feature which will create a lot less dead rows. Look it up  
in the docs.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Really really slow select count(*)

I am having huge performance problems with a table. Performance deteriorates
every day and I have to run REINDEX and ANALYZE on it every day.  auto
vacuum is on.  yes, I am reading the other thread about count(*) :)

but obviously I'm doing something wrong here


explain analyze select count(*) from fastadder_fastadderstatus;

Aggregate  (cost=62458.73..62458.74 rows=1 width=0) (actual
time=77130.000..77130.000 rows=1 loops=1)
   -  Seq Scan on fastadder_fastadderstatus  (cost=0.00..61701.18
rows=303018 width=0) (actual time=50.000..76930.000 rows=302479 loops=1)
 Total runtime: *77250.000 ms*

directly after REINDEX and ANALYZE:

 Aggregate  (cost=62348.70..62348.71 rows=1 width=0) (actual
time=15830.000..15830.000 rows=1 loops=1)
   -  Seq Scan on fastadder_fastadderstatus  (cost=0.00..61613.16
rows=294216 width=0) (actual time=30.000..15570.000 rows=302479 loops=1)
 Total runtime: 15830.000 ms

still very bad for a 300k row table

a similar table:

explain analyze select count(*) from fastadder_fastadderstatuslog;

 Aggregate  (cost=8332.53..8332.54 rows=1 width=0) (actual
time=1270.000..1270.000 rows=1 loops=1)
   -  Seq Scan on fastadder_fastadderstatuslog  (cost=0.00..7389.02
rows=377402 width=0) (actual time=0.000..910.000 rows=377033 loops=1)
 Total runtime: 1270.000 ms


It gets updated quite a bit each day, and this is perhaps the problem.
To me it doesn't seem like that many updates

100-500 rows inserted per day
no deletes

10k-50k updates per day
mostly of this sort:   set priority=1 where id=12345

is it perhaps this that is causing the performance problem ?

I could rework the app to be more efficient and do updates using batches
where id IN (1,2,3,4...)

I assume that means a more efficient index update compared to individual
updates.

There is one routine that updates position_in_queue using a lot (too many)
update statements.
Is that likely to be the culprit ?

*What else can I do to investigate ?*


   Table
public.fastadder_fastadderstatus
  Column   |   Type   |
  Modifiers
---+--+
 id| integer  | not null default
nextval('fastadder_fastadderstatus_id_seq'::regclass)
 apt_id| integer  | not null
 service_id| integer  | not null
 agent_priority| integer  | not null
 priority  | integer  | not null
 last_validated| timestamp with time zone |
 last_sent | timestamp with time zone |
 last_checked  | timestamp with time zone |
 last_modified | timestamp with time zone | not null
 running_status| integer  |
 validation_status | integer  |
 position_in_queue | integer  |
 sent  | boolean  | not null default false
 built | boolean  | not null default false
 webid_suffix  | integer  |
 build_cache   | text |
Indexes:
fastadder_fastadderstatus_pkey PRIMARY KEY, btree (id)
fastadder_fastadderstatus_apt_id_key UNIQUE, btree (apt_id,
service_id)
fastadder_fastadderstatus_agent_priority btree (agent_priority)
fastadder_fastadderstatus_apt_id btree (apt_id)
fastadder_fastadderstatus_built btree (built)
fastadder_fastadderstatus_last_checked btree (last_checked)
fastadder_fastadderstatus_last_validated btree (last_validated)
fastadder_fastadderstatus_position_in_queue btree (position_in_queue)
fastadder_fastadderstatus_priority btree (priority)
fastadder_fastadderstatus_running_status btree (running_status)
fastadder_fastadderstatus_service_id btree (service_id)
Foreign-key constraints:
fastadder_fastadderstatus_apt_id_fkey FOREIGN KEY (apt_id) REFERENCES
nsproperties_apt(id) DEFERRABLE INITIALLY DEFERRED
fastadder_fastadderstatus_service_id_fkey FOREIGN KEY (service_id)
REFERENCES fastadder_fastadderservice(id) DEFERRABLE INITIALLY DEFERRED


thanks !

Re: [PERFORM] Really really slow select count(*)

2011-02-04 Thread hubert depesz lubaczewski

On Fri, Feb 04, 2011 at 03:46:35PM +0100, felix wrote:
 directly after REINDEX and ANALYZE:
 
  Aggregate  (cost=62348.70..62348.71 rows=1 width=0) (actual
 time=15830.000..15830.000 rows=1 loops=1)
-  Seq Scan on fastadder_fastadderstatus  (cost=0.00..61613.16
 rows=294216 width=0) (actual time=30.000..15570.000 rows=302479 loops=1)
  Total runtime: 15830.000 ms

do run vacuum of the table. reindex doesn't matter for seq scans, and
analyze, while can help choose different plan - will not help here
anyway.

Best regards,

depesz


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

2011-02-04 Thread Greg Smith


felix wrote:

explain analyze select count(*) from fastadder_fastadderstatus;

Aggregate  (cost=62458.73..62458.74 rows=1 width=0) (actual 
time=77130.000..77130.000 rows=1 loops=1)
   -  Seq Scan on fastadder_fastadderstatus  (cost=0.00..61701.18 
rows=303018 width=0) (actual time=50.000..76930.000 rows=302479 loops=1)

 Total runtime: *77250.000 ms*



PostgreSQL version?  If you're running on 8.3 or earlier, I would be 
suspicous that your Free Space Map has been overrun.


What you are seeing is that the table itself is much larger on disk than 
it's supposed to be.  That can be caused by frequent UPDATEs if you 
don't have vacuum cleanup working effectively, you'll get lots of dead 
sections left behind from UPDATEs in the middle.  The best way to fix 
all this is to run CLUSTER on the table.  That will introduce a bit of 
downtime while it holds a lock on the table (only a few minutes based on 
what you've shown here), but the copy you'll have afterwards won't be 
spread all over disk anymore.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books

Re: [PERFORM] Really really slow select count(*)


On 02/04/2011 08:46 AM, felix wrote:


explain analyze select count(*) from fastadder_fastadderstatus;

Aggregate  (cost=62458.73..62458.74 rows=1 width=0) (actual
time=77130.000..77130.000 rows=1 loops=1)
-  Seq Scan on fastadder_fastadderstatus  (cost=0.00..61701.18
rows=303018 width=0) (actual time=50.000..76930.000 rows=302479 loops=1)
  Total runtime: *77250.000 ms*


How big is this table when it's acting all bloated and ugly?

SELECT relpages*8/1024 FROM pg_class
 WHERE relname='fastadder_fastadderstatus';

That's the number of MB it's taking up that would immediately affect a 
count statement.



directly after REINDEX and ANALYZE:

  Aggregate  (cost=62348.70..62348.71 rows=1 width=0) (actual
time=15830.000..15830.000 rows=1 loops=1)
-  Seq Scan on fastadder_fastadderstatus  (cost=0.00..61613.16
rows=294216 width=0) (actual time=30.000..15570.000 rows=302479 loops=1)
  Total runtime: 15830.000 ms


That probably put it into cache, explaining the difference, but yeah... 
that is pretty darn slow. Is this the only thing running when you're 
doing your tests? What does your disk IO look like?



10k-50k updates per day
mostly of this sort:   set priority=1 where id=12345


Well... that's up to 16% turnover per day, but even then, regular 
vacuuming should keep it manageable.



I could rework the app to be more efficient and do updates using batches
where id IN (1,2,3,4...)


No. Don't do that. You'd be better off loading everything into a temp 
table and doing this:


UPDATE fastadder_fastadderstatus s
   SET priority = 1
  FROM temp_statuses t
 WHERE t.id=s.id;

It's a better practice, but still doesn't really explain your 
performance issues.



fastadder_fastadderstatus_pkey PRIMARY KEY, btree (id)
fastadder_fastadderstatus_apt_id_key UNIQUE, btree (apt_id, service_id)
fastadder_fastadderstatus_agent_priority btree (agent_priority)
fastadder_fastadderstatus_apt_id btree (apt_id)
fastadder_fastadderstatus_built btree (built)
fastadder_fastadderstatus_last_checked btree (last_checked)
fastadder_fastadderstatus_last_validated btree (last_validated)
fastadder_fastadderstatus_position_in_queue btree (position_in_queue)
fastadder_fastadderstatus_priority btree (priority)
fastadder_fastadderstatus_running_status btree (running_status)
fastadder_fastadderstatus_service_id btree (service_id)


Whoh! Hold on, here. That looks like *way* too many indexes. Definitely 
will slow down your insert/update performance. The index on 'built' for 
example, is a boolean. If it's evenly distributed, that's 150k matches 
for true or false, rendering it useless, yet still requiring space and 
maintenance. I'm guessing the story is similar for quite a few of the 
others.


It doesn't really explain your count speed, but it certainly isn't helping.

Something seems fishy, here.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)


On 02/04/2011 08:56 AM, Greg Smith wrote:


PostgreSQL version?  If you're running on 8.3 or earlier, I would be
suspicous that your Free Space Map has been overrun.


That's my first inclination. If he says autovacuum is running, there's 
no way it should be bloating the table that much.


Felix, If you're running a version before 8.4, what is your 
max_fsm_pages setting? If it's too low, autovacuum won't save you, and 
your tables will continue to grow daily unless you vacuum full 
regularly, and I wouldn't recommend that to my worst enemy. ;)


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Fwd: [PERFORM] Really really slow select count(*)

sorry, reply was meant to go to the list.

-- Forwarded message --
From: felix crucialfe...@gmail.com
Date: Fri, Feb 4, 2011 at 5:17 PM
Subject: Re: [PERFORM] Really really slow select count(*)
To: stho...@peak6.com

On Fri, Feb 4, 2011 at 4:00 PM, Shaun Thomas stho...@peak6.com wrote:

 How big is this table when it's acting all bloated and ugly?

458MB

 Is this the only thing running when you're doing your tests? What does your
 disk IO look like?

this is on a live site.  best not to scare the animals.

I have the same config on the dev environment but not the same table size.

  10k-50k updates per day
 mostly of this sort:   set priority=1 where id=12345

 Well... that's up to 16% turnover per day, but even then, regular vacuuming
 should keep it manageable.

something is definitely amiss with this table.

I'm not sure if its something that happened at one point when killing an
task that was writing to it or if its something about the way the app is
updating.  it SHOULDN'T be that much of a problem, though I can find ways to
improve it.

No. Don't do that. You'd be better off loading everything into a temp table
 and doing this:

 UPDATE fastadder_fastadderstatus s
   SET priority = 1
  FROM temp_statuses t
  WHERE t.id=s.id;

ok, that is one the solutions I was thinking about.

are updates of the where id IN (1,2,3,4) generally not efficient ?
how about for select queries ?

 fastadder_fastadderstatus_pkey PRIMARY KEY, btree (id)
 fastadder_fastadderstatus_apt_id_key UNIQUE, btree (apt_id, service_id)
 fastadder_fastadderstatus_agent_priority btree (agent_priority)
 fastadder_fastadderstatus_apt_id btree (apt_id)
 fastadder_fastadderstatus_built btree (built)
 fastadder_fastadderstatus_last_checked btree (last_checked)
 fastadder_fastadderstatus_last_validated btree (last_validated)
 fastadder_fastadderstatus_position_in_queue btree (position_in_queue)
 fastadder_fastadderstatus_priority btree (priority)
 fastadder_fastadderstatus_running_status btree (running_status)
 fastadder_fastadderstatus_service_id btree (service_id)

 Whoh! Hold on, here. That looks like *way* too many indexes.

I actually just added most of those yesterday in an attempt to improve
performance. priority and agent_priority were missing indexes and that was a
big mistake.

overall performance went way up on my primary selects

 Definitely will slow down your insert/update performance.

there are a lot more selects happening throughout the day

 The index on 'built' for example, is a boolean. If it's evenly distributed,
 that's 150k matches for true or false,

ok,

built True is in the minority.

here is the test query that caused me to add indices to the booleans.  this
is a 30k table which is doing selects on two booleans constantly.  again:
True is the minority

explain analyze SELECT nsproperties_apt.id,
nsproperties_apt.display_address, nsproperties_apt.apt_num,
nsproperties_apt.bldg_id, nsproperties_apt.is_rental,
nsproperties_apt.is_furnished, nsproperties_apt.listing_type,
nsproperties_apt.list_on_web, nsproperties_apt.is_approved,
nsproperties_apt.status, nsproperties_apt.headline,
nsproperties_apt.slug, nsproperties_apt.cross_street,
nsproperties_apt.show_apt_num, nsproperties_apt.show_building_name,
nsproperties_apt.external_url, nsproperties_apt.listed_on,
nsproperties_bldg.id, nsproperties_bldg.name FROM nsproperties_apt
LEFT OUTER JOIN nsproperties_bldg ON (nsproperties_apt.bldg_id =
nsproperties_bldg.id) WHERE (nsproperties_apt.list_on_web = True AND
nsproperties_apt.is_available = True ) ;
   QUERY PLAN

 Hash Left Join  (cost=408.74..10062.18 rows=3344 width=152) (actual
time=12.688..2442.542 rows=2640 loops=1)
   Hash Cond: (nsproperties_apt.bldg_id = nsproperties_bldg.id)
   -  Seq Scan on nsproperties_apt  (cost=0.00..9602.52 rows=3344
width=139) (actual time=0.025..2411.644 rows=2640 loops=1)
 Filter: (list_on_web AND is_available)
   -  Hash  (cost=346.66..346.66 rows=4966 width=13) (actual
time=12.646..12.646 rows=4966 loops=1)
 -  Seq Scan on nsproperties_bldg  (cost=0.00..346.66 rows=4966
width=13) (actual time=0.036..8.236 rows=4966 loops=1)
 Total runtime: 2444.067 ms
(7 rows)

=

 Hash Left Join  (cost=1232.45..9784.18 rows=5690 width=173) (actual
time=30.000..100.000 rows=5076 loops=1)
   Hash Cond: (nsproperties_apt.bldg_id = nsproperties_bldg.id)
   -  Bitmap Heap Scan on nsproperties_apt  (cost=618.23..9075.84 rows=5690
width=157) (actual time=10.000..60.000 rows=5076 loops=1)
 Filter: (list_on_web AND is_available)
 -  BitmapAnd  (cost=618.23..618.23 rows=5690 width=0) (actual
time=10.000..10.000 rows=0 loops=1)
   -  Bitmap Index Scan on nsproperties_apt_is_available
 (cost=0.00..131.81 rows=6874 width=0) (actual time=0.000..0.000 rows

[PERFORM] Really really slow select count(*)

reply was meant for the list

-- Forwarded message --
From: felix crucialfe...@gmail.com
Date: Fri, Feb 4, 2011 at 4:39 PM
Subject: Re: [PERFORM] Really really slow select count(*)
To: Greg Smith g...@2ndquadrant.com

On Fri, Feb 4, 2011 at 3:56 PM, Greg Smith g...@2ndquadrant.com wrote:

 PostgreSQL version?  If you're running on 8.3 or earlier, I would be
 suspicous that your Free Space Map has been overrun.

8.3

 What you are seeing is that the table itself is much larger on disk than
 it's supposed to be.

which part of the explain told you that ?

 shaun thomas

SELECT relpages*8/1024 FROM pg_class
 WHERE relname='fastadder_fastadderstatus';

458MB

way too big. build_cache is text between 500-1k chars

  That can be caused by frequent UPDATEs if you don't have vacuum cleanup
 working effectively, you'll get lots of dead sections left behind from
 UPDATEs in the middle.

ok, I just vacuumed it (did this manually a few times as well). and auto is
on.

still:
32840.000ms
and still 458MB

 The best way to fix all this is to run CLUSTER on the table.

http://www.postgresonline.com/journal/archives/10-How-does-CLUSTER-ON-improve-index-performance.html

now that would order the data on disk by id (primary key)
the usage of the table is either by a query or by position_in_queue which is
rewritten often (I might change this part of the app and pull it out of this
table)

is this definitely the best way to fix this ?

thanks for your help !

That will introduce a bit of downtime while it holds a lock on the table
 (only a few minutes based on what you've shown here), but the copy you'll
 have afterwards won't be spread all over disk anymore.

 --
 Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
 PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
 PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books

Re: [PERFORM] Really really slow select count(*)

2011-02-04 Thread Kenneth Marshall

On Fri, Feb 04, 2011 at 05:20:27PM +0100, felix wrote:
 reply was meant for the list

 -- Forwarded message --
 From: felix crucialfe...@gmail.com
 Date: Fri, Feb 4, 2011 at 4:39 PM
 Subject: Re: [PERFORM] Really really slow select count(*)
 To: Greg Smith g...@2ndquadrant.com

 On Fri, Feb 4, 2011 at 3:56 PM, Greg Smith g...@2ndquadrant.com wrote:

  PostgreSQL version?  If you're running on 8.3 or earlier, I would be
  suspicous that your Free Space Map has been overrun.

 8.3

  What you are seeing is that the table itself is much larger on disk than
  it's supposed to be.

 which part of the explain told you that ?

  shaun thomas

 SELECT relpages*8/1024 FROM pg_class
  WHERE relname='fastadder_fastadderstatus';

 458MB

 way too big. build_cache is text between 500-1k chars

As has been suggested, you really need to CLUSTER the table
to remove dead rows. VACUUM will not do that, VACUUM FULL will
but will take a full table lock and then you would need to
REINDEX to fix index bloat. CLUSTER will do this in one shot.
You almost certainly have your free space map way too small,
which is how you bloated in the first place.

Cheers,
Ken

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)


On 02/04/2011 10:17 AM, felix wrote:


 How big is this table when it's acting all bloated and ugly?

458MB


Wow! There's no way a table with 300k records should be that big unless 
it's just full of text. 70-seconds seems like a really long time to read 
half a gig, but that might be because it's fighting for IO with other 
processes.


For perspective, we have several 1-2 million row tables smaller than 
that. Heck, I have a 11-million row table that's only 30% larger.



are updates of the where id IN (1,2,3,4) generally not efficient ?
how about for select queries ?


Well, IN is notorious for being inefficient. It's been getting better, 
but even EXISTS is a better bet than using IN. We've got a lot of stuff 
using IN here, and we're slowly phasing it out. Every time I get rid of 
it, things get faster.



I actually just added most of those yesterday in an attempt to improve
performance. priority and agent_priority were missing indexes and that
was a big mistake.


Haha. Well, that can always be true. Ironically one of the things you 
actually did by creating the indexes is create fast lookup values to 
circumvent your table bloat. It would help with anything except sequence 
scans, which you saw with your count query.



ok,
built True is in the minority.


Ok, in that case, use a partial index. If a boolean value is only 1% of 
your table or something, why bother indexing the rest anyway?


CREATE INDEX fastadder_fastadderstatus_built
 ON fastadder_fastadderstatus
  WHERE built;

But only if it really is the vast minority. Check this way:

SELECT built, count(1)
   FROM fastadder_fastadderstatus
  GROUP BY 1;

We used one of these to ignore a status that was over 90% of the table, 
where the other statuses combined were less than 10%. The index was 10x 
smaller and much faster than before.


If you know both booleans are used together often, you can combine them 
into a single index, again using a partial where it only indexes if both 
values are true. Much smaller, much faster index if it's more selective 
than the other indexes.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)


On 02/04/2011 10:03 AM, felix wrote:


  max_fsm_pages   | 153600 | Sets the
maximum number of disk pages for which free space is tracked.
  max_fsm_relations   | 1000   | Sets the
maximum number of tables and indexes for which free space is tracked.

how do I determine the best size or if that's the problem ?


Well, the best way is to run:

vacuumdb -a -v -z vacuum.log

And at the end of the log, it'll tell you how many pages it wants, and 
how many pages were available.


 From the sounds of your database, 150k is way too small. If a single 
table is getting 10-50k updates per day, it's a good chance a ton of 
other tables are getting similar traffic. With max_fsm_pages at that 
setting, any update beyond 150k effectively gets forgotten, and 
forgotten rows aren't reused by new inserts or updates.


Your database has probably been slowly expanding for months without you 
realizing it. The tables that get the most turnover will be hit the 
hardest, as it sounds like what happened here.


You can stop the bloating by setting the right max_fsm_pages setting, 
but you'll either have to go through and VACUUM FULL every table in your 
database, or dump/restore to regain all the lost space and performance 
(the later would actually be faster). Before I even touch an older 
PostgreSQL DB, I set it to some value over 3-million just as a starting 
value to be on the safe side. A little used memory is a small price to 
pay for stopping gradual expansion.


Your reindex was a good idea. Indexes do sometimes need that. But your 
base tables need work too. Unless you're on 8.4 or above, auto_vacuum 
isn't enough.


Just to share an anecdote, I was with a company about five years ago and 
they also used the default max_fsm_pages setting. Their DB had expanded 
to 40GB and was filling their disk, only a couple weeks before 
exhausting it. I set the max_fsm_pages setting to 2-million, set up a 
bunch of scripts to vacuum-full the tables from smallest to largest (to 
make enough space for the larger tables, you see) and the database ended 
up at less than 20GB.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

On Fri, Feb 4, 2011 at 5:35 PM, Shaun Thomas stho...@peak6.com wrote:



 vacuumdb -a -v -z vacuum.log

 And at the end of the log, it'll tell you how many pages it wants, and how
 many pages were available.


this is the dev, not live. but this is after it gets done with that table:

CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  analyzing public.fastadder_fastadderstatus
INFO:  fastadder_fastadderstatus: scanned 2492 of 2492 pages, containing
154378 live rows and 0 dead rows; 3 rows in sample, 154378 estimated
total rows

and there's nothing at the end of the whole vacuum output about pages

actual command:

vacuumdb -U postgres -W -v -z djns4  vacuum.log

I tried it with all databases too

?

thanks

Re: [PERFORM] Really really slow select count(*)

On Fri, Feb 4, 2011 at 10:38 AM, felix crucialfe...@gmail.com wrote:


 On Fri, Feb 4, 2011 at 5:35 PM, Shaun Thomas stho...@peak6.com wrote:


 vacuumdb -a -v -z vacuum.log

 And at the end of the log, it'll tell you how many pages it wants, and how
 many pages were available.

 this is the dev, not live. but this is after it gets done with that table:
 CPU 0.00s/0.00u sec elapsed 0.00 sec.
 INFO:  analyzing public.fastadder_fastadderstatus
 INFO:  fastadder_fastadderstatus: scanned 2492 of 2492 pages, containing
 154378 live rows and 0 dead rows; 3 rows in sample, 154378 estimated
 total rows
 and there's nothing at the end of the whole vacuum output about pages
 actual command:
 vacuumdb -U postgres -W -v -z djns4  vacuum.log
 I tried it with all databases too

I believe you have to run it on the whole db to get that output.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

vacuumdb -a -v -z -U postgres -W  vacuum.log

that's all, isn't it ?

it did each db

8.3 in case that matters

the very end:

There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  analyzing public.seo_partnerlinkcategory
INFO:  seo_partnerlinkcategory: scanned 0 of 0 pages, containing 0 live
rows and 0 dead rows; 0 rows in sample, 0 estimated total rows



On Fri, Feb 4, 2011 at 6:40 PM, Scott Marlowe scott.marl...@gmail.comwrote:


  I tried it with all databases too

 I believe you have to run it on the whole db to get that output.

Re: [PERFORM] Really really slow select count(*)


On 02/04/2011 11:38 AM, felix wrote:


CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  analyzing public.fastadder_fastadderstatus
INFO: fastadder_fastadderstatus: scanned 2492 of 2492 pages,
containing 154378 live rows and 0 dead rows; 3 rows in sample,
154378 estimated total rows

and there's nothing at the end of the whole vacuum output about pages


I'm not sure if it gives it to you if you pick a single DB, but if you 
use -a for all, you should get something at the very end like this:


INFO:  free space map contains 1365918 pages in 1507 relations
DETAIL:  A total of 1326656 page slots are in use (including overhead).
1326656 page slots are required to track all free space.
Current limits are:  300 page slots, 3500 relations, using 38784 kB.
VACUUM

That's on our dev system. Your dev table seems properly sized, but prod 
probably isn't. If you run an all-database vacuum after-hours, you'll 
see the stuff at the end. And if your 'page slots are required' is 
greater than your 'page slots are in use,' you've got a problem.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

vacuumdb -a -v -z -U postgres -W  vacuum.log
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
Password:
cruxnu:nsbuildout crucial$

do you think its possible that it just doesn't have anything to complain
about ?
or the password is affecting it ?

In any case I'm not sure I want to run this even at night on production.

what is the downside to estimating max_fsm_pages too high ?

300 should be safe
its certainly not 150k

I have one very large table (10m) that is being analyzed before I warehouse
it.
that could've been the monster that ate the free map.
I think today I've learned that even unused tables affect postgres
performance.


and do you agree that I should turn CLUSTER ON ?
I have no problem to stop all tasks to this table at night and just reload
it



On Fri, Feb 4, 2011 at 6:47 PM, Shaun Thomas stho...@peak6.com wrote:

 On 02/04/2011 11:44 AM, felix wrote:

  the very end:

 There were 0 unused item pointers.
 0 pages are entirely empty.
 CPU 0.00s/0.00u sec elapsed 0.00 sec.
 INFO:  analyzing public.seo_partnerlinkcategory
 INFO: seo_partnerlinkcategory: scanned 0 of 0 pages, containing 0 live
 rows and 0 dead rows; 0 rows in sample, 0 estimated total rows


 That looks to me like it didn't finish. Did you fork it off with '' or run
 it and wait until it gave control back to you?

 It really should be telling you how many pages it wanted, and are in use.
 If not, something odd is going on.


 --
 Shaun Thomas
 OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
 312-676-8870
 stho...@peak6.com

 __

 See  http://www.peak6.com/email_disclaimer.php
 for terms and conditions related to this email

Re: [PERFORM] Really really slow select count(*)

You can run vacuum verbose on just the postgres database and get the
global numbers at the end.  gotta be a superuser as well.

# \c postgres postgres
postgres=# vacuum verbose;
 lots deleted.
DETAIL:  A total of 7664 page slots are in use (including overhead).
7664 page slots are required to track all free space.
Current limits are:  1004800 page slots, 5000 relations, using 6426 kB.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)


On 02/04/2011 12:14 PM, felix wrote:


do you think its possible that it just doesn't have anything to
complain about ? or the password is affecting it ?


Why is it asking for the password over and over again? It shouldn't be 
doing that. And also, are you running this as a user with superuser 
privileges? You might want to think about setting up a .pgpass file, or 
setting up local trust for the postgres user so you can run maintenance 
without having to manually enter a password.



In any case I'm not sure I want to run this even at night on
production.


You should be. Even with auto vacuum turned on, all of our production 
systems get a nightly vacuum over the entire list of databases. It's non 
destructive, and about the only thing that happens is disk IO. If your 
app has times where it's not very busy, say 3am, it's a good time.


This is especially true since your free space map is behind.

We actually turn off autovacuum because we have a very transactionally 
intense DB, and if autovacuum launches on a table in the middle of the 
day, our IO totally obliterates performance. We only run a nightly 
vacuum over all the databases when very few users or scripts are using 
anything.



what is the downside to estimating max_fsm_pages too high ?


Nothing really. It uses more memory to track it, but on modern servers, 
it's not a concern. The only risk is that you don't know what the real 
setting should be, so you may not completely stop your bloating.



and do you agree that I should turn CLUSTER ON ?


Cluster isn't really something you turn on, but something you do. It's 
like vacuum full, in that it basically rebuilds the table and all 
indexes from scratch. The major issue you'll run into is that it 
reorders the table by the index you chose, so you'd best select the 
primary key unless you have reasons to use something else. And you have 
to do it table by table, which will really suck since we already know 
your whole db has bloated, not just one or two tables.


You're going to be doing some scripting, buddy. :) Well, unless you just 
do a dump/restore and start over with sane postgresql.conf settings.



I have no problem to stop all tasks to this table at night and just
reload it


That will work for this table. Just keep in mind all your tables have 
been suffering since you installed this database. Tables with the 
highest turnover were hit hardest, but they all have non-ideal sizes 
compared to what they would be if your maintenance was working.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

2011-02-04 Thread Greg Smith


felix wrote:

and do you agree that I should turn CLUSTER ON ?
I have no problem to stop all tasks to this table at night and just 
reload it


You don't turn it on; it's a one time operation that does a cleanup.  It 
is by far the easiest way to clean up the mess you have right now.  
Moving forward, if you have max_fsm_pages set to an appropriate number, 
you shouldn't end up back in this position again.  But VACUUM along 
won't get you out of there, and VACUUM FULL is always a worse way to 
clean this up than CLUSTER.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

On Fri, Feb 4, 2011 at 11:38 AM, Greg Smith g...@2ndquadrant.com wrote:
 You don't turn it on; it's a one time operation that does a cleanup.  It is
 by far the easiest way to clean up the mess you have right now.  Moving
 forward, if you have max_fsm_pages set to an appropriate number, you
 shouldn't end up back in this position again.  But VACUUM along won't get
 you out of there, and VACUUM FULL is always a worse way to clean this up
 than CLUSTER.

note that for large, randomly ordered tables, cluster can be pretty
slow, and you might want to do the old:

begin;
select * into temporaryholdingtable order by somefield;
truncate oldtable;
insert into oldtables select * from temporaryholdingtable;
commit;

for fastest performance.  I've had Cluster take hours to do that the
above does in 1/4th the time.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)


On 02/04/2011 01:01 PM, Scott Marlowe wrote:


begin;
select * into temporaryholdingtable order by somefield;
truncate oldtable;
insert into oldtables select * from temporaryholdingtable;
commit;


That's usually how I do it, except for larger tables, I also throw in a 
DROP INDEX for all the indexes on the table before the insert, and 
CREATE INDEX statements afterwards.


Which actually brings up a question I've been wondering to myself that I 
may submit to [HACKERS]: Can we add a a parallel option to the reindexdb 
command? We added one to pg_restore, so we already know it works.


I have a bunch of scripts that get all the indexes in the database and 
order them by size (so they're distributed evenly), round-robin them 
into separate REINDEX sql files, and launches them all in parallel 
depending on how many threads you want, but that's so hacky I feel dirty 
every time I use it.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

On Fri, Feb 4, 2011 at 7:34 PM, Shaun Thomas stho...@peak6.com wrote:

Why is it asking for the password over and over again? It shouldn't be
doing that.

because I asked it to: -W
on the production server I need to enter password and I'm testing on dev
first.

I just sudo tried it but still no report

and do you agree that I should turn CLUSTER ON ?

Cluster isn't really something you turn on, but something you do.

djns4=# cluster fastadder_fastadderstatus;
ERROR: there is no previously clustered index for table
fastadder_fastadderstatus

http://www.postgresonline.com/journal/archives/10-How-does-CLUSTER-ON-improve-index-performance.html

djns4=# alter table fastadder_fastadderstatus CLUSTER ON
fastadder_fastadderstatus_pkey; ALTER TABLE djns4=# CLUSTER
fastadder_fastadderstatus; CLUSTER

ok, that's why I figured I was turning something on. the table has been
altered.

it will be pk ordered, new entries always at the end and no deletes

but this means I have to manually run cluster from time to time, right ? not
that there will be much or any reordering. or it should be fine going
forward with vacuum and enlarging the free space memory map.

It's like vacuum full, in that it basically rebuilds the table and all
indexes from scratch. The major issue you'll run into is that it reorders
the table by the index you chose, so you'd best select the primary key
unless you have reasons to use something else. And you have to do it table
by table, which will really suck since we already know your whole db has
bloated, not just one or two tables.

do we know that ? many of the tables are fairly static.

only this one is seriously borked, and yet other related tables seem to be
fine.

You're going to be doing some scripting, buddy. :) Well, unless you just do
a dump/restore and start over with sane postgresql.conf settings.

well who knew the defaults were unsane ? :)

scripting this is trivial, I already have the script

I have made the mistake of doing VACUUM FULL in the past. in fact on this
table, and it had to be killed because it took down my entire website !
that may well be the major borking event. a credit to postgres that the
table still functions if that's the case.

scott marlowe:

begin;
select * into temporaryholdingtable order by somefield;
truncate oldtable;
insert into oldtables select * from temporaryholdingtable;
commit;

that sounds like a good approach.

gentlemen, 300,000 + thanks for your generous time !
(a small number, I know)

-felix

Re: [PERFORM] Really really slow select count(*)

On Fri, Feb 4, 2011 at 12:26 PM, felix crucialfe...@gmail.com wrote:
 I just sudo tried it but still no report

It's not about who you are in Unix / Linux, it's about who you are in
Postgresql.  \du will show you who is a superusr.  psql -U username
will let you connect as that user.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)


On 02/04/2011 01:26 PM, felix wrote:


because I asked it to: -W on the production server I need to enter
password and I'm testing on dev first.


Right. I'm just surprised it threw up the prompt so many times.


I just sudo tried it but still no report


Nono... you have to run the vacuum command with the -U for a superuser 
in the database. Like the postgres user.



but this means I have to manually run cluster from time to time, right ?
not that there will be much or any reordering.  or it should be fine
going forward with vacuum and enlarging the free space memory map.


It should be fine going forward. You only need to re-cluster if you want 
to force the table to remain in the order you chose, since it doesn't 
maintain the order for updates and new inserts. Since you're only doing 
it as a cleanup, that's not a concern for you.



do we know that ?  many of the tables are fairly static. only this
one is seriously borked, and yet other related tables seem to be
fine.


Probably not in your case. I just mean that any non-static table is 
going to have this problem. If you know what those are, great. I don't 
usually have that luxury, so I err on the side of assuming the whole DB 
is borked. :)


Also, here's a query you may find useful in the future. It reports the 
top 20 tables by size, but also reports the row counts and what not. 
It's a good way to find possibly bloated tables, or tables you could 
archive:


SELECT n.nspname AS schema_name, c.relname AS table_name,
   c.reltuples AS row_count,
   c.relpages*8/1024 AS mb_used,
   pg_total_relation_size(c.oid)/1024/1024 AS total_mb_used
  FROM pg_class c
  JOIN pg_namespace n ON (n.oid=c.relnamespace)
 WHERE c.relkind = 'r'
 ORDER BY total_mb_used DESC
 LIMIT 20;

The total_mb_used column is the table + all of the indexes and toast 
table space. The mb_used is just for the table itself. This will also 
help you see index bloat, or if a table has too much toasted data.



well who knew the defaults were unsane ? :)


Not really unsane, but for any large database, they're not ideal. This 
also goes for the default_statistics_target setting. If you haven't 
already, you may want to bump this up to 100 from the default of 10. Not 
enough stats can make the planner ignore indexes and other bad things, 
and it sounds like your DB is big enough to benefit from that.


Later versions have made 100 the default, so you'd just be catching up. :)

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

ah right, duh.
yes, I did it as -U postgres, verified as a superuser

just now did it from inside psql as postgres

\c djns4
vacuum verbose analyze;

still no advice on the pages



On Fri, Feb 4, 2011 at 8:34 PM, Scott Marlowe scott.marl...@gmail.comwrote:

 On Fri, Feb 4, 2011 at 12:26 PM, felix crucialfe...@gmail.com wrote:
  I just sudo tried it but still no report

 It's not about who you are in Unix / Linux, it's about who you are in
 Postgresql.  \du will show you who is a superusr.  psql -U username
 will let you connect as that user.

Re: [PERFORM] Really really slow select count(*)


On 02/04/2011 01:59 PM, felix wrote:



still no advice on the pages


I think it just hates you.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

it probably has good reason to hate me.



ns= SELECT n.nspname AS schema_name, c.relname AS table_name,
ns-   c.reltuples AS row_count,
ns-   c.relpages*8/1024 AS mb_used,
ns-   pg_total_relation_size(c.oid)/1024/1024 AS total_mb_used
ns-  FROM pg_class c
ns-  JOIN pg_namespace n ON (n.oid=c.relnamespace)
ns-  WHERE c.relkind = 'r'
ns-  ORDER BY total_mb_used DESC
ns-  LIMIT 20;
 schema_name |table_name|  row_count  | mb_used |
total_mb_used
-+--+-+-+---
 public  | django_session   | 1.47843e+07 |4122 |
  18832
 public  | traffic_tracking2010 | 9.81985e+06 | 811 |
   1653
 public  | mailer_mailingmessagelog | 7.20214e+06 | 441 |
   1082
 public  | auth_user| 3.20077e+06 | 572 |
791
 public  | fastadder_fastadderstatus|  302479 | 458 |
693
 public  | registration_registrationprofile | 3.01345e+06 | 248 |
404
 public  | reporting_dp_6c93734c|  1.1741e+06 |  82 |
224
 public  | peoplez_contact  |   79759 |  18 |
221
 public  | traffic_tracking201101   | 1.49972e+06 | 163 |
204
 public  | reporting_dp_a3439e2a| 1.32739e+06 |  82 |
187
 public  | nsproperties_apthistory  |   44906 |  69 |
126
 public  | nsproperties_apt |   30780 |  71 |
125
 public  | clients_showingrequest   |   85175 |  77 |
103
 public  | reporting_dp_4ffe04ad|  330252 |  26 |
 63
 public  | fastadder_fastadderstatuslog |  377402 |  28 |
 60
 public  | nsmailings_officememotoagent |  268345 |  15 |
 52
 public  | celery_taskmeta  |5041 |  12 |
 32
 public  | mailer_messagelog|  168298 |  24 |
 32
 public  | datapoints_job   |9167 |  12 |
 23
 public  | fastadder_fastadderstatus_errors |  146314 |   7 |
 21

oh and there in the footnotes to django they say dont' forget to run the
delete expired sessions management every once in a while. thanks guys.

it won't run now because its too big, I can delete them from psql though

well just think how sprightly my website will run tomorrow once I fix these.




On Fri, Feb 4, 2011 at 9:00 PM, Shaun Thomas stho...@peak6.com wrote:

 On 02/04/2011 01:59 PM, felix wrote:


  still no advice on the pages


 I think it just hates you.


 --
 Shaun Thomas
 OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
 312-676-8870
 stho...@peak6.com

 __

 See  http://www.peak6.com/email_disclaimer.php
 for terms and conditions related to this email

Re: [PERFORM] Really really slow select count(*)