Re: [HACKERS] Performance Monitoring

2007-06-16 Thread Robert Treat
On Friday 15 June 2007 13:29, Greg Smith wrote:
 On Fri, 15 Jun 2007, Umar Farooq wrote:
  Surprisingly, no matter what type of query I execute, when I use strace
  to monitor the system calls generated they turn out to be the same for
  ALL sorts of queries.

 How are you calling strace?  The master postgres progress forks off new
 processes for each of the clients, you need to make sure you include stats
 from all of them as well to get anything.

 I normally use

 strace -c -f -p [main postgres process]

 which is helpful to collecte basic info, but even that's not quite right;
 you don't get the work done by the other proceses (logger, writer, stats
 collector).  To get everything, you need to start some process that you
 attach strace to like this, then have that process start the server.

 I haven't found strace to be a great tool for this type of work.  You
 might take at look at systemtap instead (
 http://sourceware.org/systemtap/wiki ) which is a bit immature but is
 going the right direction.

 I will now bow my head and wait for someone to suggest you move to an OS
 that supports dtrace.


You know that's what I was thinking :-)

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[HACKERS] Performance Monitoring

2007-06-15 Thread Umar Farooq
Hello All,Recently, I have been involved in some work that requires me to 
monitor low level performance counters for pgsql. Specifically, when I execute 
a particular query I want to be able to tell how many system calls get executed 
on behalf of that query and time of each sys call. The idea is to provide a 
break down of the total query running time in terms of CPU time (user+system) 
and IO (wait time + service time) and any other contributing factors.I have 
been using various performance monitoring tool in linux including vmstat, 
mpstat, iostat and strace(to monitor system calls). Surprisingly, no matter 
what type of query I execute, when I use strace to monitor the system calls 
generated they turn out to be the same for ALL sorts of queries. Now this could 
happen because (a) the tool (strace) is not robust (b) something the pgsql code 
is doing that eludes strace. At this point I have been unable to narrow it 
down. Any help in this regard will be greatly appreciated. Also if somebody 
knows a better way of achieving the same goal please let me know.Thanks for 
reading.-Umar
_
Live Earth is coming.  Learn more about the hottest summer event - only on MSN.
http://liveearth.msn.com?source=msntaglineliveearthwlm

Re: [HACKERS] Performance Monitoring

2007-06-15 Thread Greg Smith

On Fri, 15 Jun 2007, Umar Farooq wrote:

Surprisingly, no matter what type of query I execute, when I use strace 
to monitor the system calls generated they turn out to be the same for 
ALL sorts of queries.


How are you calling strace?  The master postgres progress forks off new 
processes for each of the clients, you need to make sure you include stats 
from all of them as well to get anything.


I normally use

strace -c -f -p [main postgres process]

which is helpful to collecte basic info, but even that's not quite right; 
you don't get the work done by the other proceses (logger, writer, stats 
collector).  To get everything, you need to start some process that you 
attach strace to like this, then have that process start the server.


I haven't found strace to be a great tool for this type of work.  You 
might take at look at systemtap instead ( 
http://sourceware.org/systemtap/wiki ) which is a bit immature but is 
going the right direction.


I will now bow my head and wait for someone to suggest you move to an OS 
that supports dtrace.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Performance monitoring

2007-05-14 Thread Zdenek Kotala

Heikki Linnakangas napsal(a):

Jim C. Nasby wrote:




There is two counters for checkpoints in pgstats, the number of timed 
(triggered by checkpoint_timeout) and requested (triggered by 
checkpoint_segments) checkpoints.


Maybe we should improve the stats system so that we can collect events 
with timestamps and durations, but in my experience log files actually 
are the most reliable and universal way to collect real-time performance 
information. Any serious tool has a generic log parser. The other 
alternative is SNMP. I welcome the efforts on pgsnmpd..


Whats about add some DTrace probes?

Zdenek


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Performance monitoring

2007-05-13 Thread Heikki Linnakangas

Jim C. Nasby wrote:

Moving to -hackers.

On Fri, May 11, 2007 at 04:37:44PM +0100, Heikki Linnakangas wrote:
If you know when the checkpoint ended, and you know how long each of the 
pieces took, you can reconstruct the other times easily.  The way you 
describe this it is true--that the summary is redundant given the 
detail--but if you put yourself in the shoes of a log file parser the 
other way around is easier to work with.  Piecing together log entries 
is a pain, splitting them is easy.


If I had to only keep one line out of this, it would be the one with the 
summary.  It would be nice to have it logged at INFO.
Yeah, if we have the summary line we don't need the other lines and vice 
versa. I have sympathy for parsing log files, I've done that a lot in 
the past and I can see what you mean. Having the individual lines is 
nice when you're monitoring a running system; you don't get the summary 
line until the checkpoint is finished. I suppose we can have both the 
individual lines and the summary, the extra lines shouldn't hurt anyone, 
and you won't get them unless you turn on the new log_checkpoints 
parameter anyway.


Not to beat a dead horse, but do we really want to force folks to be
parsing logs for performance monitoring? Especially if that log parsing
is just going to result in data being inserted into a table anyway?

I know there's concern about performance of the stats system and maybe
that needs to be addressed, but pushing users to log parsing is a lot of
extra effort, non-standard, likely to be overlooked, and doesn't play
well with other tools. It also conflicts with all the existing
statistics framework.


There is two counters for checkpoints in pgstats, the number of timed 
(triggered by checkpoint_timeout) and requested (triggered by 
checkpoint_segments) checkpoints.


Maybe we should improve the stats system so that we can collect events 
with timestamps and durations, but in my experience log files actually 
are the most reliable and universal way to collect real-time performance 
information. Any serious tool has a generic log parser. The other 
alternative is SNMP. I welcome the efforts on pgsnmpd..


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Performance monitoring

2007-05-13 Thread Magnus Hagander
Heikki Linnakangas wrote:
 Yeah, if we have the summary line we don't need the other lines and
 vice versa. I have sympathy for parsing log files, I've done that a
 lot in the past and I can see what you mean. Having the individual
 lines is nice when you're monitoring a running system; you don't get
 the summary line until the checkpoint is finished. I suppose we can
 have both the individual lines and the summary, the extra lines
 shouldn't hurt anyone, and you won't get them unless you turn on the
 new log_checkpoints parameter anyway.

 Not to beat a dead horse, but do we really want to force folks to be
 parsing logs for performance monitoring? Especially if that log parsing
 is just going to result in data being inserted into a table anyway?

 I know there's concern about performance of the stats system and maybe
 that needs to be addressed, but pushing users to log parsing is a lot of
 extra effort, non-standard, likely to be overlooked, and doesn't play
 well with other tools. It also conflicts with all the existing
 statistics framework.
 
 There is two counters for checkpoints in pgstats, the number of timed
 (triggered by checkpoint_timeout) and requested (triggered by
 checkpoint_segments) checkpoints.
 
 Maybe we should improve the stats system so that we can collect events
 with timestamps and durations, but in my experience log files actually
 are the most reliable and universal way to collect real-time performance
 information. Any serious tool has a generic log parser. The other
 alternative is SNMP. I welcome the efforts on pgsnmpd..

pgsnmpd can't provide any information that's not in the backend. Unless
we'd turn it into a log parser, which is not really something I think is
a good idea.

Log files are great for one kind of thing, live statistics through
SNMP or the statistics collector for another kind. It only gets wrong
when you put them in the wrong place. Things you poll regularly makes a
lot more sense in some kind of live view than in a log file.

//Magnus

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Performance monitoring

2007-05-13 Thread Jim C. Nasby
On Sun, May 13, 2007 at 07:54:20AM +0100, Heikki Linnakangas wrote:
 Maybe we should improve the stats system so that we can collect events 
 with timestamps and durations, but in my experience log files actually 
 are the most reliable and universal way to collect real-time performance 
 information. Any serious tool has a generic log parser. The other 
 alternative is SNMP. I welcome the efforts on pgsnmpd..

Having timing information in the stats system would be useful, but I'm
not sure how it could actually be done. But at least if the information
is in the stats system it's easy to programatically collect and process.
SNMP is just one example of that (fwiw I agree with Magnus that it
probably doesn't make sense to turn pgsnmpd into a log parser...)
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] Performance monitoring (was: [PATCHES] Logging checkpoints and other slowdown causes)

2007-05-12 Thread Jim C. Nasby
Moving to -hackers.

On Fri, May 11, 2007 at 04:37:44PM +0100, Heikki Linnakangas wrote:
 If you know when the checkpoint ended, and you know how long each of the 
 pieces took, you can reconstruct the other times easily.  The way you 
 describe this it is true--that the summary is redundant given the 
 detail--but if you put yourself in the shoes of a log file parser the 
 other way around is easier to work with.  Piecing together log entries 
 is a pain, splitting them is easy.
 
 If I had to only keep one line out of this, it would be the one with the 
 summary.  It would be nice to have it logged at INFO.
 
 Yeah, if we have the summary line we don't need the other lines and vice 
 versa. I have sympathy for parsing log files, I've done that a lot in 
 the past and I can see what you mean. Having the individual lines is 
 nice when you're monitoring a running system; you don't get the summary 
 line until the checkpoint is finished. I suppose we can have both the 
 individual lines and the summary, the extra lines shouldn't hurt anyone, 
 and you won't get them unless you turn on the new log_checkpoints 
 parameter anyway.

Not to beat a dead horse, but do we really want to force folks to be
parsing logs for performance monitoring? Especially if that log parsing
is just going to result in data being inserted into a table anyway?

I know there's concern about performance of the stats system and maybe
that needs to be addressed, but pushing users to log parsing is a lot of
extra effort, non-standard, likely to be overlooked, and doesn't play
well with other tools. It also conflicts with all the existing
statistics framework.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Performance monitoring

2007-05-12 Thread Joshua D. Drake



Not to beat a dead horse, but do we really want to force folks to be
parsing logs for performance monitoring? Especially if that log parsing
is just going to result in data being inserted into a table anyway?

I know there's concern about performance of the stats system and maybe
that needs to be addressed, but pushing users to log parsing is a lot of
extra effort, non-standard, likely to be overlooked, and doesn't play
well with other tools. It also conflicts with all the existing
statistics framework.



One thing that doesn't seemed to be being looked at it is the cost of 
logging. Logging is very expensive. I don't know if it is more expensive 
than the stats system, but you can cut your tps in half by having any 
level of verbose logging on.


Yes that can be offset by pushing the logging to another spindle, and 
being careful about what you are logging but still.


Either way, we are taking the hit, it is just a matter of where. IMO it 
would be better to have the information in the database where it makes 
sense, than pushing out to a log that:


A. Will likely be forgotten
B. Is only accessible if you have shell access to the machine (not as 
common as all of us would like to think)


Sincerely,

Joshua D. Drake


--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Performance monitoring

2007-05-12 Thread Greg Smith

On Sat, 12 May 2007, Joshua D. Drake wrote:

One thing that doesn't seemed to be being looked at it is the cost of 
logging.


If any of this executed at something like the query level, sure, that 
would be real important.  The majority of the logging I suggested here is 
of things that happen at checkpoint time.  The presumption is that the 
overhead of that is considerably greater than writing a log line or even 
five.


The least intensive event I would like to be loggable is when a new WAL 
segment is created and cleared, which is again a pretty small bit of log 
compared to the 16MB write.  I wouldn't mind seeing that exposed under 
pg_stats instead, just had more interesting things to statify first.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Performance monitoring

2007-05-12 Thread Neil Conway
On Sat, 2007-12-05 at 14:26 -0700, Joshua D. Drake wrote:
 Either way, we are taking the hit, it is just a matter of where. IMO it 
 would be better to have the information in the database where it makes 
 sense, than pushing out to a log

If performance monitoring information is provided as a database object,
what would the right interface be? IMHO the problem with cleanly
presenting monitoring information within a normal database system is
that this sort of data is fundamentally dynamic and continuous: to
determine how the performance of the system changes over time, you need
to repeatedly rescan the table/view/SRF and recompute your analysis
essentially from scratch. Trying to get even simple information like
queries per second from pg_stat_activity is an example of how this can
be painful.

plug
BTW, if the system included the concept of a continuous data *stream* as
a kind of database object, this problem would be much more tractable :)
In fact, there is some code in a version of TelegraphCQ that exposes
various information about the runtime state of the system as a set of
system-defined data streams -- like any other stream, users could then
use those streams in arbitrary queries.
/plug

-Neil



---(end of broadcast)---
TIP 6: explain analyze is your friend