Re: [HACKERS] postgresql and process titles

2006-06-17 Thread Jim Nasby

Moving to osdldbt-general and dropping Tom and Marc.

On Jun 13, 2006, at 1:18 PM, Kris Kennaway wrote:

On Tue, Jun 13, 2006 at 12:29:14PM -0500, Jim C. Nasby wrote:


Unless supersmack has improved substantially, you're unlikely to find
much interest. Last I heard it was a pretty brain-dead benchmark.  
DBT2/3
(http://sourceforge.net/projects/osdldbt) is much more realistic  
(based

on TPC-C and TPC-H).


Have you tried to compile this on FreeBSD?  It looks like it (dbt1 at
least) will need a moderate amount of hacking - there are some Linux
assumptions in the source and the configure script makes assumptions
about where things are installed that cannot be overridden on the
commandline.


Yeah, there's a number of issues that would need to be addressed. The  
biggest problem as I see it is that all the framework to run tests is  
pretty tied together, without any modularity. The run script itself  
launches all the stats collecting stuff, assumes it needs to happen  
on the local machine, etc.


There is consensus that this kind of thing needs to be done to  
improve dbt*, it's just a matter of doing it. Also, it's desirable to  
improve the benchmarks so that they can store results in a database,  
and set them up so that the monitoring stuff is common to all the  
different tests.

--
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] postgresql and process titles

2006-06-17 Thread Jim Nasby

On Jun 13, 2006, at 9:42 PM, Kris Kennaway wrote:
BTW, there's another FBSD performance odditiy I've run across.  
Running


pg_dump -t email_contrib -COx stats | bzip2  ec.sql.bz2 

which dumps the email_contrib table to bzip2 then to disk, the OS
won't use more than 1 CPU on an SMP system... unless the data is
cached. According to both gstat and systat -v, the system isn't I/O
bound; both are reporting the RAID10 with that table on it as only
about 10% busy. If I let that command run for a bit then cancel it
and re-start it so that the beginning of that table is in cache, it
will use one entire CPU for bzip2, which is what I'd expect to  
happen.


Hmm, odd..maybe something with the scheduler.  I'd need access to a
test case to be able to figure it out though.


well, pg_dump of any sizeable database piped through bzip2 or gzip  
should show this. Try:


pg_dump large_database | bzip2  databasedump.sql.bz2
--
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] postgresql and process titles

2006-06-16 Thread Rod Taylor
  I did have dbt2 pretty close to functional on FreeBSD a year ago but
  it's probably gone back into linuxisms since then.
 
 :(
 
 I won't have the chance to work on this further for another 2 months,
 but if you have patches I could see about picking up on them when I
 get back.

Everything has been applied to the dbt2 tree.

-- 


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] postgresql and process titles

2006-06-16 Thread Kris Kennaway
On Thu, Jun 15, 2006 at 11:34:52PM -0400, Rod Taylor wrote:
 On Tue, 2006-06-13 at 14:18 -0400, Kris Kennaway wrote:
  On Tue, Jun 13, 2006 at 12:29:14PM -0500, Jim C. Nasby wrote:
  
   Unless supersmack has improved substantially, you're unlikely to find
   much interest. Last I heard it was a pretty brain-dead benchmark. DBT2/3
   (http://sourceforge.net/projects/osdldbt) is much more realistic (based
   on TPC-C and TPC-H).
  
  Have you tried to compile this on FreeBSD?  It looks like it (dbt1 at
  least) will need a moderate amount of hacking - there are some Linux
  assumptions in the source and the configure script makes assumptions
  about where things are installed that cannot be overridden on the
  commandline.
 
 I did have dbt2 pretty close to functional on FreeBSD a year ago but
 it's probably gone back into linuxisms since then.

:(

I won't have the chance to work on this further for another 2 months,
but if you have patches I could see about picking up on them when I
get back.

Kris



pgpseP2tfCg1z.pgp
Description: PGP signature


Re: [HACKERS] postgresql and process titles

2006-06-16 Thread Kris Kennaway
On Fri, Jun 16, 2006 at 07:56:30AM -0400, Rod Taylor wrote:
   I did have dbt2 pretty close to functional on FreeBSD a year ago but
   it's probably gone back into linuxisms since then.
  
  :(
  
  I won't have the chance to work on this further for another 2 months,
  but if you have patches I could see about picking up on them when I
  get back.
 
 Everything has been applied to the dbt2 tree.

Cool!

Kris


pgpv6XcmPo9Br.pgp
Description: PGP signature


Re: [HACKERS] postgresql and process titles

2006-06-15 Thread Kris Kennaway
On Tue, Jun 13, 2006 at 12:29:14PM -0500, Jim C. Nasby wrote:

   Can you provide the actual commands you used to setup and run the test?
  
  I actually forget all the steps I needed to do to get super-smack
  working with postgresql since it required a lot of trial and error for
  a database newbie like me (compiling it from the
  benchmarks/super-smack port was trivial, but unlike mysql it required
  configuring the database by hand - this should hopefully be more
  obvious to someone familiar with pgsql though).
  
  It would be great if someone on your end could make this easier, BTW -
  e.g. at least document the steps.  Also super-smack should be changed
  to allow use via a local socket with pgsql (this is the default with
  mysql) - this avoids measuring network stack overhead.
  
 Unless supersmack has improved substantially, you're unlikely to find
 much interest. Last I heard it was a pretty brain-dead benchmark. DBT2/3
 (http://sourceforge.net/projects/osdldbt) is much more realistic (based
 on TPC-C and TPC-H).

Thanks for the reference, I'll check it out.  Brain-dead or not though
(and I've also seen evidence that it is, e.g. it does a hell of a lot
of 1-byte reads), it's a well known benchmark that seems to be fairly
widely used, so there's still some value in making it perform well
(but not at the expense of other things).  However my interest is not
so much for measuring database performance as for measuring kernel
performance and trying to optimize bottlenecks.

FYI, the biggest source of contention is via semop() - it might be
possible to optimize that some more in FreeBSD, I don't know.
   
   Yeah, I've seen PostgreSQL on FreeBSD fall over at high load with a lot
   of procs in either semwait or semlock. :(
  
  Part of that is Giant contention again; for example on 6.x semop() and
  setproctitle() both want to acquire it, so they'll fight with each
  other and with anything else on the system that wants Giant
  (e.g. IPv6, or the USB stack, etc).  As I mentioned Giant is not an
  issue here going forward, but there is still as much lock contention
  just between semop() calls running on different CPUs.  It may be
  possible for someone to implement more fine-grained locking here, but
  I don't know if there is available interest.
 
 FWIW, getting turning off stats_command_string substantially reduced
 this contention, so it appears the issue is somewhere in the stats code.
 This code sends stats messages to a different process via a socket (or
 is it UDP?), with the intention that if the system gets heavily loaded
 we'll lose some stats in the interest of not bogging down all the
 backends. It seems that doesn't work so hot on FreeBSD. :(

It could be it's making assumptions about what is cheap and what is
not, which are not true universally.  e.g. you might be adding a new
contention point that is even worse.  I didn't notice a change in
behaviour as I varied the load, so perhaps it was not visible on this
benchmark or I wasn't looking in the right place.

Kris


pgp1IoeCchkVo.pgp
Description: PGP signature


Re: [HACKERS] postgresql and process titles

2006-06-15 Thread Kris Kennaway
On Tue, Jun 13, 2006 at 12:29:14PM -0500, Jim C. Nasby wrote:

 Unless supersmack has improved substantially, you're unlikely to find
 much interest. Last I heard it was a pretty brain-dead benchmark. DBT2/3
 (http://sourceforge.net/projects/osdldbt) is much more realistic (based
 on TPC-C and TPC-H).

Have you tried to compile this on FreeBSD?  It looks like it (dbt1 at
least) will need a moderate amount of hacking - there are some Linux
assumptions in the source and the configure script makes assumptions
about where things are installed that cannot be overridden on the
commandline.

Kris


pgpMykh0d2JDQ.pgp
Description: PGP signature


Re: [HACKERS] postgresql and process titles

2006-06-15 Thread Kris Kennaway
On Tue, Jun 13, 2006 at 03:55:38PM -0500, Jim Nasby wrote:
 
 On Jun 12, 2006, at 10:38 AM, Kris Kennaway wrote:
 FYI, the biggest source of contention is via semop() - it might be
 possible to optimize that some more in FreeBSD, I don't know.
 
 Yeah, I've seen PostgreSQL on FreeBSD fall over at high load with  
 a lot
 of procs in either semwait or semlock. :(
 
 Part of that is Giant contention again; for example on 6.x semop() and
 setproctitle() both want to acquire it, so they'll fight with each
 other and with anything else on the system that wants Giant
 (e.g. IPv6, or the USB stack, etc).  As I mentioned Giant is not an
 issue here going forward, but there is still as much lock contention
 just between semop() calls running on different CPUs.  It may be
 possible for someone to implement more fine-grained locking here, but
 I don't know if there is available interest.
 
 BTW, there's another FBSD performance odditiy I've run across. Running
 
 pg_dump -t email_contrib -COx stats | bzip2  ec.sql.bz2 
 
 which dumps the email_contrib table to bzip2 then to disk, the OS  
 won't use more than 1 CPU on an SMP system... unless the data is  
 cached. According to both gstat and systat -v, the system isn't I/O  
 bound; both are reporting the RAID10 with that table on it as only  
 about 10% busy. If I let that command run for a bit then cancel it  
 and re-start it so that the beginning of that table is in cache, it  
 will use one entire CPU for bzip2, which is what I'd expect to happen.

Hmm, odd..maybe something with the scheduler.  I'd need access to a
test case to be able to figure it out though.

Kris


pgpG0gnuPegHa.pgp
Description: PGP signature


Re: [HACKERS] postgresql and process titles

2006-06-15 Thread Rod Taylor
On Tue, 2006-06-13 at 14:18 -0400, Kris Kennaway wrote:
 On Tue, Jun 13, 2006 at 12:29:14PM -0500, Jim C. Nasby wrote:
 
  Unless supersmack has improved substantially, you're unlikely to find
  much interest. Last I heard it was a pretty brain-dead benchmark. DBT2/3
  (http://sourceforge.net/projects/osdldbt) is much more realistic (based
  on TPC-C and TPC-H).
 
 Have you tried to compile this on FreeBSD?  It looks like it (dbt1 at
 least) will need a moderate amount of hacking - there are some Linux
 assumptions in the source and the configure script makes assumptions
 about where things are installed that cannot be overridden on the
 commandline.

I did have dbt2 pretty close to functional on FreeBSD a year ago but
it's probably gone back into linuxisms since then.
-- 


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] postgresql and process titles

2006-06-14 Thread Martijn van Oosterhout
On Tue, Jun 13, 2006 at 10:36:27PM -0400, Tom Lane wrote:
 * not clear whether the interrupt happens when waiting for I/O.  I
 already mentioned that this would be a problem for EXPLAIN ANALYZE,
 but it might be no big deal for the other uses.

If we're going y UNIX signal semantics, in theory any interruptable
function (as listed on the signal manpage) would be interrupted every N
milliseconds to handle the interrupt. So yes, if possible the interrupt
will be serviced anyway, even during I/O, by aborting a restarting the
I/O continuously. Not terribly efficient.

With one big exception: sometimes I/O is non-interruptable (the good
old D state). In this case the interrupts will happen but will simply
be queued and in fact will all be dropped except the last one
(non-realtime signals are never stacked). The handler will probably be
called the moment it returns to user-space.

Note: Do we want SIGPROF, SIGALRM or SIGVTALRM? There's apparently a
distinction, see: http://en.wikipedia.org/wiki/SIGPROF

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] postgresql and process titles

2006-06-14 Thread Greg Stark

Martijn van Oosterhout kleptog@svana.org writes:

 With one big exception: sometimes I/O is non-interruptable (the good
 old D state). In this case the interrupts will happen but will simply
 be queued and in fact will all be dropped except the last one
 (non-realtime signals are never stacked). The handler will probably be
 called the moment it returns to user-space.

If backends store their current status in shared memory then a separate
process entirely can receive the interrupts, scan through the shared memory
process states and do the accounting. It would never be interuppting i/o since
that process would never be doing i/o. It could use real-time signals, reads
on /dev/rtc, or whatever other methods exist for scheduling periodic events
since it would be doing nothing but handling these interrupts.

The neat thing about this is that it would be possible to look at a process
from outside instead of having to hijack the application to get feedback. You
could find information like total time spent in i/o versus cpu aggregated
across all queries from a single backend or for all backends.

The downside is that to handle things like EXPLAIN ANALYZE you would have to
provide enough information about plans to this accounting process for it to
store the information. Plans are currently purely local to the backend running
them.

Perhaps it would be enough to provide a unique id for the plan (sequential id
per backend would be enough). And a unique id for the plan node. The
accounting process wouldn't know anything more about what that node
represented but the backend could later query to fetch the information and
associate it with the plan.

-- 
greg


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] postgresql and process titles

2006-06-14 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 If backends store their current status in shared memory then a separate
 process entirely can receive the interrupts, scan through the shared memory
 process states and do the accounting.

This sounds good until you think about locking.  It'd be quite
impractical to implement anything as fine-grained as EXPLAIN ANALYZE
this way, because of the overhead involved in taking and releasing
spinlocks.

It could be practical as a replacement for stats_command_string
messages, though.

I'm not sure about replacing ps_status with this.  I don't think there
is a way for one process to set another's status (on most platforms
anyway).  You might argue that we could abandon ps_status reporting
altogether if we had something better, but I'm unconvinced ...

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] postgresql and process titles

2006-06-14 Thread Greg Stark
Tom Lane [EMAIL PROTECTED] writes:

 Greg Stark [EMAIL PROTECTED] writes:
  If backends store their current status in shared memory then a separate
  process entirely can receive the interrupts, scan through the shared memory
  process states and do the accounting.
 
 This sounds good until you think about locking.  It'd be quite
 impractical to implement anything as fine-grained as EXPLAIN ANALYZE
 this way, because of the overhead involved in taking and releasing
 spinlocks.

I'm not entirely convinced. The only other process that would be looking at
the information would be the statistics accumulator which would only be waking
up every 100ms or so. There would be no contention with other backends
reporting their info.

I was also thinking of reporting only basic information like which of a small
number of states the backend was in and which node of the plan is being
executed. If those are just integers then it might be possible to get away
without locking at all, storing them as sig_atomic_t. 

I think there would still be a fair amount of cache coherency overhead like we
see with the tas instruction on Xeons but only if that backend's monitor block
is actually being touched by the statistics accumulator process which would
presumably be only when running an EXPLAIN ANALYZE or other tracing facility.

-- 
greg


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] postgresql and process titles

2006-06-14 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 This sounds good until you think about locking.  It'd be quite
 impractical to implement anything as fine-grained as EXPLAIN ANALYZE
 this way, because of the overhead involved in taking and releasing
 spinlocks.

 I'm not entirely convinced. The only other process that would be looking at
 the information would be the statistics accumulator which would only be waking
 up every 100ms or so. There would be no contention with other backends
 reporting their info.

The numbers I've been looking at lately say that heavy lock traffic is
expensive, particularly on SMP machines, even with zero contention.
Seems the cache coherency protocol costs a lot even when it's not doing
anything...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] postgresql and process titles

2006-06-14 Thread Jim C. Nasby
On Wed, Jun 14, 2006 at 03:51:28PM -0400, Tom Lane wrote:
 Greg Stark [EMAIL PROTECTED] writes:
  Tom Lane [EMAIL PROTECTED] writes:
  This sounds good until you think about locking.  It'd be quite
  impractical to implement anything as fine-grained as EXPLAIN ANALYZE
  this way, because of the overhead involved in taking and releasing
  spinlocks.
 
  I'm not entirely convinced. The only other process that would be looking at
  the information would be the statistics accumulator which would only be 
  waking
  up every 100ms or so. There would be no contention with other backends
  reporting their info.
 
 The numbers I've been looking at lately say that heavy lock traffic is
 expensive, particularly on SMP machines, even with zero contention.
 Seems the cache coherency protocol costs a lot even when it's not doing
 anything...

Are there any ways we could avoid the locking?

One idea would be to keep something akin to a FIFO, where the backend
would write records instead of updating/over-writing them, and the
reader process would only read records where there was no risk that they
were still being written. That would mean that the reader would need to
stay at least one record behind the backend, but that's probably
manageable.

The downside is more memory usage, but I think we could limit that by
also keeping track of what record the reader had last read. The backend
would check that, and if the reader was more than X records behind, the
backend would update the most recent record it had written, instead of
writing a new one. That would place an effective limit on how much
memory was consumed.

But... I have no idea how exactly shared memory works, so maybe this
plan is fundamentally broken. But hopefully there's some way to handle
the locking problems, because a seperate reader process does sound like
an interesting possibility.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] postgresql and process titles

2006-06-14 Thread Martijn van Oosterhout
On Wed, Jun 14, 2006 at 03:21:55PM -0500, Jim C. Nasby wrote:
 One idea would be to keep something akin to a FIFO, where the backend
 would write records instead of updating/over-writing them, and the
 reader process would only read records where there was no risk that they
 were still being written. That would mean that the reader would need to
 stay at least one record behind the backend, but that's probably
 manageable.

The problem with a FIFO is that the pointers controlling where the
start/end are become the contention.

The only thing I can think of is to have the backends only write atomic
types and give each backend their own cache-line. The profiler would
simply wake up every N ms and read each value, accumulating them
somewhere.

No locking and there would be a maximum of one cache-line bounce per
backend per N ms. I'm not sure you can get better than that under the
situation.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] postgresql and process titles

2006-06-14 Thread Greg Stark

Jim C. Nasby [EMAIL PROTECTED] writes:

  The numbers I've been looking at lately say that heavy lock traffic is
  expensive, particularly on SMP machines, even with zero contention.
  Seems the cache coherency protocol costs a lot even when it's not doing
  anything...
 
 Are there any ways we could avoid the locking?

Well if all we want to do is reproduce the current functionality of EXPLAIN
ANALYZE, all you need is a single sig_atomic_t int that you store the address
of the current node in. When you're done with the query you ask the central
statistics accumulator for how the counts of how many times it saw each node
address.

But that's sort of skipping some steps. The reason tas is slow is because it
needs to ensure cache coherency. Even just storing an int into shared memory
when its cache line is owned by the processor running the stats accumulator
would will have the same overhead.

That said I don't think it would actually be as much. TAS has to ensure no
other processor overwrites the data from this processor. Whereas merely
storing into a sig_atomic_t just has to ensure that you don't get a partially
written integer.

And it seems sort of silly to be doing this much work just to reproduce the
current functionality even if it's lower overhead. The big advantage of a
separate process would be that it could do much more. But that would require
pushing some representation of the plan out to the stats collector, storing
more information about the state than just what node you're at currently such
as whether you're in the midst of index lookups, sequential scans, operator
and function execution, etc.

-- 
greg


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] postgresql and process titles

2006-06-14 Thread Jim C. Nasby
On Wed, Jun 14, 2006 at 10:37:59PM +0200, Martijn van Oosterhout wrote:
 On Wed, Jun 14, 2006 at 03:21:55PM -0500, Jim C. Nasby wrote:
  One idea would be to keep something akin to a FIFO, where the backend
  would write records instead of updating/over-writing them, and the
  reader process would only read records where there was no risk that they
  were still being written. That would mean that the reader would need to
  stay at least one record behind the backend, but that's probably
  manageable.
 
 The problem with a FIFO is that the pointers controlling where the
 start/end are become the contention.
 
Even if what's being written is a single byte/word and we're taking
intentional steps to make sure that reading a pointer that's off by 1
isn't an issue? Is the issue that if CPU A writes a pointer and CPU B
immediately tries to read it that it could get old data from it's cache?

 The only thing I can think of is to have the backends only write atomic
 types and give each backend their own cache-line. The profiler would
 simply wake up every N ms and read each value, accumulating them
 somewhere.
 
Would everything we need fit in a cache line?

 No locking and there would be a maximum of one cache-line bounce per
 backend per N ms. I'm not sure you can get better than that under the
 situation.

What scheduler timer resolution do modern OSes use? It's likely that on
a system with more runnable processes than processors, each process will
only run for N ms anyway...

Speaking of which... doesn't the variables involved in scheduling mean
we couldn't use sampling to calculate cost in EXPLAIN ANALYZE? Or is the
assumption just that it will affect all nodes evenly?
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] postgresql and process titles

2006-06-14 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 Well if all we want to do is reproduce the current functionality of EXPLAIN
 ANALYZE, all you need is a single sig_atomic_t int that you store the address
 of the current node in.

Do I need to point out that sig_atomic_t can't portably be assumed to be
wider than char?

We do currently assume that TransactionId can be read/written
atomically, but (a) that's an int not a pointer, and (b) the assumption
is cruft that we really ought to get rid of.

In any case, speculating about whether we can do something useful with
atomic types ignores the main real problem the thread is about.  Anybody
remember process titles and current command strings?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] postgresql and process titles

2006-06-14 Thread Greg Stark

Jim C. Nasby [EMAIL PROTECTED] writes:

 Even if what's being written is a single byte/word and we're taking
 intentional steps to make sure that reading a pointer that's off by 1
 isn't an issue? Is the issue that if CPU A writes a pointer and CPU B
 immediately tries to read it that it could get old data from it's cache?

Well for applications like this you don't care whether you get the old data or
the new data. You just want to be sure you don't get half of each.

However:

Tom Lane [EMAIL PROTECTED] writes:

 Do I need to point out that sig_atomic_t can't portably be assumed to be
 wider than char?

I didn't actually realize that. That would mean EXPLAIN ANALYZE would be
limited to counting times for 256 plan nodes and wouldn't really be much of an
improvement over the existing infrastructure.

 In any case, speculating about whether we can do something useful with
 atomic types ignores the main real problem the thread is about.  Anybody
 remember process titles and current command strings?

Oops, I thought this was the ANALYZE patch thread.

-- 
greg


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] postgresql and process titles

2006-06-13 Thread Kris Kennaway
On Mon, Jun 12, 2006 at 10:08:22AM -0500, Jim C. Nasby wrote:
 On Mon, Jun 12, 2006 at 12:24:36AM -0400, Kris Kennaway wrote:
  On Sun, Jun 11, 2006 at 10:07:13PM -0500, Jim C. Nasby wrote:
   On Sun, Jun 11, 2006 at 09:58:33PM -0400, Tom Lane wrote:
Kris Kennaway [EMAIL PROTECTED] writes:
 On Sun, Jun 11, 2006 at 07:43:03PM -0400, Tom Lane wrote:
 Let's see the evidence.

 The calls to setproctitle() (it looks like 4 setproctitle syscalls per
 DB query) are causing contention on the Giant lock 25% of the time on
 a dual p4 + HTT.  Disabling process title setting completely gives an
 8% peak performance boost to the super-smack select benchmark.

I think you misunderstood me: I asked for evidence, not interpretation.
What are you measuring, and with what tool, and what are the numbers?
On what benchmark case?  And what did you do to disable process title
setting completely?

The reason I'm being doubting Thomas here is that I've never seen any
   
   Ba-da-bum!
   
indication on any other platform that ps_status is a major bottleneck.
Now maybe FreeBSD really sucks, or maybe you're onto something of
interest, but let's see the proof in a form that someone else can
check and reproduce.
   
   It's also important to find out what version of FreeBSD this is. A lot
   of things have been pulled out of GIANT in 5.x and 6.x, so it's entirely
   possible this isn't an issue in newer versions.
  
 Can you provide the actual commands you used to setup and run the test?

I actually forget all the steps I needed to do to get super-smack
working with postgresql since it required a lot of trial and error for
a database newbie like me (compiling it from the
benchmarks/super-smack port was trivial, but unlike mysql it required
configuring the database by hand - this should hopefully be more
obvious to someone familiar with pgsql though).

It would be great if someone on your end could make this easier, BTW -
e.g. at least document the steps.  Also super-smack should be changed
to allow use via a local socket with pgsql (this is the default with
mysql) - this avoids measuring network stack overhead.

The only thing I had to change on FreeBSD was to edit the
select-key.smack and change localhost to 127.0.0.1 in two
locations to avoid possibly using IPv6 transport.

 This would allow others to duplicate your results and debug the
 situation on their own. This is also important because we've generally
 found HTT to be a loss (except on Windows), so it'd be good to see what
 impact this has on AMD hardware. It would also be very useful to have
 the raw test result numbers you obtained.

They were posted previously.  This is Intel hardware (AMD doesn't do
HTT), and I didn't retest without HTT.  I'll try to do so if I have
the time (however previously when profiling mysql, HTT did give a small
positive change).

  It's still true in 6.x and 7.x.  I have a patch that removes Giant
  from the sysctl in question, and I have also removed it from another
  relevant part of the kernel (semop() is bogusly acquiring Giant when
  it is supposed to be mpsafe).
  
 What affect did that patch have on the numbers? And where is it, in case
 anyone here wants to try it?

I didn't yet retest with the patch.  It's in my perforce branch:

  
http://perforce.freebsd.org/changeList.cgi?FSPC=//depot/user/kris/contention/...

although you probably need a combination of the changes in order for
it to be usable.

  When it's possible to commit that patch (should be in time for 7.0,
  but not sure if it will make it into 6.2) it will eliminate the worst
  part of the problem, but it still leaves postgresql making thousands
  of syscalls per second to flip its process titles back and forth,
  which needs to be looked at carefully for a performance impact.  For
  now, users of FreeBSD who want that 8% should turn it off though (or
  maybe one of the alternative methods is usable).
 
 We have a similar issue internally with stats_command_string. The issue
 is that it's very helpful to be able to see what a 'long running' (more
 than a second or so) statement is doing, but there's very little value
 in doing all that work for a statement that's only going to run for a
 few ms. If there's a very cheap way to set some kind of a timer that
 would update this information once a statement's been around long enough
 that might be a way to handle this (I don't know if we're already using
 ALRM in the backend, or if that's a cheap enough solution).

I don't know what the best way to implement it would be, but limiting
the frequency of these updates does seem to be the way to go.

  FYI, the biggest source of contention is via semop() - it might be
  possible to optimize that some more in FreeBSD, I don't know.
 
 Yeah, I've seen PostgreSQL on FreeBSD fall over at high load with a lot
 of procs in either semwait or semlock. :(

Part of that is Giant contention again; for example on 6.x 

Re: [HACKERS] postgresql and process titles

2006-06-13 Thread Jim C. Nasby
On Mon, Jun 12, 2006 at 11:38:01AM -0400, Kris Kennaway wrote:
 On Mon, Jun 12, 2006 at 10:08:22AM -0500, Jim C. Nasby wrote:
  On Mon, Jun 12, 2006 at 12:24:36AM -0400, Kris Kennaway wrote:
   On Sun, Jun 11, 2006 at 10:07:13PM -0500, Jim C. Nasby wrote:
On Sun, Jun 11, 2006 at 09:58:33PM -0400, Tom Lane wrote:
 Kris Kennaway [EMAIL PROTECTED] writes:
  On Sun, Jun 11, 2006 at 07:43:03PM -0400, Tom Lane wrote:
  Let's see the evidence.
 
  The calls to setproctitle() (it looks like 4 setproctitle syscalls 
  per
  DB query) are causing contention on the Giant lock 25% of the time 
  on
  a dual p4 + HTT.  Disabling process title setting completely gives 
  an
  8% peak performance boost to the super-smack select benchmark.
 
 I think you misunderstood me: I asked for evidence, not 
 interpretation.
 What are you measuring, and with what tool, and what are the numbers?
 On what benchmark case?  And what did you do to disable process title
 setting completely?
 
 The reason I'm being doubting Thomas here is that I've never seen any

Ba-da-bum!

 indication on any other platform that ps_status is a major bottleneck.
 Now maybe FreeBSD really sucks, or maybe you're onto something of
 interest, but let's see the proof in a form that someone else can
 check and reproduce.

It's also important to find out what version of FreeBSD this is. A lot
of things have been pulled out of GIANT in 5.x and 6.x, so it's entirely
possible this isn't an issue in newer versions.
   
  Can you provide the actual commands you used to setup and run the test?
 
 I actually forget all the steps I needed to do to get super-smack
 working with postgresql since it required a lot of trial and error for
 a database newbie like me (compiling it from the
 benchmarks/super-smack port was trivial, but unlike mysql it required
 configuring the database by hand - this should hopefully be more
 obvious to someone familiar with pgsql though).
 
 It would be great if someone on your end could make this easier, BTW -
 e.g. at least document the steps.  Also super-smack should be changed
 to allow use via a local socket with pgsql (this is the default with
 mysql) - this avoids measuring network stack overhead.
 
Unless supersmack has improved substantially, you're unlikely to find
much interest. Last I heard it was a pretty brain-dead benchmark. DBT2/3
(http://sourceforge.net/projects/osdldbt) is much more realistic (based
on TPC-C and TPC-H).
   FYI, the biggest source of contention is via semop() - it might be
   possible to optimize that some more in FreeBSD, I don't know.
  
  Yeah, I've seen PostgreSQL on FreeBSD fall over at high load with a lot
  of procs in either semwait or semlock. :(
 
 Part of that is Giant contention again; for example on 6.x semop() and
 setproctitle() both want to acquire it, so they'll fight with each
 other and with anything else on the system that wants Giant
 (e.g. IPv6, or the USB stack, etc).  As I mentioned Giant is not an
 issue here going forward, but there is still as much lock contention
 just between semop() calls running on different CPUs.  It may be
 possible for someone to implement more fine-grained locking here, but
 I don't know if there is available interest.

FWIW, getting turning off stats_command_string substantially reduced
this contention, so it appears the issue is somewhere in the stats code.
This code sends stats messages to a different process via a socket (or
is it UDP?), with the intention that if the system gets heavily loaded
we'll lose some stats in the interest of not bogging down all the
backends. It seems that doesn't work so hot on FreeBSD. :(
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] postgresql and process titles

2006-06-13 Thread Bruce Momjian
Jim C. Nasby wrote:
 FWIW, getting turning off stats_command_string substantially reduced
 this contention, so it appears the issue is somewhere in the stats code.
 This code sends stats messages to a different process via a socket (or
 is it UDP?), with the intention that if the system gets heavily loaded
 we'll lose some stats in the interest of not bogging down all the
 backends. It seems that doesn't work so hot on FreeBSD. :(

I am working on a patch for 8.2 to fix that for all platforms.

-- 
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] postgresql and process titles

2006-06-13 Thread Jim C. Nasby
On Tue, Jun 13, 2006 at 02:10:15PM -0400, Bruce Momjian wrote:
 Jim C. Nasby wrote:
  FWIW, getting turning off stats_command_string substantially reduced
  this contention, so it appears the issue is somewhere in the stats code.
  This code sends stats messages to a different process via a socket (or
  is it UDP?), with the intention that if the system gets heavily loaded
  we'll lose some stats in the interest of not bogging down all the
  backends. It seems that doesn't work so hot on FreeBSD. :(
 
 I am working on a patch for 8.2 to fix that for all platforms.

Excellent. Did I miss discussion of that or have you not mentioned it?
I'm curious as to how you're fixing it...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] postgresql and process titles

2006-06-13 Thread Bruce Momjian
Jim C. Nasby wrote:
 On Tue, Jun 13, 2006 at 02:10:15PM -0400, Bruce Momjian wrote:
  Jim C. Nasby wrote:
   FWIW, getting turning off stats_command_string substantially reduced
   this contention, so it appears the issue is somewhere in the stats code.
   This code sends stats messages to a different process via a socket (or
   is it UDP?), with the intention that if the system gets heavily loaded
   we'll lose some stats in the interest of not bogging down all the
   backends. It seems that doesn't work so hot on FreeBSD. :(
  
  I am working on a patch for 8.2 to fix that for all platforms.
 
 Excellent. Did I miss discussion of that or have you not mentioned it?
 I'm curious as to how you're fixing it...

The patches are in the hold queue:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

Title is Stats collector performance improvement.  I need someone to
test my patches on a non-BSD platform to move forward.

-- 
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] postgresql and process titles

2006-06-13 Thread Tom Lane
Bruce Momjian pgman@candle.pha.pa.us writes:
 Jim C. Nasby wrote:
 Excellent. Did I miss discussion of that or have you not mentioned it?
 I'm curious as to how you're fixing it...

 The patches are in the hold queue:
   http://momjian.postgresql.org/cgi-bin/pgpatches_hold

That's talking about the stats code, which has approximately zip to do
with setproctitle (ps_status.c).  But IIRC that patch is on hold because
nobody particularly liked the approach it's taking.  I think we should
investigate rewriting the stats communication architecture entirely ---
in particular, do we really need the stats buffer process at all?  It'd
be interesting to see what happens if we just make the collector process
read the UDP socket directly.  Or alternatively drop the UDP socket in
favor of having the backends write directly to the collector process'
input pipe (not sure if this would port to Windows though).

As far as Kris' complaint goes, one thing that might be interesting is
to delay both the setproctitle call and the stats command message send
until the current command has been running a little while (say 100ms
or so).  The main objection I see to this is that it replaces a kernel
call that actually does some work with a kernel call to start a timer,
plus possibly a later kernel call to actually do the work.  Not clear
that there's a win there.  (If you're using statement_timeout it might
not matter, but if you aren't...)

Also not clear how you make the necessary actions happen when the timer
expires --- I seriously doubt it'd be safe to try to do either action
directly in a signal handler.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] postgresql and process titles

2006-06-13 Thread Martijn van Oosterhout
On Tue, Jun 13, 2006 at 04:35:24PM -0400, Tom Lane wrote:
 ...  The main objection I see to this is that it replaces a kernel
 call that actually does some work with a kernel call to start a timer,
 plus possibly a later kernel call to actually do the work.  Not clear
 that there's a win there.
 
And ofcourse it's an almost guarenteed loss on systems that don't
require a syscall to set the proc title.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] postgresql and process titles

2006-06-13 Thread Magnus Hagander
 That's talking about the stats code, which has approximately 
 zip to do with setproctitle (ps_status.c).  But IIRC that 
 patch is on hold because nobody particularly liked the 
 approach it's taking.  I think we should investigate 
 rewriting the stats communication architecture entirely --- 
 in particular, do we really need the stats buffer process at 
 all?  It'd be interesting to see what happens if we just make 
 the collector process read the UDP socket directly.  Or 
 alternatively drop the UDP socket in favor of having the 
 backends write directly to the collector process'
 input pipe (not sure if this would port to Windows though).

(Yes,  Iremember saying I was planning to look at this. As is probably
obvious by now, I haven't had the time to do that (yet)).

As for your question, it will be a bit painful to port to windows. We
did have a lot of problems with the pgstat pipe in the initial porting
work, and I'm not convinced that there aren't some small issues still
lurking there under heavy load. The point is that the whole concept of
sharing socket descriptors doesn't really play well between processes on
Windows.

Using UDP would make that a whole lot better. Without knowing anything,
I would assume the overhead of a localhost UDP packet isn't very large
on a reasonably modern platform.


//Magnus

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] postgresql and process titles

2006-06-13 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian pgman@candle.pha.pa.us writes:
  Jim C. Nasby wrote:
  Excellent. Did I miss discussion of that or have you not mentioned it?
  I'm curious as to how you're fixing it...
 
  The patches are in the hold queue:
  http://momjian.postgresql.org/cgi-bin/pgpatches_hold
 
 That's talking about the stats code, which has approximately zip to do
 with setproctitle (ps_status.c).  But IIRC that patch is on hold because

I thought the bug reporter was asking about the stats code was well.

 nobody particularly liked the approach it's taking.  I think we should
 investigate rewriting the stats communication architecture entirely ---
 in particular, do we really need the stats buffer process at all?  It'd
 be interesting to see what happens if we just make the collector process
 read the UDP socket directly.  Or alternatively drop the UDP socket in

Agreed, that's what I would prefer, and tested something like that, but
even pulling the packet into the buffer and throwing them away had
significant overhead, so I think the timeout trick has to be employed as
well as going to a single process.

 favor of having the backends write directly to the collector process'
 input pipe (not sure if this would port to Windows though).
 
 As far as Kris' complaint goes, one thing that might be interesting is
 to delay both the setproctitle call and the stats command message send
 until the current command has been running a little while (say 100ms
 or so).  The main objection I see to this is that it replaces a kernel
 call that actually does some work with a kernel call to start a timer,
 plus possibly a later kernel call to actually do the work.  Not clear
 that there's a win there.  (If you're using statement_timeout it might
 not matter, but if you aren't...)
 
 Also not clear how you make the necessary actions happen when the timer
 expires --- I seriously doubt it'd be safe to try to do either action
 directly in a signal handler.

Right.  What if the postmaster signals the backend once a second to do
their reporting.  I am not sure what the final solution will be, but we
_need_ one based on the performance numbers I and others have seen. 
Could we have PGPROC have a reporting boolean that is set every second
and somehow checked by each backend?

-- 
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] postgresql and process titles

2006-06-13 Thread Jim C. Nasby
On Tue, Jun 13, 2006 at 05:05:31PM -0400, Bruce Momjian wrote:
 Tom Lane wrote:
  Bruce Momjian pgman@candle.pha.pa.us writes:
   Jim C. Nasby wrote:
   Excellent. Did I miss discussion of that or have you not mentioned it?
   I'm curious as to how you're fixing it...
  
   The patches are in the hold queue:
 http://momjian.postgresql.org/cgi-bin/pgpatches_hold
  
  That's talking about the stats code, which has approximately zip to do
  with setproctitle (ps_status.c).  But IIRC that patch is on hold because
 
 I thought the bug reporter was asking about the stats code was well.
 
It did get brought up...

  As far as Kris' complaint goes, one thing that might be interesting is
  to delay both the setproctitle call and the stats command message send
  until the current command has been running a little while (say 100ms
  or so).  The main objection I see to this is that it replaces a kernel
  call that actually does some work with a kernel call to start a timer,
  plus possibly a later kernel call to actually do the work.  Not clear
  that there's a win there.  (If you're using statement_timeout it might
  not matter, but if you aren't...)
  
  Also not clear how you make the necessary actions happen when the timer
  expires --- I seriously doubt it'd be safe to try to do either action
  directly in a signal handler.
 
 Right.  What if the postmaster signals the backend once a second to do
 their reporting.  I am not sure what the final solution will be, but we
 _need_ one based on the performance numbers I and others have seen. 
 Could we have PGPROC have a reporting boolean that is set every second
 and somehow checked by each backend?

One second might be a bit more delay than some folks want... it would be
nice if this was tuneable. Also, what would the overhead on this look
like if there's a large number of idle backends?

It does sound more appealing than setting a timer every time you start a
transaction, though...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] postgresql and process titles

2006-06-13 Thread Jim Nasby


On Jun 12, 2006, at 10:38 AM, Kris Kennaway wrote:

FYI, the biggest source of contention is via semop() - it might be
possible to optimize that some more in FreeBSD, I don't know.


Yeah, I've seen PostgreSQL on FreeBSD fall over at high load with  
a lot

of procs in either semwait or semlock. :(


Part of that is Giant contention again; for example on 6.x semop() and
setproctitle() both want to acquire it, so they'll fight with each
other and with anything else on the system that wants Giant
(e.g. IPv6, or the USB stack, etc).  As I mentioned Giant is not an
issue here going forward, but there is still as much lock contention
just between semop() calls running on different CPUs.  It may be
possible for someone to implement more fine-grained locking here, but
I don't know if there is available interest.


BTW, there's another FBSD performance odditiy I've run across. Running

pg_dump -t email_contrib -COx stats | bzip2  ec.sql.bz2 

which dumps the email_contrib table to bzip2 then to disk, the OS  
won't use more than 1 CPU on an SMP system... unless the data is  
cached. According to both gstat and systat -v, the system isn't I/O  
bound; both are reporting the RAID10 with that table on it as only  
about 10% busy. If I let that command run for a bit then cancel it  
and re-start it so that the beginning of that table is in cache, it  
will use one entire CPU for bzip2, which is what I'd expect to happen.

--
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] postgresql and process titles

2006-06-13 Thread Marc G. Fournier

On Tue, 13 Jun 2006, Jim Nasby wrote:



On Jun 12, 2006, at 10:38 AM, Kris Kennaway wrote:

FYI, the biggest source of contention is via semop() - it might be
possible to optimize that some more in FreeBSD, I don't know.


Yeah, I've seen PostgreSQL on FreeBSD fall over at high load with a lot
of procs in either semwait or semlock. :(


Part of that is Giant contention again; for example on 6.x semop() and
setproctitle() both want to acquire it, so they'll fight with each
other and with anything else on the system that wants Giant
(e.g. IPv6, or the USB stack, etc).  As I mentioned Giant is not an
issue here going forward, but there is still as much lock contention
just between semop() calls running on different CPUs.  It may be
possible for someone to implement more fine-grained locking here, but
I don't know if there is available interest.


BTW, there's another FBSD performance odditiy I've run across. Running

pg_dump -t email_contrib -COx stats | bzip2  ec.sql.bz2 

which dumps the email_contrib table to bzip2 then to disk, the OS won't use 
more than 1 CPU on an SMP system... unless the data is cached. According to 
both gstat and systat -v, the system isn't I/O bound; both are reporting the 
RAID10 with that table on it as only about 10% busy. If I let that command 
run for a bit then cancel it and re-start it so that the beginning of that 
table is in cache, it will use one entire CPU for bzip2, which is what I'd 
expect to happen.


What version of FreeBSD are you dealing with here?  I'm guessing at least 
6.x, but just figured I'd clarify ...



Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] postgresql and process titles

2006-06-13 Thread Tom Lane
Bruce Momjian pgman@candle.pha.pa.us writes:
 Right.  What if the postmaster signals the backend once a second to do
 their reporting.  I am not sure what the final solution will be, but we
 _need_ one based on the performance numbers I and others have seen. 
 Could we have PGPROC have a reporting boolean that is set every second
 and somehow checked by each backend?

I don't see any point in involving the postmaster in it.  What might be
interesting is to replace the current backend timer-interrupt handling
by a free-running cyclic interrupt every N milliseconds (the resolution
of the statement_timeout and deadlock_check delays would then be no
better than N milliseconds, since those actions would occur at the next
cyclic interrupt after the desired time expires).  We could possibly
drive stats reports and ps_status updates from this, as well as sampling
EXPLAIN ANALYZE if anyone cares to pursue that.

Complaints I can foresee:

* lots of cycles wasted in idle backends.  Possibly a backend that's not
received any command for a second or two could shut down its interrupt
until it next gets a command.

* not clear whether the interrupt happens when waiting for I/O.  I
already mentioned that this would be a problem for EXPLAIN ANALYZE,
but it might be no big deal for the other uses.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] postgresql and process titles

2006-06-13 Thread Jim C. Nasby
On Tue, Jun 13, 2006 at 11:13:55PM -0300, Marc G. Fournier wrote:
 BTW, there's another FBSD performance odditiy I've run across. Running
 
 pg_dump -t email_contrib -COx stats | bzip2  ec.sql.bz2 
 
 which dumps the email_contrib table to bzip2 then to disk, the OS won't 
 use more than 1 CPU on an SMP system... unless the data is cached. 
 According to both gstat and systat -v, the system isn't I/O bound; both 
 are reporting the RAID10 with that table on it as only about 10% busy. If 
 I let that command run for a bit then cancel it and re-start it so that 
 the beginning of that table is in cache, it will use one entire CPU for 
 bzip2, which is what I'd expect to happen.
 
 What version of FreeBSD are you dealing with here?  I'm guessing at least 
 6.x, but just figured I'd clarify ...

FreeBSD 6.0-STABLE #6: Fri Dec  9 19:14:19 UTC 2005
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] postgresql and process titles

2006-06-12 Thread Kris Kennaway
On Sun, Jun 11, 2006 at 10:07:13PM -0500, Jim C. Nasby wrote:
 On Sun, Jun 11, 2006 at 09:58:33PM -0400, Tom Lane wrote:
  Kris Kennaway [EMAIL PROTECTED] writes:
   On Sun, Jun 11, 2006 at 07:43:03PM -0400, Tom Lane wrote:
   Let's see the evidence.
  
   The calls to setproctitle() (it looks like 4 setproctitle syscalls per
   DB query) are causing contention on the Giant lock 25% of the time on
   a dual p4 + HTT.  Disabling process title setting completely gives an
   8% peak performance boost to the super-smack select benchmark.
  
  I think you misunderstood me: I asked for evidence, not interpretation.
  What are you measuring, and with what tool, and what are the numbers?
  On what benchmark case?  And what did you do to disable process title
  setting completely?
  
  The reason I'm being doubting Thomas here is that I've never seen any
 
 Ba-da-bum!
 
  indication on any other platform that ps_status is a major bottleneck.
  Now maybe FreeBSD really sucks, or maybe you're onto something of
  interest, but let's see the proof in a form that someone else can
  check and reproduce.
 
 It's also important to find out what version of FreeBSD this is. A lot
 of things have been pulled out of GIANT in 5.x and 6.x, so it's entirely
 possible this isn't an issue in newer versions.

It's still true in 6.x and 7.x.  I have a patch that removes Giant
from the sysctl in question, and I have also removed it from another
relevant part of the kernel (semop() is bogusly acquiring Giant when
it is supposed to be mpsafe).

When it's possible to commit that patch (should be in time for 7.0,
but not sure if it will make it into 6.2) it will eliminate the worst
part of the problem, but it still leaves postgresql making thousands
of syscalls per second to flip its process titles back and forth,
which needs to be looked at carefully for a performance impact.  For
now, users of FreeBSD who want that 8% should turn it off though (or
maybe one of the alternative methods is usable).

BTW, another promising performance/scalability change on BSD systems
would be to convert pgsql to use kqueue instead of select, since mutex
profiling traces show a lot of contention on the select lock in
FreeBSD.

FYI, the biggest source of contention is via semop() - it might be
possible to optimize that some more in FreeBSD, I don't know.

Kris


pgpoSFs8N7PwV.pgp
Description: PGP signature


Re: [HACKERS] postgresql and process titles

2006-06-12 Thread Jim C. Nasby
On Mon, Jun 12, 2006 at 12:24:36AM -0400, Kris Kennaway wrote:
 On Sun, Jun 11, 2006 at 10:07:13PM -0500, Jim C. Nasby wrote:
  On Sun, Jun 11, 2006 at 09:58:33PM -0400, Tom Lane wrote:
   Kris Kennaway [EMAIL PROTECTED] writes:
On Sun, Jun 11, 2006 at 07:43:03PM -0400, Tom Lane wrote:
Let's see the evidence.
   
The calls to setproctitle() (it looks like 4 setproctitle syscalls per
DB query) are causing contention on the Giant lock 25% of the time on
a dual p4 + HTT.  Disabling process title setting completely gives an
8% peak performance boost to the super-smack select benchmark.
   
   I think you misunderstood me: I asked for evidence, not interpretation.
   What are you measuring, and with what tool, and what are the numbers?
   On what benchmark case?  And what did you do to disable process title
   setting completely?
   
   The reason I'm being doubting Thomas here is that I've never seen any
  
  Ba-da-bum!
  
   indication on any other platform that ps_status is a major bottleneck.
   Now maybe FreeBSD really sucks, or maybe you're onto something of
   interest, but let's see the proof in a form that someone else can
   check and reproduce.
  
  It's also important to find out what version of FreeBSD this is. A lot
  of things have been pulled out of GIANT in 5.x and 6.x, so it's entirely
  possible this isn't an issue in newer versions.
 
Can you provide the actual commands you used to setup and run the test?
This would allow others to duplicate your results and debug the
situation on their own. This is also important because we've generally
found HTT to be a loss (except on Windows), so it'd be good to see what
impact this has on AMD hardware. It would also be very useful to have
the raw test result numbers you obtained.

 It's still true in 6.x and 7.x.  I have a patch that removes Giant
 from the sysctl in question, and I have also removed it from another
 relevant part of the kernel (semop() is bogusly acquiring Giant when
 it is supposed to be mpsafe).
 
What affect did that patch have on the numbers? And where is it, in case
anyone here wants to try it?

 When it's possible to commit that patch (should be in time for 7.0,
 but not sure if it will make it into 6.2) it will eliminate the worst
 part of the problem, but it still leaves postgresql making thousands
 of syscalls per second to flip its process titles back and forth,
 which needs to be looked at carefully for a performance impact.  For
 now, users of FreeBSD who want that 8% should turn it off though (or
 maybe one of the alternative methods is usable).

We have a similar issue internally with stats_command_string. The issue
is that it's very helpful to be able to see what a 'long running' (more
than a second or so) statement is doing, but there's very little value
in doing all that work for a statement that's only going to run for a
few ms. If there's a very cheap way to set some kind of a timer that
would update this information once a statement's been around long enough
that might be a way to handle this (I don't know if we're already using
ALRM in the backend, or if that's a cheap enough solution).

 FYI, the biggest source of contention is via semop() - it might be
 possible to optimize that some more in FreeBSD, I don't know.

Yeah, I've seen PostgreSQL on FreeBSD fall over at high load with a lot
of procs in either semwait or semlock. :(
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] postgresql and process titles

2006-06-11 Thread Marc G. Fournier


The only way that I'm aware of for disabling this is at compile time ... 
after running configure, you want to modify:


src/include/pg_config.h

and undef HAVE_SETPROCTITLE ...

I'm CC'ng -hackers about this though, since I think you are the first to 
point to setproctitle() as being a serious performance bottleneck ...




On Sun, 11 Jun 2006, Kris Kennaway wrote:


Why does postgresql change its process title so frequently and how can
this be disabled?  Profiling suggests it's a fairly serious
performance bottleneck.

Kris




Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] postgresql and process titles

2006-06-11 Thread Tom Lane
 On Sun, 11 Jun 2006, Kris Kennaway wrote:
 Why does postgresql change its process title so frequently and how can
 this be disabled?  Profiling suggests it's a fairly serious
 performance bottleneck.

Let's see the evidence.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] postgresql and process titles

2006-06-11 Thread Tom Lane
Kris Kennaway [EMAIL PROTECTED] writes:
 On Sun, Jun 11, 2006 at 07:43:03PM -0400, Tom Lane wrote:
 Let's see the evidence.

 The calls to setproctitle() (it looks like 4 setproctitle syscalls per
 DB query) are causing contention on the Giant lock 25% of the time on
 a dual p4 + HTT.  Disabling process title setting completely gives an
 8% peak performance boost to the super-smack select benchmark.

I think you misunderstood me: I asked for evidence, not interpretation.
What are you measuring, and with what tool, and what are the numbers?
On what benchmark case?  And what did you do to disable process title
setting completely?

The reason I'm being doubting Thomas here is that I've never seen any
indication on any other platform that ps_status is a major bottleneck.
Now maybe FreeBSD really sucks, or maybe you're onto something of
interest, but let's see the proof in a form that someone else can
check and reproduce.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] postgresql and process titles

2006-06-11 Thread Kris Kennaway
On Sun, Jun 11, 2006 at 07:43:03PM -0400, Tom Lane wrote:
  On Sun, 11 Jun 2006, Kris Kennaway wrote:
  Why does postgresql change its process title so frequently and how can
  this be disabled?  Profiling suggests it's a fairly serious
  performance bottleneck.
 
 Let's see the evidence.

The calls to setproctitle() (it looks like 4 setproctitle syscalls per
DB query) are causing contention on the Giant lock 25% of the time on
a dual p4 + HTT.  Disabling process title setting completely gives an
8% peak performance boost to the super-smack select benchmark.

It's actually worse than that in stock FreeBSD because in certain
mixed workloads with other sources of Giant contention present they
will contend with each other to make overall machine performance much
worse; that's the serious aspect, although 8% performance loss just
in case someone should ever want to run 'ps' to check the state of the
database processes is nothing to laugh at either.

Kris


pgpyrvGOB2EdD.pgp
Description: PGP signature


Re: [HACKERS] postgresql and process titles

2006-06-11 Thread Kris Kennaway
On Sun, Jun 11, 2006 at 09:58:33PM -0400, Tom Lane wrote:
 Kris Kennaway [EMAIL PROTECTED] writes:
  On Sun, Jun 11, 2006 at 07:43:03PM -0400, Tom Lane wrote:
  Let's see the evidence.
 
  The calls to setproctitle() (it looks like 4 setproctitle syscalls per
  DB query) are causing contention on the Giant lock 25% of the time on
  a dual p4 + HTT.  Disabling process title setting completely gives an
  8% peak performance boost to the super-smack select benchmark.
 
 I think you misunderstood me: I asked for evidence, not interpretation.
 What are you measuring, and with what tool, and what are the numbers?
 On what benchmark case?

As I said, I'm using the super-smack select benchmark; presumably you
are aware of it.

 And what did you do to disable process title setting completely?

Added this to ps_status.c:

#undef PS_USE_SETPROCTITLE
#undef PS_USE_PSTAT
#undef PS_USE_PS_STRINGS
#undef PS_USE_CHANGE_ARGV
#undef PS_USE_CLOBBER_ARGV
#define PS_USE_NONE

Here are the queries/second data, before and after:

x pg
+ pg-noproctitle
++
|x  xx  +|
|x  xx  +  + |
|xx xxx++ +  |
| |AM||AM___||
++
N   Min   MaxMedian   AvgStddev
x  11   3399.293425.1   3413.93 3411.8418 9.4287675
+  10   3615.63   3699.32  3685.515  3679.344 24.894177
Difference at 95.0% confidence
267.502 +/- 16.871
7.8404% +/- 0.494483%
(Student's t, pooled s = 18.4484)

Kris


pgporT3d4xNO0.pgp
Description: PGP signature