AW: AW: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-19 Thread Zeugswetter Andreas SB


  It's great as long as you never block, but it sucks for making things
  wait, because the wait interval will be some multiple of 10 msec rather
  than just the time till the lock comes free.
 
  On the AIX platform usleep (3) is able to really sleep microseconds without 
  busying the cpu when called for more than approx. 100 us (the longer the interval,
  the less busy the cpu gets) .
  Would this not be ideal for spin_lock, or is usleep not very common ?
  Linux sais it is in the BSD 4.3 standard.
 
 HPUX has usleep, but the man page says
 
  The usleep() function is included for its historical usage. The
  setitimer() function is preferred over this function.

I doubt that setitimer has microsecond precision on HPUX.

 In any case, I would expect that all these functions offer accuracy
 no better than the scheduler's regular clock cycle (~ 100Hz) on most
 kernels.

Not on AIX, and I don't beleive that for the majority of other UNIX platforms eighter. 
I do however suspect, that some implementations need a busy loop, which would, 
if at all, only be acceptable on an SMP system.

Andreas

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Xu Yifeng

Hello Alfred,

Friday, March 16, 2001, 3:21:09 PM, you wrote:

AP * Xu Yifeng [EMAIL PROTECTED] [010315 22:25] wrote:

 Could anyone consider fork a syncer process to sync data to disk ?
 build a shared sync queue, when a daemon process want to do sync after
 write() is called, just put a sync request to the queue. this can release
 process from blocked on writing as soon as possible. multipile sync
 request for one file can be merged when the request is been inserting to
 the queue.

AP I suggested this about a year ago. :)

AP The problem is that you need that process to potentially open and close
AP many files over and over.

AP I still think it's somewhat of a good idea.

I am not a DBMS guru.
couldn't the syncer process cache opened files? is there any problem I
didn't consider ?

-- 
Best regards,
Xu Yifeng



---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Alfred Perlstein

* Xu Yifeng [EMAIL PROTECTED] [010316 01:15] wrote:
 Hello Alfred,
 
 Friday, March 16, 2001, 3:21:09 PM, you wrote:
 
 AP * Xu Yifeng [EMAIL PROTECTED] [010315 22:25] wrote:
 
  Could anyone consider fork a syncer process to sync data to disk ?
  build a shared sync queue, when a daemon process want to do sync after
  write() is called, just put a sync request to the queue. this can release
  process from blocked on writing as soon as possible. multipile sync
  request for one file can be merged when the request is been inserting to
  the queue.
 
 AP I suggested this about a year ago. :)
 
 AP The problem is that you need that process to potentially open and close
 AP many files over and over.
 
 AP I still think it's somewhat of a good idea.
 
 I am not a DBMS guru.

Hah, same here. :)

 couldn't the syncer process cache opened files? is there any problem I
 didn't consider ?

1) IPC latency, the amount of time it takes to call fsync will
   increase by at least two context switches.

2) a working set (number of files needed to be fsync'd) that
   is larger than the amount of files you wish to keep open.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Tom Lane

Alfred Perlstein [EMAIL PROTECTED] writes:
 couldn't the syncer process cache opened files? is there any problem I
 didn't consider ?

 1) IPC latency, the amount of time it takes to call fsync will
increase by at least two context switches.

 2) a working set (number of files needed to be fsync'd) that
is larger than the amount of files you wish to keep open.

These days we're really only interested in fsync'ing the current WAL
log file, so working set doesn't seem like a problem anymore.  However
context-switch latency is likely to be a big problem.  One thing we'd
definitely need before considering this is to replace the existing
spinlock mechanism with something more efficient.

Vadim has designed the WAL stuff in such a way that a separate
writer/syncer process would be easy to add; in fact it's almost that way
already, in that any backend can write or sync data that's been added
to the queue by any other backend.  The question is whether it'd
actually buy anything to have another process.  Good stuff to experiment
with for 7.2.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Alfred Perlstein

* Tom Lane [EMAIL PROTECTED] [010316 08:16] wrote:
 Alfred Perlstein [EMAIL PROTECTED] writes:
  couldn't the syncer process cache opened files? is there any problem I
  didn't consider ?
 
  1) IPC latency, the amount of time it takes to call fsync will
 increase by at least two context switches.
 
  2) a working set (number of files needed to be fsync'd) that
 is larger than the amount of files you wish to keep open.
 
 These days we're really only interested in fsync'ing the current WAL
 log file, so working set doesn't seem like a problem anymore.  However
 context-switch latency is likely to be a big problem.  One thing we'd
 definitely need before considering this is to replace the existing
 spinlock mechanism with something more efficient.

What sort of problems are you seeing with the spinlock code?

 Vadim has designed the WAL stuff in such a way that a separate
 writer/syncer process would be easy to add; in fact it's almost that way
 already, in that any backend can write or sync data that's been added
 to the queue by any other backend.  The question is whether it'd
 actually buy anything to have another process.  Good stuff to experiment
 with for 7.2.

The delayed/coallecesed (sp?) fsync looked interesting.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Tom Lane

Alfred Perlstein [EMAIL PROTECTED] writes:
 definitely need before considering this is to replace the existing
 spinlock mechanism with something more efficient.

 What sort of problems are you seeing with the spinlock code?

It's great as long as you never block, but it sucks for making things
wait, because the wait interval will be some multiple of 10 msec rather
than just the time till the lock comes free.

We've speculated about using Posix semaphores instead, on platforms
where those are available.  I think Bruce was concerned about the
possible overhead of pulling in a whole thread-support library just to
get semaphores, however.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread The Hermit Hacker

On Fri, 16 Mar 2001, Tom Lane wrote:

 Alfred Perlstein [EMAIL PROTECTED] writes:
  definitely need before considering this is to replace the existing
  spinlock mechanism with something more efficient.

  What sort of problems are you seeing with the spinlock code?

 It's great as long as you never block, but it sucks for making things
 wait, because the wait interval will be some multiple of 10 msec rather
 than just the time till the lock comes free.

 We've speculated about using Posix semaphores instead, on platforms
 where those are available.  I think Bruce was concerned about the
 possible overhead of pulling in a whole thread-support library just to
 get semaphores, however.

But, with shared libraries, are you really pulling in a "whole
thread-support library"?  My understanding of shared libraries (altho it
may be totally off) was that instead of pulling in a whole library, you
pulled in the bits that you needed, pretty much as you needed them ...




---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



RE: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Mikheev, Vadim

 We've speculated about using Posix semaphores instead, on platforms

For spinlocks we should use pthread mutex-es.

 where those are available.  I think Bruce was concerned about the

And nutex-es are more portable than semaphores.

Vadim

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Tom Lane

Larry Rosenman [EMAIL PROTECTED] writes:
 But, with shared libraries, are you really pulling in a "whole
 thread-support library"?

 Yes, you are.  On UnixWare, you need to add -Kthread, which CHANGES a LOT 
 of primitives to go through threads wrappers and scheduling.

Right, it's not so much that we care about referencing another shlib,
it's that -lpthreads may cause you to get a whole new thread-aware
version of libc, with attendant overhead that we don't need or want.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



AW: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Zeugswetter Andreas SB


  definitely need before considering this is to replace the existing
  spinlock mechanism with something more efficient.
 
  What sort of problems are you seeing with the spinlock code?
 
 It's great as long as you never block, but it sucks for making things

I like optimistic approaches :-)

 wait, because the wait interval will be some multiple of 10 msec rather
 than just the time till the lock comes free.

On the AIX platform usleep (3) is able to really sleep microseconds without 
busying the cpu when called for more than approx. 100 us (the longer the interval,
the less busy the cpu gets) .
Would this not be ideal for spin_lock, or is usleep not very common ?
Linux sais it is in the BSD 4.3 standard.

postgres@s0188000zeu:/usr/postgres time ustest # with 100 us
real0m10.95s
user0m0.40s
sys 0m0.74s

postgres@s0188000zeu:/usr/postgres time ustest # with 10 us
real0m18.62s
user0m1.37s
sys 0m5.73s

Andreas

PS: sorry off for weekend now :-) Current looks good on AIX.


 ustest.c


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Doug McNaught

Tom Lane [EMAIL PROTECTED] writes:

 Alfred Perlstein [EMAIL PROTECTED] writes:
  definitely need before considering this is to replace the existing
  spinlock mechanism with something more efficient.
 
  What sort of problems are you seeing with the spinlock code?
 
 It's great as long as you never block, but it sucks for making things
 wait, because the wait interval will be some multiple of 10 msec rather
 than just the time till the lock comes free.

Plus, using select() for the timeout is putting you into the kernel
multiple times in a short period, and causing a reschedule everytime,
which is a big lose.  This was discussed in the linux-kernel thread
that was referred to a few days ago.

 We've speculated about using Posix semaphores instead, on platforms
 where those are available.  I think Bruce was concerned about the
 possible overhead of pulling in a whole thread-support library just to
 get semaphores, however.

Are Posix semaphores faster by definition than SysV semaphores (which
are described as "slow" in the source comments)?  I can't see how
they'd be much faster unless locking/unlocking an uncontended
semaphore avoids a system call, in which case you might run into the
same problems with userland backoff...

Just looked, and on Linux pthreads and POSIX semaphores are both
already in the C library.  Unfortunately, the Linux C library doesn't
support the PROCESS_SHARED attribute for either pthreads mutexes or
POSIX semaphores.  Grumble.  What's the point then?

Just some ignorant ramblings, thanks for listening...

-Doug

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Bruce Momjian

[ Charset ISO-8859-1 unsupported, converting... ]
 Yes, you are.  On UnixWare, you need to add -Kthread, which CHANGES a LOT 
 of primitives to go through threads wrappers and scheduling.

This was my concern;  the change that happens on startup and lib calls
when thread support comes in through a library.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: AW: Re[4]: [HACKERS] Allowing WAL fsync to be done via O_SYNC

2001-03-16 Thread Tom Lane

Zeugswetter Andreas SB  [EMAIL PROTECTED] writes:
 It's great as long as you never block, but it sucks for making things
 wait, because the wait interval will be some multiple of 10 msec rather
 than just the time till the lock comes free.

 On the AIX platform usleep (3) is able to really sleep microseconds without 
 busying the cpu when called for more than approx. 100 us (the longer the interval,
 the less busy the cpu gets) .
 Would this not be ideal for spin_lock, or is usleep not very common ?
 Linux sais it is in the BSD 4.3 standard.

HPUX has usleep, but the man page says

 The usleep() function is included for its historical usage. The
 setitimer() function is preferred over this function.

In any case, I would expect that all these functions offer accuracy
no better than the scheduler's regular clock cycle (~ 100Hz) on most
kernels.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]