Re: [HACKERS] Stuck spins in current

2001-03-21 Thread Vadim Mikheev

  BTW, I've got ~320tps with 50 clients inserting (int4, text[1-256])
  records into 50 tables (-B 16384, wal_buffers = 256) on Ultra10
  with 512Mb RAM, IDE (clients run on the same host as server).
 
 Not bad.  What were you getting before these recent changes?

As I already reported - with O_DSYNC this test shows 30% better
performance than with fsync.

(BTW, seems in all my tests I was using -O0 flag...)

Vadim



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Stuck spins in current

2001-03-21 Thread Bruce Momjian

   BTW, I've got ~320tps with 50 clients inserting (int4, text[1-256])
   records into 50 tables (-B 16384, wal_buffers = 256) on Ultra10
   with 512Mb RAM, IDE (clients run on the same host as server).
  
  Not bad.  What were you getting before these recent changes?
 
 As I already reported - with O_DSYNC this test shows 30% better
 performance than with fsync.
 
 (BTW, seems in all my tests I was using -O0 flag...)

Good data point.  I could never understand why we would ever use the
normal sync if we had a data-only sync option available.  I can imagine
the data-only being the same, but never slower.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



RE: [HACKERS] Stuck spins in current

2001-03-19 Thread Mikheev, Vadim

 "Vadim Mikheev" [EMAIL PROTECTED] writes:
  Anyway, deadlock in my tests are very correlated with new log file
  creation - something probably is still wrong...
 
 Well, if you can reproduce it easily, seems like you could 
 get in there and verify or disprove my theory about where
 the deadlock is.

You were right - deadlock disappeared.

BTW, I've got ~320tps with 50 clients inserting (int4, text[1-256])
records into 50 tables (-B 16384, wal_buffers = 256) on Ultra10
with 512Mb RAM, IDE (clients run on the same host as server).

Vadim

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] Stuck spins in current

2001-03-19 Thread Tom Lane

"Mikheev, Vadim" [EMAIL PROTECTED] writes:
 "Vadim Mikheev" [EMAIL PROTECTED] writes:
 Anyway, deadlock in my tests are very correlated with new log file
 creation - something probably is still wrong...
 
 Well, if you can reproduce it easily, seems like you could 
 get in there and verify or disprove my theory about where
 the deadlock is.

 You were right - deadlock disappeared.

Okay, good.  I'll bet the correlation to new-log-file was just because
the WAL insert_lck gets held for a longer time than usual if XLogInsert
is forced to call XLogWrite and that in turn is forced to make a new
log file.  Were you running with wal_files = 0?  The problem would
likely not have shown up at all if logfiles were created in advance...

 BTW, I've got ~320tps with 50 clients inserting (int4, text[1-256])
 records into 50 tables (-B 16384, wal_buffers = 256) on Ultra10
 with 512Mb RAM, IDE (clients run on the same host as server).

Not bad.  What were you getting before these recent changes?

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] Stuck spins in current

2001-03-17 Thread Tom Lane

"Mikheev, Vadim" [EMAIL PROTECTED] writes:
 xlog.c revision 1.55 from Feb 26 already had log file
 zero-filling, so ...
 
 Oh, you're right, I didn't study the CVS log carefully enough.  Hmm,
 maybe the control file lock isn't the problem.  The abort() in
 s_lock_stuck should have left a core file --- what is the backtrace?

 After 10 times increasing DEFAULT_TIMEOUT in s_lock.c
 I got abort in xlog.c:626 - waiting for insert_lck.
 But problem is near new log file creation code: system
 goes sleep just after new one is created.

Have you learned any more about this?  Or can you send your test program
so other people can try it?

In the meantime, even if it turns out there's a different problem here,
it seems clear to me that it's a bad idea to use a plain spinlock to
interlock xlog segment creation.  The spinlock timeouts are not set
high enough to be safe for something that could take several seconds.
Unless someone objects, I will go ahead and work on the change I
suggested yesterday to not hold the ControlFileLockId spinlock while
we are zero-filling the new segment.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] Stuck spins in current

2001-03-17 Thread Tom Lane

"Mikheev, Vadim" [EMAIL PROTECTED] writes:
 And you know - I've run same tests on ~ Mar 9 snapshot
 without any problems.

Oh, I see it:

Process A is doing GetSnapShotData.  It holds SInvalLock and calls
ReadNewTransactionId, which wants XidGenLockId.

Process B is doing GetNewTransactionId.  It holds XidGenLockId and
has run out of XIDs, so it needs to write a NEXTXID log record.
Therefore, it calls XLogInsert which wants the insert_lck.

Process C is inside XLogInsert on its first xlog entry of a transaction.
It holds the insert_lck and wants to put its XID into MyProc-logRec,
for which it needs SInvalLock.

Ooops.

At this point I must humbly say "yes, you told me so", because if I
hadn't insisted that we needed NEXTXID records then we wouldn't have
this deadlock.

It looks to me like the simplest answer is to take NEXTXID records
out again.  (Fortunately, there doesn't seem to be any comparable
cycle involving OidGenLock, or we'd need to think of a better answer.)
I shall retire to lick my wounds, and make the changes tomorrow ...

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Stuck spins in current

2001-03-17 Thread Vadim Mikheev

 At this point I must humbly say "yes, you told me so", because if I

No, I didn't - I must humbly say that I didn't foresee this deadlock,
so "I didn't tell you so" -:)

Anyway, deadlock in my tests are very correlated with new log file
creation - something probably is still wrong...

Vadim



---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] Stuck spins in current

2001-03-17 Thread Tom Lane

"Vadim Mikheev" [EMAIL PROTECTED] writes:
 Anyway, deadlock in my tests are very correlated with new log file
 creation - something probably is still wrong...

Well, if you can reproduce it easily, seems like you could get in there
and verify or disprove my theory about where the deadlock is.

Or send the testbed and I'll try ...

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



[HACKERS] Stuck spins in current

2001-03-16 Thread Mikheev, Vadim

Got it at spin.c:156 with 50 clients doing inserts into
50 tables (int4, text[1-256 bytes]).
-B 16384, -wal_buffers=256 (with default others wal params).

Vadim

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Stuck spins in current

2001-03-16 Thread Tom Lane

"Mikheev, Vadim" [EMAIL PROTECTED] writes:
 Got it at spin.c:156 with 50 clients doing inserts into
 50 tables (int4, text[1-256 bytes]).
 -B 16384, -wal_buffers=256 (with default others wal params).

SpinAcquire() ... but on which lock?

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] Stuck spins in current

2001-03-16 Thread Tom Lane

"Mikheev, Vadim" [EMAIL PROTECTED] writes:
 Got it at spin.c:156 with 50 clients doing inserts into
 50 tables (int4, text[1-256 bytes]).
 -B 16384, -wal_buffers=256 (with default others wal params).

 SpinAcquire() ... but on which lock?

After a little bit of thought I'll bet it's ControlFileLockId.

Likely we shouldn't be using a spinlock at all for that, but the
short-term solution might be a longer timeout for this particular lock.
Alternatively, could we avoid holding that lock while initializing a
new log segment?

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



RE: [HACKERS] Stuck spins in current

2001-03-16 Thread Mikheev, Vadim

  How to synchronize with checkpoint-er if wal_files  0?
 
 I was sort of visualizing assigning the created xlog files 
 dynamically:
 
   create a temp file of a PID-dependent name
   fill it with zeroes and fsync it
   acquire ControlFileLockId
   rename temp file into place as next uncreated segment
   update pg_control
   release ControlFileLockId
 
 Since the things are just filled with 0's, there's no need to 
 know which segment it will be while you're filling it.
 
 This would leave you sometimes with more advance files than you really
 needed, but so what ...

Yes, it has sence, but:

  And you know - I've run same tests on ~ Mar 9 snapshot
  without any problems.
 
 That was before I changed the code to pre-fill the file --- 
 now it takes longer to init a log segment.  And we're only
 using a plain SpinAcquire, not the flavor with a longer timeout.

xlog.c revision 1.55 from Feb 26 already had log file
zero-filling, so ...

Vadim

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] Stuck spins in current

2001-03-16 Thread Tom Lane

"Mikheev, Vadim" [EMAIL PROTECTED] writes:
 Alternatively, could we avoid holding that lock while initializing a
 new log segment?

 How to synchronize with checkpoint-er if wal_files  0?

I was sort of visualizing assigning the created xlog files dynamically:

create a temp file of a PID-dependent name
fill it with zeroes and fsync it
acquire ControlFileLockId
rename temp file into place as next uncreated segment
update pg_control
release ControlFileLockId

Since the things are just filled with 0's, there's no need to know which
segment it will be while you're filling it.

This would leave you sometimes with more advance files than you really
needed, but so what ...

 And you know - I've run same tests on ~ Mar 9 snapshot
 without any problems.

That was before I changed the code to pre-fill the file --- now it takes
longer to init a log segment.  And we're only using a plain SpinAcquire,
not the flavor with a longer timeout.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



RE: [HACKERS] Stuck spins in current

2001-03-16 Thread Mikheev, Vadim

  Got it at spin.c:156 with 50 clients doing inserts into
  50 tables (int4, text[1-256 bytes]).
  -B 16384, -wal_buffers=256 (with default others wal params).
 
  SpinAcquire() ... but on which lock?
 
 After a little bit of thought I'll bet it's ControlFileLockId.

I see "XLogWrite: new log file created..." in postmaster' log -
backend writes this after releasing ControlFileLockId.

 Likely we shouldn't be using a spinlock at all for that, but the
 short-term solution might be a longer timeout for this 
 particular lock.
 Alternatively, could we avoid holding that lock while initializing a
 new log segment?

How to synchronize with checkpoint-er if wal_files  0?
And you know - I've run same tests on ~ Mar 9 snapshot
without any problems.

Vadim

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] Stuck spins in current

2001-03-16 Thread Tom Lane

"Mikheev, Vadim" [EMAIL PROTECTED] writes:
 And you know - I've run same tests on ~ Mar 9 snapshot
 without any problems.
 
 That was before I changed the code to pre-fill the file --- 
 now it takes longer to init a log segment.  And we're only
 using a plain SpinAcquire, not the flavor with a longer timeout.

 xlog.c revision 1.55 from Feb 26 already had log file
 zero-filling, so ...

Oh, you're right, I didn't study the CVS log carefully enough.  Hmm,
maybe the control file lock isn't the problem.  The abort() in
s_lock_stuck should have left a core file --- what is the backtrace?

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



RE: [HACKERS] Stuck spins in current

2001-03-16 Thread Mikheev, Vadim

  And you know - I've run same tests on ~ Mar 9 snapshot
  without any problems.
  
  That was before I changed the code to pre-fill the file --- 
  now it takes longer to init a log segment.  And we're only
  using a plain SpinAcquire, not the flavor with a longer timeout.
 
  xlog.c revision 1.55 from Feb 26 already had log file
  zero-filling, so ...
 
 Oh, you're right, I didn't study the CVS log carefully enough.  Hmm,
 maybe the control file lock isn't the problem.  The abort() in
 s_lock_stuck should have left a core file --- what is the backtrace?

After 10 times increasing DEFAULT_TIMEOUT in s_lock.c
I got abort in xlog.c:626 - waiting for insert_lck.
But problem is near new log file creation code: system
goes sleep just after new one is created.

Vadim

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])