On Tue, 2005-07-26 at 19:15 -0400, Tom Lane wrote:
Josh Berkus josh@agliodbs.com writes:
We should run tests with much higher wal_buffers numbers to nullify the
effect described above and reduce contention. That way we will move
towards the log disk speed being the limiting factor, patch or
Simon,
I guess I'd be concerned that the poor bgwriter can't do all of this
work. I was thinking about a separate log writer, so we could have both
bgwriter and logwriter active simultaneously on I/O. It has taken a
while to get bgwriter to perform its duties efficiently, so I'd rather
not
On Fri, 2005-07-22 at 19:11 -0400, Tom Lane wrote:
Hmm. Eyeballing the NOTPM trace for cases 302912 and 302909, it sure
looks like the post-checkpoint performance recovery is *slower* in
the latter. And why is 302902 visibly slower overall than 302905?
I thought for a bit that you had gotten
Simon,
We should run tests with much higher wal_buffers numbers to nullify the
effect described above and reduce contention. That way we will move
towards the log disk speed being the limiting factor, patch or no patch.
I've run such tests, at a glance they do seem to improve performance. I
Josh Berkus josh@agliodbs.com writes:
We should run tests with much higher wal_buffers numbers to nullify the
effect described above and reduce contention. That way we will move
towards the log disk speed being the limiting factor, patch or no patch.
I've run such tests, at a glance they do
Tom,
I have no idea whether the DBT benchmarks would benefit at all, but
given that they are affected positively by increasing wal_buffers,
they must have a fair percentage of not-small transactions.
Even if they don't, we'll have series tests for DW here at GreenPlum soon,
and I'll bet
On Fri, 22 Jul 2005 19:11:36 -0400
Tom Lane [EMAIL PROTECTED] wrote:
BTW, I'd like to look at 302906, but its [Details] link is broken.
Ugh, I tried digging onto the internal systems and it looks like they
were destroyed (or not saved) somehow. It'll have to be rerun.
Sorry...
Mark
Tom,
This will remove just the CRC calculation work associated with backed-up
pages. Note that any attempt to recover from the WAL will fail, but I
assume you don't need that for the purposes of the test run.
Looks like the CRC calculation work isn't the issue. I did test runs of
no-CRC
Did you test with full_page_writes on and off?
---
Josh Berkus wrote:
Tom,
This will remove just the CRC calculation work associated with backed-up
pages. ?Note that any attempt to recover from the WAL will fail, but
Bruce,
Did you test with full_page_writes on and off?
I didn't use your full_page_writes version because Tom said it was
problematic. This is CVS from July 3rd.
--
--Josh
Josh Berkus
Aglio Database Solutions
San Francisco
---(end of
Josh Berkus wrote:
Bruce,
Did you test with full_page_writes on and off?
I didn't use your full_page_writes version because Tom said it was
problematic. This is CVS from July 3rd.
I think we need those tests run.
--
Bruce Momjian| http://candle.pha.pa.us
Bruce,
I think we need those tests run.
Sure. What CVS day should I grab? What's the option syntax? ( -c
full_page_writes=false)?
I have about 20 tests in queue right now but can stack yours up behind
them.
--
--Josh
Josh Berkus
Aglio Database Solutions
San Francisco
Josh Berkus wrote:
Bruce,
I think we need those tests run.
Sure. What CVS day should I grab? What's the option syntax? ( -c
full_page_writes=false)?
Yes. You can grab any from the day Tom fixed it, which was I think two
weeks ago.
I have about 20 tests in queue right now but
Josh Berkus josh@agliodbs.com writes:
I think this test run http://khack.osdl.org/stp/302903/results/0/, with a
30-min checkpoint shows pretty clearly that the behavior of the
performance drop is consistent with needing to re-prime the WAL will
full page images. Each checkpoint drops
Josh Berkus josh@agliodbs.com writes:
Looks like the CRC calculation work isn't the issue. I did test runs of
no-CRC vs. regular DBT2 with different checkpoint timeouts, and didn't
discern any statistical difference. See attached spreadsheet chart (the
two different runs are on two
Josh Berkus josh@agliodbs.com writes:
Bruce,
Did you test with full_page_writes on and off?
I didn't use your full_page_writes version because Tom said it was
problematic. This is CVS from July 3rd.
We already know the results: should be equivalent to the hack Josh
tried first.
So what
Greg,
For any benchmarking to be meaningful you have to set the checkpoint
interval to something more realistic. Something like 5 minutes. That way
when the final checkpoint cycle isn't completely included in the timing
data you'll at least be missing a statistically insignificant portion of
Tom,
Um, where are the test runs underlying this spreadsheet? I don't have a
whole lot of confidence in looking at full-run average TPM numbers to
discern whether transient dropoffs in TPM are significant or not.
Web in the form of:
http://khack.osdl.org/stp/#test_number#/
Where
Greg Stark [EMAIL PROTECTED] writes:
For any benchmarking to be meaningful you have to set the checkpoint interval
to something more realistic. Something like 5 minutes. That way when the final
checkpoint cycle isn't completely included in the timing data you'll at least
be missing a
Josh Berkus josh@agliodbs.com writes:
Um, where are the test runs underlying this spreadsheet? I don't have a
whole lot of confidence in looking at full-run average TPM numbers to
discern whether transient dropoffs in TPM are significant or not.
Web in the form of:
Tom,
There's something awfully weird going on here. I was prepared to see
no statistically-significant differences, but not multiple cases that
seem to be going the wrong direction.
There's a lot of variance in the tests. I'm currently running a variance
test battery on one machine to
Tom,
Josh, I see that all of those runs seem to be using wal_buffers = 8.
Have you tried materially increasing wal_buffers (say to 100 or 1000)
and/or experimenting with different wal_sync_method options since we
fixed the bufmgrlock problem? I am wondering if the real issue is
WAL buffer
Josh Berkus josh@agliodbs.com writes:
So, now that we know what the performance bottleneck is, how do we fix it?
Josh, I see that all of those runs seem to be using wal_buffers = 8.
Have you tried materially increasing wal_buffers (say to 100 or 1000)
and/or experimenting with different
Tom Lane wrote:
Simon Riggs [EMAIL PROTECTED] writes:
I don't think we should care too much about indexes. We can rebuild
them...but losing heap sectors means *data loss*.
If you're so concerned about *data loss* then none of this will be
acceptable to you at all. We are talking about
I don't think our problem is partial writes of WAL, which we already
check, but heap/index page writes, which we currently do not check for
partial writes.
---
Kevin Brown wrote:
Tom Lane wrote:
Simon Riggs [EMAIL
Bruce Momjian wrote:
I don't think our problem is partial writes of WAL, which we already
check, but heap/index page writes, which we currently do not check for
partial writes.
Hmm...I've read through the thread again and thought about the problem
further, and now think I understand what
On Fri, 2005-07-08 at 14:45 -0400, Tom Lane wrote:
Simon Riggs [EMAIL PROTECTED] writes:
I don't think we should care too much about indexes. We can rebuild
them...but losing heap sectors means *data loss*.
If you're so concerned about *data loss* then none of this will be
acceptable to
On Fri, 2005-07-08 at 09:34 +0200, Zeugswetter Andreas DAZ SD wrote:
The point here is that fsync-off is only realistic for development
or
playpen installations. You don't turn it off in a production
machine, and I can't see that you'd turn off the full-page-write
option either. So
Simon, Tom,
Will do. Results in a few days.
Actually, between the bad patch on the 5th and ongoing STP issues, I don't
think I will have results before I leave town.Will e-mail you offlist to
give you info to retrieve results.
Any chance you'd be able to do this with
ext3 and a
On R, 2005-07-08 at 14:45 -0400, Tom Lane wrote:
Simon Riggs [EMAIL PROTECTED] writes:
I don't think we should care too much about indexes. We can rebuild
them...but losing heap sectors means *data loss*.
There might be some merit in idea to disabling WAL/PITR for indexes,
where one can
The point here is that fsync-off is only realistic for development
or
playpen installations. You don't turn it off in a production
machine, and I can't see that you'd turn off the full-page-write
option either. So we have not solved anyone's performance problem.
Yes, this is basically
On Thu, 2005-07-07 at 11:59 -0400, Bruce Momjian wrote:
Tom Lane wrote:
Bruce Momjian pgman@candle.pha.pa.us writes:
Tom Lane wrote:
The point here is that fsync-off is only realistic for development
or playpen installations. You don't turn it off in a production
machine, and I
On 7/7/05, Bruce Momjian pgman@candle.pha.pa.us wrote:
One idea would be to just tie its behavior directly to fsync and remove
the option completely (that was the original TODO), or we can adjust it
so it doesn't have the same risks as fsync, or the same lack of failure
reporting as fsync.
I
Simon Riggs [EMAIL PROTECTED] writes:
Is there also a potential showstopper in the redo machinery? We work on
the assumption that the post-checkpoint block is available in WAL as a
before image. Redo for all actions merely replay the write action again
onto the block. If we must reapply the
On Thu, 7 Jul 2005, Tom Lane wrote:
We still don't know enough about the situation to know what a solution
might look like. Is the slowdown Josh is seeing due to the extra CPU
cost of the CRCs, or the extra I/O cost, or excessive locking of the
WAL-related data structures while we do this
Tom,
Great. BTW, don't bother testing snapshots between 2005/07/05 2300 EDT
and just now --- Bruce's full_page_writes patch introduced a large
random negative component into the timing ...
Ach. Starting over, then.
--Josh
--
Josh Berkus
Aglio Database Solutions
San Francisco
On Fri, 2005-07-08 at 09:47 -0400, Tom Lane wrote:
Simon Riggs [EMAIL PROTECTED] writes:
Having raised that objection, ISTM that checking for torn pages can be
accomplished reasonably well using a few rules...
I have zero confidence in this; the fact that you can think of
(incomplete,
Simon Riggs [EMAIL PROTECTED] writes:
I don't think we should care too much about indexes. We can rebuild
them...but losing heap sectors means *data loss*.
If you're so concerned about *data loss* then none of this will be
acceptable to you at all. We are talking about going from a system
that
On Wed, 2005-07-06 at 17:17 -0700, Joshua D. Drake wrote:
Tom, I think you're the only person that could or would be trusted to
make such a change. Even past the 8.1 freeze, I say we need to do
something now on this issue.
I think if we document full_page_writes as similar to fsync in
On Thu, 2005-07-07 at 00:29 -0400, Bruce Momjian wrote:
Tom Lane wrote:
Bruno Wolff III [EMAIL PROTECTED] writes:
Are you sure about that? That would probably be the normal case, but are
you promised that the hardware will write all of the sectors of a block
in order?
I don't
Are you sure about that? That would probably be the normal case, but
are you promised that the hardware will write all of the sectors of a
block in order?
I don't think you can possibly assume that. If the block
crosses a cylinder boundary then it's certainly an unsafe
assumption,
Here's an idea:
We read the page that we would have backed up, calc the CRC and
write a short WAL record with just the CRC, not the block. When
we recover we re-read the database page, calc its CRC and
compare it with the CRC from the transaction log. If they
differ, we know that the
Zeugswetter Andreas DAZ SD [EMAIL PROTECTED] writes:
Only workable solution would imho be to write the LSN to each 512
byte block (not that I am propagating that idea).
We're not doing anything like that, as it would create an impossible
space-management problem (or are you happy with limiting
I wrote:
We still don't know enough about the situation to know what a solution
might look like. Is the slowdown Josh is seeing due to the extra CPU
cost of the CRCs, or the extra I/O cost, or excessive locking of the
WAL-related data structures while we do this stuff, or ???. Need more
Only workable solution would imho be to write the LSN to each 512
byte
block (not that I am propagating that idea).
Only workable was a stupid formulation, I meant a solution that works
with
a LSN.
We're not doing anything like that, as it would create an
impossible space-management
Simon Riggs wrote:
SCSI tagged queueing certainly allows 512-byte blocks to be reordered
during writes.
Then a torn-page tell-tale is required that will tell us of any change
to any of the 512-byte sectors that make up a block/page.
Here's an idea:
We read the page that we would have
Bruce Momjian pgman@candle.pha.pa.us writes:
Yes, that is a good idea!
... which was shot down in the very next message.
regards, tom lane
---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your
Zeugswetter Andreas DAZ SD wrote:
Are you sure about that? That would probably be the normal case, but
are you promised that the hardware will write all of the sectors of a
block in order?
I don't think you can possibly assume that. If the block
crosses a cylinder boundary then
Tom Lane [EMAIL PROTECTED] writes:
Zeugswetter Andreas DAZ SD [EMAIL PROTECTED] writes:
Only workable solution would imho be to write the LSN to each 512
byte block (not that I am propagating that idea).
We're not doing anything like that, as it would create an impossible
Greg Stark [EMAIL PROTECTED] writes:
Tom Lane [EMAIL PROTECTED] writes:
What we *could* do is calculate a page-level CRC and
store it in the page header just before writing out. Torn pages
would then manifest as a wrong CRC on read. No correction ability,
but at least a reliable detection
Tom Lane wrote:
Bruce Momjian pgman@candle.pha.pa.us writes:
As far as #2, my posted proposal was to write the full pages to WAL when
they are written to the file system, and not when they are first
modified in the shared buffers ---
That is *completely* unworkable. Or were you planning
Bruce Momjian pgman@candle.pha.pa.us writes:
Tom Lane wrote:
The point here is that fsync-off is only realistic for development
or playpen installations. You don't turn it off in a production
machine, and I can't see that you'd turn off the full-page-write
option either. So we have not
Tom Lane wrote:
Bruce Momjian pgman@candle.pha.pa.us writes:
Tom Lane wrote:
The point here is that fsync-off is only realistic for development
or playpen installations. You don't turn it off in a production
machine, and I can't see that you'd turn off the full-page-write
option
Just to make my position perfectly clear: I don't want to see this
option shipped in 8.1. It's reasonable to have it in there for now
as an aid to our performance investigations, but I don't see that it
has any value for production.
Well, this is the first I am hearing that, and of course
Joshua D. Drake wrote:
Just to make my position perfectly clear: I don't want to see this
option shipped in 8.1. It's reasonable to have it in there for now
as an aid to our performance investigations, but I don't see that it
has any value for production.
Well, this is the first I
Simon Riggs wrote:
On Wed, 2005-07-06 at 18:22 -0400, Bruce Momjian wrote:
Well, I added #1 yesterday as 'full_page_writes', and it has the same
warnings as fsync (namely, on crash, be prepared to recovery or check
your system thoroughly.
Yes, which is why I comment now that the GUC
Tom,
Josh, is OSDL up enough that you can try another comparison run?
Thankfully, yes.
If so, please undo the previous patch (which disabled page dumping
entirely) and instead try removing this block of code, starting
at about xlog.c line 620 in CVS tip:
Will do. Results in a few days.
On Thu, Jul 07, 2005 at 11:36:40AM -0400, Tom Lane wrote:
Greg Stark [EMAIL PROTECTED] writes:
Tom Lane [EMAIL PROTECTED] writes:
What we *could* do is calculate a page-level CRC and
store it in the page header just before writing out. Torn pages
would then manifest as a wrong CRC on
Josh Berkus josh@agliodbs.com writes:
If so, please undo the previous patch (which disabled page dumping
entirely) and instead try removing this block of code, starting
at about xlog.c line 620 in CVS tip:
Will do. Results in a few days.
Great. BTW, don't bother testing snapshots between
On Wed, 2005-06-29 at 23:23 -0400, Tom Lane wrote:
Josh Berkus josh@agliodbs.com writes:
Uh, what exactly did you cut out? I suggested dropping the dumping of
full page images, but not removing CRCs altogether ...
Attached is the patch I used.
OK, thanks for the clarification. So it
Simon Riggs wrote:
On Wed, 2005-06-29 at 23:23 -0400, Tom Lane wrote:
Josh Berkus josh@agliodbs.com writes:
Uh, what exactly did you cut out? I suggested dropping the dumping of
full page images, but not removing CRCs altogether ...
Attached is the patch I used.
OK, thanks for
On Wed, 2005-07-06 at 18:22 -0400, Bruce Momjian wrote:
Well, I added #1 yesterday as 'full_page_writes', and it has the same
warnings as fsync (namely, on crash, be prepared to recovery or check
your system thoroughly.
Yes, which is why I comment now that the GUC alone is not enough.
There
Tom, I think you're the only person that could or would be trusted to
make such a change. Even past the 8.1 freeze, I say we need to do
something now on this issue.
I think if we document full_page_writes as similar to fsync in risk, we
are OK for 8.1, but if something can be done easily, it
Simon Riggs wrote:
I agree we *must* have the GUC, but we also *must* have a way for crash
recovery to tell us for certain that it has definitely worked, not just
maybe worked.
Doesn't the same argument apply to the existing fsync = off case? i.e.
we already have a case where we don't provide
Simon Riggs [EMAIL PROTECTED] writes:
On Wed, 2005-07-06 at 18:22 -0400, Bruce Momjian wrote:
Well, I added #1 yesterday as 'full_page_writes', and it has the same
warnings as fsync (namely, on crash, be prepared to recovery or check
your system thoroughly.
Yes, which is why I comment now
On Wed, Jul 06, 2005 at 21:48:44 +0100,
Simon Riggs [EMAIL PROTECTED] wrote:
We could implement the torn-pages option, but that seems a lot of work.
Another way of implementing a tell-tale would be to append the LSN again
as a data page trailer as the last 4 bytes of the page. Thus the LSN
Bruno Wolff III [EMAIL PROTECTED] writes:
Are you sure about that? That would probably be the normal case, but are
you promised that the hardware will write all of the sectors of a block
in order?
I don't think you can possibly assume that. If the block crosses a
cylinder boundary then it's
Tom Lane wrote:
Bruno Wolff III [EMAIL PROTECTED] writes:
Are you sure about that? That would probably be the normal case, but are
you promised that the hardware will write all of the sectors of a block
in order?
I don't think you can possibly assume that. If the block crosses a
Bruce Momjian pgman@candle.pha.pa.us writes:
I have an idea! Currently we write the backup pages (copies of pages
modified since the last checkpoint) when we write the WAL changes as
part of the commit. See the XLogCheckBuffer() call in XLogInsert().
Can someone explain exactly what the
On Sun, 3 Jul 2005 04:47 pm, Greg Stark wrote:
Bruce Momjian pgman@candle.pha.pa.us writes:
I have an idea! Currently we write the backup pages (copies of pages
modified since the last checkpoint) when we write the WAL changes as
part of the commit. See the XLogCheckBuffer() call in
Greg Stark [EMAIL PROTECTED] writes:
Can someone explain exactly what the problem being defeated by writing whole
pages to the WAL log?
Partial writes. Without the full-page image, we do not have enough
information in WAL to reconstruct the correct page contents.
A further optimization would
Tom Lane [EMAIL PROTECTED] writes:
Greg Stark [EMAIL PROTECTED] writes:
Can someone explain exactly what the problem being defeated by writing whole
pages to the WAL log?
Partial writes. Without the full-page image, we do not have enough
information in WAL to reconstruct the correct
Greg Stark [EMAIL PROTECTED] writes:
Tom Lane [EMAIL PROTECTED] writes:
Partial writes. Without the full-page image, we do not have enough
information in WAL to reconstruct the correct page contents.
Sure, but why not?
If a 8k page contains 16 low level segments on disk and the old data is
Tom Lane wrote:
Josh Berkus josh@agliodbs.com writes:
Uh, what exactly did you cut out? I suggested dropping the dumping of
full page images, but not removing CRCs altogether ...
Attached is the patch I used.
OK, thanks for the clarification. So it does seem that dumping full
page
Magnus Hagander [EMAIL PROTECTED] writes
FWIW, MSSQL deals with this using Torn Page Detection. This is off by
default (no check at all!), but can be abled on a per-database level.
Note that it only *detects* torn pages. If it finds one, it won't start
and tell you to recover from backup. It
2. Think of a better defense against partial-page writes.
I like #2, or would if I could think of a better defense.
Ideas anyone?
FWIW, MSSQL deals with this using Torn Page Detection. This is off by
default (no check at all!), but can be abled on a per-database level.
Note that it only
Tom,
Database pages. The current theory is that we can completely
reconstruct from WAL data every page that's been modified since the
last checkpoint. So the first write of any page after a checkpoint
dumps a full image of the page into WAL; subsequent writes only write
differences.
What
Josh Berkus josh@agliodbs.com writes:
What I'm confused about is that this shouldn't be anything new for 8.1. Yet
8.1 has *worse* performance on the STP machines than 8.0 does, and it's
pretty much entirely due to this check.
That's simply not believable --- better recheck your analysis.
Tom,
What I'm confused about is that this shouldn't be anything new for
8.1. Yet 8.1 has *worse* performance on the STP machines than 8.0
does, and it's pretty much entirely due to this check.
That's simply not believable --- better recheck your analysis. If 8.1
is worse it's not
Tom, All:
Ok, finally managed though the peristent efforts of Mark Wong to get some
tests through. Here are two tests with the CRC and wall buffer checking
completely cut out of the code, as Tom suggested:
5-min checkpoint:
http://khack.osdl.org/stp/302738/results/0/
Josh Berkus josh@agliodbs.com writes:
Ok, finally managed though the peristent efforts of Mark Wong to get some
tests through. Here are two tests with the CRC and wall buffer checking
completely cut out of the code, as Tom suggested:
Uh, what exactly did you cut out? I suggested dropping
Tom,
Uh, what exactly did you cut out? I suggested dropping the dumping of
full page images, but not removing CRCs altogether ...
Attached is the patch I used. (it's a -Urn patch 'cause that's what STP
takes)
--
--Josh
Josh Berkus
Aglio Database Solutions
San Francisco
diff -urN
Josh Berkus josh@agliodbs.com writes:
Uh, what exactly did you cut out? I suggested dropping the dumping of
full page images, but not removing CRCs altogether ...
Attached is the patch I used.
OK, thanks for the clarification. So it does seem that dumping full
page images is a pretty big
Josh Berkus josh@agliodbs.com writes:
1. Offer a GUC to turn off full-page-image dumping, which you'd use only
if you really trust your hardware :-(
Are these just WAL pages? Or database pages as well?
Database pages. The current theory is that we can completely
reconstruct from WAL data
Tom,
1. Offer a GUC to turn off full-page-image dumping, which you'd use only
if you really trust your hardware :-(
Are these just WAL pages? Or database pages as well?
--
--Josh
Josh Berkus
Aglio Database Solutions
San Francisco
---(end of
85 matches
Mail list logo