date:20111224

Re: [HACKERS] 16-bit page checksums for 9.2

2011-12-24 Thread Greg Stark

On Sat, Dec 24, 2011 at 4:06 PM, Simon Riggs  wrote:
> Checksums merely detect a problem, whereas FPWs correct a problem if
> it happens, but only in crash situations.
>
> So this does nothing to remove the need for FPWs, though checksum
> detection could be used for double write buffers also.

This is missing the point. If you have a torn page on a page that is
only dirty due to hint bits then the checksum will show a spurious
checksum failure. It will "detect" a problem that isn't there.

The problem is that there is no WAL indicating the hint bit change.
And if the torn page includes the new checksum but not the new hint
bit or vice versa it will be a checksum mismatch.

The strategy discussed in the past was moving all the hint bits to a
common area and skipping them in the checksum. No amount of double
writing or buffering or locking will avoid this problem.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Moving more work outside WALInsertLock

2011-12-24 Thread Greg Stark

On Fri, Dec 16, 2011 at 3:27 AM, Tom Lane  wrote:
>> On its own that sounds dangerous, but its not. When we need to confirm
>> the prev link we already know what we expect it to be, so CRC-ing it
>> is overkill. That isn't true of any other part of the WAL record, so
>> the prev link is the only thing we can relax, but thats OK because we
>> can CRC check everything else outside of the locked section.
>
>> That isn't my idea, but I'm happy to put it on the table since I'm not shy.
>
> I'm glad it's not your idea, because it's a bad one.

I'll take the blame or credit here.

> A large part of
> the point of CRC'ing WAL records is to guard against torn-page problems
> in the WAL files, and doing things like that would give up a significant
> part of that protection, because there would no longer be any assurance
> that the body of a WAL record had anything to do with its prev_link.

Hm, I hadn't considered the possibility of a prev_link being the only
thing left over from a torn page. As Heikki pointed out having the CRC
and the rest of the record on opposite sides of the prev_link does
seem like convincing protection but it's a lot more fiddly and hard to
explain the dependencies this way.

Another thought that was discussed in the same dinner was separating
the CRC into a separate record that would cover all the WAL since the
last CRC. These would only need to be emitted when there's a WAL sync,
not on every record. I think someone showed some benchmarks claiming
that a significant overhead with the CRC was the startup and finishing
time for doing lots of small chunks. If it processes larger blocks it
might be able to make more efficient use of the memory bandwidth. I'm
not entirely convinced of that myself but it bears some
experimentation.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reprise: pretty print viewdefs

2011-12-24 Thread Greg Stark

On Thu, Dec 22, 2011 at 5:52 PM, Andrew Dunstan  wrote:
> I've looked at that, and it was discussed a bit previously. It's more
> complex because it requires that we keep track of (or calculate) where we
> are on the line,

You might try a compromise, just spit out all the columns on one line
*unless* either the previous or next column is longer than something
like 30 columns. So if you have a long list of short columns it just
gets wrapped by your terminal but if you have complex expressions like
CASE expressions or casts or so on they go on a line by themselves.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2011-12-24 Thread Andres Freund

On Saturday, December 24, 2011 05:01:02 PM Simon Riggs wrote:
> On Sat, Dec 24, 2011 at 3:54 PM, Andres Freund  wrote:
> > On Saturday, December 24, 2011 03:46:16 PM Tom Lane wrote:
> >> Simon Riggs  writes:
> >> > After the various recent discussions on list, I present what I believe
> >> > to be a working patch implementing 16-but checksums on all buffer
> >> > pages.
> >> 
> >> I think locking around hint-bit-setting is likely to be unworkable from
> >> a performance standpoint.  I also wonder whether it might not result in
> >> deadlocks.
> > 
> > Why don't you use the same tricks as the former patch and copy the
> > buffer, compute the checksum on that, and then write out that copy (you
> > can even do both at the same time). I have a hard time believing that
> > the additional copy is more expensive than the locking.
> 
> We would copy every time we write, yet lock only every time we set hint
> bits.
Isn't setting hint bits also a rather frequent operation? At least in a well-
cached workload where most writeout happens due to checkpoints.

> If that option is favoured, I'll write another version after Christmas.
Seems less complicated (wrt deadlocking et al) to me. But I havent read your 
patch, so I will shut up now ;)

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2011-12-24 Thread Simon Riggs

On Sat, Dec 24, 2011 at 3:51 PM, Aidan Van Dyk  wrote:

> Not an expert here, but after reading through the patch quickly, I
> don't see anything that changes the torn-page problem though, right?
>
> Hint bits aren't wal-logged, and FPW isn't forced on the hint-bit-only
> dirty, right?

Checksums merely detect a problem, whereas FPWs correct a problem if
it happens, but only in crash situations.

So this does nothing to remove the need for FPWs, though checksum
detection could be used for double write buffers also.

Checksums work even when there is no crash, so if your disk goes bad
and corrupts data then you'll know about it as soon as it happens.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2011-12-24 Thread Simon Riggs

On Sat, Dec 24, 2011 at 3:54 PM, Andres Freund  wrote:
> On Saturday, December 24, 2011 03:46:16 PM Tom Lane wrote:
>> Simon Riggs  writes:
>> > After the various recent discussions on list, I present what I believe
>> > to be a working patch implementing 16-but checksums on all buffer
>> > pages.
>>
>> I think locking around hint-bit-setting is likely to be unworkable from
>> a performance standpoint.  I also wonder whether it might not result in
>> deadlocks.

> Why don't you use the same tricks as the former patch and copy the buffer,
> compute the checksum on that, and then write out that copy (you can even do
> both at the same time). I have a hard time believing that the additional copy
> is more expensive than the locking.

We would copy every time we write, yet lock only every time we set hint bits.

If that option is favoured, I'll write another version after Christmas.

ISTM we can't write and copy at the same time because the cheksum is
not a trailer field.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2011-12-24 Thread Simon Riggs

On Sat, Dec 24, 2011 at 2:46 PM, Tom Lane  wrote:
> Simon Riggs  writes:
>> After the various recent discussions on list, I present what I believe
>> to be a working patch implementing 16-but checksums on all buffer
>> pages.
>
> I think locking around hint-bit-setting is likely to be unworkable from
> a performance standpoint.

Anyone choosing page_checksums = on has already made a performance
reducing decision in favour of reliability. So they understand and
accept the impact. There is no locking when the parameter is off.

A safe alternative is to use LockBuffer, which has a much greater
performance impact.

I did think about optimistically checking after the write, but if we
crash at that point we will then see a block that has an invalid
checksum. It's faster but you may get a checksum failure if you crash
- but then one important aspect of this is to spot problems in case of
a crash, so that seems unacceptable.

> I also wonder whether it might not result in
> deadlocks.

If you can see how, please say. I can't see any ways for that myself.

> Also, as far as I can see this patch usurps the page version field,
> which I find unacceptably short-sighted.  Do you really think this is
> the last page layout change we'll ever make?

No, I don't. I hope and expect the next page layout change to
reintroduce such a field.

But since we're agreed now that upgrading is important, changing page
format isn't likely to be happening until we get an online upgrade
process. So future changes are much less likely. If they do happen, we
have some flag bits spare that can be used to indicate later versions.
It's not the prettiest thing in the world, but it's a small ugliness
in return for an important feature. If there was a way without that, I
would have chosen it.

pg_filedump will need to be changed more than normal, but the version
isn't used anywhere else in the server code.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2011-12-24 Thread Andres Freund

On Saturday, December 24, 2011 03:46:16 PM Tom Lane wrote:
> Simon Riggs  writes:
> > After the various recent discussions on list, I present what I believe
> > to be a working patch implementing 16-but checksums on all buffer
> > pages.
> 
> I think locking around hint-bit-setting is likely to be unworkable from
> a performance standpoint.  I also wonder whether it might not result in
> deadlocks.
Why don't you use the same tricks as the former patch and copy the buffer, 
compute the checksum on that, and then write out that copy (you can even do 
both at the same time). I have a hard time believing that the additional copy 
is more expensive than the locking.


Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

2011-12-24 Thread Tom Lane

Simon Riggs  writes:
> After the various recent discussions on list, I present what I believe
> to be a working patch implementing 16-but checksums on all buffer
> pages.

I think locking around hint-bit-setting is likely to be unworkable from
a performance standpoint.  I also wonder whether it might not result in
deadlocks.

Also, as far as I can see this patch usurps the page version field,
which I find unacceptably short-sighted.  Do you really think this is
the last page layout change we'll ever make?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: bytea_agg

2011-12-24 Thread Merlin Moncure

On Fri, Dec 23, 2011 at 12:30 PM, Robert Haas  wrote:
> Well, because it doesn't operate on strings.
>
> I argued when we added string_agg that it ought to be called
> concat_agg, or something like that, but I got shouted down.  So now
> here we are.

+1.  Using the input type names to name the function is a mistake and
should be stopped...enough.  It's verbose and unnecessary...everything
else in sql is heavily overloaded (we don't have int_max() and
float_max()).

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Checksums + Double Writes

2011-12-24 Thread Simon Riggs

On Thu, Dec 22, 2011 at 9:58 PM, Simon Riggs  wrote:
> On Thu, Dec 22, 2011 at 9:50 AM, Kevin Grittner
>  wrote:
>
>> Simon, does it sound like I understand your proposal?
>
> Yes, thanks for restating.

I've implemented that proposal, posting patch on a separate thread.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] 16-bit page checksums for 9.2

2011-12-24 Thread Simon Riggs

After the various recent discussions on list, I present what I believe
to be a working patch implementing 16-but checksums on all buffer
pages.

page_checksums = on | off (default)

There are no required block changes; checksums are optional and some
blocks may have a checksum, others not. This means that the patch will
allow pg_upgrade.

That capability also limits us to 16-bit checksums. Fletcher's 16 is
used in this patch and seems rather quick, though that is easily
replaceable/tuneable if desired, perhaps even as a parameter enum.
This patch is a step on the way to 32-bit checksums in a future
redesign of the page layout, though that is not a required future
change, nor does this prevent that.

Checksum is set whenever the buffer is flushed to disk, and checked
when the page is read in from disk. It is not set at other times, and
for much of the time may not be accurate. This follows earlier
discussions from 2010-12-22, and is discussed in detail in patch
comments.

Note it works with buffer manager pages, which includes shared and
local data buffers, but not SLRU pages (yet? an easy addition but
needs other discussion around contention).

Note that all this does is detect bit errors on the page, it doesn't
identify where the error is, how bad and definitely not what caused it
or when it happened.

The main body of the patch involves changes to bufpage.c/.h so this
differs completely from the VMware patch, for technical reasons. Also
included are facilities to LockBufferForHints() with usage in various
AMs, to avoid the case where hints are set during calculation of the
checksum.

In my view this is a fully working, committable patch but I'm not in a
hurry to do so given the holiday season.

Hopefully its a gift not a turkey, and therefore a challenge for some
to prove that wrong. Enjoy either way,

Merry Christmas,

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
*** a/src/backend/access/hash/hash.c
--- b/src/backend/access/hash/hash.c
***
*** 282,288  hashgettuple(PG_FUNCTION_ARGS)
--- 282,290 
  			/*
  			 * Yes, so mark it by setting the LP_DEAD state in the item flags.
  			 */
+ 			LockBufferForHints(buf);
  			ItemIdMarkDead(PageGetItemId(page, offnum));
+ 			UnlockBufferForHints(buf);
  
  			/*
  			 * Since this can be redone later if needed, it's treated the same
*** a/src/backend/access/heap/pruneheap.c
--- b/src/backend/access/heap/pruneheap.c
***
*** 259,266  heap_page_prune(Relation relation, Buffer buffer, TransactionId OldestXmin,
--- 259,268 
  		if (((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
  			PageIsFull(page))
  		{
+ 			LockBufferForHints(buffer);
  			((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
  			PageClearFull(page);
+ 			UnlockBufferForHints(buffer);
  			SetBufferCommitInfoNeedsSave(buffer);
  		}
  	}
*** a/src/backend/access/nbtree/nbtinsert.c
--- b/src/backend/access/nbtree/nbtinsert.c
***
*** 403,415  _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel,
--- 403,427 
  	 * everyone, so we may as well mark the index entry
  	 * killed.
  	 */
+ 	if (nbuf != InvalidBuffer)
+ 		LockBufferForHints(nbuf);
+ 	else
+ 		LockBufferForHints(buf);
+ 
  	ItemIdMarkDead(curitemid);
  	opaque->btpo_flags |= BTP_HAS_GARBAGE;
+ 
  	/* be sure to mark the proper buffer dirty... */
  	if (nbuf != InvalidBuffer)
+ 	{
+ 		UnlockBufferForHints(nbuf);
  		SetBufferCommitInfoNeedsSave(nbuf);
+ 	}
  	else
+ 	{
+ 		UnlockBufferForHints(buf);
  		SetBufferCommitInfoNeedsSave(buf);
+ 	}
  }
  			}
  		}
*** a/src/backend/access/nbtree/nbtree.c
--- b/src/backend/access/nbtree/nbtree.c
***
*** 1040,1046  restart:
--- 1040,1048 
  			if (vstate->cycleid != 0 &&
  opaque->btpo_cycleid == vstate->cycleid)
  			{
+ LockBufferForHints(buf);
  opaque->btpo_cycleid = 0;
+ UnlockBufferForHints(buf);
  SetBufferCommitInfoNeedsSave(buf);
  			}
  		}
*** a/src/backend/commands/sequence.c
--- b/src/backend/commands/sequence.c
***
*** 1092,1101  read_info(SeqTable elm, Relation rel, Buffer *buf)
--- 1092,1103 
  	 */
  	if (HeapTupleHeaderGetXmax(tuple.t_data) != InvalidTransactionId)
  	{
+ 		LockBufferForHints(*buf);
  		HeapTupleHeaderSetXmax(tuple.t_data, InvalidTransactionId);
  		tuple.t_data->t_infomask &= ~HEAP_XMAX_COMMITTED;
  		tuple.t_data->t_infomask |= HEAP_XMAX_INVALID;
  		SetBufferCommitInfoNeedsSave(*buf);
+ 		UnlockBufferForHints(*buf);
  	}
  
  	seq = (Form_pg_sequence) GETSTRUCT(&tuple);
*** a/src/backend/storage/buffer/bufmgr.c
--- b/src/backend/storage/buffer/bufmgr.c
***
*** 440,446  ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
  			smgrread(smgr, forkNum, blockNum, (char *) bufBlock);
  
  			/* check for garbage da

Re: [HACKERS] CLOG contention

2011-12-24 Thread Simon Riggs

On Thu, Dec 22, 2011 at 4:20 PM, Robert Haas  wrote:

> You mentioned "latency" so this morning I ran pgbench with -l and
> graphed the output.  There are latency spikes every few seconds.  I'm
> attaching the overall graph as well as the graph of the last 100
> seconds, where the spikes are easier to see clearly.  Now, here's the
> problem: it seems reasonable to hypothesize that the spikes are due to
> CLOG page replacement since the frequency is at least plausibly right,
> but this is obviously not enough to prove that conclusively.  Ideas?

Thanks. That illustrates the effect I explained earlier very clearly,
so now we all know I wasn't speculating.

> Also, if it is that, what do we do about it?  I don't think any of the
> ideas proposed so far are going to help much.

If you don't like guessing, don't guess, don't think. Just measure.

Does increasing the number of buffers solve the problems you see? That
must be the first port of call - is that enough, or not? If not, we
can discuss the various ideas, write patches and measure them.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Moving more work outside WALInsertLock

2011-12-24 Thread Fujii Masao

On Sat, Dec 24, 2011 at 4:54 AM, Heikki Linnakangas
 wrote:
> Sorry. Last minute changes, didn't retest properly.. Here's another attempt.

When I tested the patch, initdb failed:

$ initdb -D data

initializing dependencies ... PANIC:  could not locate a valid checkpoint record

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 16-bit page checksums for 9.2

Re: [HACKERS] Moving more work outside WALInsertLock

Re: [HACKERS] reprise: pretty print viewdefs

Re: [HACKERS] 16-bit page checksums for 9.2

Re: [HACKERS] 16-bit page checksums for 9.2

Re: [HACKERS] 16-bit page checksums for 9.2

Re: [HACKERS] 16-bit page checksums for 9.2

Re: [HACKERS] 16-bit page checksums for 9.2

Re: [HACKERS] 16-bit page checksums for 9.2

Re: [HACKERS] patch: bytea_agg

Re: [HACKERS] Page Checksums + Double Writes

[HACKERS] 16-bit page checksums for 9.2

Re: [HACKERS] CLOG contention

Re: [HACKERS] Moving more work outside WALInsertLock

14 matches

Site Navigation

Mail list logo

Footer information