Jonah H. Harris wrote:
Rather than potentially letting this slide past 8.4, I threw together
an extremely quick-hack patch at the smgr-layer for block-level
checksums.
One hard problem is how to deal with torn pages with non-WAL-logged
changes. Like heap hint bit updates, killing tuples in
* Gregory Stark:
I've also seen single-bit errors caused by bad memory in a network interface.
*Twice*. Particularly nasty since the CRC on TCP/IP packets is only 16-bit so
a large enough ftp transfer would eventually finish despite the packet loss
but with the occasional bits flipped. In
Dave Page wrote:
On Tue, Sep 30, 2008 at 2:53 PM, Heikki Linnakangas
[EMAIL PROTECTED] wrote:
Should pg_relation_indexes_size() include the FSMs of the indexes? Should
pg_relation_toast_size() include the toast index and FSM as well?
It might be worth revisiting the near identical discussions
Zdenek Kotala wrote:
Heikki Linnakangas napsal(a):
The FSM is not updated during WAL replay. That means that after crash
recovery, the FSM won't be completely up-to-date, but at roughly the
state it was at last checkpoint. In a warm stand-by, the FSM will
reflect the situation at last full
On Thu, Oct 2, 2008 at 8:56 AM, Heikki Linnakangas
[EMAIL PROTECTED] wrote:
It might be worth revisiting the near identical discussions we had
when Andreas I integrated this stuff into the backend for 8.1.
Good point. The previous discussions evolved to having two functions,
Le jeudi 02 octobre 2008, Heikki Linnakangas a écrit :
pg_relation_size('footable') for size of the main data fork
pg_relation_size('footable', 'fsm') for FSM size
As good as possible, if you ask me!
Regards,
--
dim
signature.asc
Description: This is a digitally signed message part.
On Thu, 2008-10-02 at 09:35 +0300, Heikki Linnakangas wrote:
Jonah H. Harris wrote:
Rather than potentially letting this slide past 8.4, I threw together
an extremely quick-hack patch at the smgr-layer for block-level
checksums.
One hard problem is how to deal with torn pages with
Heikki Linnakangas napsal(a):
Zdenek Kotala wrote:
Heikki Linnakangas napsal(a):
The FSM is not updated during WAL replay. That means that after crash
recovery, the FSM won't be completely up-to-date, but at roughly the
state it was at last checkpoint. In a warm stand-by, the FSM will
On 2 Oct 2008, at 05:44 AM, Tom Lane [EMAIL PROTECTED] wrote:
Hitoshi Harada [EMAIL PROTECTED] writes:
Hmm, I've looked over the patch. Logically window functions can
access
arbitrary rows that have been stored in a frame. Thus I had thought
tuplestore should hold all the positions and
Seems to me there's a bug in HEAD (and probably old branches
as well) when compiled with HAVE_INT64_TIMESTAMP. As shown below
It sometimes shows things like -6.-70 secs where 8.3
showed -6.70 secs.
I think the attached one-liner patch fixes this, as well as
another roundoff regression between
while reading documentation for pg_freespacemap contrib module i found a
small mistake - the functions are names pg_freespace and not
pg_freespacemap.
attached patch changes the sgml file with documentation.
best regards,
depesz
--
Linkedin: http://www.linkedin.com/in/depesz / blog:
hubert depesz lubaczewski wrote:
while reading documentation for pg_freespacemap contrib module i found a
small mistake - the functions are names pg_freespace and not
pg_freespacemap.
attached patch changes the sgml file with documentation.
Fixed, thanks.
--
Heikki Linnakangas
Ron Mayer wrote:
Ron Mayer wrote:
Tom Lane wrote:
...GUC that selected PG traditional, SQL-standard... interval output
format seems like it could be a good idea.
This is an update to the earlier SQL-standard-interval-literal output
patch that I submitted here:
Ron Mayer wrote:
Tom Lane wrote:
In fact, given that we are now
somewhat SQL-compliant on interval input, a GUC that selected
PG traditional, SQL-standard, or ISO 8601 interval output format seems
like it could be a good idea.
This patch (that works on top of the IntervalStyle patch I
posted
Tom Lane wrote:
[EMAIL PROTECTED] (Peter Eisentraut) writes:
Allow pg_regress to be run outside the build tree. Look for input files
in both input and output dir, to handle vpath builds more simply.
Buildfarm doesn't like this patch much :-(. It's definitely not working
on the MSVC setup,
Peter Eisentraut [EMAIL PROTECTED] writes:
vpath build should be fixed now. MSVC will need to update the
pg_regress call sites, but I'll leave that to those maintaining that
build system. In particular, the --dlpath option needs to be added.
contrib is still pretty broken. One part of it
Heikki Linnakangas [EMAIL PROTECTED] writes:
Following that philosophy, I think the idea of adding a new optional
fork name argument to pg_relation_size() is the right thing to do:
pg_relation_size('footable') for size of the main data fork
pg_relation_size('footable', 'fsm') for FSM size
I have a stupid question wrt hint bits and CRC checksums- it seems to me
that it should be possible, if you change the hint bits, to be able to
very easily calculate what the change in the CRC checksum should be.
The basic idea of the CRC checksum is that, given a message x, the
checksum is x
On Thu, Oct 2, 2008 at 9:07 AM, Brian Hurt [EMAIL PROTECTED] wrote:
I have a stupid question wrt hint bits and CRC checksums- it seems to me
that it should be possible, if you change the hint bits, to be able to very
easily calculate what the change in the CRC checksum should be.
Doesn't the
Tom Lane wrote:
Peter Eisentraut [EMAIL PROTECTED] writes:
vpath build should be fixed now. MSVC will need to update the
pg_regress call sites, but I'll leave that to those maintaining that
build system. In particular, the --dlpath option needs to be added.
contrib is still pretty broken.
Peter Eisentraut [EMAIL PROTECTED] writes:
Tom Lane wrote:
contrib is still pretty broken. One part of it is needing a --srcdir
option in pgxs.mk, which I fixed. But some of the modules still
fail --- it looks like the problem is with the ones having data/
subdirectories. You seem to have
Jonah H. Harris wrote:
On Thu, Oct 2, 2008 at 9:07 AM, Brian Hurt [EMAIL PROTECTED] wrote:
I have a stupid question wrt hint bits and CRC checksums- it seems to me
that it should be possible, if you change the hint bits, to be able to very
easily calculate what the change in the CRC checksum
Jonah H. Harris [EMAIL PROTECTED] writes:
On Thu, Oct 2, 2008 at 9:07 AM, Brian Hurt [EMAIL PROTECTED] wrote:
I have a stupid question wrt hint bits and CRC checksums- it seems to me
that it should be possible, if you change the hint bits, to be able to very
easily calculate what the change
On Thu, Oct 2, 2008 at 9:36 AM, Brian Hurt [EMAIL PROTECTED] wrote:
Another possibility is to just not checksum the hint bits...
Seems like that would just complicate matters and prevent a viable checksum.
--
Jonah H. Harris, Senior DBA
myYearbook.com
--
Sent via pgsql-hackers mailing list
On Thu, Oct 2, 2008 at 9:42 AM, Gregory Stark [EMAIL PROTECTED] wrote:
It's even worse than that. Two processes can both be fiddling hint bits on
different tuples (or even the same tuple) at the same time.
Agreed. Back to the double-buffer idea, we could have a temporary
BLCKSZ buffer we could
Brian Hurt wrote:
Jonah H. Harris wrote:
On Thu, Oct 2, 2008 at 9:07 AM, Brian Hurt [EMAIL PROTECTED]
wrote:
I have a stupid question wrt hint bits and CRC checksums- it seems to me
that it should be possible, if you change the hint bits, to be able
to very
easily calculate what the change
Ron Mayer [EMAIL PROTECTED] writes:
Seems to me there's a bug in HEAD (and probably old branches
as well) when compiled with HAVE_INT64_TIMESTAMP. As shown below
It sometimes shows things like -6.-70 secs where 8.3
showed -6.70 secs.
I think the attached one-liner patch fixes this, as well
Heikki Linnakangas [EMAIL PROTECTED] writes:
Brian Hurt wrote:
Another possibility is to just not checksum the hint bits...
That would work. But I'm afraid it'd make the implementation a lot more
invasive, and also slower. The buffer manager would have to know what
kind of a page it's
Tom Lane wrote:
Heikki Linnakangas [EMAIL PROTECTED] writes:
Brian Hurt wrote:
Another possibility is to just not checksum the hint bits...
That would work. But I'm afraid it'd make the implementation a lot more
invasive, and also slower. The buffer manager would have to know what
kind of
Jonah H. Harris wrote:
On Thu, Oct 2, 2008 at 1:29 AM, Jonah H. Harris [EMAIL PROTECTED] wrote:
I ran the regressions and several concurrent benchmark tests which
passed successfully, but I'm sure I'm missing quite a bit due to the
the fact that it's late, it's just a quick hack, and I haven't
On Thu, Oct 2, 2008 at 10:09 AM, Andrew Chernow [EMAIL PROTECTED] wrote:
I read through this patch and am curious why 0xdeadbeef was used as an
uninitialized value for the page crc. Is this value somehow less likely to
have collisons than zero (or any other arbitrary value)?
It was just an
Jonah H. Harris wrote:
On Thu, Oct 2, 2008 at 10:09 AM, Andrew Chernow [EMAIL PROTECTED] wrote:
Would it not be better to add a boolean bit or byte to inidcate the crc
state?
Ideally, though we don't have any spare bits to play with in MAXALIGN=4.
In the page header? There's plenty of free
Tom Lane wrote:
I think the right fix would be to convert those .sql files to
input/*.source files and have pg_regress substitute the absolute
directory names in them, like it is done for the backend.
Ugh. I don't think it's acceptable to make contrib modules have to do
that. Even if we fix
On Thu, Oct 2, 2008 at 10:27 AM, Heikki Linnakangas
[EMAIL PROTECTED] wrote:
Ideally, though we don't have any spare bits to play with in MAXALIGN=4.
In the page header? There's plenty of free bits in pd_flags.
Ahh, didn't see that. Good catch!
But isn't it a bit dangerous to have a single
Peter Eisentraut [EMAIL PROTECTED] writes:
Tom Lane wrote:
Ugh. I don't think it's acceptable to make contrib modules have to do
that. Even if we fix all the ones in core, what of other people relying
on the pgxs infrastructure?
Yeah, true. Maybe copy the data directory over, but let
Heikki Linnakangas [EMAIL PROTECTED] writes:
Tom Lane wrote:
The problem we still have to solve is torn pages when writing back a
hint-bit update ...
Not checksumming the hint bits *is* a solution to the torn page problem.
Yeah, but it has enough drawbacks that I'd like to keep looking for
Andrew Chernow [EMAIL PROTECTED] writes:
I read through this patch and am curious why 0xdeadbeef was used as an
uninitialized value for the page crc. Is this value somehow less likely
to have collisons than zero (or any other arbitrary value)?
Actually, because that's a favorite bit pattern
On Thu, Oct 2, 2008 at 10:41 AM, Tom Lane [EMAIL PROTECTED] wrote:
Not checksumming the hint bits *is* a solution to the torn page problem.
Yeah, but it has enough drawbacks that I'd like to keep looking for
alternatives.
Agreed.
One argument that I've not seen raised is that not
So, it comes down to two possible designs, each with its own set of challenges.
Just to see where to go from here... I want to make sure the options
I've seen in this thread are laid out clearly:
1. Hold an exclusive lock on the buffer during the call to smgrwrite
OR
2. Doublebuffer the write
OR
On Thu, Oct 2, 2008 at 8:40 PM, Alvaro Herrera
[EMAIL PROTECTED]wrote:
Reg Me Please escribió:
Il Thursday 02 October 2008 16:15:10 Alvaro Herrera ha scritto:
You can nest blocks arbitrarily, giving you the chance to selectively
rollback pieces of the function. It's only a bit more
On Thursday 02 October 2008 08:37:59 Tom Lane wrote:
Heikki Linnakangas [EMAIL PROTECTED] writes:
Following that philosophy, I think the idea of adding a new optional
fork name argument to pg_relation_size() is the right thing to do:
pg_relation_size('footable') for size of the main data
Jonah H. Harris [EMAIL PROTECTED] writes:
Just to see where to go from here... I want to make sure the options
I've seen in this thread are laid out clearly:
1. Hold an exclusive lock on the buffer during the call to smgrwrite
OR
2. Doublebuffer the write
OR
3. Do some crufty magic to
Gurjeet Singh escribió:
I have seen this feature being asked for, and this work-around suggested so
many times. If plpgql does it internally, why not provide a clean interface
for this? Is there some road-block, or that nobody has ever tried it?
Initially we aimed at just exposing SAVEPOINT
Have you looked at creating a function in perl and creating a new
connection? Or using a dblink query which can create a new connection?
These two methods work. I have used them to insert to a log table regardless
of the parent transaction being commited or rolled back.
A old example I posted of
On Tuesday 30 September 2008 17:17:10 Decibel! wrote:
On Sep 30, 2008, at 1:48 PM, Heikki Linnakangas wrote:
Doesn't some filesystems include a per-block CRC, which would
achieve the same thing? ZFS?
Sure, some do. We're on linux and can't run ZFS. And I'll argue that
no linux FS is
* Tom Lane [EMAIL PROTECTED] [081002 11:40]:
Jonah H. Harris [EMAIL PROTECTED] writes:
Just to see where to go from here... I want to make sure the options
I've seen in this thread are laid out clearly:
1. Hold an exclusive lock on the buffer during the call to smgrwrite
OR
2.
On Thu, 02 Oct 2008 11:57:30 -0400
Robert Treat [EMAIL PROTECTED] wrote:
Actually we had someone on irc yesterday explaining how they were
able to run zfs on debian linux, so that option might be closer than
you think.
Its user mode. Not sure I would suggest that from a production server
On Thu, Oct 2, 2008 at 12:05 PM, Aidan Van Dyk [EMAIL PROTECTED] wrote:
How does your current write strategy handle this situation. I mean,
how do you currently guarnetee that between when you call write() and
the kernel copies the buffer internally, no hint-bit are updated?
Working on the
* Jonah H. Harris [EMAIL PROTECTED] [081002 12:43]:
#define write(fd, buf, count) buffer_crc_write(fd, buf, count)
I certainly wouldn't interpose the write() call itself; that's just
asking for trouble.
Of course not, that was only to show that whatever you currenlty pritect
write() with,
On Thu, Oct 2, 2008 at 12:51 PM, Aidan Van Dyk [EMAIL PROTECTED] wrote:
But I thought you didn't really care about hint-bit updates, even in the
current strategy... but I'm fully ignorant about the code, sorry...
The current implementation does not take it into account.
So if PG currently
Jonah H. Harris wrote:
PG doesn't care because during hint-bits aren't logged and during
normal WAL replay, the old page will be pulled from the WAL. I
believe what Tom is referring to is that the buffer PG sends to
write() can still be modified by way of SetHintBits between the time
Tom Lane wrote:
Yeah, bug all the way back --- applied.
I don't much like the forced rounding to two digits here, but changing
that doesn't seem like material for back-patching. Are you going to
fix that up while working on your other patches?
Gladly. I hate that too.
I think I can also
On Tuesday 30 September 2008 02:19:31 Robins Tharakan wrote:
Hi,
While making a complex database back-end, I have at-hand about 200 odd
functions and frankly 'management of functions' is already getting quite
tedious. Since the count is certain to rise, I am looking for a good tool
to do
* Bruce Momjian [EMAIL PROTECTED] [081002 13:07]:
Jonah H. Harris wrote:
PG doesn't care because during hint-bits aren't logged and during
normal WAL replay, the old page will be pulled from the WAL. I
believe what Tom is referring to is that the buffer PG sends to
write() can still be
On Thu, Oct 2, 2008 at 1:07 PM, Bruce Momjian [EMAIL PROTECTED] wrote:
If we're double-buffering the write, I don't see where we could be
introducing a torn-page, as we'd actually be writing a copied version
of the buffer. Will look into this.
The torn page is during kernel write to disk, I
On 2 Oct 2008, at 05:51 PM, Aidan Van Dyk [EMAIL PROTECTED] wrote:
So if PG currently doesn't care about the hit-bits being updated,
during
the write, then why should introducing a double-buffer introduce the a
torn-page problem Tom mentions? I admit, I'm fishing for information
from those
Jonah H. Harris wrote:
On Thu, Oct 2, 2008 at 1:07 PM, Bruce Momjian [EMAIL PROTECTED] wrote:
If we're double-buffering the write, I don't see where we could be
introducing a torn-page, as we'd actually be writing a copied version
of the buffer. Will look into this.
The torn page is
It's not the buffeting it's the checksum. The problem arises if a page is
read in but no wal logged modifications are done against it. If a hint bit
is modified it won't be wal logged but the page is marked dirty.
Ah. Thanks Greg. Let me look into this a bit before I respond :)
--
Greg Stark escribió:
Writing this explanation did bring to mind one solution which we had
already discussed for other reasons: not marking blocks dirty after hint
bit setting.
How about when a hint bit is set and the page is not already dirty, set
the checksum to the always valid value?
* Greg Stark [EMAIL PROTECTED] [081002 13:37]:
It's not the buffeting it's the checksum. The problem arises if a page
is read in but no wal logged modifications are done against it. If a
hint bit is modified it won't be wal logged but the page is marked
dirty.
When we write the page
Aidan Van Dyk [EMAIL PROTECTED] writes:
Wal logged changes are safe because of full_page_writes. Hint bits are
safe because either the old or the new value will be on disk and we
don't care which. It doesn't matter if some hint bits are set and some
aren't.
However the checksum won't
On Thu, Oct 2, 2008 at 1:58 PM, Gregory Stark [EMAIL PROTECTED] wrote:
On recovery after a torn-page write, won't the recovery of the
full_page_write WAL + WAL changes get us back to the page as it was
before the buffer+checksum+write?
Hint bit setting doesn't trigger a WAL record.
Hence, no
On Thu, Oct 2, 2008 at 1:44 PM, Alvaro Herrera
[EMAIL PROTECTED] wrote:
How about when a hint bit is set and the page is not already dirty, set
the checksum to the always valid value? The problem I have with this
idea is that there would be lots of pages excluded from the CRC checks,
a
* Jonah H. Harris [EMAIL PROTECTED] [081002 14:01]:
On Thu, Oct 2, 2008 at 1:58 PM, Gregory Stark [EMAIL PROTECTED] wrote:
On recovery after a torn-page write, won't the recovery of the
full_page_write WAL + WAL changes get us back to the page as it was
before the buffer+checksum+write?
Robert Treat wrote:
On Thursday 02 October 2008 08:37:59 Tom Lane wrote:
Heikki Linnakangas [EMAIL PROTECTED] writes:
There's currently two variants of both pg_relation_size and
pg_total_relation_size, one takes an OID and one takes a relation name
as argument. Any objections to having just
On Wednesday 01 October 2008 10:27:52 Tom Lane wrote:
[EMAIL PROTECTED] writes:
No, it's all about time penalties and loss of concurrency.
I don't think that the amount of time it would take to calculate and test
the sum is even important. It may be in older CPUs, but these days CPUs
Robert Treat wrote:
On Wednesday 01 October 2008 10:27:52 Tom Lane wrote:
Your optimism is showing ;-). XLogInsert routinely shows up as a major
CPU hog in any update-intensive test, and AFAICT that's mostly from the
CRC calculation for WAL records.
Yeah... for those who run on
Jonah H. Harris escribió:
On Thu, Oct 2, 2008 at 1:44 PM, Alvaro Herrera
[EMAIL PROTECTED] wrote:
How about when a hint bit is set and the page is not already dirty, set
the checksum to the always valid value? The problem I have with this
idea is that there would be lots of pages excluded
Ron Mayer wrote:
Tom Lane wrote:
Yeah, bug all the way back --- applied.
I don't much like the forced rounding to two digits here, but changing
that doesn't seem like material for back-patching. Are you going to
fix that up while working on your other patches?
Gladly. I hate that too.
* Alvaro Herrera [EMAIL PROTECTED] [081002 16:18]:
And, this still seems to have an issue with WAL, unless Simon's
original idea somehow included recording hint bit settings/dirtying
the page in WAL.
I have to admit I don't remember exactly how it worked :-) I think the
idea was
On Thu, 2008-10-02 at 16:18 -0400, Alvaro Herrera wrote:
Maybe we could mix this with Simon's approach to counting hint bit
setting, and calculate a valid CRC on the page every n-th non-logged
change.
I still think we should only calculate checksums on the actual write.
Well, if
On Thu, Oct 2, 2008 at 7:42 PM, Jonah H. Harris [EMAIL PROTECTED] wrote:
It's not the buffeting it's the checksum. The problem arises if a page is
read in but no wal logged modifications are done against it. If a hint bit
is modified it won't be wal logged but the page is marked dirty.
On Thu, 2008-09-25 at 18:28 -0400, Tom Lane wrote:
Simon Riggs [EMAIL PROTECTED] writes:
Version 7
After reading this for awhile, I realized that there is a rather
fundamental problem with it: it switches into consistent recovery
mode as soon as it's read WAL beyond
On Oct 1, 2008, at 2:03 PM, Sam Mason wrote:
I know you said detecting memory errors wasn't being attempted, but
bad memory accounts for a reasonable number of reports of database
corruption on -general so I was wondering if moving the checks around
could catch some of these.
Down the road, I
Dawid Kuroczko [EMAIL PROTECTED] writes:
On Thu, Oct 2, 2008 at 7:42 PM, Jonah H. Harris [EMAIL PROTECTED] wrote:
if checksum mismatch {
flip the hint bits [1]
I did try to make something like that work. But I didn't get anywhere. There
could easily be dozens of bits to flip. The
On Thu, 2008-09-11 at 17:58 +0300, Heikki Linnakangas wrote:
BTW, we haven't talked about how to acquire a snapshot in the slave.
You'll somehow need to know which transactions have not yet committed,
but will in the future. In the master, we keep track of in-progress
transaction in the
On Oct 2, 2008, at 3:18 PM, Alvaro Herrera wrote:
I have to admit I don't remember exactly how it worked :-) I think
the
idea was avoiding setting the page dirty until a certain number of
hint
bit setting operations had been done (which I think means it's not
useful for the present
Simon Riggs wrote:
On Thu, 2008-09-25 at 18:28 -0400, Tom Lane wrote:
Simon Riggs [EMAIL PROTECTED] writes:
Version 7
After reading this for awhile, I realized that there is a rather
fundamental problem with it: it switches into consistent recovery
mode as soon as it's read
I have just completed a test of the patch I posted a few days ago.
The test is a 2Gb dump file that restores to a 22Gb database. The
database is very complex, with some 28,000 objects.
The baseline test was run in a single transaction:
pg_restore --use-list tlist -1 -d mdata
Hello
I would to do better tests orafce package - but actually I can't
access to other hw than x86 (Thanks to helpers, there are some test on
other platforms). Is possible use postgresql build farm for this
purpose?
Regards
Pavel Stehule
--
Sent via pgsql-hackers mailing list
Pavel Stehule wrote:
Hello
I would to do better tests orafce package - but actually I can't
access to other hw than x86 (Thanks to helpers, there are some test on
other platforms). Is possible use postgresql build farm for this
purpose?
Not currently. The client software is very
ook
Pavel
2008/10/3 Andrew Dunstan [EMAIL PROTECTED]:
Pavel Stehule wrote:
Hello
I would to do better tests orafce package - but actually I can't
access to other hw than x86 (Thanks to helpers, there are some test on
other platforms). Is possible use postgresql build farm for this
82 matches
Mail list logo