subject:"\[HACKERS\] WAL replay bugs"

Re: [HACKERS] WAL replay bugs

2015-06-17 Thread Alvaro Herrera

Michael Paquier wrote:

 From 077d675795b4907904fa4e85abed8c4528f4666f Mon Sep 17 00:00:00 2001
 From: Michael Paquier mich...@otacoo.com
 Date: Sat, 19 Jul 2014 10:40:20 +0900
 Subject: [PATCH 3/3] Buffer capture facility: check WAL replay consistency

Is there a newer version of this tech?


-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2015-06-17 Thread Michael Paquier

On Thu, Jun 18, 2015 at 3:39 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Michael Paquier wrote:

 From 077d675795b4907904fa4e85abed8c4528f4666f Mon Sep 17 00:00:00 2001
 From: Michael Paquier mich...@otacoo.com
 Date: Sat, 19 Jul 2014 10:40:20 +0900
 Subject: [PATCH 3/3] Buffer capture facility: check WAL replay consistency

 Is there a newer version of this tech?

Not yet.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-11-05 Thread Alvaro Herrera

Michael Paquier wrote:

 Now, do we really want this feature in-core? That's somewhat a duplicate of
 what is mentioned here:
 http://www.postgresql.org/message-id/CAB7nPqQMq=4ejak317mxz4has0i+1rslbqu29zx18jwlb2j...@mail.gmail.com
 Of course both things do not have the same coverage as the former is for
 buildfarm and dev, while the latter is dedidated to production systems, but
 could be used for development as well.

Oh, I had forgotten that other patch.

 The patch sent there is a bit outdated, but a potential implementation gets
 simpler with XLogReadBufferForRedo able to return flags about each block
 state during redo. I am still planning to come back to it for this cycle,
 though I stopped for now waiting for the WAL format patches finish to shape
 the APIs this feature would rely on.

I agree it makes sense to wait until the WAL reworks are done -- glad
to hear you're putting some time in this area.

Thanks,

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-11-05 Thread Peter Eisentraut

On 11/4/14 10:50 PM, Michael Paquier wrote:
 Except pg_upgrade, are there other tests using bash?

There are a few obscure things under src/test/.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-11-05 Thread Michael Paquier

On Thu, Nov 6, 2014 at 5:41 AM, Peter Eisentraut pete...@gmx.net wrote:

 On 11/4/14 10:50 PM, Michael Paquier wrote:
  Except pg_upgrade, are there other tests using bash?

 There are a few obscure things under src/test/.


Oh, I see. There is quite a number here, and each script is doing quite
different things..
$ git grep /sh src/test/
src/test/locale/de_DE.ISO8859-1/runall:#! /bin/sh
src/test/locale/gr_GR.ISO8859-7/runall:#! /bin/sh
src/test/locale/koi8-r/runall:#! /bin/sh
src/test/locale/koi8-to-win1251/runall:#! /bin/sh
src/test/mb/mbregress.sh:#! /bin/sh
src/test/performance/start-pgsql.sh:#!/bin/sh
src/test/regress/regressplans.sh:#! /bin/sh
-- 
Michael

Re: [HACKERS] WAL replay bugs

2014-11-04 Thread Alvaro Herrera

Michael Paquier wrote:
 On Mon, Jul 14, 2014 at 6:14 PM, Kyotaro HORIGUCHI
 horiguchi.kyot...@lab.ntt.co.jp wrote:
  Although I doubt necessity of the flexibility seeing the current
  testing framework, I don't have so strong objection about
  that. Nevertheless, perhaps you are appreciated to put a notice
  on.. README or somewhere.
 Hm, well... Fine, I added it in this updated series.

Did this go anywhere?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-11-04 Thread Alvaro Herrera

Michael Paquier wrote:
 On Mon, Jul 14, 2014 at 6:14 PM, Kyotaro HORIGUCHI
 horiguchi.kyot...@lab.ntt.co.jp wrote:
  Although I doubt necessity of the flexibility seeing the current
  testing framework, I don't have so strong objection about
  that. Nevertheless, perhaps you are appreciated to put a notice
  on.. README or somewhere.
 Hm, well... Fine, I added it in this updated series.

FWIW I gave this a trial run and found I needed some tweaks to test.sh
and the Makefile in order to make it work on VPATH; mainly replace ./
with `dirname $0` in a couple test.sh in a couple of places, and
something similar in the Makefile.  Also you have $PG_ROOT_DIR somewhere
which doesn't work.

Also you have the Makefile checking for -DBUFFER_CAPTURE exactly but for
some reason I used -DBUFFER_CAPTURE=1 which wasn't well received by your
$(filter) stuff.  Instead of checking CFLAGS it might make more sense to
expose it as a read-only GUC and grep `postmaster -C buffer_capture` or
similar.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-11-04 Thread Peter Eisentraut

On 11/4/14 3:21 PM, Alvaro Herrera wrote:
 FWIW I gave this a trial run and found I needed some tweaks to test.sh
 and the Makefile in order to make it work on VPATH; mainly replace ./
 with `dirname $0` in a couple test.sh in a couple of places, and
 something similar in the Makefile.  Also you have $PG_ROOT_DIR somewhere
 which doesn't work.

I also saw some bashisms in the script.

Maybe the time for shell-based test scripts has passed?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-11-04 Thread Michael Paquier

Thanks for the tests.

On Wed, Nov 5, 2014 at 5:21 AM, Alvaro Herrera alvhe...@2ndquadrant.com
wrote:

Michael Paquier wrote:
On Mon, Jul 14, 2014 at 6:14 PM, Kyotaro HORIGUCHI
horiguchi.kyot...@lab.ntt.co.jp wrote:
Although I doubt necessity of the flexibility seeing the current
testing framework, I don't have so strong objection about
that. Nevertheless, perhaps you are appreciated to put a notice
on.. README or somewhere.
Hm, well... Fine, I added it in this updated series.

FWIW I gave this a trial run and found I needed some tweaks to test.sh
and the Makefile in order to make it work on VPATH; mainly replace ./
with `dirname $0` in a couple test.sh in a couple of places, and
something similar in the Makefile. Also you have $PG_ROOT_DIR somewhere
which doesn't work.

Ah thanks, forgot that.

Also you have the Makefile checking for -DBUFFER_CAPTURE exactly but for
some reason I used -DBUFFER_CAPTURE=1 which wasn't well received by your
$(filter) stuff. Instead of checking CFLAGS it might make more sense to
expose it as a read-only GUC and grep `postmaster -C buffer_capture` or
similar.

Yes that's a good idea.

Now, do we really want this feature in-core? That's somewhat a duplicate of
what is mentioned here:
http://www.postgresql.org/message-id/CAB7nPqQMq=4ejak317mxz4has0i+1rslbqu29zx18jwlb2j...@mail.gmail.com
Of course both things do not have the same coverage as the former is for
buildfarm and dev, while the latter is dedidated to production systems, but
could be used for development as well.

The patch sent there is a bit outdated, but a potential implementation gets
simpler with XLogReadBufferForRedo able to return flags about each block
state during redo. I am still planning to come back to it for this cycle,
though I stopped for now waiting for the WAL format patches finish to shape
the APIs this feature would rely on.
Regards,
--
Michael

Re: [HACKERS] WAL replay bugs

2014-11-04 Thread Michael Paquier

On Wed, Nov 5, 2014 at 6:29 AM, Peter Eisentraut pete...@gmx.net wrote:

 On 11/4/14 3:21 PM, Alvaro Herrera wrote:
  FWIW I gave this a trial run and found I needed some tweaks to test.sh
  and the Makefile in order to make it work on VPATH; mainly replace ./
  with `dirname $0` in a couple test.sh in a couple of places, and
  something similar in the Makefile.  Also you have $PG_ROOT_DIR somewhere
  which doesn't work.

 I also saw some bashisms in the script.

 Maybe the time for shell-based test scripts has passed?

Except pg_upgrade, are there other tests using bash?
-- 
Michael

Re: [HACKERS] WAL replay bugs

2014-07-25 Thread Kyotaro HORIGUCHI

Hi, I'm not so confident whether it's the time to do this...

I mark this as 'Ready for Committer' since no additional comment
or objection was put by others on this patch.

  After all, I have no more comment about this patch. I will mark
 this as 'Ready for committer' unless no comment comes from others
 for a few days.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-07-22 Thread Kyotaro HORIGUCHI

Hello,

  Although I doubt necessity of the flexibility seeing the current
  testing framework, I don't have so strong objection about
  that. Nevertheless, perhaps you are appreciated to put a notice
  on.. README or somewhere.
 Hm, well... Fine, I added it in this updated series.

Thank you for your patience:)

 After all, I have no more comment about this patch. I will mark
this as 'Ready for committer' unless no comment comes from others
for a few days.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-07-14 Thread Kyotaro HORIGUCHI

Hello, Let me apologize for continuing the discussion even though
the deadline is approaching.

At Fri, 11 Jul 2014 09:49:55 +0900, Michael Paquier michael.paqu...@gmail.com 
wrote in cab7npqtjezoz-fotibsejyg0eabngx2plqywodchyfxzfqy...@mail.gmail.com
 Updated patches attached.
 
 On Thu, Jul 10, 2014 at 7:13 PM, Kyotaro HORIGUCHI
 horiguchi.kyot...@lab.ntt.co.jp wrote:
 
  The usage of this is a bit irregular as a (extension) module but
  it is the nature of this 'contrib'. The rearranged page type
  detection logic seems neater and keeps to work as intended. This
  logic will be changed after the new page type detection scheme
  becomes ready by the another patch.
 
 No disk format changes will be allowed just to make page detection
 easier (Tom mentioned that earlier in this thread btw). We'll have to
 live with what current code offers, 
 especially considering that adding
 new bytes for page detection for gin pages would double the size of
 its special area after applying MAXALIGN to it.

That's awkward, but I agree with it. By the way, I found
PageHeaderData.pd_flags to have 9 bits room.  It seems to be
usable if no other usage is in sight right now, if the formal
method to identify page types is worth the 3-4 bits there.

# This is a separate discussion from this patch itself.

  - make check runs regress --use-existing but IMHO the make
target which is expected to do that is installcheck. I had
fooled to convince that it should run the postgres which is
built dedicatedly:(
 
 Yes, the patch is abusing of that. --use-existing is specified in this
 case because the test itself is controlling Postgres servers to build
 and fetch the buffer captures. This allows more flexible machinery
 IMO.

Although I doubt necessity of the flexibility seeing the current
testing framework, I don't have so strong objection about
that. Nevertheless, perhaps you are appreciated to put a notice
on.. README or somewhere.

  - postgres processes are left running after
test_default(custom).sh has stopped halfway. This can be fixed
with the attached patch, but, to be honest, this seems too
much. I'll follow your decision whether or not to do this.
(bufcapt_test_sh_20140710.patch)
 
 I had considered that first, thinking that it was the user
 responsibility if things are screwed up with his custom scripts. I
 guess that the way you are doing it is a safeguard simple enough
 though, so included with some editing, particularly reporting to the
 user the error code returned by the test script.

Agreed.

  - test_default.sh is not only an example script which will run
while utilize this facility, but the test script for this
facility itself.
So I think it would be better be added some queries so that all
possible page types available for the default testing. What do
you think about the attached patch?  (hash index is unlogged
but I dared to put it for clarity.)
 
 Yeah, having a default set of queries run just to let the user get an
 idea of how it works improves things.
 Once again thanks for taking the time to look at that.

Thank you.


regardes,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-07-10 Thread Kyotaro HORIGUCHI

Hello, The new patch looks good for me.

The usage of this is a bit irregular as a (extension) module but
it is the nature of this 'contrib'. The rearranged page type
detection logic seems neater and keeps to work as intended. This
logic will be changed after the new page type detection scheme
becomes ready by the another patch.


I have some additional comments, which should be the last
ones. All of the comments are about test.sh.

- A question mark seems missing from the end of the message Has
  build been done with -DBUFFER_CAPTURE included in CFLAGS in
  test.sh.

- make check runs regress --use-existing but IMHO the make
  target which is expected to do that is installcheck. I had
  fooled to convince that it should run the postgres which is
  built dedicatedly:(

- postgres processes are left running after
  test_default(custom).sh has stopped halfway. This can be fixed
  with the attached patch, but, to be honest, this seems too
  much. I'll follow your decision whether or not to do this.
  (bufcapt_test_sh_20140710.patch)

- test_default.sh is not only an example script which will run
  while utilize this facility, but the test script for this
  facility itself.

  So I think it would be better be added some queries so that all
  possible page types available for the default testing. What do
  you think about the attached patch?  (hash index is unlogged
  but I dared to put it for clarity.)

  (bufcapt_test_default_sh_20140710.patch)

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/contrib/buffer_capture_cmp/test.sh b/contrib/buffer_capture_cmp/test.sh
index 89740bb..ba5e290 100644
--- a/contrib/buffer_capture_cmp/test.sh
+++ b/contrib/buffer_capture_cmp/test.sh
@@ -117,16 +117,27 @@ pg_ctl -w -D $TEST_STANDBY start
 # Check the presence of custom tests and kick them in priority. If not,
 # fallback to the default tests. Tests need only to be run on the master
 # node.
+
 if [ -f ./test-custom.sh ]; then
-	. ./test-custom.sh
+	TEST_SCRIPT=test-custom.sh
 else
-	. ./test-default.sh
+	TEST_SCRIPT=test-default.sh
 fi
 
+set +e
+bash -e $TEST_SCRIPT
+EXITSTATUS=$?
+set -e
+
 # Time to stop the nodes as tests have run
 pg_ctl -w -D $TEST_MASTER stop
 pg_ctl -w -D $TEST_STANDBY stop
 
+if [ $EXITSTATUS != 0 ]; then
+	echo $TEST_SCRIPT exited by error
+	exit 1;
+fi
+
 DIFF_FILE=capture_differences.txt
 
 # Check if the capture files exist. If not, build may have not been
diff --git a/contrib/buffer_capture_cmp/test-default.sh b/contrib/buffer_capture_cmp/test-default.sh
index 5bec503..24091ff 100644
--- a/contrib/buffer_capture_cmp/test-default.sh
+++ b/contrib/buffer_capture_cmp/test-default.sh
@@ -11,4 +11,16 @@
 # cd $ROOT_DIR
 
 # Create a simple table
-psql -c 'CREATE TABLE aa AS SELECT generate_series(1, 10) AS a'
+psql -c 'CREATE TABLE tbtree AS SELECT generate_series(1, 10) AS a'
+psql -c 'CREATE INDEX i_tbtree ON tbtree USING btree(a)'
+psql -c 'CREATE TABLE thash AS SELECT generate_series(1, 10) AS a'
+psql -c 'CREATE INDEX i_thash ON thash USING hash(a)'
+psql -c 'CREATE TABLE tgist AS SELECT POINT(a, a) AS p1 FROM generate_series(0, 10) a'
+psql -c 'CREATE INDEX i_tgist ON tgist USING gist(p1)'
+psql -c 'CREATE TABLE tspgist AS SELECT POINT(a, a) AS p1 FROM generate_series(0, 10) a'
+psql -c 'CREATE INDEX i_tspgist ON tspgist USING spgist(p1)'
+psql -c 'CREATE TABLE tgin AS SELECT ARRAY[a/10, a%10] as a1 from generate_series(0, 10) a'
+psql -c 'CREATE INDEX i_tgin ON tgin USING gin(a1)'
+psql -c 'CREATE SEQUENCE sq1'
+
+

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-07-04 Thread Kyotaro HORIGUCHI

Hello, thank you for considering my comments, including somewhat
impractical ones.

I'll have a look at the latest patch sooner.


At Fri, 4 Jul 2014 15:29:51 +0900, Michael Paquier michael.paqu...@gmail.com 
wrote in cab7npqt_fs_3jlnhywc6nzej4sbl6dgslfvcg0jbukgjep9...@mail.gmail.com
 OK, I have been working more on this patch, improving on-the-fly the
 following things on top of what Horiguchi-san has reported:
 - Moved sequence page opaque data to sequence.h, renaming it at the same time.
 - Improvement of page type identification, particularly for sequences
 using a correct opaque data structure. For gin the process is not that
 cool, but I guess that there is nothing much to do as it has no proper
 page identifier :(

Year, there seems to be no choice than that.

 On Thu, Jul 3, 2014 at 7:34 PM, Kyotaro HORIGUCHI
 horiguchi.kyot...@lab.ntt.co.jp wrote:
  = bufcapt.c:
 
  - buffer_capture_remember() or so.
...
 Yes, it is worth mentioning and a comment in bufcapt.h seems enough.
 
  - buffer_capture_forget()
...
 Yesh, this seems informative.
 
  - buffer_capture_is_changed()
...
 Hm, yes. This name looks better fine as it remains static within bufcapt.c.
 
  == bufmgr.c:
 
  - ConditionalLockBuffer()
...
 Fixed.
 
  - LockBuffer()
...
   lwlock.h: #define LWLockHoldedExclusive(l) ((l)-exclusive  0)
   lwlock.h: #define LWLockHoldedShared(l) ((l)-shared  0)
 I don't think that there is much to gain with such macros as of now
 LWLock-exclusive is only used in the code this patch introduces.

 Year, I think so, too:p That's simply for the case if you
 thought that.

   If there isn't any particular concern, 'XXX:' should be removed.
 Well yes.

That's great.

  = bufpage.c:
  = bufcapt.h:
 
  - header comment
 
The file name is misspelled as 'bufcaptr.h'.
 Nicely spotted.

Thank you ;)

  - The includes in this header except for buf.h seem not to be
necessary.
 Yep.
 
  = init_env.sh:
 
  - pg_get_test_port()
It determines server port using PG_VERSION_NUM, but is it
necessary? Addition to it, the theoretical maximum initial
PGPORT seems to be 65535 but this script search for free port
up to the number larger by 16 from the start, which would be
beyond the edge.
 Hm, no. As of now, there is still some margin:
 PG_VERSION_NUM = 90500
 PG_VERSION_NUM % 16384 + 49152 = 57732

Yes, it's practically no problem. I said about the theroretical
max value seeing it without any preassumption about the value of
PG_VERSION_NUM. There's in reality no problem before the
PostgreSQL 9.82.88 comes, and 10.0.0 doesn't cause problem. So
I'm not so dissapointed if it is not fixed.

  - pg_get_test_port()
 
It stops with error after 16 times of failure, but the message
complains only about the final attempt. If you want to mention
the port numbers, it might should be 'port $PGPORTSTART-$PGPORT
..' or would be 'All 16 ports attempted failed' or something..
 Hm. You could complain about pg_upgrade as well now for the same
 thing. But I guess that it doesn't hurt to complain back to caller
 about the range of ports already in use, so changed this way.

Yes, this comment is also comes from a kind of
fastidiousness. I'm satisified not to fixed if you think so.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-07-04 Thread Kyotaro HORIGUCHI

Hello,

At Thu, 3 Jul 2014 14:48:50 +0900, Michael Paquier michael.paqu...@gmail.com 
wrote in cab7npqq2y3qkapasac6oxxqtwbvkkxcrftua0w+dx-j3i-l...@mail.gmail.com
 TODO
...
Each type of page can be confirmed by the following way *if*
its type is previously *hinted* except for gin.
 
btree: 32bit magic at pd-opaque
gin  : No magic
gist : 16-bit magic at ((GISTPageOpaque*)pd-opaque)-gist_page_id
spgist   : 16-bit magic at ((SpGistPageOpaque*)pd-opaque)-spgist_page_id
hash : 16-bit magic at ((HashPageOpaque*)pd-paque)-hasho_page_id
sequence : 16-bit magic at pd-opaque, the last 2 bytes of the page.
 
# Is it comprehensive? and correct?
 Sequence pages use the last 4 bytes. Have a look at sequence_magic in
 sequence.c.
 For btree pages we can use the last 2 bytes and a check on MAX_BT_CYCLE_ID.
 For gin, I'll investigate if it is possible to add a identifier like
 GIN_PAGE_ID, it would make the page analysis more consistent with the
 others. I am not sure for what the 8 bytes allocated for the special
 area are used now for though.
 
The majority is 16-bit magic at the TAIL of opaque struct. If
we can unify them into , say, 16-bit magic at
*(uint16*)(pd-opaque) the sizeof() labyrinth become stable and
simple and other tools which should identify the type of pages
will be happy. Do you think we can do that?
 Yes I think so. I'll raise a different thread though as this is a
 different problem that what this patch is targeting. I would even
 imagine a macro in bufpage.c able to handle that well.

Ok, that being the case, this topic should be stashed and I'll
look into there regardless of it. Thank you.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-07-03 Thread Tom Lane

Michael Paquier michael.paqu...@gmail.com writes:
 On Wed, Jul 2, 2014 at 5:32 PM, Kyotaro HORIGUCHI
 horiguchi.kyot...@lab.ntt.co.jp wrote:
 Pehaps this is showing that no tidy or generalized way to tell
 what a page is used for. Many of the modules which have their
 own page format has a magic value and the values seem to be
 selected carefully. But no one-for-all method to retrieve that.

 You have a point here.

Yeah, it's a bit messy, but I believe it's currently always possible to
tell which access method a PG page belongs to.  Look at pg_filedump.
The last couple times we added index access methods, we took pains to
make sure pg_filedump could figure out what their pages were.  (IIRC,
it's a combination of the special-space size and contents, but I'm too
tired to go check the details right now.)

 For gin, I'll investigate if it is possible to add a identifier like
 GIN_PAGE_ID, it would make the page analysis more consistent with the
 others. I am not sure for what the 8 bytes allocated for the special
 area are used now for though.

There is exactly zero chance that anyone will accept an on-disk format
change just to make this prettier.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-07-03 Thread Michael Paquier

On Thu, Jul 3, 2014 at 3:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Michael Paquier michael.paqu...@gmail.com writes:
 On Wed, Jul 2, 2014 at 5:32 PM, Kyotaro HORIGUCHI
 horiguchi.kyot...@lab.ntt.co.jp wrote:
 Pehaps this is showing that no tidy or generalized way to tell
 what a page is used for. Many of the modules which have their
 own page format has a magic value and the values seem to be
 selected carefully. But no one-for-all method to retrieve that.

 You have a point here.

 Yeah, it's a bit messy, but I believe it's currently always possible to
 tell which access method a PG page belongs to.  Look at pg_filedump.
 The last couple times we added index access methods, we took pains to
 make sure pg_filedump could figure out what their pages were.  (IIRC,
 it's a combination of the special-space size and contents, but I'm too
 tired to go check the details right now.)
Yes, that's what the current code does btw, in this *messy* way.

 For gin, I'll investigate if it is possible to add a identifier like
 GIN_PAGE_ID, it would make the page analysis more consistent with the
 others. I am not sure for what the 8 bytes allocated for the special
 area are used now for though.

 There is exactly zero chance that anyone will accept an on-disk format
 change just to make this prettier.
Yeah thought so.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-07-03 Thread Kyotaro HORIGUCHI

Hello, This is the additional comments for other part.

I haven't see significant defect in the code so far.


= bufcapt.c: 

- buffer_capture_remember() or so.

 Pages for unlogged tables are avoided to be written taking
 advantage that the lsn for such pages stays 0/0. I'd like to see
 a comment mentioning for, say, buffer_capture_is_changed? or
 buffer_capture_forget or somewhere.

- buffer_capture_forget()

 However this error is likely not to occur, in the error message
 could not find image..., the buffer id seems to bring no
 information. LSN, or quadraplet of LSN, rnode, forknum and
 blockno seems to be more informative.

- buffer_capture_is_changed()

 The name for the function semes to be misleading. What this
 function does is comparing LSNs between stored page image and
 current page.  lsn_is_changed(BufferImage) or something like
 would be more clearly.

== bufmgr.c:

- ConditionalLockBuffer()

 Sorry for a trivial comment, but the variable 'res' conceales
 the meaning. acquired seems to be more preferable, isn't it?

- LockBuffer()

 There is a 'XXX' comment.  The discussion written there seems to
 be right, for me. If you mind that peeking into there is a bad
 behavior, adding a macro into lwlock.h would help:p

 lwlock.h: #define LWLockHoldedExclusive(l) ((l)-exclusive  0)
 lwlock.h: #define LWLockHoldedShared(l) ((l)-shared  0)

# I don't see this usable so much..
 
 bufmgr.c: if (LWLockHoldedExclusive(buf-content_lock))

 If there isn't any particular concern, 'XXX:' should be removed.

= bufpage.c:

-  #include storage/bufcapt.h

  The include seems to be needless.

= bufcapt.h:

- header comment

  The file name is misspelled as 'bufcaptr.h'.
  Copyright notice of UC is needed? (Other files are the same)

- The includes in this header except for buf.h seem not to be
  necessary.

= init_env.sh:

- pg_get_test_port()

  It determines server port using PG_VERSION_NUM, but is it
  necessary? Addition to it, the theoretical maximum initial
  PGPORT semes to be 65535 but this script search for free port
  up to the number larger by 16 from the start, which would be
  beyond the edge.

- pg_get_test_port()

  It stops with error after 16 times of failure, but the message
  complains only about the final attempt. If you want to mention
  the port numbers, it might should be 'port $PGPORTSTART-$PGPORT
  ..' or would be 'All 16 ports attempted failed' or something..



regards,


-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-07-02 Thread Kyotaro HORIGUCHI

Hello,

 Thanks for your comments. Looking forward to seeing some more input.

Thank you. bufcapt.c was a poser.

bufcapt.c: 326 memcpy(tail, page[BLCKSZ - 2], 2);

  This seems duzzling..  Isn't *(uint16*)(page[BLCKSZ - 2]) applicable?

bufcapt.c: 331 else if (PageGetSpecial

  Generally saying, the code to identify the type of page is too
  heuristic and seems fragile.

  Pehaps this is showing that no tidy or generalized way to tell
  what a page is used for. Many of the modules which have their
  own page format has a magic value and the values seem to be
  selected carefully. But no one-for-all method to retrieve that.

  Each type of page can be confirmed by the following way *if*
  its type is previously *hinted* except for gin.

  btree: 32bit magic at pd-opaque
  gin  : No magic
  gist : 16-bit magic at ((GISTPageOpaque*)pd-opaque)-gist_page_id
  spgist   : 16-bit magic at ((SpGistPageOpaque*)pd-opaque)-spgist_page_id
  hash : 16-bit magic at ((HashPageOpaque*)pd-paque)-hasho_page_id
  sequence : 16-bit magic at pd-opaque, the last 2 bytes of the page.

  # Is it comprehensive? and correct?

  The majority is 16-bit magic at the TAIL of opaque struct. If
  we can unify them into , say, 16-bit magic at
  *(uint16*)(pd-opaque) the sizeof() labyrinth become stable and
  simple and other tools which should identify the type of pages
  will be happy. Do you think we can do that?


# Sorry, time's up for today.


  - contrib/buffer_capture_cmp/README
..
 Yeah right... This was a rest of some previous hacking on this feature.
 Paragraph was rather unclear so I rewrote it, mentioning that the custom
 script can use PGPORT to connect to the node where tests can be run.
 
 - contrib/buffer_capture_cmp/Makefile
 
   make check does nothing when BUFFER_CAPTURE is not defined, as
...
 Sure, I added such a message in the makefile.
 
 - buffer_capture_cmp.c
 
This source generates void executable when BUFFER_CAPTURE is
..
 Done. The compilation of this utility is now independent on BUFFER_CAPTURE.
 At the same time I made test.sh a bit smarter to have it grab the value of
 BUFFER_CAPTURE_FILE directly from bufcapt.h.
 
 - buffer_capture_cmp.c/main()
 
The parameters for this command are the parent directories for
...
 Fixed. I changed back the utility to directly file names instead of data
 folders as arguments.
 
 Updated patches addressing those comments are attached.
 Regards,

Thank you I'll look into them later.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-07-02 Thread Michael Paquier

TODO

On Wed, Jul 2, 2014 at 5:32 PM, Kyotaro HORIGUCHI
horiguchi.kyot...@lab.ntt.co.jp wrote:
 bufcapt.c: 326 memcpy(tail, page[BLCKSZ - 2], 2);

   This seems duzzling..  Isn't *(uint16*)(page[BLCKSZ - 2]) applicable?
Well yes it is.

   Pehaps this is showing that no tidy or generalized way to tell
   what a page is used for. Many of the modules which have their
   own page format has a magic value and the values seem to be
   selected carefully. But no one-for-all method to retrieve that.
You have a point here.

   Each type of page can be confirmed by the following way *if*
   its type is previously *hinted* except for gin.

   btree: 32bit magic at pd-opaque
   gin  : No magic
   gist : 16-bit magic at ((GISTPageOpaque*)pd-opaque)-gist_page_id
   spgist   : 16-bit magic at ((SpGistPageOpaque*)pd-opaque)-spgist_page_id
   hash : 16-bit magic at ((HashPageOpaque*)pd-paque)-hasho_page_id
   sequence : 16-bit magic at pd-opaque, the last 2 bytes of the page.

   # Is it comprehensive? and correct?
Sequence pages use the last 4 bytes. Have a look at sequence_magic in
sequence.c.
For btree pages we can use the last 2 bytes and a check on MAX_BT_CYCLE_ID.
For gin, I'll investigate if it is possible to add a identifier like
GIN_PAGE_ID, it would make the page analysis more consistent with the
others. I am not sure for what the 8 bytes allocated for the special
area are used now for though.

   The majority is 16-bit magic at the TAIL of opaque struct. If
   we can unify them into , say, 16-bit magic at
   *(uint16*)(pd-opaque) the sizeof() labyrinth become stable and
   simple and other tools which should identify the type of pages
   will be happy. Do you think we can do that?
Yes I think so. I'll raise a different thread though as this is a
different problem that what this patch is targeting. I would even
imagine a macro in bufpage.c able to handle that well.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-07-01 Thread Kyotaro HORIGUCHI

Hello, I had a look on this patch.

 Let me show you some comments about the README, Makefile and
buffer_capture_cmp of the second part for the present. A
continuation of this comment would be seen later..


- contrib/buffer_capture_cmp/README

 - 'contains' seems duplicate in the first paragraph.

 - The second pragraph says that 'This script can use the node
   number of the master node available as the first argument of
   the script when it is run within the test suite.' But test.sh
   seems not giving such a parameter.

- contrib/buffer_capture_cmp/Makefile

 make check does nothing when BUFFER_CAPTURE is not defined, as
 described in itself. But I trapped by that after build the
 server by 'make CFLAGS=-DBUFFER_CAPTURE':( It would be better
 that 'make check' without defining it prints some message.


- buffer_capture_cmp.c

  This source generates void executable when BUFFER_CAPTURE is
  not defined. The restriction seems to be put only to use
  BUFFER_CAPTURE_FILE in bufcapt.h. If so, changing the parameter
  of the executable as described in the following comment for
  main() would blow off the necessity for the restriction.


- buffer_capture_cmp.c/main()

  The parameters for this command are the parent directories for
  each capture file. This is a bit incovenient for separate
  use. For example, when I want to gather the capture files from
  multiple servers then compare them, I should unwillingly make
  their own directories for each capture file. If no particular
  reason exists for the design, I suppose it would be more
  convenient that the parameters are the names of the capture
  files themselves.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-06-27 Thread Michael Paquier

On Fri, Jun 27, 2014 at 2:57 PM, Michael Paquier michael.paqu...@gmail.com
wrote:

 Here are rebased patches, their was a conflict with a recent commit in
 contrib/pg_upgrade.

I am resending patch 2 as it contained a rebase conflict not correctly
resolved (Thanks Alvaro).

Regards,
-- 
Michael
From d4f0289ffcece54a78e51e8b707c41e994d549ee Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Fri, 27 Jun 2014 23:35:29 +0900
Subject: [PATCH 2/3] Extract generic bash initialization process from
 pg_upgrade

Such initialization is useful as well for some other utilities and makes
test settings consistent.
---
 contrib/pg_upgrade/test.sh | 47 
 src/test/shell/init_env.sh | 60 ++
 2 files changed, 64 insertions(+), 43 deletions(-)
 create mode 100644 src/test/shell/init_env.sh

diff --git a/contrib/pg_upgrade/test.sh b/contrib/pg_upgrade/test.sh
index 7bbd2c7..2e1c61a 100644
--- a/contrib/pg_upgrade/test.sh
+++ b/contrib/pg_upgrade/test.sh
@@ -9,24 +9,14 @@
 # Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
 # Portions Copyright (c) 1994, Regents of the University of California
 
-set -e
-
-: ${MAKE=make}
-
-# Guard against parallel make issues (see comments in pg_regress.c)
-unset MAKEFLAGS
-unset MAKELEVEL
-
-# Establish how the server will listen for connections
-testhost=`uname -s`
+# Initialize test
+. ../../src/test/shell/init_env.sh
 
 case $testhost in
 	MINGW*)
-		LISTEN_ADDRESSES=localhost
 		PGHOST=; unset PGHOST
 		;;
 	*)
-		LISTEN_ADDRESSES=
 		# Select a socket directory.  The algorithm is from the configure
 		# script; the outcome mimics pg_regress.c:make_temp_sockdir().
 		PGHOST=$PG_REGRESS_SOCK_DIR
@@ -102,37 +92,8 @@ logdir=$PWD/log
 rm -rf $logdir
 mkdir $logdir
 
-# Clear out any environment vars that might cause libpq to connect to
-# the wrong postmaster (cf pg_regress.c)
-#
-# Some shells, such as NetBSD's, return non-zero from unset if the variable
-# is already unset. Since we are operating under 'set -e', this causes the
-# script to fail. To guard against this, set them all to an empty string first.
-PGDATABASE=;unset PGDATABASE
-PGUSER=;unset PGUSER
-PGSERVICE=; unset PGSERVICE
-PGSSLMODE=; unset PGSSLMODE
-PGREQUIRESSL=;  unset PGREQUIRESSL
-PGCONNECT_TIMEOUT=; unset PGCONNECT_TIMEOUT
-PGHOSTADDR=;unset PGHOSTADDR
-
-# Select a non-conflicting port number, similarly to pg_regress.c
-PG_VERSION_NUM=`grep '#define PG_VERSION_NUM' $newsrc/src/include/pg_config.h | awk '{print $3}'`
-PGPORT=`expr $PG_VERSION_NUM % 16384 + 49152`
-export PGPORT
-
-i=0
-while psql -X postgres /dev/null 2/dev/null
-do
-	i=`expr $i + 1`
-	if [ $i -eq 16 ]
-	then
-		echo port $PGPORT apparently in use
-		exit 1
-	fi
-	PGPORT=`expr $PGPORT + 1`
-	export PGPORT
-done
+# Get a port to run the tests
+pg_get_test_port $newsrc
 
 # buildfarm may try to override port via EXTRA_REGRESS_OPTS ...
 EXTRA_REGRESS_OPTS=$EXTRA_REGRESS_OPTS --port=$PGPORT
diff --git a/src/test/shell/init_env.sh b/src/test/shell/init_env.sh
new file mode 100644
index 000..d37eb69
--- /dev/null
+++ b/src/test/shell/init_env.sh
@@ -0,0 +1,60 @@
+#!/bin/sh
+
+# src/test/shell/init.sh
+#
+# Utility initializing environment for tests to be conducted in shell.
+# The initialization done here is made to be platform-proof.
+
+set -e
+
+: ${MAKE=make}
+
+# Guard against parallel make issues (see comments in pg_regress.c)
+unset MAKEFLAGS
+unset MAKELEVEL
+
+# Set listen_addresses desirably
+testhost=`uname -s`
+case $testhost in
+	MINGW*)	LISTEN_ADDRESSES=localhost ;;
+	*)		LISTEN_ADDRESSES= ;;
+esac
+
+# Clear out any environment vars that might cause libpq to connect to
+# the wrong postmaster (cf pg_regress.c)
+#
+# Some shells, such as NetBSD's, return nonzero from unset if the variable
+# is already unset. Since we are operating under 'set e', this causes the
+# script to fail. To guard against this, set them all to an empty string first.
+PGDATABASE=;unset PGDATABASE
+PGUSER=;unset PGUSER
+PGSERVICE=; unset PGSERVICE
+PGSSLMODE=; unset PGSSLMODE
+PGREQUIRESSL=;  unset PGREQUIRESSL
+PGCONNECT_TIMEOUT=; unset PGCONNECT_TIMEOUT
+PGHOST=;unset PGHOST
+PGHOSTADDR=;unset PGHOSTADDR
+
+# Select a non-conflicting port number, similarly to pg_regress.c, and
+# save its value in PGPORT. Caller should either save or use this value
+# for the tests.
+pg_get_test_port()
+{
+	PG_ROOT_DIR=$1
+	PG_VERSION_NUM=`grep '#define PG_VERSION_NUM' $PG_ROOT_DIR/src/include/pg_config.h | awk '{print $3}'`
+	PGPORT=`expr $PG_VERSION_NUM % 16384 + 49152`
+	export PGPORT
+
+	i=0
+	while psql -X postgres /dev/null 2/dev/null
+	do
+		i=`expr $i + 1`
+		if [ $i -eq 16 ]
+		then
+			echo port $PGPORT apparently in use
+			exit 1
+		fi
+		PGPORT=`expr $PGPORT + 1`
+		export PGPORT
+	done
+}
-- 
2.0.0


-- 
Sent via pgsql-hackers mailing list

Re: [HACKERS] WAL replay bugs

2014-06-18 Thread Robert Haas

On Tue, Jun 17, 2014 at 5:40 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Wed, Jun 18, 2014 at 1:40 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Jun 2, 2014 at 8:55 AM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 I'm not sure if this is reasonably possible, but one thing that would
 make this tool a whole lot easier to use would be if you could make
 all the magic happen in a single server.  For example, suppose you had
 a background process that somehow got access to the pre and post
 images for every buffer change, and the associated WAL record, and
 tried applying the WAL record to the pre-image to see whether it got
 the corresponding post-image.  Then you could run 'make check' or so
 and afterwards do something like psql -c 'SELECT * FROM
 wal_replay_problems()' and hopefully get no rows back.
 So your point is to have a 3rd independent server in the process that
 would compare images taken from a master and its standby? Seems to
 complicate the machinery.

No, I was trying to get it down form 2 servers to 1, not 2 servers up to 3.

 Don't get me wrong, having this tool at all sounds great.  But I think
 to really get the full benefit out of it we need to be able to run it
 in the buildfarm, so that if people break stuff it gets noticed
 quickly.
 The patch I sent has included a regression test suite making the tests
 rather facilitated: that's only a matter of running actually make
 check in the contrib repository containing the binary able to compare
 buffer captures between a master and a standby.

Cool!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-06-17 Thread Robert Haas

On Mon, Jun 2, 2014 at 8:55 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Wed, Apr 23, 2014 at 9:43 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 And here is the tool itself. It consists of two parts:

 1. Modifications to the backend to write the page images
 2. A post-processing tool to compare the logged images between master and
 standby.
 Having that into Postgres at the disposition of developers would be
 great, and I believe that it would greatly reduce the occurrence of
 bugs caused by WAL replay during recovery. So, with the permission of
 the author, I have been looking at this facility for a cleaner
 integration into Postgres.

I'm not sure if this is reasonably possible, but one thing that would
make this tool a whole lot easier to use would be if you could make
all the magic happen in a single server.  For example, suppose you had
a background process that somehow got access to the pre and post
images for every buffer change, and the associated WAL record, and
tried applying the WAL record to the pre-image to see whether it got
the corresponding post-image.  Then you could run 'make check' or so
and afterwards do something like psql -c 'SELECT * FROM
wal_replay_problems()' and hopefully get no rows back.

Don't get me wrong, having this tool at all sounds great.  But I think
to really get the full benefit out of it we need to be able to run it
in the buildfarm, so that if people break stuff it gets noticed
quickly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-06-17 Thread Michael Paquier

On Wed, Jun 18, 2014 at 1:40 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Jun 2, 2014 at 8:55 AM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 I'm not sure if this is reasonably possible, but one thing that would
 make this tool a whole lot easier to use would be if you could make
 all the magic happen in a single server.  For example, suppose you had
 a background process that somehow got access to the pre and post
 images for every buffer change, and the associated WAL record, and
 tried applying the WAL record to the pre-image to see whether it got
 the corresponding post-image.  Then you could run 'make check' or so
 and afterwards do something like psql -c 'SELECT * FROM
 wal_replay_problems()' and hopefully get no rows back.
So your point is to have a 3rd independent server in the process that
would compare images taken from a master and its standby? Seems to
complicate the machinery.

 Don't get me wrong, having this tool at all sounds great.  But I think
 to really get the full benefit out of it we need to be able to run it
 in the buildfarm, so that if people break stuff it gets noticed
 quickly.
The patch I sent has included a regression test suite making the tests
rather facilitated: that's only a matter of running actually make
check in the contrib repository containing the binary able to compare
buffer captures between a master and a standby.

Thanks,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-06-13 Thread Heikki Linnakangas


On 06/13/2014 10:14 AM, Michael Paquier wrote:

On Mon, Jun 2, 2014 at 9:55 PM, Michael Paquier
michael.paqu...@gmail.com wrote:

On Wed, Apr 23, 2014 at 9:43 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
Perhaps there are parts of what is proposed here that could be made
more generalized, like the masking functions. So do not hesitate if
you have any opinion on the matter.

OK, attached is the result of this hacking:

Buffer capture facility: check WAL replay consistency

It is a tool aimed to be used by developers and buildfarm machines
that can be used to check for consistency at page level when replaying
WAL files among several nodes of a cluster (generally master and
standby node).

This facility is made of two parts:
- A server part, where all the changes happening at page level are
captured and inserted in a file called buffer_captures located at the
root of PGDATA. Each buffer entry is masked to make the comparison
across node consistent (flags like hint bits for example) and then
each buffer is captured is with the following format as a single line
of the output file:
LSN: %08X/%08X page: PAGE_IN_HEXA
Hexadecimal format makes it easier to detect differences between
pages, and format is chosen to facilitate comparison between buffer
entries.
- A client part, located in contrib/buffer_capture_cmp, that can be
used to compare buffer captures between nodes.


Oh, you moved the masking code from the client tool to the backend. Why? 
When debugging, it's useful to have the genuine, non-masked page image 
available.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-06-13 Thread Michael Paquier

On Fri, Jun 13, 2014 at 4:48 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 06/13/2014 10:14 AM, Michael Paquier wrote:

 On Mon, Jun 2, 2014 at 9:55 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:

 On Wed, Apr 23, 2014 at 9:43 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 Perhaps there are parts of what is proposed here that could be made
 more generalized, like the masking functions. So do not hesitate if
 you have any opinion on the matter.

 OK, attached is the result of this hacking:

 Buffer capture facility: check WAL replay consistency

 It is a tool aimed to be used by developers and buildfarm machines
 that can be used to check for consistency at page level when replaying
 WAL files among several nodes of a cluster (generally master and
 standby node).

 This facility is made of two parts:
 - A server part, where all the changes happening at page level are
 captured and inserted in a file called buffer_captures located at the
 root of PGDATA. Each buffer entry is masked to make the comparison
 across node consistent (flags like hint bits for example) and then
 each buffer is captured is with the following format as a single line
 of the output file:
 LSN: %08X/%08X page: PAGE_IN_HEXA
 Hexadecimal format makes it easier to detect differences between
 pages, and format is chosen to facilitate comparison between buffer
 entries.
 - A client part, located in contrib/buffer_capture_cmp, that can be
 used to compare buffer captures between nodes.


 Oh, you moved the masking code from the client tool to the backend. Why?
 When debugging, it's useful to have the genuine, non-masked page image
 available.
My thought is to share the CPU effort of masking between backends...
That's not a big deal to move them back to the client tool though.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-06-13 Thread Michael Paquier

On Fri, Jun 13, 2014 at 4:50 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Fri, Jun 13, 2014 at 4:48 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 On 06/13/2014 10:14 AM, Michael Paquier wrote:

 On Mon, Jun 2, 2014 at 9:55 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:

 On Wed, Apr 23, 2014 at 9:43 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 Perhaps there are parts of what is proposed here that could be made
 more generalized, like the masking functions. So do not hesitate if
 you have any opinion on the matter.

 OK, attached is the result of this hacking:

 Buffer capture facility: check WAL replay consistency

 It is a tool aimed to be used by developers and buildfarm machines
 that can be used to check for consistency at page level when replaying
 WAL files among several nodes of a cluster (generally master and
 standby node).

 This facility is made of two parts:
 - A server part, where all the changes happening at page level are
 captured and inserted in a file called buffer_captures located at the
 root of PGDATA. Each buffer entry is masked to make the comparison
 across node consistent (flags like hint bits for example) and then
 each buffer is captured is with the following format as a single line
 of the output file:
 LSN: %08X/%08X page: PAGE_IN_HEXA
 Hexadecimal format makes it easier to detect differences between
 pages, and format is chosen to facilitate comparison between buffer
 entries.
 - A client part, located in contrib/buffer_capture_cmp, that can be
 used to compare buffer captures between nodes.


 Oh, you moved the masking code from the client tool to the backend. Why?
 When debugging, it's useful to have the genuine, non-masked page image
 available.
 My thought is to share the CPU effort of masking between backends...
And that having a set of API to do page masking on the server side
would be useful for extensions as well. Now that I recall this was one
of the first things that came to my mind when looking at this
facility, thinking that it would be useful to have them in a separate
file, with a dedicated header.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-06-02 Thread Michael Paquier

On Wed, Apr 23, 2014 at 9:43 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 And here is the tool itself. It consists of two parts:

 1. Modifications to the backend to write the page images
 2. A post-processing tool to compare the logged images between master and
 standby.
Having that into Postgres at the disposition of developers would be
great, and I believe that it would greatly reduce the occurrence of
bugs caused by WAL replay during recovery. So, with the permission of
the author, I have been looking at this facility for a cleaner
integration into Postgres.

Roughly, this utility is made of three parts:
1) A set of masking functions that can be used on page images to
normalize them. This is used to put magic numbers or enforce flag
values to make page content consistent across nodes. This is for
example the case of the free space between pd_lower and pd_upper,
pd_flags, etc. Of course this depends on the type of page (btree,
heap, etc.).
2) Facility to memorize, analyze if they have been modified, and flush
page images to a dedicated file. This interacts with the buffer
manager mainly.
3) Facility to reorder page images within the same WAL record as
master/standby may not write them in the same order on a standby or a
master due to for example lock released in different order. This is
part of the binary analyzing the diffs between master and standby.

As of now, 2) is integrated in the backend, 1) and 3) are part of the
contrib module. However I am thinking that 1) and 2) should be done in
core using an ifdef similar to CLOBBER_FREED_MEMORY, to mask the page
images and write them in a dedicated file (in global/ ?), while 3)
would be fine as a separate binary in contrib/. An essential thing to
add would be to have a set of regression tests that developers and
buildfarm machines could directly use.

Perhaps there are parts of what is proposed here that could be made
more generalized, like the masking functions. So do not hesitate if
you have any opinion on the matter.

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-04-23 Thread Heikki Linnakangas


On 04/17/2014 07:59 PM, Heikki Linnakangas wrote:

On 04/08/2014 06:41 AM, Michael Paquier wrote:

On Tue, Apr 8, 2014 at 3:16 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:


I've been playing with a little hack that records a before and after image
of every page modification that is WAL-logged, and writes the images to a
file along with the LSN of the corresponding WAL record. I set up a
master-standby replication with that hack in place in both servers, and ran
the regression suite. Then I compared the after images after every WAL
record, as written on master, and as replayed by the standby.

Assuming that adding some dedicated hooks in the core able to do
actions before and after a page modification occur is not *that*
costly (well I imagine that it is not acceptable in terms of
performance), could it be possible to get that in the shape of a
extension that could be used to test WAL record consistency? This may
be an idea to think about...


Yeah, working on it. It can live as a patch set if nothing else.

This has been very fruitful, I just committed another fix for a bug I
found with this earlier today.

There are quite a few things that cause differences between master and
standby. We have hint bits in many places, unused space that isn't
zeroed etc.


[a few more fixed bugs later]

Ok, I'm now getting clean output when running the regression suite with 
this tool.


And here is the tool itself. It consists of two parts:

1. Modifications to the backend to write the page images
2. A post-processing tool to compare the logged images between master 
and standby.


The attached diff contains both parts. The postprocessing tool is in 
contrib/page_image_logging. See contrib/page_image_logging/README for 
instructions. Let me know if you have any questions or need further help 
running the tool.


I've also pushed this to my git repository at 
git://git.postgresql.org/git/users/heikki/postgres.git, branch 
page_image_logging. I intend to keep it up-to-date with current master.


This is a pretty ugly hack, so I'm not proposing to commit this in the 
current state. But perhaps this could be done more cleanly, by adding 
some hooks in the backend as Michael suggested.

- Heikki

diff --git a/contrib/page_image_logging/Makefile b/contrib/page_image_logging/Makefile
new file mode 100644
index 000..9c68bbc
--- /dev/null
+++ b/contrib/page_image_logging/Makefile
@@ -0,0 +1,20 @@
+# contrib/page_image_logging/Makefile
+
+PGFILEDESC = postprocess-images - 
+
+PROGRAM = postprocess-images
+OBJS	= postprocess-images.o
+
+PG_CPPFLAGS = -I$(libpq_srcdir)
+PG_LIBS = $(libpq_pgport)
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/postprocess-images
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/page_image_logging/README b/contrib/page_image_logging/README
new file mode 100644
index 000..2c3d271
--- /dev/null
+++ b/contrib/page_image_logging/README
@@ -0,0 +1,50 @@
+Usage
+-
+
+1. Apply the patch
+
+2. Set up a master and standby.
+
+3. stop master, then standby.
+
+4. Remove $PGDATA/buffer-images from both servers.
+
+5. Start master and standby
+
+6. Run make installcheck, or whatever you want to test
+
+7. Stop master, then standby
+
+8. compare the logged page images using the postprocessing tool:
+
+./postprocess-images ~/data-master/buffer-images ~/data-standby/buffer-images   differences
+
+9. The 'differences' file should be empty. If not, investigate.
+
+Tips
+
+
+The page images take up a lot of disk space! The PostgreSQL regression
+suite generates about 11GB - double that when the same is generated also
+in a standby.
+
+Always stop the master first, then standby. Otherwise, when you restart
+the standby, it will start WAL replay from the previous checkpoint, and
+log some page images already. Stopping the master creates a checkpoint
+record, avoiding the problem.
+
+If you get errors like this from postprocess-images:
+
+could not reorder line XXX
+
+It can be caused by an all-zeros page being logged with XLOG HEAP_NEWPAGE
+records. Look at the line in the buffer-image file, see if it's all-zeros.
+This can happen e.g when you change the tablespace of a table. See
+log_newpage() in heapam.c.
+
+You can use pg_xlogdump to see which WAL record a page image corresponds
+to. But beware that the LSN in the page image points to the *end* of the
+WAL record, while the LSN that pg_xlogdump prints is the *beginning* of
+the WAL record. So to find which WAL record a page image corresponds to,
+find the LSN from the page image in pg_xlogdump output, and back off one
+record. (you can't just grep for the line containing the LSN).
diff --git a/contrib/page_image_logging/postprocess-images.c b/contrib/page_image_logging/postprocess-images.c
new file mode 100644
index 000..6b4ab4c
--- /dev/null
+++

Re: [HACKERS] WAL replay bugs

2014-04-18 Thread Peter Geoghegan

On Thu, Apr 17, 2014 at 10:33 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Any objections to changing those two?

 Not here.  I've always suspected #2 was going to bite us someday anyway.

+1


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-04-17 Thread Heikki Linnakangas


On 04/08/2014 06:41 AM, Michael Paquier wrote:

On Tue, Apr 8, 2014 at 3:16 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:


I've been playing with a little hack that records a before and after image
of every page modification that is WAL-logged, and writes the images to a
file along with the LSN of the corresponding WAL record. I set up a
master-standby replication with that hack in place in both servers, and ran
the regression suite. Then I compared the after images after every WAL
record, as written on master, and as replayed by the standby.

Assuming that adding some dedicated hooks in the core able to do
actions before and after a page modification occur is not *that*
costly (well I imagine that it is not acceptable in terms of
performance), could it be possible to get that in the shape of a
extension that could be used to test WAL record consistency? This may
be an idea to think about...


Yeah, working on it. It can live as a patch set if nothing else.

This has been very fruitful, I just committed another fix for a bug I 
found with this earlier today.


There are quite a few things that cause differences between master and 
standby. We have hint bits in many places, unused space that isn't 
zeroed etc.


Two things that are not bugs, but I'd like to change just to make this 
tool easier to maintain, and to generally clean things up:


1. When creating a sequence, we first use simple_heap_insert() to insert 
the sequence tuple, which creates a WAL record. Then we write a new 
sequence RM WAL record about the same thing. The reason is that the WAL 
record written by regular heap_insert is bogus for a sequence tuple. 
After replaying just the heap insertion, but not the other record, the 
page doesn't have the magic value indicating that it's a sequence, i.e. 
it's broken as a sequence page. That's OK because we only do this when 
creating a new sequence, so if we crash between those two records, the 
whole relation is not visible to anyone. Nevertheless, I'd like to fix 
that by using PageAddItem directly to insert the tuple, instead of 
simple_heap_insert. We have to override the xmin field of the tuple 
anyway, and we don't need any of the other services like finding the 
insert location, toasting, visibility map or freespace map updates, that 
simple_heap_insert() provides.


2. _bt_restore_page, when restoring a B-tree page split record. It adds 
tuples to the page in reverse order compared to how it's done in master. 
There is a comment noting that, and it asks Is it worth changing just 
on general principles?. Yes, I think it is.


Any objections to changing those two?

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-04-17 Thread Tom Lane

Heikki Linnakangas hlinnakan...@vmware.com writes:
 Two things that are not bugs, but I'd like to change just to make this 
 tool easier to maintain, and to generally clean things up:

 1. When creating a sequence, we first use simple_heap_insert() to insert 
 the sequence tuple, which creates a WAL record. Then we write a new 
 sequence RM WAL record about the same thing. The reason is that the WAL 
 record written by regular heap_insert is bogus for a sequence tuple. 
 After replaying just the heap insertion, but not the other record, the 
 page doesn't have the magic value indicating that it's a sequence, i.e. 
 it's broken as a sequence page. That's OK because we only do this when 
 creating a new sequence, so if we crash between those two records, the 
 whole relation is not visible to anyone. Nevertheless, I'd like to fix 
 that by using PageAddItem directly to insert the tuple, instead of 
 simple_heap_insert. We have to override the xmin field of the tuple 
 anyway, and we don't need any of the other services like finding the 
 insert location, toasting, visibility map or freespace map updates, that 
 simple_heap_insert() provides.

 2. _bt_restore_page, when restoring a B-tree page split record. It adds 
 tuples to the page in reverse order compared to how it's done in master. 
 There is a comment noting that, and it asks Is it worth changing just 
 on general principles?. Yes, I think it is.

 Any objections to changing those two?

Not here.  I've always suspected #2 was going to bite us someday anyway.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-04-10 Thread sachin kotwal


I executed given  steps many times to produce this bug.
But still I unable to hit this bug.
I used attached scripts to produce this bug.

Can I get scripts to produce this bug?


wal_replay_bug.sh
http://postgresql.1045698.n5.nabble.com/file/n5799512/wal_replay_bug.sh  



-
Thanks and Regards,

Sachin Kotwal
NTT-DATA-OSS Center (Pune)
--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/WAL-replay-bugs-tp5799053p5799512.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-04-10 Thread Heikki Linnakangas


On 04/10/2014 10:52 AM, sachin kotwal wrote:


I executed given  steps many times to produce this bug.
But still I unable to hit this bug.
I used attached scripts to produce this bug.

Can I get scripts to produce this bug?


wal_replay_bug.sh
http://postgresql.1045698.n5.nabble.com/file/n5799512/wal_replay_bug.sh


Oh, I can't reproduce it using that script either. I must've used some 
variation of it, and posted wrong script.


The attached seems to do the trick. I changed the INSERT statements 
slightly, so that all the new rows have the same key.


Thanks for verifying this!

- Heikki


wal_replay_bug.sh
Description: application/shellscript

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-04-10 Thread Sachin D. Kotwal

On Thu, Apr 10, 2014 at 6:21 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 04/10/2014 10:52 AM, sachin kotwal wrote:


 I executed given  steps many times to produce this bug.
 But still I unable to hit this bug.
 I used attached scripts to produce this bug.

 Can I get scripts to produce this bug?



 Oh, I can't reproduce it using that script either. I must've used some
 variation of it, and posted wrong script.

 The attached seems to do the trick. I changed the INSERT statements
 slightly, so that all the new rows have the same key.

 Thanks for verifying this!


Thanks to explain the case to produce this bug.
I am able to produce this bug by using latest scripts from last mail.
I applied patch submitted for this bug and re-run the scripts.
Now it is giving correct result.


Thanks and Regards,

Sachin Kotwal

[HACKERS] WAL replay bugs

2014-04-07 Thread Heikki Linnakangas



I've been playing with a little hack that records a before and after 
image of every page modification that is WAL-logged, and writes the 
images to a file along with the LSN of the corresponding WAL record. I 
set up a master-standby replication with that hack in place in both 
servers, and ran the regression suite. Then I compared the after images 
after every WAL record, as written on master, and as replayed by the 
standby.


The idea is that the page content in the standby after replaying a WAL 
record should be identical to the page in the master, when the WAL 
record was generated. There are some known cases where that doesn't 
hold, but it's a useful sanity check. To reduce noise, I've been 
focusing on one access method at a time, filtering out others.


I did that for GIN first, and indeed found a bug in my new 
incomplete-split code, see commit 594bac42. After fixing that, and 
zeroing some padding bytes (38a2b95c), I'm now getting a clean run with 
that.



Next, I took on GiST, and lo-and-behold found a bug there pretty quickly 
as well. This one has been there ever since we got Hot Standby: the redo 
of a page update (e.g an insertion) resets the right-link of the page. 
If there is a concurrent scan, in a hot standby server, that scan might 
still need the rightlink, and will hence miss some tuples. This can be 
reproduced like this:


1. in master, create test table.

CREATE TABLE gisttest (id int4);
CREATE INDEX gisttest_idx ON gisttest USING gist (id);
INSERT INTO gisttest SELECT g * 1000 from generate_series(1, 10) g;

-- Test function. Starts a scan, fetches one row from it, then waits 10 
seconds until fetching the rest of the rows.

-- Returns the number of rows scanned. Should be 10 if you follow
-- these test instructions.
CREATE OR REPLACE FUNCTION gisttestfunc() RETURNS int AS
$$
declare
  i int4;
  t text;
  cur CURSOR FOR SELECT 'foo' FROM gisttest WHERE id = 0;
begin
  set enable_seqscan=off; set enable_bitmapscan=off;

  i = 0;
  OPEN cur;
  FETCH cur INTO t;

  perform pg_sleep(10);

  LOOP
EXIT WHEN NOT FOUND; -- this is bogus on first iteration
i = i + 1;
FETCH cur INTO t;
  END LOOP;
  CLOSE cur;
  RETURN i;
END;
$$ LANGUAGE plpgsql;

2. in standby

SELECT gisttestfunc();
blocks

3. Quickly, before the scan in standby continues, cause some page splits:

INSERT INTO gisttest SELECT g * 1000+1 from generate_series(1, 10) g;

4. The scan in standby finishes. It should return 10, but will 
return a lower number if you hit the bug.



At a quick glance, I think fixing that is just a matter of not resetting 
the right-link. I'll take a closer look tomorrow, but for now I just 
wanted to report what I've been doing. I'll post the scripts I've been 
using later too - nag me if I don't.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-04-07 Thread Josh Berkus

On 04/07/2014 02:16 PM, Heikki Linnakangas wrote:
 I've been playing with a little hack that records a before and after
 image of every page modification that is WAL-logged, and writes the
 images to a file along with the LSN of the corresponding WAL record. I
 set up a master-standby replication with that hack in place in both
 servers, and ran the regression suite. Then I compared the after images
 after every WAL record, as written on master, and as replayed by the
 standby.

This is awesome ... thank you for doing this.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-04-07 Thread Michael Paquier

On Tue, Apr 8, 2014 at 3:16 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

 I've been playing with a little hack that records a before and after image
 of every page modification that is WAL-logged, and writes the images to a
 file along with the LSN of the corresponding WAL record. I set up a
 master-standby replication with that hack in place in both servers, and ran
 the regression suite. Then I compared the after images after every WAL
 record, as written on master, and as replayed by the standby.
Assuming that adding some dedicated hooks in the core able to do
actions before and after a page modification occur is not *that*
costly (well I imagine that it is not acceptable in terms of
performance), could it be possible to get that in the shape of a
extension that could be used to test WAL record consistency? This may
be an idea to think about...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

40 matches

Mail list logo