Re: [HACKERS] Re: [ADMIN] PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum

2011-03-02 Thread daveg
On Wed, Mar 02, 2011 at 04:20:24PM -0800, bricklen wrote:
> On Wed, Mar 2, 2011 at 3:53 PM, daveg  wrote:
> >> > Postgresql version is 8.4.4.
> >>
> >> I don't see how this could be related, but since you're running on NFS,
> >> maybe it is, somehow:
> >> http://archives.postgresql.org/message-id/4d40ddb7.1010...@credativ.com
> >> (for example what if the visibility map fork's last page is overwritten?)
> >
> > Running on ISCSI, not nfs. But it is still a Netapp, so who knows. I'll 
> > look.
> > Also, we are not seeing any of the "unexpected data beyond EOF" errors,
> > just thousands per day of the PD_ALL_VISIBLE error.
> >
> > -dg
> 
> FWIW, we had a couple occurrences of that message about a month ago on 9.0.2
> 
> http://archives.postgresql.org/pgsql-general/2011-01/msg00887.php
> 
> Haven't seen it since we ran a cluster-wide vacuum.

We did a shutdown and restart to clear the buffer cache (but did not reboot
the host) and a vacuum on all dbs in the cluster last night. That cleared it
up for a couple hours, but we are still getting lots of these messages.

Most of them are pg_statistic and we create and drop hundreds of thousands of
temp tables daily, so there is a good chance there is a concurrancy issue.

-dg
 

-- 
David Gould   da...@sonic.net  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [ADMIN] PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum

2011-03-02 Thread bricklen
On Wed, Mar 2, 2011 at 3:53 PM, daveg  wrote:
>> > Postgresql version is 8.4.4.
>>
>> I don't see how this could be related, but since you're running on NFS,
>> maybe it is, somehow:
>> http://archives.postgresql.org/message-id/4d40ddb7.1010...@credativ.com
>> (for example what if the visibility map fork's last page is overwritten?)
>
> Running on ISCSI, not nfs. But it is still a Netapp, so who knows. I'll look.
> Also, we are not seeing any of the "unexpected data beyond EOF" errors,
> just thousands per day of the PD_ALL_VISIBLE error.
>
> -dg

FWIW, we had a couple occurrences of that message about a month ago on 9.0.2

http://archives.postgresql.org/pgsql-general/2011-01/msg00887.php

Haven't seen it since we ran a cluster-wide vacuum.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [ADMIN] PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum

2011-03-02 Thread daveg
On Wed, Mar 02, 2011 at 06:45:13PM -0300, Alvaro Herrera wrote:
> Excerpts from daveg's message of mié mar 02 18:30:34 -0300 2011:
> 
> > After a restart and vacuum of all dbs with no other activity things were
> > quiet for a couple hours and then we started seeing these PD_ALL_VISIBLE
> > messages again. 
> > 
> > Going back through the logs we have been getting these since at least before
> > mid January. Oddly, this only happens on four systems which are all new Dell
> > 32 core Nehalem 512GB machines using iscsi partitions served off a Netapp.
> > Our older 8 core 64GB hosts have never logged any of these errors. I'm not
> > saying it is related to the hw, as these hosts are doing a lot more work 
> > than
> > the old hosts so it may be a concurrency problem that just never came up at
> > lower levels before.
> > 
> > Postgresql version is 8.4.4.
> 
> I don't see how this could be related, but since you're running on NFS,
> maybe it is, somehow:
> http://archives.postgresql.org/message-id/4d40ddb7.1010...@credativ.com
> (for example what if the visibility map fork's last page is overwritten?)

Running on ISCSI, not nfs. But it is still a Netapp, so who knows. I'll look.
Also, we are not seeing any of the "unexpected data beyond EOF" errors,
just thousands per day of the PD_ALL_VISIBLE error.

-dg
 

-- 
David Gould   da...@sonic.net  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [ADMIN] PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum

2011-03-02 Thread Alvaro Herrera
Excerpts from daveg's message of mié mar 02 18:30:34 -0300 2011:

> After a restart and vacuum of all dbs with no other activity things were
> quiet for a couple hours and then we started seeing these PD_ALL_VISIBLE
> messages again. 
> 
> Going back through the logs we have been getting these since at least before
> mid January. Oddly, this only happens on four systems which are all new Dell
> 32 core Nehalem 512GB machines using iscsi partitions served off a Netapp.
> Our older 8 core 64GB hosts have never logged any of these errors. I'm not
> saying it is related to the hw, as these hosts are doing a lot more work than
> the old hosts so it may be a concurrency problem that just never came up at
> lower levels before.
> 
> Postgresql version is 8.4.4.

I don't see how this could be related, but since you're running on NFS,
maybe it is, somehow:
http://archives.postgresql.org/message-id/4d40ddb7.1010...@credativ.com
(for example what if the visibility map fork's last page is overwritten?)

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [ADMIN] PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum

2011-03-02 Thread daveg
On Tue, Mar 01, 2011 at 01:20:43PM -0800, daveg wrote:
> On Tue, Mar 01, 2011 at 12:00:54AM +0200, Heikki Linnakangas wrote:
> > On 28.02.2011 23:28, daveg wrote:
> > >On Wed, Jan 12, 2011 at 10:46:14AM +0200, Heikki Linnakangas wrote:
> > >>We'll likely need to go back and forth a few times with various
> > >>debugging patches until we get to the heart of this..
> > >
> > >Anything new on this? I'm seeing at on one of my clients production boxes.
> > 
> > I haven't heard anything from the OP since.
> > 
> > >Also, what is the significance, ie what is the risk or damage potential if
> > >this flag is set incorrectly?
> > 
> > Sequential scans will honor the flag, so you might see some dead rows 
> > incorrectly returned by a sequential scan. That's the only "damage", but 
> > an incorrectly set flag could be a sign of something more sinister, like 
> > corrupt tuple headers. The flag should never be set incorrectly, so if 
> > you see that message you have hit a bug in PostgreSQL, or you have bad 
> > hardware.
> > 
> > This flag is quite new, so a bug in PostgreSQL is quite possible. If you 
> > still have a backup that contains those incorrectly set flags, I'd like 
> > to see what the page looks like.
> 
> 
> I ran vacuums on all the affected tables last night. I plan to take a downtime
> to clear the buffer cache and then to run vacuums on all the dbs in the
> cluster.
> 
> Most but not all the tables involved are catalogs.
> 
> However, I could probably pick up your old patch sometime next week if it
> recurrs and send you page images.

After a restart and vacuum of all dbs with no other activity things were
quiet for a couple hours and then we started seeing these PD_ALL_VISIBLE
messages again. 

Going back through the logs we have been getting these since at least before
mid January. Oddly, this only happens on four systems which are all new Dell
32 core Nehalem 512GB machines using iscsi partitions served off a Netapp.
Our older 8 core 64GB hosts have never logged any of these errors. I'm not
saying it is related to the hw, as these hosts are doing a lot more work than
the old hosts so it may be a concurrency problem that just never came up at
lower levels before.

Postgresql version is 8.4.4.

I'll pick up Heikkis page logging patch and run it for a bit to get some
damaged page images. What else could I be doing to track this down?

-dg

-- 
David Gould   da...@sonic.net  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [ADMIN] PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum

2011-03-01 Thread daveg
On Tue, Mar 01, 2011 at 12:00:54AM +0200, Heikki Linnakangas wrote:
> On 28.02.2011 23:28, daveg wrote:
> >On Wed, Jan 12, 2011 at 10:46:14AM +0200, Heikki Linnakangas wrote:
> >>We'll likely need to go back and forth a few times with various
> >>debugging patches until we get to the heart of this..
> >
> >Anything new on this? I'm seeing at on one of my clients production boxes.
> 
> I haven't heard anything from the OP since.
> 
> >Also, what is the significance, ie what is the risk or damage potential if
> >this flag is set incorrectly?
> 
> Sequential scans will honor the flag, so you might see some dead rows 
> incorrectly returned by a sequential scan. That's the only "damage", but 
> an incorrectly set flag could be a sign of something more sinister, like 
> corrupt tuple headers. The flag should never be set incorrectly, so if 
> you see that message you have hit a bug in PostgreSQL, or you have bad 
> hardware.
> 
> This flag is quite new, so a bug in PostgreSQL is quite possible. If you 
> still have a backup that contains those incorrectly set flags, I'd like 
> to see what the page looks like.


I ran vacuums on all the affected tables last night. I plan to take a downtime
to clear the buffer cache and then to run vacuums on all the dbs in the
cluster.

Most but not all the tables involved are catalogs.

However, I could probably pick up your old patch sometime next week if it
recurrs and send you page images.

-dg

-- 
David Gould   da...@sonic.net  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [ADMIN] PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum

2011-03-01 Thread daveg
On Mon, Feb 28, 2011 at 07:43:39PM -0600, David Christensen wrote:
> 
> On Feb 28, 2011, at 3:28 PM, daveg wrote:
> 
> > Anything new on this? I'm seeing at on one of my clients production boxes.
> > Also, what is the significance, ie what is the risk or damage potential if
> > this flag is set incorrectly?
> 
> 
> Was this cluster upgraded to 8.4.4 from 8.4.0?  It sounds to me like a known 
> bug in 8.4.0 which was fixed by this commit:
> 
> commit 7fc7a7c4d082bfbd579f49e92b046dd51f1faf5f
> Author: Tom Lane 
> Date:   Mon Aug 24 02:18:32 2009 +
> 
> Fix a violation of WAL coding rules in the recent patch to include an
> "all tuples visible" flag in heap page headers.  The flag update *must*
> be applied before calling XLogInsert, but heap_update and the tuple
> moving routines in VACUUM FULL were ignoring this rule.  A crash and
> replay could therefore leave the flag incorrectly set, causing rows
> to appear visible in seqscans when they should not be.  This might explain
> recent reports of data corruption from Jeff Ross and others.
> 
> In passing, do a bit of editorialization on comments in visibilitymap.c.
> 
> oy:postgresql machack$ git describe --tag 
> 7fc7a7c4d082bfbd579f49e92b046dd51f1faf5f
> REL8_4_0-190-g7fc7a7c
> 
> If the flag got twiddled while running as 8.4.0, the incorrect PD_ALL_VISIBLE 
> flag would (obviously) not be fixed by the upgrade to 8.4.4.  (Is this a 
> separate issue?)

This cluster was installed with 8.4.4. So it is still an existing problem.
Also, to my recollection, this cluster has never crashed.

-dg


-- 
David Gould   da...@sonic.net  510 536 1443510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [ADMIN] PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum

2011-03-01 Thread Maxim Boguk
Hi

On Tue, Mar 1, 2011 at 11:00 AM, Heikki Linnakangas <
heikki.linnakan...@enterprisedb.com> wrote:

> On 28.02.2011 23:28, daveg wrote:
>
>> On Wed, Jan 12, 2011 at 10:46:14AM +0200, Heikki Linnakangas wrote:
>>
>>> We'll likely need to go back and forth a few times with various
>>> debugging patches until we get to the heart of this..
>>>
>>
>> Anything new on this? I'm seeing at on one of my clients production boxes.
>>
>
> I haven't heard anything from the OP since.
>
>
>  Also, what is the significance, ie what is the risk or damage potential if
>> this flag is set incorrectly?
>>
>
> Sequential scans will honor the flag, so you might see some dead rows
> incorrectly returned by a sequential scan. That's the only "damage", but an
> incorrectly set flag could be a sign of something more sinister, like
> corrupt tuple headers. The flag should never be set incorrectly, so if you
> see that message you have hit a bug in PostgreSQL, or you have bad hardware.
>
> This flag is quite new, so a bug in PostgreSQL is quite possible. If you
> still have a backup that contains those incorrectly set flags, I'd like to
> see what the page looks like.
>
>
> --
>  Heikki Linnakangas
>  EnterpriseDB   http://www.enterprisedb.com
>


Sorry, I was a bit busy lately.
This effect on my production servers seems have floating nature (some weeks
it happen each day many time and some weeks - not at all... phase of the
moon may be). Last 2 weeks I don't see that error in database logs so I
can't add anything more.

What make situation even harder to debug it is I can't compile and install
patched version of database engine (because it is production server and I
physically don't have access to changing software there). So I trying to
reproduce error on single connect  (so I can attach GDB and look inside) but
without luck during last week.

-- 
Maxim Boguk
Senior Postgresql DBA.

Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
If they can send one man to the moon... why can't they send them all?

МойКруг: http://mboguk.moikrug.ru/
Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не
все.


Re: [HACKERS] Re: [ADMIN] PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum

2011-02-28 Thread Fujii Masao
On Tue, Mar 1, 2011 at 10:43 AM, David Christensen  wrote:
> Was this cluster upgraded to 8.4.4 from 8.4.0?  It sounds to me like a known 
> bug in 8.4.0 which was fixed by this commit:
>
> commit 7fc7a7c4d082bfbd579f49e92b046dd51f1faf5f
> Author: Tom Lane 
> Date:   Mon Aug 24 02:18:32 2009 +
>
>    Fix a violation of WAL coding rules in the recent patch to include an
>    "all tuples visible" flag in heap page headers.  The flag update *must*
>    be applied before calling XLogInsert, but heap_update and the tuple
>    moving routines in VACUUM FULL were ignoring this rule.  A crash and
>    replay could therefore leave the flag incorrectly set, causing rows
>    to appear visible in seqscans when they should not be.  This might explain
>    recent reports of data corruption from Jeff Ross and others.
>
>    In passing, do a bit of editorialization on comments in visibilitymap.c.
>
> oy:postgresql machack$ git describe --tag 
> 7fc7a7c4d082bfbd579f49e92b046dd51f1faf5f
> REL8_4_0-190-g7fc7a7c
>
> If the flag got twiddled while running as 8.4.0, the incorrect PD_ALL_VISIBLE 
> flag would (obviously) not be fixed by the upgrade to 8.4.4.  (Is this a 
> separate issue?)

Yes, it's a different issue. I observed it on 8.4.2.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [ADMIN] PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum

2011-02-28 Thread David Christensen

On Feb 28, 2011, at 3:28 PM, daveg wrote:

> On Wed, Jan 12, 2011 at 10:46:14AM +0200, Heikki Linnakangas wrote:
>> On 12.01.2011 06:21, Fujii Masao wrote:
>>> On Sat, Dec 25, 2010 at 2:09 PM, Maxim Boguk  wrote:
 While I trying create reproducible test case for BUG #5798 I
 encountered very strange effect on two of my servers (both servers
 have same hardware platform/OS (freebsd 7.2) and PostgreSQL 8.4.4).
 
 Very simple test table created as:
 CREATE TABLE test (id integer);
 INSERT INTO test select generate_series(0,1);
 
 And I trying repeateble vacuum of that table with script:
 perl -e "foreach (1..10) {system \"psql -d test -h -c 'vacuum 
 test'\";}"
 
 And once per like an minute (really random intervals can be 5 minutes
 without problems can be 3 vacuum in row show same error)  I getting
 next errors:
 WARNING:  PD_ALL_VISIBLE flag was incorrectly set in relation "test" page 
 1
 ...
 WARNING:  PD_ALL_VISIBLE flag was incorrectly set in relation "test"
 page 30 for all pages of the relation.
>> 
>> Oh, interesting. This is the first time anyone can reliably reproducible 
>> that. I can't reproduce that on my laptop with that script, though, so 
>> I'm going to need your help to debug this.
>> 
>> Can you compile PostgreSQL with the attached patch, and rerun the test? 
>> It will dump the pages with incorrectly set flags to files in /tmp/, and 
>> adds a bit more detail in the WARNING.  Please run the test until you 
>> get those warnings, and tar up the the created "/tmp/pageimage*" files, 
>> and post them along with the warning generated.
>> 
>> We'll likely need to go back and forth a few times with various 
>> debugging patches until we get to the heart of this..
> 
> Anything new on this? I'm seeing at on one of my clients production boxes.
> Also, what is the significance, ie what is the risk or damage potential if
> this flag is set incorrectly?


Was this cluster upgraded to 8.4.4 from 8.4.0?  It sounds to me like a known 
bug in 8.4.0 which was fixed by this commit:

commit 7fc7a7c4d082bfbd579f49e92b046dd51f1faf5f
Author: Tom Lane 
Date:   Mon Aug 24 02:18:32 2009 +

Fix a violation of WAL coding rules in the recent patch to include an
"all tuples visible" flag in heap page headers.  The flag update *must*
be applied before calling XLogInsert, but heap_update and the tuple
moving routines in VACUUM FULL were ignoring this rule.  A crash and
replay could therefore leave the flag incorrectly set, causing rows
to appear visible in seqscans when they should not be.  This might explain
recent reports of data corruption from Jeff Ross and others.

In passing, do a bit of editorialization on comments in visibilitymap.c.

oy:postgresql machack$ git describe --tag 
7fc7a7c4d082bfbd579f49e92b046dd51f1faf5f
REL8_4_0-190-g7fc7a7c

If the flag got twiddled while running as 8.4.0, the incorrect PD_ALL_VISIBLE 
flag would (obviously) not be fixed by the upgrade to 8.4.4.  (Is this a 
separate issue?)

Regards,

David
--
David Christensen
End Point Corporation
da...@endpoint.com





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [ADMIN] PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum

2011-02-28 Thread daveg
On Wed, Jan 12, 2011 at 10:46:14AM +0200, Heikki Linnakangas wrote:
> On 12.01.2011 06:21, Fujii Masao wrote:
> >On Sat, Dec 25, 2010 at 2:09 PM, Maxim Boguk  wrote:
> >>While I trying create reproducible test case for BUG #5798 I
> >>encountered very strange effect on two of my servers (both servers
> >>have same hardware platform/OS (freebsd 7.2) and PostgreSQL 8.4.4).
> >>
> >>Very simple test table created as:
> >>CREATE TABLE test (id integer);
> >>INSERT INTO test select generate_series(0,1);
> >>
> >>And I trying repeateble vacuum of that table with script:
> >>  perl -e "foreach (1..10) {system \"psql -d test -h -c 'vacuum 
> >>  test'\";}"
> >>
> >>And once per like an minute (really random intervals can be 5 minutes
> >>without problems can be 3 vacuum in row show same error)  I getting
> >>next errors:
> >>WARNING:  PD_ALL_VISIBLE flag was incorrectly set in relation "test" page 
> >>1
> >>...
> >>WARNING:  PD_ALL_VISIBLE flag was incorrectly set in relation "test"
> >>page 30 for all pages of the relation.
> 
> Oh, interesting. This is the first time anyone can reliably reproducible 
> that. I can't reproduce that on my laptop with that script, though, so 
> I'm going to need your help to debug this.
> 
> Can you compile PostgreSQL with the attached patch, and rerun the test? 
> It will dump the pages with incorrectly set flags to files in /tmp/, and 
> adds a bit more detail in the WARNING.  Please run the test until you 
> get those warnings, and tar up the the created "/tmp/pageimage*" files, 
> and post them along with the warning generated.
> 
> We'll likely need to go back and forth a few times with various 
> debugging patches until we get to the heart of this..

Anything new on this? I'm seeing at on one of my clients production boxes.
Also, what is the significance, ie what is the risk or damage potential if
this flag is set incorrectly?

Thanks

-dg


-- 
David Gould   da...@sonic.net
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [ADMIN] PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum

2011-02-28 Thread Heikki Linnakangas

On 28.02.2011 23:28, daveg wrote:

On Wed, Jan 12, 2011 at 10:46:14AM +0200, Heikki Linnakangas wrote:

We'll likely need to go back and forth a few times with various
debugging patches until we get to the heart of this..


Anything new on this? I'm seeing at on one of my clients production boxes.


I haven't heard anything from the OP since.


Also, what is the significance, ie what is the risk or damage potential if
this flag is set incorrectly?


Sequential scans will honor the flag, so you might see some dead rows 
incorrectly returned by a sequential scan. That's the only "damage", but 
an incorrectly set flag could be a sign of something more sinister, like 
corrupt tuple headers. The flag should never be set incorrectly, so if 
you see that message you have hit a bug in PostgreSQL, or you have bad 
hardware.


This flag is quite new, so a bug in PostgreSQL is quite possible. If you 
still have a backup that contains those incorrectly set flags, I'd like 
to see what the page looks like.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [ADMIN] PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum

2011-01-12 Thread Heikki Linnakangas

On 12.01.2011 06:21, Fujii Masao wrote:

On Sat, Dec 25, 2010 at 2:09 PM, Maxim Boguk  wrote:

While I trying create reproducible test case for BUG #5798 I
encountered very strange effect on two of my servers (both servers
have same hardware platform/OS (freebsd 7.2) and PostgreSQL 8.4.4).

Very simple test table created as:
CREATE TABLE test (id integer);
INSERT INTO test select generate_series(0,1);

And I trying repeateble vacuum of that table with script:
  perl -e "foreach (1..10) {system \"psql -d test -h -c 'vacuum test'\";}"

And once per like an minute (really random intervals can be 5 minutes
without problems can be 3 vacuum in row show same error)  I getting
next errors:
WARNING:  PD_ALL_VISIBLE flag was incorrectly set in relation "test" page 1
...
WARNING:  PD_ALL_VISIBLE flag was incorrectly set in relation "test"
page 30 for all pages of the relation.


Oh, interesting. This is the first time anyone can reliably reproducible 
that. I can't reproduce that on my laptop with that script, though, so 
I'm going to need your help to debug this.


Can you compile PostgreSQL with the attached patch, and rerun the test? 
It will dump the pages with incorrectly set flags to files in /tmp/, and 
adds a bit more detail in the WARNING.  Please run the test until you 
get those warnings, and tar up the the created "/tmp/pageimage*" files, 
and post them along with the warning generated.


We'll likely need to go back and forth a few times with various 
debugging patches until we get to the heart of this..


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 67e0be9..0e88aa5 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -48,6 +48,7 @@
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
 #include "storage/bufmgr.h"
+#include "storage/fd.h"
 #include "storage/freespace.h"
 #include "storage/lmgr.h"
 #include "utils/inval.h"
@@ -668,13 +669,26 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 		/* Update the all-visible flag on the page */
 		if (!PageIsAllVisible(page) && all_visible)
 		{
+			elog(WARNING, "debugging: setting PD_ALL_VISIBLE in relation \"%s\" on page %u (OldestXmin %u)", relname, blkno, OldestXmin);
 			PageSetAllVisible(page);
 			SetBufferCommitInfoNeedsSave(buf);
 		}
 		else if (PageIsAllVisible(page) && !all_visible)
 		{
-			elog(WARNING, "PD_ALL_VISIBLE flag was incorrectly set in relation \"%s\" page %u",
- relname, blkno);
+			elog(WARNING, "PD_ALL_VISIBLE flag was incorrectly set in relation \"%s\" page %u (OldestXmin %u)",
+ relname, blkno, OldestXmin);
+			{
+char fname[MAXPGPATH];
+FILE *fp;
+
+/* dump the raw page to a file */
+snprintf(fname, sizeof(fname), "/tmp/pageimage_%s_%d",
+		 relname, blkno);
+fp = AllocateFile(fname, "wb");
+fwrite(page, 1, BLCKSZ, fp);
+FreeFile(fp);
+
+			}
 			PageClearAllVisible(page);
 			SetBufferCommitInfoNeedsSave(buf);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [ADMIN] PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum

2011-01-11 Thread Fujii Masao
On Sat, Dec 25, 2010 at 2:09 PM, Maxim Boguk  wrote:
> While I trying create reproducible test case for BUG #5798 I
> encountered very strange effect on two of my servers (both servers
> have same hardware platform/OS (freebsd 7.2) and PostgreSQL 8.4.4).
>
> Very simple test table created as:
> CREATE TABLE test (id integer);
> INSERT INTO test select generate_series(0,1);
>
> And I trying repeateble vacuum of that table with script:
>  perl -e "foreach (1..10) {system \"psql -d test -h -c 'vacuum test'\";}"
>
> And once per like an minute (really random intervals can be 5 minutes
> without problems can be 3 vacuum in row show same error)  I getting
> next errors:
> WARNING:  PD_ALL_VISIBLE flag was incorrectly set in relation "test" page 1
> ...
> WARNING:  PD_ALL_VISIBLE flag was incorrectly set in relation "test"
> page 30 for all pages of the relation.

The same problem happened on my customer's PostgreSQL 8.4.2
environment. Here are the error messages:

Dec 13 13:11:59 test postgres[28249]: [2-1] WARNING:  01000:
PD_ALL_VISIBLE flag was incorrectly set in relation "pg_statistic"
page 38
Dec 13 13:11:59 test postgres[28249]: [2-2] LOCATION:  lazy_scan_heap,
vacuumlazy.c:676
Dec 13 13:11:59 test postgres[28249]: [3-1] WARNING:  01000:
PD_ALL_VISIBLE flag was incorrectly set in relation "pg_statistic"
page 61
Dec 13 13:11:59 test postgres[28249]: [3-2] LOCATION:  lazy_scan_heap,
vacuumlazy.c:676
Dec 13 13:11:59 test postgres[28249]: [4-1] WARNING:  01000:
PD_ALL_VISIBLE flag was incorrectly set in relation "pg_statistic"
page 88
Dec 13 13:11:59 test postgres[28249]: [4-2] LOCATION:  lazy_scan_heap,
vacuumlazy.c:676
Dec 13 13:11:59 test postgres[28249]: [5-1] WARNING:  01000:
PD_ALL_VISIBLE flag was incorrectly set in relation "pg_statistic"
page 96
Dec 13 13:11:59 test postgres[28249]: [5-2] LOCATION:  lazy_scan_heap,
vacuumlazy.c:676
Dec 13 13:11:59 test postgres[28249]: [6-1] WARNING:  01000:
PD_ALL_VISIBLE flag was incorrectly set in relation "pg_statistic"
page 98
Dec 13 13:11:59 test postgres[28249]: [6-2] LOCATION:  lazy_scan_heap,
vacuumlazy.c:676
Dec 13 13:11:59 test postgres[28249]: [7-1] WARNING:  01000:
PD_ALL_VISIBLE flag was incorrectly set in relation "pg_statistic"
page 107
Dec 13 13:11:59 test postgres[28249]: [7-2] LOCATION:  lazy_scan_heap,
vacuumlazy.c:676
...
Dec 13 13:15:59 test postgres[9640]: [2-1] WARNING:  01000:
PD_ALL_VISIBLE flag was incorrectly set in relation "pg_statistic"
page 38
Dec 13 13:15:59 test postgres[9640]: [2-2] LOCATION:  lazy_scan_heap,
vacuumlazy.c:676
Dec 13 13:15:59 test postgres[9640]: [3-1] WARNING:  01000:
PD_ALL_VISIBLE flag was incorrectly set in relation "pg_statistic"
page 61
Dec 13 13:15:59 test postgres[9640]: [3-2] LOCATION:  lazy_scan_heap,
vacuumlazy.c:676
Dec 13 13:15:59 test postgres[9640]: [4-1] WARNING:  01000:
PD_ALL_VISIBLE flag was incorrectly set in relation "pg_statistic"
page 88
Dec 13 13:15:59 test postgres[9640]: [4-2] LOCATION:  lazy_scan_heap,
vacuumlazy.c:676
Dec 13 13:15:59 test postgres[9640]: [5-1] WARNING:  01000:
PD_ALL_VISIBLE flag was incorrectly set in relation "pg_statistic"
page 93
Dec 13 13:15:59 test postgres[9640]: [5-2] LOCATION:  lazy_scan_heap,
vacuumlazy.c:676
Dec 13 13:15:59 test postgres[9640]: [6-1] WARNING:  01000:
PD_ALL_VISIBLE flag was incorrectly set in relation "pg_statistic"
page 96
Dec 13 13:15:59 test postgres[9640]: [6-2] LOCATION:  lazy_scan_heap,
vacuumlazy.c:676
Dec 13 13:15:59 test postgres[9640]: [7-1] WARNING:  01000:
PD_ALL_VISIBLE flag was incorrectly set in relation "pg_statistic"
page 107
Dec 13 13:15:59 test postgres[9640]: [7-2] LOCATION:  lazy_scan_heap,
vacuumlazy.c:676
...

This problem was reported some times, but has not been resolved yet.
http://archives.postgresql.org/message-id/4C23A3CF.4080506%40frolix.muddywaters.org
http://archives.postgresql.org/message-id/g2o4b46b5f01004010610ib8625426uae6ee90ac1435ba1%40mail.gmail.com

Though I investigated the source code around PD_ALL_VISIBLE flag,
I could not identify the cause.

Does anyone have any ideas what the cause is?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers