Re: [HACKERS] Reviewing freeze map code

2016-06-14 Thread Thomas Munro
On Tue, Jun 14, 2016 at 4:02 AM, Robert Haas  wrote:
> On Sat, Jun 11, 2016 at 5:00 PM, Robert Haas  wrote:
>>> How about changing the return tuple of heap_prepare_freeze_tuple to
>>> a bitmap?  Two flags: "Freeze [not] done" and "[No] more freezing
>>> needed"
>>
>> Yes, I think something like that sounds about right.
>
> Here's a patch.  I took the approach of adding a separate bool out
> parameter instead.  I am also attaching an update of the
> check-visibility patch which responds to assorted review comments and
> adjusting it for the problems found on Friday which could otherwise
> lead to false positives.  I'm still getting occasional TIDs from the
> pg_check_visible() function during pgbench runs, though, so evidently
> not all is well with the world.

I'm still working out how half this stuff works, but I managed to get
pg_check_visible() to spit out a row every few seconds with the
following brute force approach:

CREATE TABLE foo (n int);
INSERT INTO foo SELECT generate_series(1, 10);

Three client threads (see attached script):
1.  Run VACUUM in a tight loop.
2.  Run UPDATE foo SET n = n + 1 in a tight loop.
3.  Run SELECT pg_check_visible('foo'::regclass) in a tight loop, and
print out any rows it produces.

I noticed that the tuples that it reported were always offset 1 in a
page, and that the page always had a maxoff over a couple of hundred,
and that we called record_corrupt_item because VM_ALL_VISIBLE returned
true but HeapTupleSatisfiesVacuum on the first tuple returned
HEAPTUPLE_DELETE_IN_PROGRESS instead of the expected HEAPTUPLE_LIVE.
It did that because HEAP_XMAX_COMMITTED was not set and
TransactionIdIsInProgress returned true for xmax.

Not sure how much of this was already obvious!  I will poke at it some
more tomorrow.

-- 
Thomas Munro
http://www.enterprisedb.com
# create table foo (n int);
# insert into foo select generate_series(1, 10);

import psycopg2
import threading
import time

keep_running = True

def vacuum_thread(dsn):
  conn = psycopg2.connect(dsn)
  conn.set_isolation_level(0)
  cursor = conn.cursor()
  while keep_running:
cursor.execute("VACUUM")

def update_thread(dsn):
  conn = psycopg2.connect(dsn)
  cursor = conn.cursor()
  while keep_running:
cursor.execute("UPDATE foo SET n = n + 1")
conn.commit()

def check_thread(dsn):
  conn = psycopg2.connect(dsn)
  conn.set_isolation_level(0)
  cursor = conn.cursor()
  while keep_running:
cursor.execute("SELECT pg_check_visible('foo'::regclass)")
for row in cursor:
  print row
conn.rollback()

if __name__ == "__main__":
  dsn = "dbname=postgres"
  t1 = threading.Thread(target = vacuum_thread, args = [dsn])
  t1.start()
  t2 = threading.Thread(target = update_thread, args = [dsn])
  t2.start()
  t3 = threading.Thread(target = check_thread, args = [dsn])
  t3.start()
  try:
time.sleep(86400)
  except KeyboardInterrupt:
keep_running = False
  t1.join()
  t2.join()
  t3.join()


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-13 Thread Andres Freund


On June 13, 2016 11:02:42 AM CDT, Robert Haas  wrote:

>(Official status update: I'm hoping that senior hackers will carefully
>review these patches for defects.  If they do not, I plan to commit
>the patches anyway neither less than 48 nor more than 60 hours from
>now after re-reviewing them myself.)

I'm traveling today and tomorrow, but will look after that.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-13 Thread Robert Haas
On Sat, Jun 11, 2016 at 5:00 PM, Robert Haas  wrote:
>> How about changing the return tuple of heap_prepare_freeze_tuple to
>> a bitmap?  Two flags: "Freeze [not] done" and "[No] more freezing
>> needed"
>
> Yes, I think something like that sounds about right.

Here's a patch.  I took the approach of adding a separate bool out
parameter instead.  I am also attaching an update of the
check-visibility patch which responds to assorted review comments and
adjusting it for the problems found on Friday which could otherwise
lead to false positives.  I'm still getting occasional TIDs from the
pg_check_visible() function during pgbench runs, though, so evidently
not all is well with the world.

(Official status update: I'm hoping that senior hackers will carefully
review these patches for defects.  If they do not, I plan to commit
the patches anyway neither less than 48 nor more than 60 hours from
now after re-reviewing them myself.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


fix-freeze-map-v1.patch
Description: binary/octet-stream


check-visibility-v4.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-12 Thread Amit Kapila
On Sat, Jun 11, 2016 at 1:24 AM, Robert Haas  wrote:
>
> 3. vacuumlazy.c includes this code:
>
> if (heap_prepare_freeze_tuple(tuple.t_data, FreezeLimit,
>   MultiXactCutoff,
[nfrozen]))
> frozen[nfrozen++].offset = offnum;
> else if (heap_tuple_needs_eventual_freeze(tuple.t_data))
> all_frozen = false;
>
> That's wrong, because a "true" return value from
> heap_prepare_freeze_tuple() means only that it has done *some*
> freezing work on the tuple, not that it's done all of the freezing
> work that will ever need to be done.  So, if the tuple's xmin can be
> frozen and is aborted but not older than vacuum_freeze_min_age, then
> heap_prepare_freeze_tuple() won't free xmax, but the page will still
> be marked all-frozen, which is bad.
>

To clarify, are you talking about a case where insertion has aborted?
Won't in such a case all_visible flag be set to false due to return value
from HeapTupleSatisfiesVacuum() and if so, later code shouldn't mark it as
all_frozen?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Reviewing freeze map code

2016-06-11 Thread Robert Haas
On Fri, Jun 10, 2016 at 4:55 PM, Alvaro Herrera
 wrote:
> Robert Haas wrote:
>> 3. vacuumlazy.c includes this code:
>>
>> if (heap_prepare_freeze_tuple(tuple.t_data, FreezeLimit,
>>   MultiXactCutoff, [nfrozen]))
>> frozen[nfrozen++].offset = offnum;
>> else if (heap_tuple_needs_eventual_freeze(tuple.t_data))
>> all_frozen = false;
>>
>> That's wrong, because a "true" return value from
>> heap_prepare_freeze_tuple() means only that it has done *some*
>> freezing work on the tuple, not that it's done all of the freezing
>> work that will ever need to be done.  So, if the tuple's xmin can be
>> frozen and is aborted but not older than vacuum_freeze_min_age, then
>> heap_prepare_freeze_tuple() won't free xmax, but the page will still
>> be marked all-frozen, which is bad.  I think it normally won't matter
>> because the xmax will probably be hinted invalid anyway, since we just
>> pruned the page which should have set hint bits everywhere, but if
>> those hint bits were lost then we'd eventually end up with an
>> accessible xmax pointing off into space.
>
> Good catch.  Also consider multixact freezing: if there is a
> long-running transaction which is a lock-only member of tuple's Xmax,
> and the multixact needs freezing because it's older than the multixact
> cutoff, we set the xmax to a new multixact which includes that old
> locker.  See FreezeMultiXactId.
>
>> My first thought was to just delete the "else" but that would be bad
>> because we'd fail to set all-frozen immediately in a lot of cases
>> where we should.  This needs a bit more thought than I have time to
>> give it right now.
>
> How about changing the return tuple of heap_prepare_freeze_tuple to
> a bitmap?  Two flags: "Freeze [not] done" and "[No] more freezing
> needed"

Yes, I think something like that sounds about right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-10 Thread Alvaro Herrera
Robert Haas wrote:

> 3. vacuumlazy.c includes this code:
> 
> if (heap_prepare_freeze_tuple(tuple.t_data, FreezeLimit,
>   MultiXactCutoff, [nfrozen]))
> frozen[nfrozen++].offset = offnum;
> else if (heap_tuple_needs_eventual_freeze(tuple.t_data))
> all_frozen = false;
> 
> That's wrong, because a "true" return value from
> heap_prepare_freeze_tuple() means only that it has done *some*
> freezing work on the tuple, not that it's done all of the freezing
> work that will ever need to be done.  So, if the tuple's xmin can be
> frozen and is aborted but not older than vacuum_freeze_min_age, then
> heap_prepare_freeze_tuple() won't free xmax, but the page will still
> be marked all-frozen, which is bad.  I think it normally won't matter
> because the xmax will probably be hinted invalid anyway, since we just
> pruned the page which should have set hint bits everywhere, but if
> those hint bits were lost then we'd eventually end up with an
> accessible xmax pointing off into space.

Good catch.  Also consider multixact freezing: if there is a
long-running transaction which is a lock-only member of tuple's Xmax,
and the multixact needs freezing because it's older than the multixact
cutoff, we set the xmax to a new multixact which includes that old
locker.  See FreezeMultiXactId.

> My first thought was to just delete the "else" but that would be bad
> because we'd fail to set all-frozen immediately in a lot of cases
> where we should.  This needs a bit more thought than I have time to
> give it right now.

How about changing the return tuple of heap_prepare_freeze_tuple to
a bitmap?  Two flags: "Freeze [not] done" and "[No] more freezing
needed"

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-10 Thread Robert Haas
On Fri, Jun 10, 2016 at 1:59 PM, Andres Freund  wrote:
>> Master, but with an existing standby. So it could be related to
>> hot_standby_feedback or such.
>
> I just managed to trigger it again.
>
>
> #1  0x7fa1a73778da in __GI_abort () at abort.c:89
> #2  0x7f9f1395e59c in record_corrupt_item (items=items@entry=0x2137be0, 
> tid=0x7f9fb8681c0c)
> at /home/andres/src/postgresql/contrib/pg_visibility/pg_visibility.c:612
> #3  0x7f9f1395ead5 in collect_corrupt_items (relid=relid@entry=29449, 
> all_visible=all_visible@entry=0 '\000', all_frozen=all_frozen@entry=1 '\001')
> at /home/andres/src/postgresql/contrib/pg_visibility/pg_visibility.c:572
> #4  0x7f9f1395f476 in pg_check_frozen (fcinfo=0x7ffe5343a200) at 
> /home/andres/src/postgresql/contrib/pg_visibility/pg_visibility.c:292
> #5  0x005fdbec in ExecMakeTableFunctionResult (funcexpr=0x2168630, 
> econtext=0x2168320, argContext=, expectedDesc=0x2168ef0,
> randomAccess=0 '\000') at 
> /home/andres/src/postgresql/src/backend/executor/execQual.c:2211
> #6  0x00616992 in FunctionNext (node=node@entry=0x2168210) at 
> /home/andres/src/postgresql/src/backend/executor/nodeFunctionscan.c:94
> #7  0x005ffdcb in ExecScanFetch (recheckMtd=0x6166f0 
> , accessMtd=0x616700 , node=0x2168210)
> at /home/andres/src/postgresql/src/backend/executor/execScan.c:95
> #8  ExecScan (node=node@entry=0x2168210, accessMtd=accessMtd@entry=0x616700 
> , recheckMtd=recheckMtd@entry=0x6166f0 )
> at /home/andres/src/postgresql/src/backend/executor/execScan.c:145
> #9  0x006169e4 in ExecFunctionScan (node=node@entry=0x2168210) at 
> /home/andres/src/postgresql/src/backend/executor/nodeFunctionscan.c:268
>
> the error happened just after I restarted a standby, so it's not
> unlikely to be related to hot_standby_feedback.

After some off-list discussion and debugging, Andres and I have
managed to identify three issues here (so far).  Two are issues in the
testing, and one is a data-corrupting bug in the freeze map code.

1. pg_check_visible keeps on using the same OldestXmin for all its
checks even though the real OldestXmin may advance in the meantime.
This can lead to spurious problem reports: pg_check_visible() thinks
that the tuple isn't all visible yet and reports it as corruption, but
in reality there's no problem.

2. pg_check_visible includes the same check for heap-xmin-committed
that vacuumlazy.c uses, but hint bits aren't crash safe, so this could
lead to a spurious trouble report in a scenario involving a crash.

3. vacuumlazy.c includes this code:

if (heap_prepare_freeze_tuple(tuple.t_data, FreezeLimit,
  MultiXactCutoff, [nfrozen]))
frozen[nfrozen++].offset = offnum;
else if (heap_tuple_needs_eventual_freeze(tuple.t_data))
all_frozen = false;

That's wrong, because a "true" return value from
heap_prepare_freeze_tuple() means only that it has done *some*
freezing work on the tuple, not that it's done all of the freezing
work that will ever need to be done.  So, if the tuple's xmin can be
frozen and is aborted but not older than vacuum_freeze_min_age, then
heap_prepare_freeze_tuple() won't free xmax, but the page will still
be marked all-frozen, which is bad.  I think it normally won't matter
because the xmax will probably be hinted invalid anyway, since we just
pruned the page which should have set hint bits everywhere, but if
those hint bits were lost then we'd eventually end up with an
accessible xmax pointing off into space.

My first thought was to just delete the "else" but that would be bad
because we'd fail to set all-frozen immediately in a lot of cases
where we should.  This needs a bit more thought than I have time to
give it right now.

(I will update on the status of this open item again no later than
Monday; probably sooner.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-10 Thread Andres Freund
On 2016-06-09 23:39:24 -0700, Andres Freund wrote:
> On 2016-06-10 11:58:26 +0530, Amit Kapila wrote:
> > I have tried in multiple ways by running pgbench with read-write tests, but
> > could not see any such behaviour.
> 
> It took over an hour of pgbench on a fast laptop till I saw it.
> 
> 
> > I have tried by even crashing and
> > restarting the server and then again running pgbench.  Do you see these
> > records on master or slave?
> 
> Master, but with an existing standby. So it could be related to
> hot_standby_feedback or such.

I just managed to trigger it again.


#1  0x7fa1a73778da in __GI_abort () at abort.c:89
#2  0x7f9f1395e59c in record_corrupt_item (items=items@entry=0x2137be0, 
tid=0x7f9fb8681c0c)
at /home/andres/src/postgresql/contrib/pg_visibility/pg_visibility.c:612
#3  0x7f9f1395ead5 in collect_corrupt_items (relid=relid@entry=29449, 
all_visible=all_visible@entry=0 '\000', all_frozen=all_frozen@entry=1 '\001')
at /home/andres/src/postgresql/contrib/pg_visibility/pg_visibility.c:572
#4  0x7f9f1395f476 in pg_check_frozen (fcinfo=0x7ffe5343a200) at 
/home/andres/src/postgresql/contrib/pg_visibility/pg_visibility.c:292
#5  0x005fdbec in ExecMakeTableFunctionResult (funcexpr=0x2168630, 
econtext=0x2168320, argContext=, expectedDesc=0x2168ef0, 
randomAccess=0 '\000') at 
/home/andres/src/postgresql/src/backend/executor/execQual.c:2211
#6  0x00616992 in FunctionNext (node=node@entry=0x2168210) at 
/home/andres/src/postgresql/src/backend/executor/nodeFunctionscan.c:94
#7  0x005ffdcb in ExecScanFetch (recheckMtd=0x6166f0 , 
accessMtd=0x616700 , node=0x2168210)
at /home/andres/src/postgresql/src/backend/executor/execScan.c:95
#8  ExecScan (node=node@entry=0x2168210, accessMtd=accessMtd@entry=0x616700 
, recheckMtd=recheckMtd@entry=0x6166f0 )
at /home/andres/src/postgresql/src/backend/executor/execScan.c:145
#9  0x006169e4 in ExecFunctionScan (node=node@entry=0x2168210) at 
/home/andres/src/postgresql/src/backend/executor/nodeFunctionscan.c:268

the error happened just after I restarted a standby, so it's not
unlikely to be related to hot_standby_feedback.


(gdb) p *tuple.t_data
$5 = {t_choice = {t_heap = {t_xmin = 9105470, t_xmax = 26049273, t_field3 = 
{t_cid = 0, t_xvac = 0}}, t_datum = {datum_len_ = 9105470, 
  datum_typmod = 26049273, datum_typeid = 0}}, t_ctid = {ip_blkid = {bi_hi 
= 1, bi_lo = 19765}, ip_posid = 3}, t_infomask2 = 4, t_infomask = 770, 
  t_hoff = 24 '\030', t_bits = 0x7f9fb8681c17 ""}

Infomask is:
#define HEAP_XMIN_COMMITTED 0x0100  /* t_xmin committed */
#define HEAP_XMIN_INVALID   0x0200  /* t_xmin invalid/aborted */
#define HEAP_XMIN_FROZEN(HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID)
#define HEAP_HASVARWIDTH0x0002  /* has variable-width 
attribute(s) */

This indeed looks borked.  Such a tuple should never survive
if (check_frozen && !VM_ALL_FROZEN(rel, blkno, ))
check_frozen = false;
especially not when
(gdb) p PageIsAllVisible(page)
$3 = 4

(fwiw, checking PD_ALL_VISIBLE in those functions sounds like a good plan)


I've got another earlier case (that I somehow missed seeing), below
check_visible:

(gdb) p *tuple->t_data 
$2 = {t_choice = {t_heap = {t_xmin = 13616549, t_xmax = 25210801, t_field3 = 
{t_cid = 0, t_xvac = 0}}, t_datum = {datum_len_ = 13616549, 
  datum_typmod = 25210801, datum_typeid = 0}}, t_ctid = {ip_blkid = {bi_hi 
= 0, bi_lo = 52320}, ip_posid = 67}, t_infomask2 = 32772, t_infomask = 8962, 
  t_hoff = 24 '\030', t_bits = 0x7f9fda2f8717 ""}

infomask is:
#define HEAP_UPDATED0x2000  /* this is UPDATEd version of 
row */
#define HEAP_XMIN_COMMITTED 0x0100  /* t_xmin committed */
#define HEAP_XMIN_INVALID   0x0200  /* t_xmin invalid/aborted */
#define HEAP_HASVARWIDTH0x0002  /* has variable-width 
attribute(s) */
infomask2 is:
#define HEAP_ONLY_TUPLE 0x8000  /* this is heap-only tuple */

I'll run again, with a debugger attached, maybe I can get some more
information.


Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-10 Thread Amit Kapila
On Fri, Jun 10, 2016 at 12:09 PM, Andres Freund  wrote:
>
> On 2016-06-10 11:58:26 +0530, Amit Kapila wrote:
>
>
> > While looking at code in this area, I observed that during replay of
> > records (heap_xlog_delete), we first clear the vm, then update the page.
> > So we don't have Buffer lock while updating the vm where as in the patch
> > (collect_corrupt_items()), we are relying on the fact that for clearing
vm
> > bit one needs to acquire buffer lock.  Can that cause a problem?
>
> Unsetting a vm bit is always safe, right?
>

I think so, which means this should not be a problem area.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Reviewing freeze map code

2016-06-10 Thread Masahiko Sawada
On Fri, Jun 10, 2016 at 1:11 AM, Robert Haas  wrote:
> On Thu, Jun 9, 2016 at 5:48 AM, Amit Kapila  wrote:
>> Attached patch implements the above 2 functions.  I have addressed the
>> comments by Sawada San and you in latest patch and updated the documentation
>> as well.
>
> I made a number of changes to this patch.  Here is the new version.
>
> 1. The algorithm you were using for growing the array size is unsafe
> and can easily overrun the array.  Suppose that each of the first two
> pages have some corrupt tuples, more than 50% of MaxHeapTuplesPerPage
> but less than the full value of MaxTuplesPerPage.  Your code will
> conclude that the array does need to be enlarged after processing the
> first page.  I switched this to what I consider the normal coding
> pattern for such problems.
>
> 2. The all-visible checks seemed to me to be incorrect and incomplete.
> I made the check match the logic in lazy_scan_heap.
>
> 3. Your 1.0 -> 1.1 upgrade script was missing copies of the REVOKE
> statements you added to the 1.1 script.  I added them.
>
> 4. The tests as written were not safe under concurrency; they could
> return spurious results if the page changed between the time you
> checked the visibility map and the time you actually examined the
> tuples.  I think people will try running these functions on live
> systems, so I changed the code to recheck the VM bits after locking
> the page.  Unfortunately, there's either still a concurrency-related
> problem here or there's a bug in the all-frozen code itself because I
> once managed to get pg_check_frozen('pgbench_accounts') to return a
> TID while pgbench was running concurrently.  That's a bit alarming,
> but since I can't reproduce it I don't really have a clue how to track
> down the problem.
>
> 5. I made various cosmetic improvements.
>
> If there are not objections, I will go ahead and commit this tomorrow,
> because even if there is a bug (see point #4 above) I think it's
> better to have this in the tree than not.  However, code review and/or
> testing with these new functions seems like it would be an extremely
> good idea.
>

Thank you for working on this.
Here are some minor comments.

---
+/*
+ * Return the TIDs of not-all-visible tuples in pages marked all-visible

If there is even one non-visible tuple in pages marked all-visible,
the database might be corrupted.
Is it better "not-visible" or "non-visible" instead of "not-all-visible"?
---
Do we need to check page header flag?
I think that database also might be corrupt in case where there is
non-visible tuple in page set PD_ALL_VISIBLE.
We could emit the WARNING log in such case.

Also, using attached tool which allows us to set spurious visibility
map status without actual modifying the tuple , I manually made the
some situations where database is corrupted and tested it, but ISTM
that this tool works fine.
It doesn't mean proposing as a new feature of course, but please use
it as appropriate.

Regards,

--
Masahiko Sawada


corrupted_test.sql
Description: Binary data


set_spurious_vm_status.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-10 Thread Amit Kapila
On Thu, Jun 9, 2016 at 9:41 PM, Robert Haas  wrote:
>
>
>
> 2. The all-visible checks seemed to me to be incorrect and incomplete.
> I made the check match the logic in lazy_scan_heap.
>

Okay, I thought we just want to check for dead-tuples.  If we want the
logic similar to lazy_scan_heap(), then I think we should also consider
applying snapshot old threshold limit to oldestxmin.   We currently do that
in vacuum_set_xid_limits() for Vacuum.  Is there a reason for not
considering it for visibility check function?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Reviewing freeze map code

2016-06-10 Thread Andres Freund
On 2016-06-10 11:58:26 +0530, Amit Kapila wrote:
> I have tried in multiple ways by running pgbench with read-write tests, but
> could not see any such behaviour.

It took over an hour of pgbench on a fast laptop till I saw it.


> I have tried by even crashing and
> restarting the server and then again running pgbench.  Do you see these
> records on master or slave?

Master, but with an existing standby. So it could be related to
hot_standby_feedback or such.


> While looking at code in this area, I observed that during replay of
> records (heap_xlog_delete), we first clear the vm, then update the page.
> So we don't have Buffer lock while updating the vm where as in the patch
> (collect_corrupt_items()), we are relying on the fact that for clearing vm
> bit one needs to acquire buffer lock.  Can that cause a problem?

Unsetting a vm bit is always safe, right?  The invariant is that the VM
may never falsely say all_visible/frozen, but it's perfectly ok for a
page to be all_visible/frozen, without the VM bit being present.


Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-10 Thread Amit Kapila
On Fri, Jun 10, 2016 at 8:27 AM, Andres Freund  wrote:

>
>
> On June 9, 2016 7:46:06 PM PDT, Amit Kapila 
> wrote:
> >On Fri, Jun 10, 2016 at 8:08 AM, Andres Freund 
> >wrote:
> >
> >> On 2016-06-09 19:33:52 -0700, Andres Freund wrote:
> >> > I played with it for a while, and besides
> >> > finding intentionally caused corruption, it didn't flag anything
> >> > (besides crashing on a standby, as in 2)).
> >>
> >> Ugh. Just sends after I sent that email:
> >>
> >>oid|t_ctid
> >> --+--
> >>  pgbench_accounts | (889641,33)
> >>  pgbench_accounts | (893854,56)
> >>  pgbench_accounts | (924226,13)
> >>  pgbench_accounts | (1073457,51)
> >>  pgbench_accounts | (1084904,16)
> >>  pgbench_accounts | (996,26)
> >> (6 rows)
> >>
> >>  oid | t_ctid
> >> -+
> >> (0 rows)
> >>
> >>oid|t_ctid
> >> --+--
> >>  pgbench_accounts | (739198,13)
> >>  pgbench_accounts | (887254,11)
> >>  pgbench_accounts | (1050391,6)
> >>  pgbench_accounts | (1158640,46)
> >>  pgbench_accounts | (1238067,18)
> >>  pgbench_accounts | (1273282,22)
> >>  pgbench_accounts | (1355816,54)
> >>  pgbench_accounts | (1361880,33)
> >> (8 rows)
> >>
> >>
> >Is this output of pg_check_visible()  or pg_check_frozen()?
>
> Unfortunately I don't know. I was running a union of both, I didn't really
> expect to hit an issue... I guess I'll put a PANIC in the relevant places
> and check whether I cab reproduce.
>
>

I have tried in multiple ways by running pgbench with read-write tests, but
could not see any such behaviour.  I have tried by even crashing and
restarting the server and then again running pgbench.  Do you see these
records on master or slave?

While looking at code in this area, I observed that during replay of
records (heap_xlog_delete), we first clear the vm, then update the page.
So we don't have Buffer lock while updating the vm where as in the patch
(collect_corrupt_items()), we are relying on the fact that for clearing vm
bit one needs to acquire buffer lock.  Can that cause a problem?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Reviewing freeze map code

2016-06-09 Thread Andres Freund


On June 9, 2016 7:46:06 PM PDT, Amit Kapila  wrote:
>On Fri, Jun 10, 2016 at 8:08 AM, Andres Freund 
>wrote:
>
>> On 2016-06-09 19:33:52 -0700, Andres Freund wrote:
>> > I played with it for a while, and besides
>> > finding intentionally caused corruption, it didn't flag anything
>> > (besides crashing on a standby, as in 2)).
>>
>> Ugh. Just sends after I sent that email:
>>
>>oid|t_ctid
>> --+--
>>  pgbench_accounts | (889641,33)
>>  pgbench_accounts | (893854,56)
>>  pgbench_accounts | (924226,13)
>>  pgbench_accounts | (1073457,51)
>>  pgbench_accounts | (1084904,16)
>>  pgbench_accounts | (996,26)
>> (6 rows)
>>
>>  oid | t_ctid
>> -+
>> (0 rows)
>>
>>oid|t_ctid
>> --+--
>>  pgbench_accounts | (739198,13)
>>  pgbench_accounts | (887254,11)
>>  pgbench_accounts | (1050391,6)
>>  pgbench_accounts | (1158640,46)
>>  pgbench_accounts | (1238067,18)
>>  pgbench_accounts | (1273282,22)
>>  pgbench_accounts | (1355816,54)
>>  pgbench_accounts | (1361880,33)
>> (8 rows)
>>
>>
>Is this output of pg_check_visible()  or pg_check_frozen()?

Unfortunately I don't know. I was running a union of both, I didn't really 
expect to hit an issue... I guess I'll put a PANIC in the relevant places and 
check whether I cab reproduce. 

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-09 Thread Amit Kapila
On Fri, Jun 10, 2016 at 8:08 AM, Andres Freund  wrote:

> On 2016-06-09 19:33:52 -0700, Andres Freund wrote:
> > I played with it for a while, and besides
> > finding intentionally caused corruption, it didn't flag anything
> > (besides crashing on a standby, as in 2)).
>
> Ugh. Just sends after I sent that email:
>
>oid|t_ctid
> --+--
>  pgbench_accounts | (889641,33)
>  pgbench_accounts | (893854,56)
>  pgbench_accounts | (924226,13)
>  pgbench_accounts | (1073457,51)
>  pgbench_accounts | (1084904,16)
>  pgbench_accounts | (996,26)
> (6 rows)
>
>  oid | t_ctid
> -+
> (0 rows)
>
>oid|t_ctid
> --+--
>  pgbench_accounts | (739198,13)
>  pgbench_accounts | (887254,11)
>  pgbench_accounts | (1050391,6)
>  pgbench_accounts | (1158640,46)
>  pgbench_accounts | (1238067,18)
>  pgbench_accounts | (1273282,22)
>  pgbench_accounts | (1355816,54)
>  pgbench_accounts | (1361880,33)
> (8 rows)
>
>
Is this output of pg_check_visible()  or pg_check_frozen()?



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Reviewing freeze map code

2016-06-09 Thread Andres Freund
On 2016-06-09 19:33:52 -0700, Andres Freund wrote:
> I played with it for a while, and besides
> finding intentionally caused corruption, it didn't flag anything
> (besides crashing on a standby, as in 2)).

Ugh. Just sends after I sent that email:

   oid|t_ctid
--+--
 pgbench_accounts | (889641,33)
 pgbench_accounts | (893854,56)
 pgbench_accounts | (924226,13)
 pgbench_accounts | (1073457,51)
 pgbench_accounts | (1084904,16)
 pgbench_accounts | (996,26)
(6 rows)

 oid | t_ctid 
-+
(0 rows)

   oid|t_ctid
--+--
 pgbench_accounts | (739198,13)
 pgbench_accounts | (887254,11)
 pgbench_accounts | (1050391,6)
 pgbench_accounts | (1158640,46)
 pgbench_accounts | (1238067,18)
 pgbench_accounts | (1273282,22)
 pgbench_accounts | (1355816,54)
 pgbench_accounts | (1361880,33)
(8 rows)

 oid | t_ctid 
-+
(0 rows)

Seems to be correlated with a concurrent vacuum, but it's hard to tell,
because I didn't have psql output a timestamp.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-09 Thread Andres Freund
Hi,


I found two, relatively minor, issues.

1) I think we should perform a relkind check in
   collect_corrupt_items(). Atm we'll "gladly" run against an index. If
   we actually entered the main portion of the loop in
   collect_corrupt_items(), that could end up corrupting the table (via
   HeapTupleSatisfiesVacuum()). But it's probably safe, because the vm
   fork doesn't exist for anything but heap/toast relations.

2) GetOldestXmin() currently specifies a relation, which can cause
   trouble in recovery:

/*
 * If we're not computing a relation specific limit, or if a shared
 * relation has been passed in, backends in all databases have to be
 * considered.
 */
allDbs = rel == NULL || rel->rd_rel->relisshared;

/* Cannot look for individual databases during recovery */
Assert(allDbs || !RecoveryInProgress());

  I think that needs to be fixed.

3) Harmless here, but I think it's bad policy to release locks
   on normal relations before the end of xact.
+   relation_close(rel, AccessShareLock);
+
  

   i.e. we'll Assert out.

4) 
+   if (check_visible)
+   {
+   HTSV_Result state;
+
+   state = HeapTupleSatisfiesVacuum(, 
OldestXmin, buffer);
+   if (state != HEAPTUPLE_LIVE ||
+   
!HeapTupleHeaderXminCommitted(tuple.t_data))
+   record_corrupt_item(items, 
_data->t_ctid);
+   else

This theoretically could give false positives, if GetOldestXmin() went
backwards. But I think that's ok.

5) There's a bunch of whitespace damage in the diff, like
Oid relid = PG_GETARG_OID(0);
-   MemoryContext   oldcontext;
+   MemoryContext oldcontext;


Otherwise this looks good. I played with it for a while, and besides
finding intentionally caused corruption, it didn't flag anything
(besides crashing on a standby, as in 2)).

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-09 Thread Andres Freund
Hi Robert, Amit,

thanks for working on this.


On 2016-06-09 12:11:15 -0400, Robert Haas wrote:
> 4. The tests as written were not safe under concurrency; they could
> return spurious results if the page changed between the time you
> checked the visibility map and the time you actually examined the
> tuples.  I think people will try running these functions on live
> systems, so I changed the code to recheck the VM bits after locking
> the page.  Unfortunately, there's either still a concurrency-related
> problem here or there's a bug in the all-frozen code itself because I
> once managed to get pg_check_frozen('pgbench_accounts') to return a
> TID while pgbench was running concurrently.  That's a bit alarming,
> but since I can't reproduce it I don't really have a clue how to track
> down the problem.

Ugh, that's a bit concerning.


> If there are not objections, I will go ahead and commit this tomorrow,
> because even if there is a bug (see point #4 above) I think it's
> better to have this in the tree than not.  However, code review and/or
> testing with these new functions seems like it would be an extremely
> good idea.

I'll try to spend some time on that today (code review & testing).


Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-09 Thread Robert Haas
On Thu, Jun 9, 2016 at 5:48 AM, Amit Kapila  wrote:
> Attached patch implements the above 2 functions.  I have addressed the
> comments by Sawada San and you in latest patch and updated the documentation
> as well.

I made a number of changes to this patch.  Here is the new version.

1. The algorithm you were using for growing the array size is unsafe
and can easily overrun the array.  Suppose that each of the first two
pages have some corrupt tuples, more than 50% of MaxHeapTuplesPerPage
but less than the full value of MaxTuplesPerPage.  Your code will
conclude that the array does need to be enlarged after processing the
first page.  I switched this to what I consider the normal coding
pattern for such problems.

2. The all-visible checks seemed to me to be incorrect and incomplete.
I made the check match the logic in lazy_scan_heap.

3. Your 1.0 -> 1.1 upgrade script was missing copies of the REVOKE
statements you added to the 1.1 script.  I added them.

4. The tests as written were not safe under concurrency; they could
return spurious results if the page changed between the time you
checked the visibility map and the time you actually examined the
tuples.  I think people will try running these functions on live
systems, so I changed the code to recheck the VM bits after locking
the page.  Unfortunately, there's either still a concurrency-related
problem here or there's a bug in the all-frozen code itself because I
once managed to get pg_check_frozen('pgbench_accounts') to return a
TID while pgbench was running concurrently.  That's a bit alarming,
but since I can't reproduce it I don't really have a clue how to track
down the problem.

5. I made various cosmetic improvements.

If there are not objections, I will go ahead and commit this tomorrow,
because even if there is a bug (see point #4 above) I think it's
better to have this in the tree than not.  However, code review and/or
testing with these new functions seems like it would be an extremely
good idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


check-visibility-v3.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-09 Thread Amit Kapila
On Thu, Jun 9, 2016 at 8:48 AM, Amit Kapila  wrote:
>
> On Wed, Jun 8, 2016 at 6:31 PM, Robert Haas  wrote:
> >
> >
> > Here's my proposal:
> >
> > 1. You already implemented a function to find non-frozen tuples on
> > supposedly all-frozen pages.  Great.
> >
> > 2. Let's implement a second function to find dead tuples on supposedly
> > all-visible pages.
> >
>
> I am planning to name them as pg_check_frozen and pg_check_visible, let
me know if you something else suits better?
>

Attached patch implements the above 2 functions.  I have addressed the
comments by Sawada San and you in latest patch and updated the
documentation as well.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pg_check_visibility_func_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-08 Thread Amit Kapila
On Wed, Jun 8, 2016 at 6:31 PM, Robert Haas  wrote:
>
>
> Here's my proposal:
>
> 1. You already implemented a function to find non-frozen tuples on
> supposedly all-frozen pages.  Great.
>
> 2. Let's implement a second function to find dead tuples on supposedly
> all-visible pages.
>

I am planning to name them as pg_check_frozen and pg_check_visible, let me
know if you something else suits better?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Reviewing freeze map code

2016-06-08 Thread Amit Kapila
On Wed, Jun 8, 2016 at 6:31 PM, Robert Haas  wrote:
>
> On Wed, Jun 8, 2016 at 4:01 AM, Amit Kapila 
wrote:
> > If we want to address both page level and tuple level inconsistencies, I
> > could see below possibility.
> >
> > 1. An API that returns setof records containing a block that have
> > inconsistent vm bit, a block where visible page contains dead tuples
and a
> > block where vm bit indicates frozen, but page contains non-frozen
tuples.
> > Three separate block numbers are required in record to distinguish the
> > problem with block.
> >
> > Signature of API will be something like:
> > pg_check_visibility_blocks(regclass, corrupt_vm_blkno OUT bigint,
> > corrupt_dead_blkno OUT bigint, corrupt_frozen_blkno OUT boolean) RETURNS
> > SETOF record
>
> I don't understand this,

This new API was to address Andres's concern of checking block level
inconsistency as we do in lazy_scan_heap.  It returns set of inconsistent
blocks.

>
>   The function that just returned non-frozen TIDs on
> supposedly-frozen pages was simple.  Now we're trying to redesign this
> into a general-purpose integrity checker on the eve of beta2, and I
> think that's a bad idea.  We don't have time to figure that out, get
> consensus on it, and do it well, and I don't want to be stuck
> supporting something half-baked from now until eternity.  Let's scale
> back our goals here to something that can realistically be done well
> in the time available.
>
> Here's my proposal:
>
> 1. You already implemented a function to find non-frozen tuples on
> supposedly all-frozen pages.  Great.
>
> 2. Let's implement a second function to find dead tuples on supposedly
> all-visible pages.
>
> 3. And then let's call it good.
>

Your proposal sounds good, will send an updated patch, if there are no
further concerns.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Reviewing freeze map code

2016-06-08 Thread Robert Haas
On Wed, Jun 8, 2016 at 4:01 AM, Amit Kapila  wrote:
> If we want to address both page level and tuple level inconsistencies, I
> could see below possibility.
>
> 1. An API that returns setof records containing a block that have
> inconsistent vm bit, a block where visible page contains dead tuples and a
> block where vm bit indicates frozen, but page contains non-frozen tuples.
> Three separate block numbers are required in record to distinguish the
> problem with block.
>
> Signature of API will be something like:
> pg_check_visibility_blocks(regclass, corrupt_vm_blkno OUT bigint,
> corrupt_dead_blkno OUT bigint, corrupt_frozen_blkno OUT boolean) RETURNS
> SETOF record

I don't understand this, and I think we're making this too
complicated.  The function that just returned non-frozen TIDs on
supposedly-frozen pages was simple.  Now we're trying to redesign this
into a general-purpose integrity checker on the eve of beta2, and I
think that's a bad idea.  We don't have time to figure that out, get
consensus on it, and do it well, and I don't want to be stuck
supporting something half-baked from now until eternity.  Let's scale
back our goals here to something that can realistically be done well
in the time available.

Here's my proposal:

1. You already implemented a function to find non-frozen tuples on
supposedly all-frozen pages.  Great.

2. Let's implement a second function to find dead tuples on supposedly
all-visible pages.

3. And then let's call it good.

If we start getting into the game of "well, that's not enough because
you can also check for X", that's an infinite treadmill.  There will
always be more things we can check.  But that's the project of
building an integrity checker, which while worthwhile, is out of scope
for 9.6.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-08 Thread Amit Kapila
On Wed, Jun 8, 2016 at 11:39 AM, Andres Freund  wrote:
>
> On 2016-06-08 10:04:56 +0530, Amit Kapila wrote:
> > On Tue, Jun 7, 2016 at 11:01 PM, Andres Freund 
wrote:>
> > > I think if we go with the pg_check_visibility approach, we should also
> > > copy the other consistency checks from vacuumlazy.c, given they can't
> > > easily be triggered.
> >
> > Are you referring to checks that are done in lazy_scan_heap() for each
> > block?
>
> Yes.
>
>
> > I think the meaning full checks in this context could be (a) page
> > is marked as visible, but corresponding vm is not marked. (b) page is
> > marked as all visible and has dead tuples. (c) vm bit indicates frozen,
but
> > page contains non-frozen tuples.
>
> Yes.
>

If we want to address both page level and tuple level inconsistencies, I
could see below possibility.

1. An API that returns setof records containing a block that have
inconsistent vm bit, a block where visible page contains dead tuples and a
block where vm bit indicates frozen, but page contains non-frozen tuples.
Three separate block numbers are required in record to distinguish the
problem with block.

Signature of API will be something like:
pg_check_visibility_blocks(regclass, corrupt_vm_blkno OUT bigint,
corrupt_dead_blkno OUT bigint, corrupt_frozen_blkno OUT boolean) RETURNS
SETOF record

2. An API that provides information of non-frozen tuples on a frozen page

Signature of API:
CREATE FUNCTION pg_check_visibility_tuples(regclass, t_ctid OUT tid)
RETURNS SETOF tid

This is same as what is present in current patch [1].
In this, user can use first API to find corrupt blocks if any and if
further information is required, second API can be used.

Does that address your concern?  If you, Robert and others are okay with
above idea, then I will send an update patch.



[1] -
https://www.postgresql.org/message-id/CAA4eK1JHz%3DOB4Ya%2B_1dMRqgxrKCt4LxiSyukgm3ZzuxF2ONqGA%40mail.gmail.com


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Reviewing freeze map code

2016-06-08 Thread Masahiko Sawada
On Wed, Jun 8, 2016 at 12:19 PM, Amit Kapila  wrote:
> On Tue, Jun 7, 2016 at 10:10 PM, Masahiko Sawada 
> wrote:
>>
>> Thank you for implementing the patch.
>>
>> I've not test it deeply but here are some comments.
>> This check tool only checks if the frozen page has live-unfrozen tuple.
>> That is, it doesn't care in case where the all-frozen page mistakenly
>> has dead-frozen tuple.
>>
>
> Do you mean to say that we should have a check for ItemIdIsDead() and then
> if item is found to be dead, then add it to array of non_frozen items?

Yes.

>  If so, earlier I thought we might not need this check as we are already using
> heap_tuple_needs_eventual_freeze(),

You're right. Sorry, I had misunderstood.

> but now again looking at it, it seems
> wise to check for dead items separately as those won't be covered by other
> check.

Sounds good.

>>
>> +   /* Clean up. */
>> +   if (vmbuffer != InvalidBuffer)
>> +   ReleaseBuffer(vmbuffer);
>>
>> I think that we should use BufferIsValid() here.
>>
>
> We can use BufferIsValid() as well, but I am trying to be consistent with
> nearby code, refer collect_visibility_data().  We can change at all places
> together if people prefer that way.
>

In vacuumlazy.c we use it like BufferisValid(vmbuffer), so I think we
can replace all these thing to be more safety if there is not specific
reason.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-08 Thread Andres Freund
On 2016-06-08 10:04:56 +0530, Amit Kapila wrote:
> On Tue, Jun 7, 2016 at 11:01 PM, Andres Freund  wrote:>
> > I think if we go with the pg_check_visibility approach, we should also
> > copy the other consistency checks from vacuumlazy.c, given they can't
> > easily be triggered.
> 
> Are you referring to checks that are done in lazy_scan_heap() for each
> block?

Yes.


> I think the meaning full checks in this context could be (a) page
> is marked as visible, but corresponding vm is not marked. (b) page is
> marked as all visible and has dead tuples. (c) vm bit indicates frozen, but
> page contains non-frozen tuples.

Yes.


> I think right now the design of pg_visibility is such that it returns the
> required information at page level to user by means of various functions
> like pg_visibility, pg_visibility_map, etc.  If we want to add page level
> checks in this new routine as well, then we have to think what should be
> the output if such checks fails, shall we issue warning, shall we return
> information in some other way.

Right.


> Also, I think there will be some duplicity
> with the already provided information via other functions of this module.

Don't think that's a problem. One part of the functionality then is
returning the available information, the other is checking for problems
and only returning problematic blocks.


> > Wonder how we can report both block and tuple
> > level issues. Kinda inclined to report everything as a block level
> > issue?
> >
> 
> The way currently this module provides information, it seems better to have
> separate API's for block and tuple level inconsistency.  For block level, I
> think most of the information can be retrieved by existing API's and for
> tuple level, this new API can be used.

I personally think simplicity is more important than detail here; but
it's not that important.  If this reports a problem, you can look into
the nitty gritty using existing functions.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-07 Thread Amit Kapila
On Tue, Jun 7, 2016 at 11:01 PM, Andres Freund  wrote:>
> I think if we go with the pg_check_visibility approach, we should also
> copy the other consistency checks from vacuumlazy.c, given they can't
> easily be triggered.

Are you referring to checks that are done in lazy_scan_heap() for each
block?  I think the meaning full checks in this context could be (a) page
is marked as visible, but corresponding vm is not marked. (b) page is
marked as all visible and has dead tuples. (c) vm bit indicates frozen, but
page contains non-frozen tuples.

I think right now the design of pg_visibility is such that it returns the
required information at page level to user by means of various functions
like pg_visibility, pg_visibility_map, etc.  If we want to add page level
checks in this new routine as well, then we have to think what should be
the output if such checks fails, shall we issue warning, shall we return
information in some other way.  Also, I think there will be some duplicity
with the already provided information via other functions of this module.

>
> Wonder how we can report both block and tuple
> level issues. Kinda inclined to report everything as a block level
> issue?
>

The way currently this module provides information, it seems better to have
separate API's for block and tuple level inconsistency.  For block level, I
think most of the information can be retrieved by existing API's and for
tuple level, this new API can be used.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Reviewing freeze map code

2016-06-07 Thread Amit Kapila
On Wed, Jun 8, 2016 at 8:37 AM, Robert Haas  wrote:
>
> On Tue, Jun 7, 2016 at 10:19 AM, Amit Kapila 
wrote:
> > I have implemented the above function in attached patch.  Currently, it
> > returns SETOF tupleids, but if we want some variant of same, that should
> > also be possible.
>
> I think we'd want to bump the pg_visibility version to 1.1 and do the
> upgrade dance, since the existing thing was in beta1.
>

Okay, will do it in next version of patch.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Reviewing freeze map code

2016-06-07 Thread Amit Kapila
On Tue, Jun 7, 2016 at 10:10 PM, Masahiko Sawada 
wrote:
>
> Thank you for implementing the patch.
>
> I've not test it deeply but here are some comments.
> This check tool only checks if the frozen page has live-unfrozen tuple.
> That is, it doesn't care in case where the all-frozen page mistakenly
> has dead-frozen tuple.
>

Do you mean to say that we should have a check for ItemIdIsDead() and then
if item is found to be dead, then add it to array of non_frozen items?  If
so, earlier I thought we might not need this check as we are already using
heap_tuple_needs_eventual_freeze(), but now again looking at it, it seems
wise to check for dead items separately as those won't be covered by other
check.

>
> +   /* Clean up. */
> +   if (vmbuffer != InvalidBuffer)
> +   ReleaseBuffer(vmbuffer);
>
> I think that we should use BufferIsValid() here.
>

We can use BufferIsValid() as well, but I am trying to be consistent with
nearby code, refer collect_visibility_data().  We can change at all places
together if people prefer that way.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Reviewing freeze map code

2016-06-07 Thread Robert Haas
On Tue, Jun 7, 2016 at 10:19 AM, Amit Kapila  wrote:
> I have implemented the above function in attached patch.  Currently, it
> returns SETOF tupleids, but if we want some variant of same, that should
> also be possible.

I think we'd want to bump the pg_visibility version to 1.1 and do the
upgrade dance, since the existing thing was in beta1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-07 Thread Jim Nasby

On 6/6/16 3:57 PM, Peter Geoghegan wrote:

On Mon, Jun 6, 2016 at 11:35 AM, Andres Freund  wrote:

We need a read-only utility which checks that the system is in a correct
and valid state.  There are a few of those which have been built for
different pieces, I believe, and we really should have one for the
visibility map, but I don't think it makes sense to imply in any way
that VACUUM can or should be used for that.


Meh. This is vacuum behaviour that *has existed* up to this point. You
essentially removed it. Sure, I'm all for adding a verification
tool. But that's just pie in the skie at this point.  We have a complex,
data loss threatening feature, which just about nobody can verify at
this point. That's crazy.


FWIW, I agree with the general sentiment. Building a stress-testing
suite would have been a good idea. In general, testability is a design
goal that I'd be willing to give up other things for.


Related to that, I suspect it would be helpful if it was possible to 
test boundary cases in this kind of critical code by separating the 
logic from the underlying implementation. It becomes very hard to verify 
the system does the right thing in some of these scenarios, because it's 
so difficult to put the system into that state to begin with. Stuff that 
depends on burning through a large number of XIDs is an example of that. 
(To be clear, I'm talking about unit-test kind of stuff here, not 
validating an existing system.)

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-07 Thread Andres Freund
On 2016-06-07 19:49:59 +0530, Amit Kapila wrote:
> On Tue, Jun 7, 2016 at 2:52 AM, Robert Haas  wrote:
> >
> > On Mon, Jun 6, 2016 at 5:06 PM, Andres Freund  wrote:
> >
> >
> > > I'd also be ok with adding & documenting (beta release notes)
> > > CREATE EXTENSION pg_visibility;
> > > SELECT relname FROM pg_class WHERE relkind IN ('r', 'm') AND NOT
> pg_check_visibility(oid);
> > > or something olong those lines.
> >
> > That wouldn't be too useful as-written in my book, because it gives
> > you no detail on what exactly the problem was.  Maybe it could be
> > "pg_check_visibility(regclass) RETURNS SETOF tid", where the returned
> > TIDs are non-frozen TIDs on frozen pages.  Then I think something like
> > this would work:
> >
> > SELECT c.oid, pg_check_visibility(c.oid) FROM pg_class WHERE relkind
> > IN ('r', 't', 'm');
> >
> 
> I have implemented the above function in attached patch.  Currently, it
> returns SETOF tupleids, but if we want some variant of same, that should
> also be possible.

Cool!

I think if we go with the pg_check_visibility approach, we should also
copy the other consistency checks from vacuumlazy.c, given they can't
easily be triggered.  Wonder how we can report both block and tuple
level issues. Kinda inclined to report everything as a block level
issue?

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-07 Thread Masahiko Sawada
On Tue, Jun 7, 2016 at 11:19 PM, Amit Kapila  wrote:
> On Tue, Jun 7, 2016 at 2:52 AM, Robert Haas  wrote:
>>
>> On Mon, Jun 6, 2016 at 5:06 PM, Andres Freund  wrote:
>>
>>
>> > I'd also be ok with adding & documenting (beta release notes)
>> > CREATE EXTENSION pg_visibility;
>> > SELECT relname FROM pg_class WHERE relkind IN ('r', 'm') AND NOT
>> > pg_check_visibility(oid);
>> > or something olong those lines.
>>
>> That wouldn't be too useful as-written in my book, because it gives
>> you no detail on what exactly the problem was.  Maybe it could be
>> "pg_check_visibility(regclass) RETURNS SETOF tid", where the returned
>> TIDs are non-frozen TIDs on frozen pages.  Then I think something like
>> this would work:
>>
>> SELECT c.oid, pg_check_visibility(c.oid) FROM pg_class WHERE relkind
>> IN ('r', 't', 'm');
>>
>
> I have implemented the above function in attached patch.  Currently, it
> returns SETOF tupleids, but if we want some variant of same, that should
> also be possible.
>
>
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com

Thank you for implementing the patch.

I've not test it deeply but here are some comments.
This check tool only checks if the frozen page has live-unfrozen tuple.
That is, it doesn't care in case where the all-frozen page mistakenly
has dead-frozen tuple.
I think this tool should check such case, otherwise the function name
would need to be changed.

+   /* Clean up. */
+   if (vmbuffer != InvalidBuffer)
+   ReleaseBuffer(vmbuffer);

I think that we should use BufferIsValid() here.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-07 Thread Amit Kapila
On Tue, Jun 7, 2016 at 2:52 AM, Robert Haas  wrote:
>
> On Mon, Jun 6, 2016 at 5:06 PM, Andres Freund  wrote:
>
>
> > I'd also be ok with adding & documenting (beta release notes)
> > CREATE EXTENSION pg_visibility;
> > SELECT relname FROM pg_class WHERE relkind IN ('r', 'm') AND NOT
pg_check_visibility(oid);
> > or something olong those lines.
>
> That wouldn't be too useful as-written in my book, because it gives
> you no detail on what exactly the problem was.  Maybe it could be
> "pg_check_visibility(regclass) RETURNS SETOF tid", where the returned
> TIDs are non-frozen TIDs on frozen pages.  Then I think something like
> this would work:
>
> SELECT c.oid, pg_check_visibility(c.oid) FROM pg_class WHERE relkind
> IN ('r', 't', 'm');
>

I have implemented the above function in attached patch.  Currently, it
returns SETOF tupleids, but if we want some variant of same, that should
also be possible.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pg_check_visibility_func_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-06 Thread Andres Freund
Hi,

On 2016-06-06 17:22:38 -0400, Robert Haas wrote:
> > I'd also be ok with adding & documenting (beta release notes)
> > CREATE EXTENSION pg_visibility;
> > SELECT relname FROM pg_class WHERE relkind IN ('r', 'm') AND NOT 
> > pg_check_visibility(oid);
> > or something olong those lines.
> 
> That wouldn't be too useful as-written in my book, because it gives
> you no detail on what exactly the problem was.

True. I don't think that's a big issue though, because we'd likely want
a lot more detail after a report anyway; to analyze things properly.

> Maybe it could be
> "pg_check_visibility(regclass) RETURNS SETOF tid", where the returned
> TIDs are non-frozen TIDs on frozen pages.  Then I think something like
> this would work:
> 
> SELECT c.oid, pg_check_visibility(c.oid) FROM pg_class WHERE relkind
> IN ('r', 't', 'm');
> 
> If you get any rows back, you've got trouble.

That'd work too; with the slight danger of returning way too much data.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-06 Thread Andres Freund
Hi,


On 2016-06-06 17:00:19 -0400, Robert Haas wrote:
> 1. I think it is pretty misleading to say that those checks aren't
> reachable any more.  It's not like we freeze every page when we mark
> it all-visible.

True. What I mean is that you can't force the checks (and some that I
think should be added) to occur anymore. Once a page is frozen it'll be
kinda hard to predict whether vacuum touches it (due to the skip logic).


> 2. With the new pg_visibility extension, you can actually check the
> same thing that first warning checks like this:
> 
> select * from pg_visibility('t1'::regclass) where all_visible and not
> pd_all_visible;

Right, but not the second.


> IMHO, that's a substantial improvement over running VACUUM and
> checking whether it spits out a WARNING.

I think it's a mixed bag. I do think that WARNINGS are a lot easier to
understand for a casual user/tester; rather than having to write/copy
queries which return results where you don't know what the expected
result is.  I agree that it's better to have that in a non-modifying way
- although I'm afraid atm it's not really possible to do a
HeapTupleSatisfies* without modifications :(.


> 3. If you think there are analogous checks that I should add for the
> frozen case, or that you want to add yourself, please say what they
> are specifically.  I *did* think about it when I wrote that code and I
> didn't see how to make it work.  If I had, I would have added them.
> The whole point of review here is, hopefully, to illuminate what
> should have been done differently - if I'd known how to do it better,
> I would have done so.  Provide an idea, or better yet, provide a
> patch.  If you see how to do it, coding it up shouldn't be the hard
> part.

I think it's pretty important (and not hard) to add a check for
(all_frozen_according_to_vm && has_unfrozen_tuples). Checking for
VM_ALL_FROZEN && !VM_ALL_VISIBLE looks worthwhile as well, especially as
we could check that always, without a measurable overhead.  But the
former primarily makes sense if we have a way to force the check to
occur in a way that's not dependant on the state of neighbouring pages.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-06 Thread Robert Haas
On Mon, Jun 6, 2016 at 5:06 PM, Andres Freund  wrote:
> On 2016-06-06 16:41:19 -0400, Robert Haas wrote:
>> I really don't understand how you can not weigh in on the original
>> thread leading up to my mid-March commits and say "hey, this needs a
>> better testing tool", and then when you finally get around to
>> reviewing it in May, I'm supposed to drop everything and write one
>> immediately.
>
> Meh. Asking you to "drop everything" and starting to push a month later
> are very different things. The reason I'm pushing is because this atm
> seems likely to slip enough that we'll decide "can't do this for
> 9.6". And I think that'd be seriously bad.

To be clear, I'm not objecting to you pushing on this.  I just think
your tone sounds a bit, uh, antagonized.

>> Why do you get two months from the time of commit to weigh in but I
>> get no time to respond?
>
> Really? You've started to apply pressure to fix things days after
> they've been discovered. It's been a month.

Yes, it would have been nice if I had gotten to this one sooner.  But
it's not like you said "hey, hurry up" before I started working on it.
You waited until I did start working on it and *then* complained that
I didn't get to it sooner.  I cannot rewind time.

>> For my part, I thought I *had*
>> written a testing tool - that's what pg_visibility is and that's what
>> I used to test the feature before committing it.
>
> I think looking only at page level data, and not at row level data is is
> insufficient. And I think we need to make $tool output the data in a way
> that only returns data if things are wrong (that can be a pre-canned
> query).

OK.  I didn't think that was necessary, but it sure can't hurt.

>> I know there's a patch.  Both Tom and I are skeptical about whether it
>> adds value, and I really don't think you've spelled out in as much
>> detail why you think it will help as I have why I think it won't.
>
> The primary reason I think it'll help because it allows users/testers to
> run a simple one-line command (VACUUM (scan_all);)in their database, and
> they'll get a clear "WARNING: XXX is bad" message if something's broken,
> and nothing if things are ok.  Vacuum isn't a bad place for that,
> because it'll be the place that removes dead item pointers and such if
> things were wrongly labeled; and because we historically have emitted
> warnings from there.   The more complex stuff we ask testers to run, the
> less likely it is that they'll actually do that.

OK, now I understand.  Let's see if there is general agreement on this
and then we can decide how to proceed.  I think the main danger here
is that people will think that this option is more useful than it
really is and start using it in all kinds of cases where it isn't
really necessary in the hopes that it will fix problems it really
can't fix.  I think we need to write the documentation in such a way
as to be deeply discouraging to people who might otherwise be prone to
unwarranted optimism.  Otherwise, 5 years from now, we're going to be
fielding complaints from people who are unhappy that there's no way to
make autovacuum run with (even_frozen_pages true).

> I'd also be ok with adding & documenting (beta release notes)
> CREATE EXTENSION pg_visibility;
> SELECT relname FROM pg_class WHERE relkind IN ('r', 'm') AND NOT 
> pg_check_visibility(oid);
> or something olong those lines.

That wouldn't be too useful as-written in my book, because it gives
you no detail on what exactly the problem was.  Maybe it could be
"pg_check_visibility(regclass) RETURNS SETOF tid", where the returned
TIDs are non-frozen TIDs on frozen pages.  Then I think something like
this would work:

SELECT c.oid, pg_check_visibility(c.oid) FROM pg_class WHERE relkind
IN ('r', 't', 'm');

If you get any rows back, you've got trouble.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-06 Thread Andres Freund
On 2016-06-06 16:41:19 -0400, Robert Haas wrote:
> I really don't understand how you can not weigh in on the original
> thread leading up to my mid-March commits and say "hey, this needs a
> better testing tool", and then when you finally get around to
> reviewing it in May, I'm supposed to drop everything and write one
> immediately.

Meh. Asking you to "drop everything" and starting to push a month later
are very different things. The reason I'm pushing is because this atm
seems likely to slip enough that we'll decide "can't do this for
9.6". And I think that'd be seriously bad.


> Why do you get two months from the time of commit to weigh in but I
> get no time to respond?

Really? You've started to apply pressure to fix things days after
they've been discovered. It's been a month.


> For my part, I thought I *had*
> written a testing tool - that's what pg_visibility is and that's what
> I used to test the feature before committing it.

I think looking only at page level data, and not at row level data is is
insufficient. And I think we need to make $tool output the data in a way
that only returns data if things are wrong (that can be a pre-canned
query).


> Now, you think that's not good enough, and I respect your opinion, but
> it's not as if you said this back when this was being committed.  Or
> at least if you did, I don't remember it.

I think I mentioned testing ages ago, but not around the commit, no. I
kind of had assumed that it was there.  I don't think that's really
relevant though. Backend flushing was discussed and benchmarked over
months as well; and while I don't agree with your, conclusion it's
absolutely sane of you to push for changing the default on that; even if
you didn't immediately push back.


> I know there's a patch.  Both Tom and I are skeptical about whether it
> adds value, and I really don't think you've spelled out in as much
> detail why you think it will help as I have why I think it won't.

The primary reason I think it'll help because it allows users/testers to
run a simple one-line command (VACUUM (scan_all);)in their database, and
they'll get a clear "WARNING: XXX is bad" message if something's broken,
and nothing if things are ok.  Vacuum isn't a bad place for that,
because it'll be the place that removes dead item pointers and such if
things were wrongly labeled; and because we historically have emitted
warnings from there.   The more complex stuff we ask testers to run, the
less likely it is that they'll actually do that.

I'd also be ok with adding & documenting (beta release notes)
CREATE EXTENSION pg_visibility;
SELECT relname FROM pg_class WHERE relkind IN ('r', 'm') AND NOT 
pg_check_visibility(oid);
or something olong those lines.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-06 Thread Robert Haas
On Mon, Jun 6, 2016 at 4:27 PM, Andres Freund  wrote:
> On 2016-06-06 16:18:19 -0400, Stephen Frost wrote:
>> That could be as simple as a query with the right things installed, or
>> it might be an independent tool, but not having any way to check isn't
>> good.  That said, trying to make VACUUM do that doesn't make sense to me
>> either.
>
> The point is that VACUUM *has* these types of checks. And had so for
> many years:
> else if (all_visible_according_to_vm && 
> !PageIsAllVisible(page)
>  && VM_ALL_VISIBLE(onerel, blkno, ))
> {
> elog(WARNING, "page is not marked all-visible but 
> visibility map bit is set in relation \"%s\" page %u",
>  relname, blkno);
> visibilitymap_clear(onerel, blkno, vmbuffer);
> }
> ...
> else if (PageIsAllVisible(page) && has_dead_tuples)
> {
> elog(WARNING, "page containing dead tuples is marked 
> as all-visible in relation \"%s\" page %u",
>  relname, blkno);
> PageClearAllVisible(page);
> MarkBufferDirty(buf);
> visibilitymap_clear(onerel, blkno, vmbuffer);
> }
>
> the point is that, after the introduction of the freeze bit, there's no
> way to reach them anymore (and they're missing a useful extension of
> these warnings, but ...); these warnings have caught bugs.  I don't
> think it'd advocate for the vacuum option otherwise.

So a couple of things:

1. I think it is pretty misleading to say that those checks aren't
reachable any more.  It's not like we freeze every page when we mark
it all-visible.  In most cases, I think that what will happen is that
the page will be marked all-visible and then, because it is
all-visible, skipped by subsequent vacuums, so that it doesn't get
marked all-frozen until a few hundred million transactions later.  Of
course there will be some cases when a page gets marked all-visible
and all-frozen at the same time, but I don't see why we should expect
that to be the norm.

2. With the new pg_visibility extension, you can actually check the
same thing that first warning checks like this:

select * from pg_visibility('t1'::regclass) where all_visible and not
pd_all_visible;

IMHO, that's a substantial improvement over running VACUUM and
checking whether it spits out a WARNING.

The second one, you can't currently trigger for all-frozen pages.  The
query I just sent in my other email could perhaps be adapted to that
purpose, but maybe this is a good-enough reason to add VACUUM
(even_frozen_pages).

3. If you think there are analogous checks that I should add for the
frozen case, or that you want to add yourself, please say what they
are specifically.  I *did* think about it when I wrote that code and I
didn't see how to make it work.  If I had, I would have added them.
The whole point of review here is, hopefully, to illuminate what
should have been done differently - if I'd known how to do it better,
I would have done so.  Provide an idea, or better yet, provide a
patch.  If you see how to do it, coding it up shouldn't be the hard
part.

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-06 Thread Peter Geoghegan
On Mon, Jun 6, 2016 at 11:35 AM, Andres Freund  wrote:
>> We need a read-only utility which checks that the system is in a correct
>> and valid state.  There are a few of those which have been built for
>> different pieces, I believe, and we really should have one for the
>> visibility map, but I don't think it makes sense to imply in any way
>> that VACUUM can or should be used for that.
>
> Meh. This is vacuum behaviour that *has existed* up to this point. You
> essentially removed it. Sure, I'm all for adding a verification
> tool. But that's just pie in the skie at this point.  We have a complex,
> data loss threatening feature, which just about nobody can verify at
> this point. That's crazy.

FWIW, I agree with the general sentiment. Building a stress-testing
suite would have been a good idea. In general, testability is a design
goal that I'd be willing to give up other things for.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-06 Thread Robert Haas
On Mon, Jun 6, 2016 at 2:35 PM, Andres Freund  wrote:
>> > Why would they have to write the complex query?  Wouldn't they just
>> > need to run that we wrote for them?
>
> Then write that query. Verify that that query performs halfway
> reasonably fast. Document that it should be run against databases after
> subjecting them to tests. That'd address my concern as well.

Here is a first attempt at such a query.  It requires that the
pageinspect and pg_visibility extensions be installed.

SELECT c.oid, v.blkno, array_agg(hpi.lp) AS affect_lps FROM pg_class
c, LATERAL ROWS FROM (pg_visibility(c.oid)) v, LATERAL ROWS FROM
(heap_page_items(get_raw_page(c.oid::regclass::text, blkno::int4)))
hpi WHERE c.relkind IN ('r', 't', 'm') AND v.all_frozen AND
(((hpi.t_infomask & 768) != 768 AND hpi.t_xmin NOT IN (1, 2)) OR
(hpi.t_infomask & 2048) != 2048) GROUP BY 1, 2 ORDER BY 1, 2;

I am not sure this is 100% correct, especially the XMAX-checking part:
is HEAP_XMAX_INVALID guaranteed to be set on a fully-frozen tuple?  Is
the method of constructing the first argument to get_raw_page() going
to be robust in all cases?

I'm not sure what the performance will be on a large table, either.
That will have to be checked.  And I obviously have not done extensive
stress runs yet.  But maybe it's a start.  Comments?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-06 Thread Robert Haas
On Mon, Jun 6, 2016 at 4:06 PM, Andres Freund  wrote:
>> I do appreciate you reviewing this code, very much, and genuinely, and
>> it would be great if more people wanted to review it.
>
>> But this kind of reads like you think that I'm being a jerk, which I'm
>> trying pretty hard not to be
>
> I don't think you're a jerk. But I am loosing a good bit of my patience
> here. I've posted these issues a month ago, and for a long while the
> only thing that happened was bikeshedding about the name of something
> that wasn't even decided to happen yet (obviously said bikeshedding
> isn't your fault).

No, the bikeshedding is not my fault.

As for the timing, you posted your first comments exactly a week
before beta1 when I was still busy addressing issues that were
reported before you reported yours, and I did not think it was
realistic to get them addressed in the time available.  If you'd sent
them two weeks sooner, I would probably have done so.  Now, it's been
four weeks since beta1 wrapped, one of which was PGCon.  As far as I
understand at this point in time, your review identified exactly zero
potential data loss bugs.  (We thought there was one, but it looks
like there isn't.)  All of the non-critical defects you identified
have now been fixed, apart from the lack of a better testing tool.
And since there is ongoing discussion (call it bikeshedding if you
want) about what would actually help in that area, I really don't feel
like anything very awful is happening here.

I really don't understand how you can not weigh in on the original
thread leading up to my mid-March commits and say "hey, this needs a
better testing tool", and then when you finally get around to
reviewing it in May, I'm supposed to drop everything and write one
immediately.  Why do you get two months from the time of commit to
weigh in but I get no time to respond?  For my part, I thought I *had*
written a testing tool - that's what pg_visibility is and that's what
I used to test the feature before committing it.  Now, you think
that's not good enough, and I respect your opinion, but it's not as if
you said this back when this was being committed.  Or at least if you
did, I don't remember it.

>> and like you have the right to tell assign me arbitrary work, which I
>> think you don't.
>
> It's not like adding a parameter for this would be a lot of work,
> there's even a patch out there.  I'm getting impatient because I feel
> the issue of this critical feature not being testable is getting ignored
> and/or played down.  And then sidetracked into a general "let's add a
> database consistency checker" type discussion. Which we need, but won't
> get in 9.6.

I know there's a patch.  Both Tom and I are skeptical about whether it
adds value, and I really don't think you've spelled out in as much
detail why you think it will help as I have why I think it won't.
Initially, I was like "ok, sure, we should have that", but the more I
thought about it (another advantage of time passing: you can think
about things more) the less convinced I was that it did anything
useful.  I don't think that's very unreasonable.  The importance of
the feature is exactly why we *should* think carefully about what is
best here and not just do the first thing that pops into our head.

> If you say: "I agree with the feature in principle, but I don't want to
> spend time to review/commit it." - ok, that's fair enough. But at the
> moment that isn't what I'm reading between the lines.

No, what I'm saying is "I'm not confident that this feature adds
value, and I'm afraid that by adding it we are making ourselves feel
better without solving any real problem".  I'm also saying "let's try
to agree on what problems we need to solve first and then decide on
the solutions".

>> If you want to have a
>> reasonable conversation about what the options are for making this
>> better, great.
>
> Yes, I want that.

Great.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-06 Thread Andres Freund
On 2016-06-06 16:18:19 -0400, Stephen Frost wrote:
> That could be as simple as a query with the right things installed, or
> it might be an independent tool, but not having any way to check isn't
> good.  That said, trying to make VACUUM do that doesn't make sense to me
> either.

The point is that VACUUM *has* these types of checks. And had so for
many years:
else if (all_visible_according_to_vm && !PageIsAllVisible(page)
 && VM_ALL_VISIBLE(onerel, blkno, ))
{
elog(WARNING, "page is not marked all-visible but 
visibility map bit is set in relation \"%s\" page %u",
 relname, blkno);
visibilitymap_clear(onerel, blkno, vmbuffer);
}
...
else if (PageIsAllVisible(page) && has_dead_tuples)
{
elog(WARNING, "page containing dead tuples is marked as 
all-visible in relation \"%s\" page %u",
 relname, blkno);
PageClearAllVisible(page);
MarkBufferDirty(buf);
visibilitymap_clear(onerel, blkno, vmbuffer);
}

the point is that, after the introduction of the freeze bit, there's no
way to reach them anymore (and they're missing a useful extension of
these warnings, but ...); these warnings have caught bugs.  I don't
think it'd advocate for the vacuum option otherwise.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-06 Thread Stephen Frost
Andres, all,

* Andres Freund (and...@anarazel.de) wrote:
> On 2016-06-06 15:16:10 -0400, Robert Haas wrote:
> > On Mon, Jun 6, 2016 at 2:35 PM, Andres Freund  wrote:
> > and like you have the right to tell assign me arbitrary work, which I
> > think you don't.
> 
> It's not like adding a parameter for this would be a lot of work,
> there's even a patch out there.  I'm getting impatient because I feel
> the issue of this critical feature not being testable is getting ignored
> and/or played down.  And then sidetracked into a general "let's add a
> database consistency checker" type discussion. Which we need, but won't
> get in 9.6.

To be clear, I was pointing out that we've had similar types of
consistency checkers implemented for other big features (eg: Heikki's
work on checking that WAL works) and that it'd be good to have one here
also.

That could be as simple as a query with the right things installed, or
it might be an independent tool, but not having any way to check isn't
good.  That said, trying to make VACUUM do that doesn't make sense to me
either.

Perhaps that's not an option due to the lateness of the hour or the lack
of manpower behind it, but that doesn't seem to be what has been said so
far.

> > If you want to me to do some work to help improve things on a patch I
> > committed, that is 100% fair.  But I don't know what I did to earn
> > this response which, to me, reads as rather demanding and rather
> > exasperated.
> 
> I don't think it's absurd to make some demands on the committer of a
> impact-heavy feature, about at least finding a realistic path towards
> the new feature being realistically testable.  This is a scary (but
> *REALLY IMPORTANT*) patch, and I don't understand why it's ok that we
> can't push a it through a couple wraparounds under high concurrency, and
> easily verify that the freeze map is in sync with the actual data.
> 
> And yes, I *am* exasperated, that I'm the only one that appears to be
> scared by the lack of that capability.  I think the feature is in a
> *lot* better shape than multixacts, but it certainly has the potential
> to do even more damage in ways that'll essentially be unrecoverable.

Not having a straightforward way to ensure that it's working properly is
certainly concerning to me as well.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Reviewing freeze map code

2016-06-06 Thread Andres Freund
Hi,

On 2016-06-06 15:16:10 -0400, Robert Haas wrote:
> On Mon, Jun 6, 2016 at 2:35 PM, Andres Freund  wrote:
> >> > Why would they have to write the complex query?  Wouldn't they just
> >> > need to run that we wrote for them?
> >
> > Then write that query. Verify that that query performs halfway
> > reasonably fast. Document that it should be run against databases after
> > subjecting them to tests. That'd address my concern as well.
> 
> You know, I am starting to lose a teeny bit of patience here.

Same here.


> I do appreciate you reviewing this code, very much, and genuinely, and
> it would be great if more people wanted to review it.

> But this kind of reads like you think that I'm being a jerk, which I'm
> trying pretty hard not to be

I don't think you're a jerk. But I am loosing a good bit of my patience
here. I've posted these issues a month ago, and for a long while the
only thing that happened was bikeshedding about the name of something
that wasn't even decided to happen yet (obviously said bikeshedding
isn't your fault).


> and like you have the right to tell assign me arbitrary work, which I
> think you don't.

It's not like adding a parameter for this would be a lot of work,
there's even a patch out there.  I'm getting impatient because I feel
the issue of this critical feature not being testable is getting ignored
and/or played down.  And then sidetracked into a general "let's add a
database consistency checker" type discussion. Which we need, but won't
get in 9.6.

If you say: "I agree with the feature in principle, but I don't want to
spend time to review/commit it." - ok, that's fair enough. But at the
moment that isn't what I'm reading between the lines.


> If you want to have a
> reasonable conversation about what the options are for making this
> better, great.

Yes, I want that.


> If you want to me to do some work to help improve things on a patch I
> committed, that is 100% fair.  But I don't know what I did to earn
> this response which, to me, reads as rather demanding and rather
> exasperated.

I don't think it's absurd to make some demands on the committer of a
impact-heavy feature, about at least finding a realistic path towards
the new feature being realistically testable.  This is a scary (but
*REALLY IMPORTANT*) patch, and I don't understand why it's ok that we
can't push a it through a couple wraparounds under high concurrency, and
easily verify that the freeze map is in sync with the actual data.

And yes, I *am* exasperated, that I'm the only one that appears to be
scared by the lack of that capability.  I think the feature is in a
*lot* better shape than multixacts, but it certainly has the potential
to do even more damage in ways that'll essentially be unrecoverable.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-06 Thread Robert Haas
On Mon, Jun 6, 2016 at 2:35 PM, Andres Freund  wrote:
>> > Why would they have to write the complex query?  Wouldn't they just
>> > need to run that we wrote for them?
>
> Then write that query. Verify that that query performs halfway
> reasonably fast. Document that it should be run against databases after
> subjecting them to tests. That'd address my concern as well.

You know, I am starting to lose a teeny bit of patience here.  I do
appreciate you reviewing this code, very much, and genuinely, and it
would be great if more people wanted to review it.  But this kind of
reads like you think that I'm being a jerk, which I'm trying pretty
hard not to be, and like you have the right to tell assign me
arbitrary work, which I think you don't.  If you want to have a
reasonable conversation about what the options are for making this
better, great.  If you want to me to do some work to help improve
things on a patch I committed, that is 100% fair.  But I don't know
what I did to earn this response which, to me, reads as rather
demanding and rather exasperated.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-06 Thread Andres Freund
On 2016-06-06 14:24:14 -0400, Stephen Frost wrote:
> * Robert Haas (robertmh...@gmail.com) wrote:
> > On Mon, Jun 6, 2016 at 11:44 AM, Andres Freund  wrote:
> > >> In terms of diagnostic tools, you can get the VM bits and
> > >> page-level bits using the pg_visibility extension; I wrote it
> > >> precisely because of concerns like the ones you raise here.  If you
> > >> want to cross-check the page-level bits against the tuple-level bits,
> > >> you can do that with the pageinspect extension.  And if you do those
> > >> things, you can actually find out whether stuff is broken.
> > >
> > > That's WAY out ouf reach of any "normal users". Adding a vacuum option
> > > is doable, writing complex queries is not.
> > 
> > Why would they have to write the complex query?  Wouldn't they just
> > need to run that we wrote for them?

Then write that query. Verify that that query performs halfway
reasonably fast. Document that it should be run against databases after
subjecting them to tests. That'd address my concern as well.


> > I mean, I'm not 100% dead set against this option you want, but in all
> > honestly, I would never, ever tell anyone to use it.  Unleashing
> > VACUUM on possibly-damaged data is just asking it to decide to prune
> > away tuples you don't want gone.  I would try very hard to come up
> > with something to give that user that was only going to *read* the
> > possibly-damaged data with as little chance of modifying or erasing it
> > as possible.

I'm more concerned about actually being able to verify that the freeze
logic does actually something meaningful, in situation where we'd *NOT*
expect any problems. If we're not trusting vacuum in that situation,
well ... 

> I certainly agree with this.
> 
> We need a read-only utility which checks that the system is in a correct
> and valid state.  There are a few of those which have been built for
> different pieces, I believe, and we really should have one for the
> visibility map, but I don't think it makes sense to imply in any way
> that VACUUM can or should be used for that.

Meh. This is vacuum behaviour that *has existed* up to this point. You
essentially removed it. Sure, I'm all for adding a verification
tool. But that's just pie in the skie at this point.  We have a complex,
data loss threatening feature, which just about nobody can verify at
this point. That's crazy.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-06 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
> On Mon, Jun 6, 2016 at 11:44 AM, Andres Freund  wrote:
> >> In terms of diagnostic tools, you can get the VM bits and
> >> page-level bits using the pg_visibility extension; I wrote it
> >> precisely because of concerns like the ones you raise here.  If you
> >> want to cross-check the page-level bits against the tuple-level bits,
> >> you can do that with the pageinspect extension.  And if you do those
> >> things, you can actually find out whether stuff is broken.
> >
> > That's WAY out ouf reach of any "normal users". Adding a vacuum option
> > is doable, writing complex queries is not.
> 
> Why would they have to write the complex query?  Wouldn't they just
> need to run that we wrote for them?
> 
> I mean, I'm not 100% dead set against this option you want, but in all
> honestly, I would never, ever tell anyone to use it.  Unleashing
> VACUUM on possibly-damaged data is just asking it to decide to prune
> away tuples you don't want gone.  I would try very hard to come up
> with something to give that user that was only going to *read* the
> possibly-damaged data with as little chance of modifying or erasing it
> as possible.

I certainly agree with this.

We need a read-only utility which checks that the system is in a correct
and valid state.  There are a few of those which have been built for
different pieces, I believe, and we really should have one for the
visibility map, but I don't think it makes sense to imply in any way
that VACUUM can or should be used for that.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Reviewing freeze map code

2016-06-06 Thread Robert Haas
On Mon, Jun 6, 2016 at 11:44 AM, Andres Freund  wrote:
>> In terms of diagnostic tools, you can get the VM bits and
>> page-level bits using the pg_visibility extension; I wrote it
>> precisely because of concerns like the ones you raise here.  If you
>> want to cross-check the page-level bits against the tuple-level bits,
>> you can do that with the pageinspect extension.  And if you do those
>> things, you can actually find out whether stuff is broken.
>
> That's WAY out ouf reach of any "normal users". Adding a vacuum option
> is doable, writing complex queries is not.

Why would they have to write the complex query?  Wouldn't they just
need to run that we wrote for them?

I mean, I'm not 100% dead set against this option you want, but in all
honestly, I would never, ever tell anyone to use it.  Unleashing
VACUUM on possibly-damaged data is just asking it to decide to prune
away tuples you don't want gone.  I would try very hard to come up
with something to give that user that was only going to *read* the
possibly-damaged data with as little chance of modifying or erasing it
as possible.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-06 Thread Andres Freund
On 2016-06-06 05:34:32 -0400, Robert Haas wrote:
> On Mon, Jun 6, 2016 at 5:11 AM, Michael Paquier
>  wrote:
> >> Attached is a sample patch that controls full page vacuum by new GUC 
> >> parameter.
> >
> > Don't we want a reloption for that? Just wondering...
> 
> Why?  Just for consistency?  I think the bigger question here is
> whether we need to do anything at all.  It's true that, without some
> new option, we'll lose the ability to forcibly vacuum every page in
> the relation, even if all-frozen.  But there's not much use case for
> that in the first place.  It will be potentially helpful if it turns
> out that we have a bug that sets the all-frozen bit on pages that are
> not, in fact, all-frozen.  Otherwise, what's the use?

Except that we right now don't have any realistic way to figure out
whether this new feature actually does the right thing. Which makes
testing this *considerably* harder than just VACUUM (dwim). I think it's
unacceptable to release this feature without a way that'll tell that it
so far has/has not corrupted the database.  Would that, in a perfect
world, be vacuum? No, probably not. But since we're not in a perfect world...

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-06 Thread Tom Lane
Robert Haas  writes:
> On Mon, Jun 6, 2016 at 11:28 AM, Andres Freund  wrote:
>> Except that we right now don't have any realistic way to figure out
>> whether this new feature actually does the right thing.

> I just don't see how running VACUUM on the all-frozen pages is going
> to help.

Yes.  I don't see that any of the proposed features would be very useful
for answering the question "is my VM incorrect".  Maybe they would fix
problems, and maybe not, but in any case you couldn't rely on VACUUM
to tell you about a problem.  (Even if you've got warning messages in
there, they might disappear into the postmaster log during an
auto-vacuum.  Warning messages in VACUUM are not a good debugging
technology.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-06 Thread Robert Haas
On Mon, Jun 6, 2016 at 11:28 AM, Andres Freund  wrote:
> On 2016-06-06 05:34:32 -0400, Robert Haas wrote:
>> On Mon, Jun 6, 2016 at 5:11 AM, Michael Paquier
>>  wrote:
>> >> Attached is a sample patch that controls full page vacuum by new GUC 
>> >> parameter.
>> >
>> > Don't we want a reloption for that? Just wondering...
>>
>> Why?  Just for consistency?  I think the bigger question here is
>> whether we need to do anything at all.  It's true that, without some
>> new option, we'll lose the ability to forcibly vacuum every page in
>> the relation, even if all-frozen.  But there's not much use case for
>> that in the first place.  It will be potentially helpful if it turns
>> out that we have a bug that sets the all-frozen bit on pages that are
>> not, in fact, all-frozen.  Otherwise, what's the use?
>
> Except that we right now don't have any realistic way to figure out
> whether this new feature actually does the right thing. Which makes
> testing this *considerably* harder than just VACUUM (dwim). I think it's
> unacceptable to release this feature without a way that'll tell that it
> so far has/has not corrupted the database.  Would that, in a perfect
> world, be vacuum? No, probably not. But since we're not in a perfect world...

I just don't see how running VACUUM on the all-frozen pages is going
to help.  In terms of diagnostic tools, you can get the VM bits and
page-level bits using the pg_visibility extension; I wrote it
precisely because of concerns like the ones you raise here.  If you
want to cross-check the page-level bits against the tuple-level bits,
you can do that with the pageinspect extension.  And if you do those
things, you can actually find out whether stuff is broken.  Vacuuming
the all-frozen pages won't tell you that.  It will either do nothing
(which doesn't tell you that things are OK) or it will change
something (possibly without reporting any message, and possibly making
a bad situation worse instead of better).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-06 Thread Andres Freund
On 2016-06-06 11:37:25 -0400, Robert Haas wrote:
> On Mon, Jun 6, 2016 at 11:28 AM, Andres Freund  wrote:
> > Except that we right now don't have any realistic way to figure out
> > whether this new feature actually does the right thing. Which makes
> > testing this *considerably* harder than just VACUUM (dwim). I think it's
> > unacceptable to release this feature without a way that'll tell that it
> > so far has/has not corrupted the database.  Would that, in a perfect
> > world, be vacuum? No, probably not. But since we're not in a perfect 
> > world...
> 
> I just don't see how running VACUUM on the all-frozen pages is going
> to help.

Because we can tell people in the beta2 announcement or some wiki page
"please run VACUUM(scan_all)" and check whether it emits WARNINGs. And
if we suspect freeze map in bug reports, we can just ask reporters to
run a VACUUM (scan_all).


> In terms of diagnostic tools, you can get the VM bits and
> page-level bits using the pg_visibility extension; I wrote it
> precisely because of concerns like the ones you raise here.  If you
> want to cross-check the page-level bits against the tuple-level bits,
> you can do that with the pageinspect extension.  And if you do those
> things, you can actually find out whether stuff is broken.

That's WAY out ouf reach of any "normal users". Adding a vacuum option
is doable, writing complex queries is not.


> Vacuuming the all-frozen pages won't tell you that.  It will either do
> nothing (which doesn't tell you that things are OK) or it will change
> something (possibly without reporting any message, and possibly making
> a bad situation worse instead of better).

We found a number of bugs for the equivalent issues in all-visible
handling via the vacuum error reporting around those.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-06 Thread Tom Lane
Robert Haas  writes:
> I'm intuitively sympathetic to the idea that we should have an option
> for this, but I can't figure out in what case we'd actually tell
> anyone to use it.  It would be useful for the kinds of bugs listed
> above to have VACUUM (rebuild_vm) to blow away the VM fork and rebuild
> it, but that's different semantics than what we proposed for VACUUM
> (even_frozen_pages).  And I'd be sort of inclined to handle that case
> by providing some other way to remove VM forks (like a new function in
> the pg_visibilitymap contrib module, maybe?) and then just tell people
> to run regular VACUUM afterwards, rather than putting the actual VM
> fork removal into VACUUM.

There's a lot to be said for that approach.  If we do it, I'd be a bit
inclined to offer an option to blow away the FSM as well.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-06 Thread Robert Haas
On Mon, Jun 6, 2016 at 10:18 AM, Alvaro Herrera
 wrote:
> Robert Haas wrote:
>> My gut feeling on this is to either do nothing or add a VACUUM option
>> (not a GUC, not a reloption) called even_frozen_pages, default false.
>> What is your opinion?
>
> +1 for that approach -- I thought that was already agreed weeks ago and
> the only question was what to name that option.  even_frozen_pages
> sounds better than SCANALL, SCAN_ALL, FREEZE, FORCE (the other
> options I saw proposed in that subthread), so +1 for that naming
> too.
>
> I don't like doing nothing; that means that when we discover a bug we'll
> have to tell users to rm a file whose name requires a complicated
> catalog query to find out, so -1 for that.

So... I agree that it is definitely not good if we have to tell users
to rm a file, but I am not quite sure how this new option would
prevent us from having to say that?  Here are some potential kinds of
bugs we might have:

1. Sometimes, the all-frozen bit doesn't get set when it should.
2. Sometimes, the all-frozen bit gets sit when it shouldn't.
3. Some combination of (1) and (2), so that the VM fork can't be
trusted in either direction.

If (1) happens, removing the VM fork is not a good idea; what people
will want to do is re-run a VACUUM FREEZE.

If (2) or (3) happens, removing the VM fork might be a good idea, but
it's not really clear that VACUUM (even_frozen_pages) will help much.
For one thing, if there are actually unfrozen tuples on those pages
and the clog pages which they reference are already gone or recycled,
rerunning VACUUM on the table in any form might permanently lose data,
or maybe it will just fail.

If because of the nature of the bug you somehow know that case doesn't
pertain, then I suppose the bug is that the tuple-level and page-level
state is out of sync.  VACUUM (even_frozen_pages) probably won't help
with that much either, because VACUUM never clears the all-frozen bit
without also clearing the all-visible bit, and that only if the page
contains dead tuples, which in this case it probably doesn't.

I'm intuitively sympathetic to the idea that we should have an option
for this, but I can't figure out in what case we'd actually tell
anyone to use it.  It would be useful for the kinds of bugs listed
above to have VACUUM (rebuild_vm) to blow away the VM fork and rebuild
it, but that's different semantics than what we proposed for VACUUM
(even_frozen_pages).  And I'd be sort of inclined to handle that case
by providing some other way to remove VM forks (like a new function in
the pg_visibilitymap contrib module, maybe?) and then just tell people
to run regular VACUUM afterwards, rather than putting the actual VM
fork removal into VACUUM.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-06 Thread Robert Haas
On Fri, Jun 3, 2016 at 11:41 PM, Robert Haas  wrote:
> (Status update for Noah: I expect Masahiko Sawada will respond
> quickly, but if not I'll give some kind of update by Monday COB
> anyhow.)

I believe this open item is now closed, unless Andres has more
comments or wishes to discuss any point further, with the exception
that we still need to decide whether to add VACUUM (even_frozen_pages)
or some variant of that.  I have added a new open item for that issue
and marked this one as resolved.

My intended strategy as the presumptive owner of the new items is to
do nothing unless more of a consensus emerges than we have presently.
We do not seem to have clear agreement on whether to add the new
option; whether to make it a GUC, a reloption, a VACUUM syntax option,
or some combination of those things; and whether it should blow up the
existing VM and rebuild it (as proposed by Sawada-san) or just force
frozen pages to be scanned in the hope that something good will happen
(as proposed by Andres).  In the absence of consensus, doing nothing
is a reasonable choice here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-06 Thread Alvaro Herrera
Robert Haas wrote:

> My gut feeling on this is to either do nothing or add a VACUUM option
> (not a GUC, not a reloption) called even_frozen_pages, default false.
> What is your opinion?

+1 for that approach -- I thought that was already agreed weeks ago and
the only question was what to name that option.  even_frozen_pages
sounds better than SCANALL, SCAN_ALL, FREEZE, FORCE (the other
options I saw proposed in that subthread), so +1 for that naming
too.

I don't like doing nothing; that means that when we discover a bug we'll
have to tell users to rm a file whose name requires a complicated
catalog query to find out, so -1 for that.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-06 Thread Tom Lane
Robert Haas  writes:
> On Mon, Jun 6, 2016 at 9:53 AM, Tom Lane  wrote:
>> Taking out GUCs is not
>> easier than taking out statement parameters; you risk breaking
>> applications either way.

> Agreed, but that doesn't really answer the question of which one we
> should have, if either.  My gut feeling on this is to either do
> nothing or add a VACUUM option (not a GUC, not a reloption) called
> even_frozen_pages, default false.  What is your opinion?

That's about where I stand, with some preference for "do nothing".
I'm not convinced we need this.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-06 Thread Robert Haas
On Mon, Jun 6, 2016 at 9:53 AM, Tom Lane  wrote:
> Masahiko Sawada  writes:
>> On Sat, Jun 4, 2016 at 1:46 PM, Masahiko Sawada  
>> wrote:
>>> So other idea is to have GUC parameter, vacuum_even_frozen_page for example.
>>> If this parameter is set true (false by default), we do vacuum whole
>>> table forcibly and re-generate visibility map.
>>> The advantage of this idea is that we don't necessary to expand VACUUM
>>> syntax and relatively easily can remove this parameter if it's not
>>> necessary anymore.
>
>> Attached is a sample patch that controls full page vacuum by new GUC 
>> parameter.
>
> I find this approach fairly ugly ... it's randomly inconsistent with other
> VACUUM parameters for no very defensible reason.

Just to be sure I understand, in what way is it inconsistent?

> Taking out GUCs is not
> easier than taking out statement parameters; you risk breaking
> applications either way.

Agreed, but that doesn't really answer the question of which one we
should have, if either.  My gut feeling on this is to either do
nothing or add a VACUUM option (not a GUC, not a reloption) called
even_frozen_pages, default false.  What is your opinion?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-06 Thread Robert Haas
On Mon, Jun 6, 2016 at 7:46 AM, Robert Haas  wrote:
> On Sat, Jun 4, 2016 at 12:18 AM, Masahiko Sawada  
> wrote:
>> Attached updated patch.
>
> The error-checking enhancements here look good to me, except that you
> forgot to initialize totalBytesRead.  I've committed those changes
> with a fix for that problem and will look at the rest of this
> separately.

Committed that now, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-06 Thread Tom Lane
Masahiko Sawada  writes:
> On Sat, Jun 4, 2016 at 1:46 PM, Masahiko Sawada  wrote:
>> So other idea is to have GUC parameter, vacuum_even_frozen_page for example.
>> If this parameter is set true (false by default), we do vacuum whole
>> table forcibly and re-generate visibility map.
>> The advantage of this idea is that we don't necessary to expand VACUUM
>> syntax and relatively easily can remove this parameter if it's not
>> necessary anymore.

> Attached is a sample patch that controls full page vacuum by new GUC 
> parameter.

I find this approach fairly ugly ... it's randomly inconsistent with other
VACUUM parameters for no very defensible reason.  Taking out GUCs is not
easier than taking out statement parameters; you risk breaking
applications either way.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-06 Thread Robert Haas
On Sat, Jun 4, 2016 at 12:18 AM, Masahiko Sawada  wrote:
> Attached updated patch.

The error-checking enhancements here look good to me, except that you
forgot to initialize totalBytesRead.  I've committed those changes
with a fix for that problem and will look at the rest of this
separately.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-06 Thread Masahiko Sawada
On Mon, Jun 6, 2016 at 6:34 PM, Robert Haas  wrote:
> On Mon, Jun 6, 2016 at 5:11 AM, Michael Paquier
>  wrote:
>>> Attached is a sample patch that controls full page vacuum by new GUC 
>>> parameter.
>>
>> Don't we want a reloption for that? Just wondering...
>
> Why?  Just for consistency?  I think the bigger question here is
> whether we need to do anything at all.  It's true that, without some
> new option, we'll lose the ability to forcibly vacuum every page in
> the relation, even if all-frozen.  But there's not much use case for
> that in the first place.  It will be potentially helpful if it turns
> out that we have a bug that sets the all-frozen bit on pages that are
> not, in fact, all-frozen.  Otherwise, what's the use?
>

I cannot agree with using this parameter as a reloption.
We set it true only when the serious bug is discovered and we want to
re-generate the visibility maps of specific tables.
I thought that control by GUC parameter would be convenient rather
than adding the new option.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-06 Thread Robert Haas
On Mon, Jun 6, 2016 at 5:11 AM, Michael Paquier
 wrote:
>> Attached is a sample patch that controls full page vacuum by new GUC 
>> parameter.
>
> Don't we want a reloption for that? Just wondering...

Why?  Just for consistency?  I think the bigger question here is
whether we need to do anything at all.  It's true that, without some
new option, we'll lose the ability to forcibly vacuum every page in
the relation, even if all-frozen.  But there's not much use case for
that in the first place.  It will be potentially helpful if it turns
out that we have a bug that sets the all-frozen bit on pages that are
not, in fact, all-frozen.  Otherwise, what's the use?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-06 Thread Michael Paquier
On Mon, Jun 6, 2016 at 5:44 PM, Masahiko Sawada  wrote:
> On Sat, Jun 4, 2016 at 1:46 PM, Masahiko Sawada  wrote:
>> On Sat, Jun 4, 2016 at 12:59 AM, Robert Haas  wrote:
>>> On Fri, Jun 3, 2016 at 11:21 AM, Masahiko Sawada  
>>> wrote:
> Can you submit that part as a separate patch?

 Attached.
>>>
>>> Thanks, committed.
>>>
>> I'm address the review comment of 7087166 commit, and will post the 
>> patch.
>
> When?

 On Saturday.
>>>
>>> Great.  Will that address everything for this open item, then?
>>>
>>
>> Attached patch for commit 7087166 on another mail.
>> I think that only the test tool for visibility map is remaining and
>> under the discussion.
>> Even if we have verification tool or function for visibility map, we
>> cannot repair the contents of visibility map if we turned out that
>> contents of visibility map is something wrong.
>> So I think we should have the way that re-generates the visibility map.
>> For this purpose, doing vacuum while ignoring visibility map by a new
>> option or new function is one idea.
>> But IMHO, it's not good idea to allow a function to do vacuum, and
>> expanding the VACUUM syntax might be somewhat overkill.
>>
>> So other idea is to have GUC parameter, vacuum_even_frozen_page for example.
>> If this parameter is set true (false by default), we do vacuum whole
>> table forcibly and re-generate visibility map.
>> The advantage of this idea is that we don't necessary to expand VACUUM
>> syntax and relatively easily can remove this parameter if it's not
>> necessary anymore.
>>
>
> Attached is a sample patch that controls full page vacuum by new GUC 
> parameter.

Don't we want a reloption for that? Just wondering...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-06 Thread Masahiko Sawada
On Sat, Jun 4, 2016 at 1:46 PM, Masahiko Sawada  wrote:
> On Sat, Jun 4, 2016 at 12:59 AM, Robert Haas  wrote:
>> On Fri, Jun 3, 2016 at 11:21 AM, Masahiko Sawada  
>> wrote:
 Can you submit that part as a separate patch?
>>>
>>> Attached.
>>
>> Thanks, committed.
>>
> I'm address the review comment of 7087166 commit, and will post the patch.

 When?
>>>
>>> On Saturday.
>>
>> Great.  Will that address everything for this open item, then?
>>
>
> Attached patch for commit 7087166 on another mail.
> I think that only the test tool for visibility map is remaining and
> under the discussion.
> Even if we have verification tool or function for visibility map, we
> cannot repair the contents of visibility map if we turned out that
> contents of visibility map is something wrong.
> So I think we should have the way that re-generates the visibility map.
> For this purpose, doing vacuum while ignoring visibility map by a new
> option or new function is one idea.
> But IMHO, it's not good idea to allow a function to do vacuum, and
> expanding the VACUUM syntax might be somewhat overkill.
>
> So other idea is to have GUC parameter, vacuum_even_frozen_page for example.
> If this parameter is set true (false by default), we do vacuum whole
> table forcibly and re-generate visibility map.
> The advantage of this idea is that we don't necessary to expand VACUUM
> syntax and relatively easily can remove this parameter if it's not
> necessary anymore.
>

Attached is a sample patch that controls full page vacuum by new GUC parameter.

Regards,

--
Masahiko Sawada
diff --git a/src/backend/commands/vacuumlazy.c 
b/src/backend/commands/vacuumlazy.c
index 784c3e9..03f026d 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -125,6 +125,8 @@ typedef struct LVRelStats
boollock_waiter_detected;
 } LVRelStats;
 
+/* GUC parameter */
+bool vacuum_even_frozen_page; /* should we scan all pages forcibly? */
 
 /* A few variables that don't seem worth passing around as parameters */
 static int elevel = -1;
@@ -138,7 +140,7 @@ static BufferAccessStrategy vac_strategy;
 
 /* non-export function prototypes */
 static void lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
-  Relation *Irel, int nindexes, bool aggressive);
+  Relation *Irel, int nindexes, bool aggressive, bool 
scan_all);
 static void lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats);
 static bool lazy_check_needs_freeze(Buffer buf, bool *hastup);
 static void lazy_vacuum_index(Relation indrel,
@@ -246,7 +248,8 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams 
*params,
vacrelstats->hasindex = (nindexes > 0);
 
/* Do the vacuuming */
-   lazy_scan_heap(onerel, vacrelstats, Irel, nindexes, aggressive);
+   lazy_scan_heap(onerel, vacrelstats, Irel, nindexes, aggressive,
+  vacuum_even_frozen_page);
 
/* Done with indexes */
vac_close_indexes(nindexes, Irel, NoLock);
@@ -261,7 +264,7 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams 
*params,
if ((vacrelstats->scanned_pages + vacrelstats->frozenskipped_pages)
< vacrelstats->rel_pages)
{
-   Assert(!aggressive);
+   Assert(!aggressive && !vacuum_even_frozen_page);
scanned_all_unfrozen = false;
}
else
@@ -442,7 +445,7 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats 
*vacrelstats)
  */
 static void
 lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
-  Relation *Irel, int nindexes, bool aggressive)
+  Relation *Irel, int nindexes, bool aggressive, bool 
scan_all)
 {
BlockNumber nblocks,
blkno;
@@ -513,6 +516,10 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 * such pages do not need freezing and do not affect the value that we 
can
 * safely set for relfrozenxid or relminmxid.
 *
+* When scan_all is set, we have to scan all pages forcibly while 
ignoring
+* visibility map status, and then we can safely set for relfrozenxid or
+* relminmxid.
+*
 * Before entering the main loop, establish the invariant that
 * next_unskippable_block is the next block number >= blkno that's not 
we
 * can't skip based on the visibility map, either all-visible for a
@@ -639,11 +646,12 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
/*
 * The current block is potentially skippable; if we've 
seen a
 * long enough run of skippable blocks to justify 
skipping it, and
-* we're not forced to check it, then go ahead and skip.
-* Otherwise, the page 

Re: [HACKERS] Reviewing freeze map code

2016-06-03 Thread Masahiko Sawada
On Sat, Jun 4, 2016 at 12:59 AM, Robert Haas  wrote:
> On Fri, Jun 3, 2016 at 11:21 AM, Masahiko Sawada  
> wrote:
>>> Can you submit that part as a separate patch?
>>
>> Attached.
>
> Thanks, committed.
>
 I'm address the review comment of 7087166 commit, and will post the patch.
>>>
>>> When?
>>
>> On Saturday.
>
> Great.  Will that address everything for this open item, then?
>

Attached patch for commit 7087166 on another mail.
I think that only the test tool for visibility map is remaining and
under the discussion.
Even if we have verification tool or function for visibility map, we
cannot repair the contents of visibility map if we turned out that
contents of visibility map is something wrong.
So I think we should have the way that re-generates the visibility map.
For this purpose, doing vacuum while ignoring visibility map by a new
option or new function is one idea.
But IMHO, it's not good idea to allow a function to do vacuum, and
expanding the VACUUM syntax might be somewhat overkill.

So other idea is to have GUC parameter, vacuum_even_frozen_page for example.
If this parameter is set true (false by default), we do vacuum whole
table forcibly and re-generate visibility map.
The advantage of this idea is that we don't necessary to expand VACUUM
syntax and relatively easily can remove this parameter if it's not
necessary anymore.

Thought?

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-03 Thread Masahiko Sawada
On Sat, Jun 4, 2016 at 12:41 PM, Robert Haas  wrote:
> On Fri, Jun 3, 2016 at 10:25 PM, Masahiko Sawada  
> wrote:
 +   charnew_vmbuf[BLCKSZ];
 +   char   *new_cur = new_vmbuf;
 +   boolempty = true;
 +   boolold_lastpart;
 +
 +   /* Copy page header in advance */
 +   memcpy(new_vmbuf, , 
 SizeOfPageHeaderData);

 Shouldn't we zero out new_vmbuf? Afaics we're not necessarily zeroing it
 with old_lastpart && !empty, right?
>>>
>>> Oh, dear.  That seems like a possible data corruption bug.  Maybe we'd
>>> better fix that right away (although I don't actually have time before
>>> the wrap).
>
> Actually, on second thought, I'm not seeing the bug here.  It seems to
> me that the loop commented this way:
>
> /* Process old page bytes one by one, and turn it into new page. 
> */
>
> ...should always write to every byte in new_vmbuf, because we process
> exactly half the bytes in the old block at a time, and so that's going
> to generate exactly one full page of new bytes.  Am I missing
> something?

Yeah, you're right.
the rewriteVisibilityMap() always exactly writes whole new_vmbuf.

>
>> Since the force is always set true, I removed the force from argument
>> of copyFile() and rewriteVisibilityMap().
>> And destination file is always opened with O_RDWR, O_CREAT, O_TRUNC flags .
>
> I'm not happy with this.  I think we should always open with O_EXCL,
> because the new file is not expected to exist and if it does,
> something's probably broken.  I think we should default to the safe
> behavior (which is failing) rather than the unsafe behavior (which is
> clobbering data).

I specified O_EXCL instead of O_TRUNC.

Attached updated patch.

Regards,

--
Masahiko Sawada


fix_freeze_map_7087166_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-03 Thread Robert Haas
On Fri, Jun 3, 2016 at 10:25 PM, Masahiko Sawada  wrote:
>>> +   charnew_vmbuf[BLCKSZ];
>>> +   char   *new_cur = new_vmbuf;
>>> +   boolempty = true;
>>> +   boolold_lastpart;
>>> +
>>> +   /* Copy page header in advance */
>>> +   memcpy(new_vmbuf, , 
>>> SizeOfPageHeaderData);
>>>
>>> Shouldn't we zero out new_vmbuf? Afaics we're not necessarily zeroing it
>>> with old_lastpart && !empty, right?
>>
>> Oh, dear.  That seems like a possible data corruption bug.  Maybe we'd
>> better fix that right away (although I don't actually have time before
>> the wrap).

Actually, on second thought, I'm not seeing the bug here.  It seems to
me that the loop commented this way:

/* Process old page bytes one by one, and turn it into new page. */

...should always write to every byte in new_vmbuf, because we process
exactly half the bytes in the old block at a time, and so that's going
to generate exactly one full page of new bytes.  Am I missing
something?

> Since the force is always set true, I removed the force from argument
> of copyFile() and rewriteVisibilityMap().
> And destination file is always opened with O_RDWR, O_CREAT, O_TRUNC flags .

I'm not happy with this.  I think we should always open with O_EXCL,
because the new file is not expected to exist and if it does,
something's probably broken.  I think we should default to the safe
behavior (which is failing) rather than the unsafe behavior (which is
clobbering data).

(Status update for Noah: I expect Masahiko Sawada will respond
quickly, but if not I'll give some kind of update by Monday COB
anyhow.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-03 Thread Masahiko Sawada
On Sat, May 7, 2016 at 5:42 AM, Robert Haas  wrote:
> On Thu, May 5, 2016 at 2:20 PM, Andres Freund  wrote:
>> On 2016-05-02 14:48:18 -0700, Andres Freund wrote:
>>> 7087166 pg_upgrade: Convert old visibility map format to new format.
>>
>> +const char *
>> +rewriteVisibilityMap(const char *fromfile, const char *tofile, bool force)
>> ...
>>
>> +   while ((bytesRead = read(src_fd, buffer, BLCKSZ)) == BLCKSZ)
>> +   {
>> ..
>>
>> Uh, shouldn't we actually fail if we read incompletely? Rather than
>> silently ignoring the problem? Ok, this causes no corruption, but it
>> indicates that something went significantly wrong.
>
> Sure, that's reasonable.
>

Fixed.

>> +   charnew_vmbuf[BLCKSZ];
>> +   char   *new_cur = new_vmbuf;
>> +   boolempty = true;
>> +   boolold_lastpart;
>> +
>> +   /* Copy page header in advance */
>> +   memcpy(new_vmbuf, , SizeOfPageHeaderData);
>>
>> Shouldn't we zero out new_vmbuf? Afaics we're not necessarily zeroing it
>> with old_lastpart && !empty, right?
>
> Oh, dear.  That seems like a possible data corruption bug.  Maybe we'd
> better fix that right away (although I don't actually have time before
> the wrap).

Since the force is always set true, I removed the force from argument
of copyFile() and rewriteVisibilityMap().
And destination file is always opened with O_RDWR, O_CREAT, O_TRUNC flags .

>> +   if ((dst_fd = open(tofile, O_RDWR | O_CREAT | (force ? 0 : O_EXCL), 
>> S_IRUSR | S_IWUSR)) < 0)
>> +   {
>> +   close(src_fd);
>> +   return getErrorText();
>> +   }
>>
>> I know you guys copied this, but what's the force thing about?
>> Expecially as it's always set to true by the callers (i.e. what is the
>> parameter even about?)?  Wouldn't we at least have to specify O_TRUNC in
>> the force case?
>
> I just work here.
>
>> +   old_cur += BITS_PER_HEAPBLOCK_OLD;
>> +   new_cur += BITS_PER_HEAPBLOCK;
>>
>> I'm not sure I'm understanding the point of the BITS_PER_HEAPBLOCK_OLD
>> stuff - as long as it's hardcoded into rewriteVisibilityMap() we'll not
>> be able to have differing ones anyway, should we decide to add a third
>> bit?
>
> I think that's just a matter of style.

So this comments is not incorporated.

Attached patch, please review it.

Regards,

--
Masahiko Sawada


fix_freeze_map_7087166.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-03 Thread Robert Haas
On Fri, Jun 3, 2016 at 11:21 AM, Masahiko Sawada  wrote:
>> Can you submit that part as a separate patch?
>
> Attached.

Thanks, committed.

>>> I'm address the review comment of 7087166 commit, and will post the patch.
>>
>> When?
>
> On Saturday.

Great.  Will that address everything for this open item, then?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-03 Thread Masahiko Sawada
On Sat, Jun 4, 2016 at 12:08 AM, Robert Haas  wrote:
> On Fri, Jun 3, 2016 at 10:49 AM, Masahiko Sawada  
> wrote:
>> That patch also incorporates the following review comment.
>> We can push at least this fix.
>
> Can you submit that part as a separate patch?

Attached.

>> I'm address the review comment of 7087166 commit, and will post the patch.
>
> When?
>

On Saturday.

Regards,

--
Masahiko Sawada


fix_freeze_map_fd31cd2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-03 Thread Robert Haas
On Fri, Jun 3, 2016 at 10:49 AM, Masahiko Sawada  wrote:
> That patch also incorporates the following review comment.
> We can push at least this fix.

Can you submit that part as a separate patch?

> I'm address the review comment of 7087166 commit, and will post the patch.

When?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-03 Thread Masahiko Sawada
On Fri, Jun 3, 2016 at 11:03 PM, Robert Haas  wrote:
> On Thu, Jun 2, 2016 at 11:24 AM, Masahiko Sawada  
> wrote:
>> Attached patch optimises skipping pages logic so that blkno can jump to
>> next_unskippable_block directly while counting the number of all_visible
>> and all_frozen pages. So we can avoid double checking visibility map.
>
> I think this is 9.7 material.  This patch has already won the
> "scariest patch" tournament.  Changing the logic more than necessary
> at this late date seems like it just increases the scariness.  I think
> this is an opportunity for further optimization, not a defect.
>

I agree with you.
I'll submit this as a improvement for 9.7.
That patch also incorporates the following review comment.
We can push at least this fix.
>> /*
>>  * Compute whether we actually scanned the whole relation. If we 
>> did, we
>>  * can adjust relfrozenxid and relminmxid.
>>  *
>>  * NB: We need to check this before truncating the relation, because 
>> that
>>  * will change ->rel_pages.
>>  */
>>
>> Comment is out-of-date now.

I'm address the review comment of 7087166 commit, and will post the patch.
And testing feature for freeze map is under the discussion.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-03 Thread Robert Haas
On Wed, Jun 1, 2016 at 3:50 AM, Masahiko Sawada  wrote:
> Attached patch fixes only above comments, other are being addressed now.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-03 Thread Robert Haas
On Thu, Jun 2, 2016 at 11:24 AM, Masahiko Sawada  wrote:
> Attached patch optimises skipping pages logic so that blkno can jump to
> next_unskippable_block directly while counting the number of all_visible
> and all_frozen pages. So we can avoid double checking visibility map.

I think this is 9.7 material.  This patch has already won the
"scariest patch" tournament.  Changing the logic more than necessary
at this late date seems like it just increases the scariness.  I think
this is an opportunity for further optimization, not a defect.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-06-02 Thread Masahiko Sawada
On Sat, May 7, 2016 at 5:40 AM, Robert Haas  wrote:
> On Wed, May 4, 2016 at 8:08 PM, Andres Freund  wrote:
>> On 2016-05-02 14:48:18 -0700, Andres Freund wrote:
>>> 77a1d1e Department of second thoughts: remove PD_ALL_FROZEN.
>>
>> Nothing to say here.
>>
>>> fd31cd2 Don't vacuum all-frozen pages.
>>
>> Hm. I do wonder if it's going to bite us that we don't have a way to
>> actually force vacuuming of the whole table (besides manually rm'ing the
>> VM). I've more than once seen VACUUM used to try to do some integrity
>> checking of the database.  How are we actually going to test that the
>> feature works correctly? They'd have to write checks ontop of
>> pg_visibility to see whether things are borked.
>
> Let's add VACUUM (FORCE) or something like that.
>
>> /*
>>  * Compute whether we actually scanned the whole relation. If we 
>> did, we
>>  * can adjust relfrozenxid and relminmxid.
>>  *
>>  * NB: We need to check this before truncating the relation, because 
>> that
>>  * will change ->rel_pages.
>>  */
>>
>> Comment is out-of-date now.
>
> OK.

Fixed.

>> -   if (blkno == next_not_all_visible_block)
>> +   if (blkno == next_unskippable_block)
>> {
>> -   /* Time to advance next_not_all_visible_block */
>> -   for (next_not_all_visible_block++;
>> -next_not_all_visible_block < nblocks;
>> -next_not_all_visible_block++)
>> +   /* Time to advance next_unskippable_block */
>> +   for (next_unskippable_block++;
>> +next_unskippable_block < nblocks;
>> +next_unskippable_block++)
>>
>> Hm. So we continue with the course of re-processing pages, even if
>> they're marked all-frozen. For all-visible there at least can be a
>> benefit by freezing earlier, but for all-frozen pages there's really no
>> point.  I don't really buy the arguments for the skipping logic. But
>> even disregarding that, maybe we should skip processing a block if it's
>> all-frozen (without preventing the page from being read?); as there's no
>> possible benefit?  Acquring the exclusive/content lock and stuff is far
>> from free.
>
> I wanted to tinker with this logic as little as possible in the
> interest of ending up with something that worked.  I would not have
> written it this way.
>
>> Not really related to this patch, but the FORCE_CHECK_PAGE is rather
>> ugly.
>
> +1.
>> +   /*
>> +* The current block is potentially skippable; if 
>> we've seen a
>> +* long enough run of skippable blocks to justify 
>> skipping it, and
>> +* we're not forced to check it, then go ahead and 
>> skip.
>> +* Otherwise, the page must be at least all-visible 
>> if not
>> +* all-frozen, so we can set 
>> all_visible_according_to_vm = true.
>> +*/
>> +   if (skipping_blocks && !FORCE_CHECK_PAGE())
>> +   {
>> +   /*
>> +* Tricky, tricky.  If this is in aggressive 
>> vacuum, the page
>> +* must have been all-frozen at the time we 
>> checked whether it
>> +* was skippable, but it might not be any 
>> more.  We must be
>> +* careful to count it as a skipped 
>> all-frozen page in that
>> +* case, or else we'll think we can't update 
>> relfrozenxid and
>> +* relminmxid.  If it's not an aggressive 
>> vacuum, we don't
>> +* know whether it was all-frozen, so we 
>> have to recheck; but
>> +* in this case an approximate answer is OK.
>> +*/
>> +   if (aggressive || VM_ALL_FROZEN(onerel, 
>> blkno, ))
>> +   vacrelstats->frozenskipped_pages++;
>> continue;
>> +   }
>>
>> Hm. This indeed seems a bit tricky.  Not sure how to make it easier
>> though without just ripping out the SKIP_PAGES_THRESHOLD stuff.
>
> Yep, I had the same problem.
>> Hm. This also doubles the number of VM accesses. While I guess that's
>> not noticeable most of the time, it's still not nice; especially when a
>> large relation is entirely frozen, because it'll mean we'll sequentially
>> go through the visibilityma twice.
>
> Compared to what we're saving, that's obviously a trivial cost.
> That's not to say that we might not want to improve it, but it's
> hardly a disaster.
>
> In short: wah, wah, wah.
>

Attached 

Re: [HACKERS] Reviewing freeze map code

2016-06-01 Thread Masahiko Sawada
On Sat, May 7, 2016 at 5:34 AM, Robert Haas  wrote:
> On Mon, May 2, 2016 at 8:25 PM, Andres Freund  wrote:
>> + * heap_tuple_needs_eventual_freeze
>> + *
>> + * Check to see whether any of the XID fields of a tuple (xmin, xmax, xvac)
>> + * will eventually require freezing.  Similar to heap_tuple_needs_freeze,
>> + * but there's no cutoff, since we're trying to figure out whether freezing
>> + * will ever be needed, not whether it's needed now.
>> + */
>> +bool
>> +heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple)
>>
>> Wouldn't redefining this to heap_tuple_is_frozen() and then inverting the
>> checks be easier to understand?
>
> I thought it much safer to keep this as close to a copy of
> heap_tuple_needs_freeze() as possible.  Copying a function and
> inverting all of the return values is much more likely to introduce
> bugs, IME.

I agree.

>> +   /*
>> +* If xmax is a valid xact or multixact, this tuple is also not 
>> frozen.
>> +*/
>> +   if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
>> +   {
>> +   MultiXactId multi;
>> +
>> +   multi = HeapTupleHeaderGetRawXmax(tuple);
>> +   if (MultiXactIdIsValid(multi))
>> +   return true;
>> +   }
>>
>> Hm. What's the test inside the if() for? There shouldn't be any case
>> where xmax is invalid if HEAP_XMAX_IS_MULTI is set.   Now there's a
>> check like that outside of this commit, but it seems strange to me
>> (Alvaro, perhaps you could comment on this?).
>
> Here again I was copying existing code, with appropriate simplifications.
>
>> + *
>> + * Clearing both visibility map bits is not separately WAL-logged.  The 
>> callers
>>   * must make sure that whenever a bit is cleared, the bit is cleared on WAL
>>   * replay of the updating operation as well.
>>
>> I think including "both" here makes things less clear, because it
>> differentiates clearing one bit from clearing both. There's no practical
>> differentce atm, but still.
>
> I agree.

Fixed.

>>   *
>>   * VACUUM will normally skip pages for which the visibility map bit is set;
>>   * such pages can't contain any dead tuples and therefore don't need 
>> vacuuming.
>> - * The visibility map is not used for anti-wraparound vacuums, because
>> - * an anti-wraparound vacuum needs to freeze tuples and observe the latest 
>> xid
>> - * present in the table, even on pages that don't have any dead tuples.
>>   *
>>
>> I think the remaining sentence isn't entirely accurate, there's now more
>> than one bit, and they're different with regard to scan_all/!scan_all
>> vacuums (or will be - maybe this updated further in a later commit? But
>> if so, that sentence shouldn't yet be removed...).
>
> We can adjust the language, but I don't really see a big problem here.

This comment is not incorporate this patch so far.

>> -/* Number of heap blocks we can represent in one byte. */
>> -#define HEAPBLOCKS_PER_BYTE 8
>> -
>> Hm, why was this moved to the header? Sounds like something the outside
>> shouldn't care about.
>
> Oh... yeah.  Let's undo that.

Fixed.

>> #define HEAPBLK_TO_MAPBIT(x) (((x) % HEAPBLOCKS_PER_BYTE) * 
>> BITS_PER_HEAPBLOCK)
>>
>> Hm. This isn't really a mapping to an individual bit anymore - but I
>> don't really have a better name in mind. Maybe TO_OFFSET?
>
> Well, it sorta is... but we could change it, I suppose.
>
>> +static const uint8 number_of_ones_for_visible[256] = {
>> ...
>> +};
>> +static const uint8 number_of_ones_for_frozen[256] = {
>> ...
>>  };
>>
>> Did somebody verify the new contents are correct?
>
> I admit that I didn't.  It seemed like an unlikely place for a goof,
> but I guess we should verify.
>> /*
>> - * visibilitymap_clear - clear a bit in visibility map
>> + * visibilitymap_clear - clear all bits in visibility map
>>   *
>>
>> This seems rather easy to misunderstand, as this really only clears all
>> the bits for one page, not actually all the bits.
>
> We could change "in" to "for one page in the".

Fixed.

>>   * the bit for heapBlk, or InvalidBuffer. The caller is responsible for
>> - * releasing *buf after it's done testing and setting bits.
>> + * releasing *buf after it's done testing and setting bits, and must pass 
>> flags
>> + * for which it needs to check the value in visibility map.
>>   *
>>   * NOTE: This function is typically called without a lock on the heap page,
>>   * so somebody else could change the bit just after we look at it.  In fact,
>> @@ -327,17 +351,16 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, 
>> Buffer heapBuf,
>>
>> I'm not seing what flags the above comment change is referring to?
>
> Ugh.  I think that's leftover cruft from an earlier patch version that
> should have been excised from what got committed.

Fixed.

>> /*
>> -* A single-bit read is atomic.  There could be memory-ordering 
>> effects
>> +* A single byte read is atomic.  There could be 

Re: [HACKERS] Reviewing freeze map code

2016-05-31 Thread Robert Haas
On Sun, May 29, 2016 at 1:44 AM, Noah Misch  wrote:
> On Fri, May 06, 2016 at 04:42:48PM -0400, Robert Haas wrote:
>> On Thu, May 5, 2016 at 2:20 PM, Andres Freund  wrote:
>> > On 2016-05-02 14:48:18 -0700, Andres Freund wrote:
>> > +   charnew_vmbuf[BLCKSZ];
>> > +   char   *new_cur = new_vmbuf;
>> > +   boolempty = true;
>> > +   boolold_lastpart;
>> > +
>> > +   /* Copy page header in advance */
>> > +   memcpy(new_vmbuf, , 
>> > SizeOfPageHeaderData);
>> >
>> > Shouldn't we zero out new_vmbuf? Afaics we're not necessarily zeroing it
>> > with old_lastpart && !empty, right?
>>
>> Oh, dear.  That seems like a possible data corruption bug.  Maybe we'd
>> better fix that right away (although I don't actually have time before
>> the wrap).
>
> [This is a generic notification.]
>
> The above-described topic is currently a PostgreSQL 9.6 open item.  Robert,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> 9.6 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within 72 hours of this
> message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
> efforts toward speedy resolution.  Thanks.

I am going to try to find time to look at this later this week, but
realistically it's going to be a little bit difficult to find that
time.  I was away over Memorial Day weekend and was in meetings most
of today.  I have a huge pile of email to catch up on.  I will send
another status update no later than Friday.  If Andres or anyone else
wants to jump in and fix this up meanwhile, that would be great.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-30 Thread Michael Paquier
On Tue, May 31, 2016 at 4:40 AM, Jeff Janes  wrote:
> On Wed, May 18, 2016 at 3:57 PM, Alvaro Herrera
>  wrote:
>> Andres Freund wrote:
>>
>>>
>>> If we had a checking module for all this it'd possibly be sufficient,
>>> but we don't.
>>
>> Here's an idea.  We need core-blessed extensions (src/extensions/, you
>> know I've proposed this before), so why not take this opportunity to
>> create our first such and make it carry a function to scan a table
>> completely to do this task.
>>
>> Since we were considering a new VACUUM option, surely this is serious
>> enough to warrant more than just contrib.
>
> What does "core-blessed" mean?  The commit rights for contrib/ are the
> same as they are for src/

Personally I understand contrib/ modules as third-part plugins that
are considered as not enough mature to be part of src/backend or
src/bin, but one day they could become so. See pg_upgrade's recent
move for example. src/extensions/ includes third-part plugins that are
thought as useful, are part of the main server package, but are not
something that we want to enable by default.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-30 Thread Jeff Janes
On Wed, May 18, 2016 at 3:57 PM, Alvaro Herrera
 wrote:
> Andres Freund wrote:
>
>>
>> If we had a checking module for all this it'd possibly be sufficient,
>> but we don't.
>
> Here's an idea.  We need core-blessed extensions (src/extensions/, you
> know I've proposed this before), so why not take this opportunity to
> create our first such and make it carry a function to scan a table
> completely to do this task.
>
> Since we were considering a new VACUUM option, surely this is serious
> enough to warrant more than just contrib.

What does "core-blessed" mean?  The commit rights for contrib/ are the
same as they are for src/

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-29 Thread Masahiko Sawada
On Sun, May 29, 2016 at 2:44 PM, Noah Misch  wrote:
> On Fri, May 06, 2016 at 04:42:48PM -0400, Robert Haas wrote:
>> On Thu, May 5, 2016 at 2:20 PM, Andres Freund  wrote:
>> > On 2016-05-02 14:48:18 -0700, Andres Freund wrote:
>> > +   charnew_vmbuf[BLCKSZ];
>> > +   char   *new_cur = new_vmbuf;
>> > +   boolempty = true;
>> > +   boolold_lastpart;
>> > +
>> > +   /* Copy page header in advance */
>> > +   memcpy(new_vmbuf, , 
>> > SizeOfPageHeaderData);
>> >
>> > Shouldn't we zero out new_vmbuf? Afaics we're not necessarily zeroing it
>> > with old_lastpart && !empty, right?
>>
>> Oh, dear.  That seems like a possible data corruption bug.  Maybe we'd
>> better fix that right away (although I don't actually have time before
>> the wrap).
>
> [This is a generic notification.]
>
> The above-described topic is currently a PostgreSQL 9.6 open item.  Robert,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> 9.6 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within 72 hours of this
> message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
> efforts toward speedy resolution.  Thanks.
>
> [1] 
> http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com

Thank you for notification.

Regarding check tool for visibility map is still under the discussion.
I'm going to address other review comments, and send the patch ASAP.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-28 Thread Noah Misch
On Fri, May 06, 2016 at 04:42:48PM -0400, Robert Haas wrote:
> On Thu, May 5, 2016 at 2:20 PM, Andres Freund  wrote:
> > On 2016-05-02 14:48:18 -0700, Andres Freund wrote:
> > +   charnew_vmbuf[BLCKSZ];
> > +   char   *new_cur = new_vmbuf;
> > +   boolempty = true;
> > +   boolold_lastpart;
> > +
> > +   /* Copy page header in advance */
> > +   memcpy(new_vmbuf, , 
> > SizeOfPageHeaderData);
> >
> > Shouldn't we zero out new_vmbuf? Afaics we're not necessarily zeroing it
> > with old_lastpart && !empty, right?
> 
> Oh, dear.  That seems like a possible data corruption bug.  Maybe we'd
> better fix that right away (although I don't actually have time before
> the wrap).

[This is a generic notification.]

The above-described topic is currently a PostgreSQL 9.6 open item.  Robert,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
9.6 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within 72 hours of this
message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
efforts toward speedy resolution.  Thanks.

[1] 
http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-18 Thread Peter Geoghegan
On Wed, May 18, 2016 at 3:57 PM, Alvaro Herrera
 wrote:
> Since we were considering a new VACUUM option, surely this is serious
> enough to warrant more than just contrib.

I would like to see us consider the long-term best place for amcheck's
functionality at the same time. Ideally, verification would be a
somewhat generic operation, with AM-specific code invoked as
appropriate.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-18 Thread Alvaro Herrera
Andres Freund wrote:

> > (AFAIK, "select count(*) from table" would offer a similar amount of
> > sanity checking as a full-table VACUUM scan does, so it's not like
> > we've removed functionality with no near-term replacement.)
> 
> I don't think that'd do anything comparable to
>   /*
>* As of PostgreSQL 9.2, the visibility map bit should never be 
> set if
>* the page-level bit is clear.  However, it's possible that 
> the bit
>* got cleared after we checked it and before we took the buffer
>* content lock, so we must recheck before jumping to the 
> conclusion
>* that something bad has happened.
>*/
>   else if (all_visible_according_to_vm && !PageIsAllVisible(page)
>&& VM_ALL_VISIBLE(onerel, blkno, ))
>   {
>   elog(WARNING, "page is not marked all-visible but 
> visibility map bit is set in relation \"%s\" page %u",
>relname, blkno);
>   visibilitymap_clear(onerel, blkno, vmbuffer);
>   }
> 
> If we had a checking module for all this it'd possibly be sufficient,
> but we don't.

Here's an idea.  We need core-blessed extensions (src/extensions/, you
know I've proposed this before), so why not take this opportunity to
create our first such and make it carry a function to scan a table
completely to do this task.

Since we were considering a new VACUUM option, surely this is serious
enough to warrant more than just contrib.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-18 Thread Andres Freund
On 2016-05-18 18:42:16 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2016-05-18 18:25:39 -0400, Tom Lane wrote:
> >> Yes, I've been wondering that too.  VACUUM is not meant as a corruption
> >> checker, and should not be made into one, so what is the point of this
> >> flag exactly?
> 
> > Well, so far a VACUUM FREEZE (or just setting vacuum_freeze_table_age =
> > 0) verified the correctness of the visibility map; and that found a
> > number of bugs.  Now visibilitymap grew additional responsibilities,
> > with a noticeable risk of data eating bugs, and there's no way to verify
> > whether visibilitymap's frozen bits are set correctly.
> 
> Meh.  I'm not sure we should grow a rather half-baked feature we'll never
> be able to remove as a substitute for a separate sanity checker.  The
> latter is really the right place for this kind of thing.

It's not a new feature, it's a feature we removed as a side effect. And
one that allows us to evaluate whether the new feature actually works.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-18 Thread Tom Lane
Andres Freund  writes:
> On 2016-05-18 18:25:39 -0400, Tom Lane wrote:
>> Yes, I've been wondering that too.  VACUUM is not meant as a corruption
>> checker, and should not be made into one, so what is the point of this
>> flag exactly?

> Well, so far a VACUUM FREEZE (or just setting vacuum_freeze_table_age =
> 0) verified the correctness of the visibility map; and that found a
> number of bugs.  Now visibilitymap grew additional responsibilities,
> with a noticeable risk of data eating bugs, and there's no way to verify
> whether visibilitymap's frozen bits are set correctly.

Meh.  I'm not sure we should grow a rather half-baked feature we'll never
be able to remove as a substitute for a separate sanity checker.  The
latter is really the right place for this kind of thing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-18 Thread Andres Freund
On 2016-05-18 18:25:39 -0400, Tom Lane wrote:
> Josh berkus  writes:
> > Maybe this is the wrong perspective.  I mean, is there a reason we even
> > need this option, other than a lack of any other way to do a full table
> > scan to check for corruption, etc.?  If we're only doing this for
> > integrity checking, then maybe it's better if it becomes a function,
> > which could be later extended with additional forensic features?
> 
> Yes, I've been wondering that too.  VACUUM is not meant as a corruption
> checker, and should not be made into one, so what is the point of this
> flag exactly?

Well, so far a VACUUM FREEZE (or just setting vacuum_freeze_table_age =
0) verified the correctness of the visibility map; and that found a
number of bugs.  Now visibilitymap grew additional responsibilities,
with a noticeable risk of data eating bugs, and there's no way to verify
whether visibilitymap's frozen bits are set correctly.


> (AFAIK, "select count(*) from table" would offer a similar amount of
> sanity checking as a full-table VACUUM scan does, so it's not like
> we've removed functionality with no near-term replacement.)

I don't think that'd do anything comparable to
/*
 * As of PostgreSQL 9.2, the visibility map bit should never be 
set if
 * the page-level bit is clear.  However, it's possible that 
the bit
 * got cleared after we checked it and before we took the buffer
 * content lock, so we must recheck before jumping to the 
conclusion
 * that something bad has happened.
 */
else if (all_visible_according_to_vm && !PageIsAllVisible(page)
 && VM_ALL_VISIBLE(onerel, blkno, ))
{
elog(WARNING, "page is not marked all-visible but 
visibility map bit is set in relation \"%s\" page %u",
 relname, blkno);
visibilitymap_clear(onerel, blkno, vmbuffer);
}

If we had a checking module for all this it'd possibly be sufficient,
but we don't.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-18 Thread Tom Lane
Josh berkus  writes:
> Maybe this is the wrong perspective.  I mean, is there a reason we even
> need this option, other than a lack of any other way to do a full table
> scan to check for corruption, etc.?  If we're only doing this for
> integrity checking, then maybe it's better if it becomes a function,
> which could be later extended with additional forensic features?

Yes, I've been wondering that too.  VACUUM is not meant as a corruption
checker, and should not be made into one, so what is the point of this
flag exactly?

(AFAIK, "select count(*) from table" would offer a similar amount of
sanity checking as a full-table VACUUM scan does, so it's not like
we've removed functionality with no near-term replacement.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-18 Thread Josh berkus
On 05/18/2016 03:51 PM, Peter Geoghegan wrote:
> On Wed, May 18, 2016 at 8:52 AM, Jeff Janes  wrote:
>> How about going with something that says more about why we are doing
>> it, rather than trying to describe in one or two words what it is
>> doing?
>>
>> VACUUM (FORENSIC)
>>
>> VACUUM (DEBUG)
>>
>> VACUUM (LINT)
> 
> +1

Maybe this is the wrong perspective.  I mean, is there a reason we even
need this option, other than a lack of any other way to do a full table
scan to check for corruption, etc.?  If we're only doing this for
integrity checking, then maybe it's better if it becomes a function,
which could be later extended with additional forensic features?

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-18 Thread Peter Geoghegan
On Wed, May 18, 2016 at 8:52 AM, Jeff Janes  wrote:
> How about going with something that says more about why we are doing
> it, rather than trying to describe in one or two words what it is
> doing?
>
> VACUUM (FORENSIC)
>
> VACUUM (DEBUG)
>
> VACUUM (LINT)

+1


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-18 Thread Jeff Janes
On Wed, May 18, 2016 at 7:09 AM, Joe Conway  wrote:
> On 05/18/2016 09:55 AM, Victor Yegorov wrote:
>> 2016-05-18 16:45 GMT+03:00 Robert Haas > >:
>>
>> No, that's what the existing FREEZE option does.  This new option is
>> about unnecessarily vacuuming pages that don't need it.  The
>> expectation is that vacuuming all-frozen pages will be a no-op.
>>
>>
>> VACUUM (INCLUDING ALL) ?
>
> VACUUM (FORCE ALL) ?


How about going with something that says more about why we are doing
it, rather than trying to describe in one or two words what it is
doing?

VACUUM (FORENSIC)

VACUUM (DEBUG)

VACUUM (LINT)

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-18 Thread Joe Conway
On 05/18/2016 09:55 AM, Victor Yegorov wrote:
> 2016-05-18 16:45 GMT+03:00 Robert Haas  >:
> 
> No, that's what the existing FREEZE option does.  This new option is
> about unnecessarily vacuuming pages that don't need it.  The
> expectation is that vacuuming all-frozen pages will be a no-op.
> 
> 
> VACUUM (INCLUDING ALL) ?

VACUUM (FORCE ALL) ?

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Reviewing freeze map code

2016-05-18 Thread Victor Yegorov
2016-05-18 16:45 GMT+03:00 Robert Haas :

> No, that's what the existing FREEZE option does.  This new option is
> about unnecessarily vacuuming pages that don't need it.  The
> expectation is that vacuuming all-frozen pages will be a no-op.
>

VACUUM (INCLUDING ALL) ?


-- 
Victor Y. Yegorov


Re: [HACKERS] Reviewing freeze map code

2016-05-18 Thread Robert Haas
On Wed, May 18, 2016 at 9:42 AM, Joshua D. Drake  wrote:
>> It's not a bad thought, but I do think it might be a bit confusing.
>> My main priority for this new option is that people aren't tempted to
>> use it very often, and I think a name like "even_frozen_pages" is more
>> likely to accomplish that than just "frozen".
>
> freeze_all_pages?

No, that's what the existing FREEZE option does.  This new option is
about unnecessarily vacuuming pages that don't need it.  The
expectation is that vacuuming all-frozen pages will be a no-op.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-18 Thread Joshua D. Drake

On 05/18/2016 05:51 AM, Robert Haas wrote:

On Wed, May 18, 2016 at 8:41 AM, David Steele  wrote:

I think we should give this a name that hints more strongly at this
being an exceptional thing, like vacuum (even_frozen_pages).


How about just FROZEN?  Perhaps it's too confusing to have that and FREEZE,
but I thought I would throw it out there.


It's not a bad thought, but I do think it might be a bit confusing.
My main priority for this new option is that people aren't tempted to
use it very often, and I think a name like "even_frozen_pages" is more
likely to accomplish that than just "frozen".



freeze_all_pages?

JD

--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-18 Thread Robert Haas
On Wed, May 18, 2016 at 8:41 AM, David Steele  wrote:
>> I think we should give this a name that hints more strongly at this
>> being an exceptional thing, like vacuum (even_frozen_pages).
>
> How about just FROZEN?  Perhaps it's too confusing to have that and FREEZE,
> but I thought I would throw it out there.

It's not a bad thought, but I do think it might be a bit confusing.
My main priority for this new option is that people aren't tempted to
use it very often, and I think a name like "even_frozen_pages" is more
likely to accomplish that than just "frozen".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-18 Thread David Steele

On 5/18/16 6:37 AM, Robert Haas wrote:

On Tue, May 17, 2016 at 5:47 PM, Gavin Flower
 wrote:

On 18/05/16 09:34, Vik Fearing wrote:

On 17/05/16 21:32, Alvaro Herrera wrote:


Is SCAN_ALL really the best we can do here?  The business of having an
underscore in an option name has no precedent (other than
CURRENT_DATABASE and the like).


ALTER DATABASE has options for ALLOW_CONNECTIONS, CONNECTION_LIMIT, and
IS_TEMPLATE.


How about COMPLETE, TOTAL, or WHOLE?


Sure, I'll play this game.  I like EXHAUSTIVE.


I prefer 'WHOLE', as it seems more obvious (and not because of the pun
relating to 'wholesomeness'!!!)


I think that users might believe that they need VACUUM (WHOLE) a lot
more often than they will actually need this option.  "Of course I
want to vacuum my whole table!"

I think we should give this a name that hints more strongly at this
being an exceptional thing, like vacuum (even_frozen_pages).


How about just FROZEN?  Perhaps it's too confusing to have that and 
FREEZE, but I thought I would throw it out there.


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-18 Thread Robert Haas
On Tue, May 17, 2016 at 5:47 PM, Gavin Flower
 wrote:
> On 18/05/16 09:34, Vik Fearing wrote:
>> On 17/05/16 21:32, Alvaro Herrera wrote:
>>>
>>> Is SCAN_ALL really the best we can do here?  The business of having an
>>> underscore in an option name has no precedent (other than
>>> CURRENT_DATABASE and the like).
>>
>> ALTER DATABASE has options for ALLOW_CONNECTIONS, CONNECTION_LIMIT, and
>> IS_TEMPLATE.
>>
>>> How about COMPLETE, TOTAL, or WHOLE?
>>
>> Sure, I'll play this game.  I like EXHAUSTIVE.
>
> I prefer 'WHOLE', as it seems more obvious (and not because of the pun
> relating to 'wholesomeness'!!!)

I think that users might believe that they need VACUUM (WHOLE) a lot
more often than they will actually need this option.  "Of course I
want to vacuum my whole table!"

I think we should give this a name that hints more strongly at this
being an exceptional thing, like vacuum (even_frozen_pages).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-05-17 Thread Gavin Flower

On 18/05/16 09:34, Vik Fearing wrote:

On 17/05/16 21:32, Alvaro Herrera wrote:

Is SCAN_ALL really the best we can do here?  The business of having an
underscore in an option name has no precedent (other than
CURRENT_DATABASE and the like).

ALTER DATABASE has options for ALLOW_CONNECTIONS, CONNECTION_LIMIT, and
IS_TEMPLATE.


How about COMPLETE, TOTAL, or WHOLE?

Sure, I'll play this game.  I like EXHAUSTIVE.


I prefer 'WHOLE', as it seems more obvious (and not because of the pun 
relating to 'wholesomeness'!!!)




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


<    1   2   3   >