Re: [HACKERS] preserving forensic information when we freeze

2014-01-02 Thread Andres Freund
Hi,

On 2014-01-02 05:26:26 +, Greg Stark wrote:
 2) refetching a row could conceivably end up retrieving different data than
 was present when the row was originally read. (In some cases that might
 actually be the intended behaviour)

That's possible with system columns as well. In the normal cases we'll
only have copied the HeapTuple, not the HeapTupleHeader, so it will be
re-fetched from the (pinned) buffer.

 If this came up earlier I'm sorry but I suppose it's too hard to have a
 function like foo(tab.*) which magically can tell that the record is a heap
 tuple and look in the header? And presumably throw an error if passed a non
 heap tuple.

I don't see how that could be a good API. What happens if you have two
relations in a query?
Even if that wouldn't be a query, why would this be a helpful? Seems
like a poor reinvention of system columns.

Andres

PS: Could you not always include the full quoted message below your --
signature?

Greetings,

Andres Freund

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2014-01-02 Thread Robert Haas
On Thu, Jan 2, 2014 at 12:26 AM, Greg Stark st...@mit.edu wrote:
 I fail to see why.  What is so ugly about this:

 select x.* from pgbench_accounts a, pg_tuple_header(a.tableoid, a.ctid) x;

 Two points:

 1) it's a bit weird to go to this effort to eliminate system columns by
 using a scheme that depends on having a system column -- ctid

 2) refetching a row could conceivably end up retrieving different data than
 was present when the row was originally read. (In some cases that might
 actually be the intended behaviour)

 If this came up earlier I'm sorry but I suppose it's too hard to have a
 function like foo(tab.*) which magically can tell that the record is a heap
 tuple and look in the header? And presumably throw an error if passed a non
 heap tuple.

Yeah, that was tried.  Doesn't work:

http://www.postgresql.org/message-id/ca+tgmozwzzw_wr8adhwc8iejweihxlpxranqyrfb8mw0sch...@mail.gmail.com

At any rate, my goal isn't really to get rid of system columns, but to
address Jim Nasby's gripe that the changes thus far committed make it
difficult to use system columns to determine whether a given tuple has
been frozen.  It sounds like the consensus is that a system column is
a better way of doing that than a function, so I suppose the next step
is to try to hammer out the details.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2014-01-02 Thread Andres Freund
On 2014-01-02 07:35:11 -0500, Robert Haas wrote:
 On Thu, Jan 2, 2014 at 12:26 AM, Greg Stark st...@mit.edu wrote:
  I fail to see why.  What is so ugly about this:
 
  select x.* from pgbench_accounts a, pg_tuple_header(a.tableoid, a.ctid) x;
 
  Two points:
 
  1) it's a bit weird to go to this effort to eliminate system columns by
  using a scheme that depends on having a system column -- ctid
 
  2) refetching a row could conceivably end up retrieving different data than
  was present when the row was originally read. (In some cases that might
  actually be the intended behaviour)
 
  If this came up earlier I'm sorry but I suppose it's too hard to have a
  function like foo(tab.*) which magically can tell that the record is a heap
  tuple and look in the header? And presumably throw an error if passed a non
  heap tuple.
 
 Yeah, that was tried.  Doesn't work:
 
 http://www.postgresql.org/message-id/ca+tgmozwzzw_wr8adhwc8iejweihxlpxranqyrfb8mw0sch...@mail.gmail.com
 
 At any rate, my goal isn't really to get rid of system columns, but to
 address Jim Nasby's gripe that the changes thus far committed make it
 difficult to use system columns to determine whether a given tuple has
 been frozen.  It sounds like the consensus is that a system column is
 a better way of doing that than a function, so I suppose the next step
 is to try to hammer out the details.

So, I think something like my POC patch would need to get in before we
can go there, right? I won't be able to work on it in the next 10days or
so, there's a bunch of stuff that's more pressing unfortunately. There's
a fair bit of work left there...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2014-01-02 Thread Robert Haas
On Thu, Jan 2, 2014 at 7:40 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-01-02 07:35:11 -0500, Robert Haas wrote:
 On Thu, Jan 2, 2014 at 12:26 AM, Greg Stark st...@mit.edu wrote:
  I fail to see why.  What is so ugly about this:
 
  select x.* from pgbench_accounts a, pg_tuple_header(a.tableoid, a.ctid) x;
 
  Two points:
 
  1) it's a bit weird to go to this effort to eliminate system columns by
  using a scheme that depends on having a system column -- ctid
 
  2) refetching a row could conceivably end up retrieving different data than
  was present when the row was originally read. (In some cases that might
  actually be the intended behaviour)
 
  If this came up earlier I'm sorry but I suppose it's too hard to have a
  function like foo(tab.*) which magically can tell that the record is a heap
  tuple and look in the header? And presumably throw an error if passed a non
  heap tuple.

 Yeah, that was tried.  Doesn't work:

 http://www.postgresql.org/message-id/ca+tgmozwzzw_wr8adhwc8iejweihxlpxranqyrfb8mw0sch...@mail.gmail.com

 At any rate, my goal isn't really to get rid of system columns, but to
 address Jim Nasby's gripe that the changes thus far committed make it
 difficult to use system columns to determine whether a given tuple has
 been frozen.  It sounds like the consensus is that a system column is
 a better way of doing that than a function, so I suppose the next step
 is to try to hammer out the details.

 So, I think something like my POC patch would need to get in before we
 can go there, right? I won't be able to work on it in the next 10days or
 so, there's a bunch of stuff that's more pressing unfortunately. There's
 a fair bit of work left there...

Well, that's the question, I guess.  I think the options are:

1. Add pg_infomask now.  Leave your patch for a future optimization.
Suck up the catalog bloat.
2. Wait for your patch to be done.  Then add pg_infomask afterwards.
Accept that it may not make 9.4, or else accept that we may have to
accept that patch after the nominal deadline.
3. Decide that exposing the frozen status of tuples is not a priority
and just don't worry about it.

One concern I have is that if we spend too much time now worrying
about these system column problems, it may reduce the amount of
optimization that gets done on top of this optimization in the 9.4
time frame.  And that, I think, would be sad.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2014-01-02 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Jan 2, 2014 at 12:26 AM, Greg Stark st...@mit.edu wrote:
 1) it's a bit weird to go to this effort to eliminate system columns by
 using a scheme that depends on having a system column -- ctid

 At any rate, my goal isn't really to get rid of system columns, but to
 address Jim Nasby's gripe that the changes thus far committed make it
 difficult to use system columns to determine whether a given tuple has
 been frozen.  It sounds like the consensus is that a system column is
 a better way of doing that than a function, so I suppose the next step
 is to try to hammer out the details.

Actually, I thought the function approach was a good proposal.  You are
right that func(tab.*) isn't going to work, because it's going to get a
Datum-ified tuple not a pointer to raw on-disk storage.  But an inspection
function that's handed a ctid could work.

I don't think that Greg's objection above has much force.  The fact of the
matter is that we are never going to be able to get rid of the exposed
tableoid, oid, or ctid columns, because there are too many situations
where those are useful, and are being used, in actual client-application
queries.  This is less true however of xmin/xmax/cmin/cmax; we might hope
to deprecate those at the SQL level someday, especially in view of the
fact that we keep whacking around their detailed meanings, with few user
complaints.

And I really really *don't* want to see us bloating pg_attribute, nor
infringing further on application column namespace, by adding even more
exposed columns.  So -1 for just blindly doing that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2014-01-02 Thread Andres Freund
On 2014-01-02 09:40:54 -0500, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  On Thu, Jan 2, 2014 at 12:26 AM, Greg Stark st...@mit.edu wrote:
  1) it's a bit weird to go to this effort to eliminate system columns by
  using a scheme that depends on having a system column -- ctid
 
  At any rate, my goal isn't really to get rid of system columns, but to
  address Jim Nasby's gripe that the changes thus far committed make it
  difficult to use system columns to determine whether a given tuple has
  been frozen.  It sounds like the consensus is that a system column is
  a better way of doing that than a function, so I suppose the next step
  is to try to hammer out the details.
 
 Actually, I thought the function approach was a good proposal.  You are
 right that func(tab.*) isn't going to work, because it's going to get a
 Datum-ified tuple not a pointer to raw on-disk storage.  But an inspection
 function that's handed a ctid could work.

Well, we discussed that upthread, and the overhead of going through a
function is quite noticeable because the tuple needs to be fetched from
the heap again.
Check 
http://archives.postgresql.org/message-id/CA%2BTgmoYdWOZa_UiewbqE73AQCM2etpMjukkcmYPkGatH8kPTow%40mail.gmail.com

To get rid of part of the performance problem, we could possibly invent
a way to pass a HeapTuple instead of a blessed HeapTupleHeader to
functions, but that might not be so easy.

 And I really really *don't* want to see us bloating pg_attribute, nor
 infringing further on application column namespace, by adding even more
 exposed columns.  So -1 for just blindly doing that.

Upthread there's a POC patch of mine, that started to explore what's
necessary to simply never store system columns (except maybe oid) in
pg_attribute. While it passes the regression tests it's not complete,
but the amount of work looks reasonable. Check
http://archives.postgresql.org/message-id/20131223124504.GB19795%40alap2.anarazel.de

Now, that obviously doesn't get rid of the namespace problem. I'd
suggested a system: prefix, Robert _pg_. Neither is pretty, but imo
should work.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2014-01-02 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-01-02 09:40:54 -0500, Tom Lane wrote:
 Actually, I thought the function approach was a good proposal.  You are
 right that func(tab.*) isn't going to work, because it's going to get a
 Datum-ified tuple not a pointer to raw on-disk storage.  But an inspection
 function that's handed a ctid could work.

 Well, we discussed that upthread, and the overhead of going through a
 function is quite noticeable because the tuple needs to be fetched from
 the heap again.

Yeah, I read those results, but that seems like it could probably be
optimized.  I'm guessing the function was doing a new heap_open every
time?  There's probably a way around that.

In any case, upon further reflection I'm not convinced that doing this
with a SELECT-based query is the right thing, no matter whether the query
looks at a function or a system column; because by definition, you'll only
be able to see tuples that are visible to your current snapshot.  For real
forensics work, you need to be able to see all tuples, which makes me
think that something akin to pgstattuple is the right API; that is return
a set of the header info for all tuples on such-and-such pages of this
relation.  That should dodge any performance problem, because the
heap_open overhead could be amortized across lots of tuples, and it also
sidesteps all problems with adding new system columns.

 Upthread there's a POC patch of mine, that started to explore what's
 necessary to simply never store system columns (except maybe oid) in
 pg_attribute. While it passes the regression tests it's not complete,
 but the amount of work looks reasonable.

I think this will inevitably break a lot of code, not all of it ours,
so I'm not in favor of pursuing that direction.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2014-01-02 Thread Robert Haas
On Thu, Jan 2, 2014 at 12:46 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Andres Freund and...@2ndquadrant.com writes:
 On 2014-01-02 09:40:54 -0500, Tom Lane wrote:
 Actually, I thought the function approach was a good proposal.  You are
 right that func(tab.*) isn't going to work, because it's going to get a
 Datum-ified tuple not a pointer to raw on-disk storage.  But an inspection
 function that's handed a ctid could work.

 Well, we discussed that upthread, and the overhead of going through a
 function is quite noticeable because the tuple needs to be fetched from
 the heap again.

 Yeah, I read those results, but that seems like it could probably be
 optimized.  I'm guessing the function was doing a new heap_open every
 time?  There's probably a way around that.

Yeah, it was.  I don't see any plausible way around that, but I'm all ears.

 In any case, upon further reflection I'm not convinced that doing this
 with a SELECT-based query is the right thing, no matter whether the query
 looks at a function or a system column; because by definition, you'll only
 be able to see tuples that are visible to your current snapshot.  For real
 forensics work, you need to be able to see all tuples, which makes me
 think that something akin to pgstattuple is the right API; that is return
 a set of the header info for all tuples on such-and-such pages of this
 relation.  That should dodge any performance problem, because the
 heap_open overhead could be amortized across lots of tuples, and it also
 sidesteps all problems with adding new system columns.

I both agree and disagree with this.  I think that pgstattuple is
useful, and I further agree that adding a stat to it about how much of
the heap is frozen would be worthwhile.  However, an aggregate number
isn't always what you want, and being able to scrutinize specific
tuples is, I think, a useful thing.  I'm not sure that it needs to be
fast, which is why I think a function rather than a system column
might be OK, but I think it ought to be possible.

 Upthread there's a POC patch of mine, that started to explore what's
 necessary to simply never store system columns (except maybe oid) in
 pg_attribute. While it passes the regression tests it's not complete,
 but the amount of work looks reasonable.

 I think this will inevitably break a lot of code, not all of it ours,
 so I'm not in favor of pursuing that direction.

I thought that that approach had been discussed previously and found
desirable on the grounds that it would cut down on the size of
pg_attribute, but it's not something I want to rush through when we
have a release to get out the door.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2014-01-02 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Jan 2, 2014 at 12:46 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 In any case, upon further reflection I'm not convinced that doing this
 with a SELECT-based query is the right thing, no matter whether the query
 looks at a function or a system column; because by definition, you'll only
 be able to see tuples that are visible to your current snapshot.  For real
 forensics work, you need to be able to see all tuples, which makes me
 think that something akin to pgstattuple is the right API; that is return
 a set of the header info for all tuples on such-and-such pages of this
 relation.  That should dodge any performance problem, because the
 heap_open overhead could be amortized across lots of tuples, and it also
 sidesteps all problems with adding new system columns.

 I both agree and disagree with this.  I think that pgstattuple is
 useful, and I further agree that adding a stat to it about how much of
 the heap is frozen would be worthwhile.  However, an aggregate number
 isn't always what you want, and being able to scrutinize specific
 tuples is, I think, a useful thing.

Oh, I guess I referenced the wrong contrib module, because what I was
trying to propose is a function that returns a setof record, one row for
each physical tuple it finds.  There are some functions in
contrib/pageinspect that work like that, but not pgstattuple.  I don't
think pageinspect's API is ideal because it's tightly tied to processing
one page at a time, but it does show information a bit like what we need
here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2014-01-02 Thread Andres Freund
On 2014-01-02 12:46:34 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-01-02 09:40:54 -0500, Tom Lane wrote:
  Actually, I thought the function approach was a good proposal.  You are
  right that func(tab.*) isn't going to work, because it's going to get a
  Datum-ified tuple not a pointer to raw on-disk storage.  But an inspection
  function that's handed a ctid could work.
 
  Well, we discussed that upthread, and the overhead of going through a
  function is quite noticeable because the tuple needs to be fetched from
  the heap again.
 
 Yeah, I read those results, but that seems like it could probably be
 optimized.  I'm guessing the function was doing a new heap_open every
 time?  There's probably a way around that.

How? Everything I can think of is either fragile or too ugly.

 In any case, upon further reflection I'm not convinced that doing this
 with a SELECT-based query is the right thing, no matter whether the query
 looks at a function or a system column; because by definition, you'll only
 be able to see tuples that are visible to your current snapshot.

I think it really depends - quite often you really want to just
investigate tuples you actually can see, because you can easily write
normal queries and such. It sure wouldn't replace pageinspect.

  For real
 forensics work, you need to be able to see all tuples, which makes me
 think that something akin to pgstattuple is the right API; that is return
 a set of the header info for all tuples on such-and-such pages of this
 relation.  That should dodge any performance problem, because the
 heap_open overhead could be amortized across lots of tuples, and it also
 sidesteps all problems with adding new system columns.

The biggest problem with such an API is that it's painful to use - I've
used pageinspect a fair bit, and not being able to easily get the
content of the rows you're looking at makes it really far less useful in
many scenarios. That could partially be improved by a neater API
(returning t_self would help greatly, accepting a page without a wrapper
function as well), but it still pretty fundamentally sucks.

And I really don't see any page-at-a-time access that's going to be
convenient. A big step would be returning the tuple's contents in a
normal fashion, but that's not easy without big notational overhead.

  Upthread there's a POC patch of mine, that started to explore what's
  necessary to simply never store system columns (except maybe oid) in
  pg_attribute. While it passes the regression tests it's not complete,
  but the amount of work looks reasonable.
 
 I think this will inevitably break a lot of code, not all of it ours,
 so I'm not in favor of pursuing that direction.

Are you thinking of client or extension code? From what I've looked at I
don't think it's all that likely too break much of either. Extension
code isn't likely to do its own catalog lookups and thus doesn't really
care. Most client code isn't interested in system catalog attribute
numbers - the few examples of client code looking at pg_attribute I
remember all explicitly excluded attnum  0.

It would make pg_attribute a fair bit smaller, especially on systems
with lots of narrow relations. There's also the possibility of only
going that route for new system catalog attributes, and leave the old
ones where they are and removed them in a release or two.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2014-01-02 Thread Robert Haas
On Thu, Jan 2, 2014 at 2:03 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I both agree and disagree with this.  I think that pgstattuple is
 useful, and I further agree that adding a stat to it about how much of
 the heap is frozen would be worthwhile.  However, an aggregate number
 isn't always what you want, and being able to scrutinize specific
 tuples is, I think, a useful thing.

 Oh, I guess I referenced the wrong contrib module, because what I was
 trying to propose is a function that returns a setof record, one row for
 each physical tuple it finds.  There are some functions in
 contrib/pageinspect that work like that, but not pgstattuple.  I don't
 think pageinspect's API is ideal because it's tightly tied to processing
 one page at a time, but it does show information a bit like what we need
 here.

Sure, Álvaro mentioned the same thing upthread.  Also, if you notice,
the function I was proposing to add does basically the same thing as
pageinsect, except one tuple at a time rather than one page at a time.
 I think both are useful, though.  pageinspect is superuser-only, and
needing work by pages isn't always convenient.  I thought about making
the pg_tuple_header() function I proposed scan using SnapshotAny
rather than the active snapshot, but then I think it'd need to be
superuser-only.  I also thought about making it use SnapshotAny for
superusers and the active snapshot for everybody else, but that seemed
like it could be confusing.

We could certainly add a function that returns SETOF record, taking
e.g. regclass as an argument, but it doesn't seem a stretch to me to
think that you might want to get tuple header information for some but
not all tuples in the relation, and I don't see any real good way to
tell the function exactly what tuples you want except by invoking it
once per TID.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2014-01-02 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-01-02 12:46:34 -0500, Tom Lane wrote:
 For real
 forensics work, you need to be able to see all tuples, which makes me
 think that something akin to pgstattuple is the right API; that is return
 a set of the header info for all tuples on such-and-such pages of this
 relation.  That should dodge any performance problem, because the
 heap_open overhead could be amortized across lots of tuples, and it also
 sidesteps all problems with adding new system columns.

 The biggest problem with such an API is that it's painful to use - I've
 used pageinspect a fair bit, and not being able to easily get the
 content of the rows you're looking at makes it really far less useful in
 many scenarios. That could partially be improved by a neater API

Surely.  Why couldn't you join against the table on ctid?

 And I really don't see any page-at-a-time access that's going to be
 convenient.

As I commented to Robert, the page-at-a-time behavior of pageinspect
is not an API detail we'd want to copy for this.  I envision something
like

   select hdr.*, foo.*
 from tuple_header_details('foo'::regclass) as hdr
  left join foo on hdr.ctid = foo.ctid;

On a large table you might want a version that restricts its scan
to pages M through N, but that's just optimization.  More useful
would be to improve the planner's intelligence about joins on ctid ...

 [ removing system columns from pg_attribute ]]
 I think this will inevitably break a lot of code, not all of it ours,
 so I'm not in favor of pursuing that direction.

 Are you thinking of client or extension code? From what I've looked at I
 don't think it's all that likely too break much of either.

It will break anything that assumes that every column is represented in
pg_attribute.  I think if you think this assumption is easily removed,
you've not looked hard enough.

 It would make pg_attribute a fair bit smaller, especially on systems
 with lots of narrow relations.

I'd like to do that too, but I think getting rid of xmin/xmax/cmin/cmax
would be enough to get most of the benefit, and we could do that without
any inconsistency if we stopped exposing those as system columns.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2014-01-02 Thread Robert Haas
On Thu, Jan 2, 2014 at 2:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 As I commented to Robert, the page-at-a-time behavior of pageinspect
 is not an API detail we'd want to copy for this.  I envision something
 like

select hdr.*, foo.*
  from tuple_header_details('foo'::regclass) as hdr
   left join foo on hdr.ctid = foo.ctid;

 On a large table you might want a version that restricts its scan
 to pages M through N, but that's just optimization.  More useful
 would be to improve the planner's intelligence about joins on ctid ...

/me blinks.

Surely you're not serious.  That's going to scan the whole darn table
even if we only want the details for one row.  And making the planner
smarter about joins on CTID doesn't help a bit, unless you can push
the join qual down *inside the tuple_header_details() function call*.
That'd be pretty a pretty sweet thing to allow for SRFs in general,
but it doesn't sound like something we're going to conjure up at the
drop of a hat.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2014-01-02 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 We could certainly add a function that returns SETOF record, taking
 e.g. regclass as an argument, but it doesn't seem a stretch to me to
 think that you might want to get tuple header information for some but
 not all tuples in the relation, and I don't see any real good way to
 tell the function exactly what tuples you want except by invoking it
 once per TID.

I have no objection to having a function that retrieves the details for
a given TID alongside one that does it for a whole relation.  The point
here is just that we should be headed in the direction of removing as
many system columns as we can, not adding more; especially not ones that
(a) have no purpose except forensics and (b) are virtually certain to
change across system versions.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2014-01-02 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Jan 2, 2014 at 2:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 As I commented to Robert, the page-at-a-time behavior of pageinspect
 is not an API detail we'd want to copy for this.  I envision something
 like
 
 select hdr.*, foo.*
 from tuple_header_details('foo'::regclass) as hdr
 left join foo on hdr.ctid = foo.ctid;
 
 On a large table you might want a version that restricts its scan
 to pages M through N, but that's just optimization.  More useful
 would be to improve the planner's intelligence about joins on ctid ...

 /me blinks.

 Surely you're not serious.  That's going to scan the whole darn table
 even if we only want the details for one row.

And?  The complaint about the other function was that it was inefficient
when getting the details for a whole table, so I don't think you can
complain about an approach that handles that case well.  Use the other
function if you just want info for one row (that you can see).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2014-01-02 Thread Robert Haas
On Thu, Jan 2, 2014 at 2:56 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Jan 2, 2014 at 2:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 As I commented to Robert, the page-at-a-time behavior of pageinspect
 is not an API detail we'd want to copy for this.  I envision something
 like

 select hdr.*, foo.*
 from tuple_header_details('foo'::regclass) as hdr
 left join foo on hdr.ctid = foo.ctid;

 On a large table you might want a version that restricts its scan
 to pages M through N, but that's just optimization.  More useful
 would be to improve the planner's intelligence about joins on ctid ...

 /me blinks.

 Surely you're not serious.  That's going to scan the whole darn table
 even if we only want the details for one row.

 And?  The complaint about the other function was that it was inefficient
 when getting the details for a whole table, so I don't think you can
 complain about an approach that handles that case well.  Use the other
 function if you just want info for one row (that you can see).

Well, that's fair enough.  I don't mind having two functions.  Should
the whole-table function also include invisible tuples?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2014-01-02 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Well, that's fair enough.  I don't mind having two functions.  Should
 the whole-table function also include invisible tuples?

Certainly, that's exactly why I was proposing it.  You can do a join
if you want to suppress them.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2014-01-02 Thread Andres Freund
On 2014-01-02 14:44:34 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-01-02 12:46:34 -0500, Tom Lane wrote:
  For real
  forensics work, you need to be able to see all tuples, which makes me
  think that something akin to pgstattuple is the right API; that is return
  a set of the header info for all tuples on such-and-such pages of this
  relation.  That should dodge any performance problem, because the
  heap_open overhead could be amortized across lots of tuples, and it also
  sidesteps all problems with adding new system columns.
 
  The biggest problem with such an API is that it's painful to use - I've
  used pageinspect a fair bit, and not being able to easily get the
  content of the rows you're looking at makes it really far less useful in
  many scenarios. That could partially be improved by a neater API
 
 Surely.  Why couldn't you join against the table on ctid?

For the case of pageinspect it's because pageinspect doesn't return the
ctid of a tuple in a useful way - its t_ctid is HeapTupleHeader-t_ctid,
not HeapTuple-t_self...
In many cases bulk access really isn't all that useful - you do a SELECT
searching for data that's looking strange and then need the forensic
data for those. That's just painful with any of the proposed fast APIs
afaics.

  And I really don't see any page-at-a-time access that's going to be
  convenient.
 
 As I commented to Robert, the page-at-a-time behavior of pageinspect
 is not an API detail we'd want to copy for this.  I envision something
 like
 
select hdr.*, foo.*
  from tuple_header_details('foo'::regclass) as hdr
   left join foo on hdr.ctid = foo.ctid;
 
 On a large table you might want a version that restricts its scan
 to pages M through N, but that's just optimization.  More useful
 would be to improve the planner's intelligence about joins on ctid ...

That really makes for but ugly queries. E.g. the database I found the
multixact bugs on was ~300GB and I had to look about 80GB of it. So I
would have had to write chunking code for individual tables. Not what
you want to do when shit has hit the fan.

  [ removing system columns from pg_attribute ]]
  I think this will inevitably break a lot of code, not all of it ours,
  so I'm not in favor of pursuing that direction.
 
  Are you thinking of client or extension code? From what I've looked at I
  don't think it's all that likely too break much of either.
 
 It will break anything that assumes that every column is represented in
 pg_attribute.  I think if you think this assumption is easily removed,
 you've not looked hard enough.

Uh. And how much code actually is that? Note that system columns already
aren't in a Relation's TupleDesc. So it's not they would be missing from
that - and if that were the issue we could easily add them there when
the cache entry is built.

There really isn't much code in postgres itself that iterates over all
columns including system columns. Some bits around heap.c, tablecmd.c,
lsyscache.c do lookups by name, but they are easily converted to
SystemAttributeByName()/SystemAttributeDefinition().

Most non DDL code doesn't care at all.

  It would make pg_attribute a fair bit smaller, especially on systems
  with lots of narrow relations.
 
 I'd like to do that too, but I think getting rid of xmin/xmax/cmin/cmax
 would be enough to get most of the benefit, and we could do that without
 any inconsistency if we stopped exposing those as system columns.

Well, I have yet to see any realistic proposal to get rid of
them. Having to write significantly more complex and/or significantly more
expensive queries doesn't qualify.

And there really is code out there using xmin/xmax as part of their
code, so I think the rumor of nobody crying about their near death is
just that, a rumor.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2014-01-01 Thread Robert Haas
On Fri, Dec 27, 2013 at 9:24 PM, Stephen Frost sfr...@snowman.net wrote:
 I'm not sure what you mean by doesn't work, because it clearly does
 work.  I've already posted a patch.  You may find it ugly, but that's
 not the same as not working.

 I meant *practically*, it doesn't work.  By which, I mean, it sucks as
 a solution. :)

I fail to see why.  What is so ugly about this:

select x.* from pgbench_accounts a, pg_tuple_header(a.tableoid, a.ctid) x;

It may be true that that query wouldn't have worked before Tom added
LATERAL, but he did add LATERAL and we have it now and nobody's going
to care about this function on versions that haven't got LATERAL,
because it won't exist there anyway.  The whole point of adding
features like LATERAL (which doesn't even appear in that query,
interestingly enough) is that it was really annoying to use functions
like this before we had it.  But now we *do* have it, so the fact a
function like this would have been annoying to use in older releases
isn't a reason not to add it now.

I will admit that performance may be a reason.  After pgbench -i:

rhaas=# explain analyze select xmax from pgbench_accounts;
   QUERY PLAN

 Seq Scan on pgbench_accounts  (cost=0.00..2640.00 rows=10
width=4) (actual time=0.009..14.950 rows=10 loops=1)
 Total runtime: 20.354 ms
(2 rows)

rhaas=# explain analyze select (pg_tuple_header(tableoid,ctid)).xmax
from pgbench_accounts;
QUERY PLAN
--
 Seq Scan on pgbench_accounts  (cost=0.00..2890.00 rows=10
width=10) (actual time=0.023..314.783 rows=10 loops=1)
 Total runtime: 322.714 ms
(2 rows)

  Hindsight being what it is, perhaps we should have stuffed the system
  columns into a complex type instead of having individual columns, but
  I'm not sure changing that now would be worth the backwards
  compatibility break (yes, I know they're system columns, but I've seen
  more than one case of using ctid to break ties in otherwise identical
  rows..).

 Well, if the consensus is in favor of adding more system columns,
 that's not my first choice, but I'm OK with it.  However, I wonder how
 we plan to name them.  If we add pg_infomask and pg_infomask2, it
 won't be consistent with the existing naming convention which doesn't
 include any kind of pg-prefix, but if we don't use such a prefix then
 every column we add further pollutes the namespace.

 Yeah, I agree that it gets a bit ugly...  What would you think of doing
 *both*?  Keep the existing system columns for backwards compatibility,
 but then also have a complex 'pg_header' type which provides all of the
 existing columns, as well as infomask  infomask2 ...?

I don't really see what that accomplishes, TBH.  If we further modify
the header format in the future, we'll need to modify the definition
of that type, and that will be a backward-compatibility break for
anyone using it.  Adding new system columns is probably less
intrusive.

Maybe it's best to just add pg_infomask and pg_infomask2 as system
columns and not worry about the inconsistency with the naming
convention for existing columns.

Other opinions?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2014-01-01 Thread Greg Stark
 I fail to see why.  What is so ugly about this:

 select x.* from pgbench_accounts a, pg_tuple_header(a.tableoid, a.ctid) x;

Two points:

1) it's a bit weird to go to this effort to eliminate system columns by
using a scheme that depends on having a system column -- ctid

2) refetching a row could conceivably end up retrieving different data than
was present when the row was originally read. (In some cases that might
actually be the intended behaviour)

If this came up earlier I'm sorry but I suppose it's too hard to have a
function like foo(tab.*) which magically can tell that the record is a heap
tuple and look in the header? And presumably throw an error if passed a non
heap tuple.

-- 
greg
On 1 Jan 2014 21:34, Robert Haas robertmh...@gmail.com wrote:

 On Fri, Dec 27, 2013 at 9:24 PM, Stephen Frost sfr...@snowman.net wrote:
  I'm not sure what you mean by doesn't work, because it clearly does
  work.  I've already posted a patch.  You may find it ugly, but that's
  not the same as not working.
 
  I meant *practically*, it doesn't work.  By which, I mean, it sucks as
  a solution. :)

 I fail to see why.  What is so ugly about this:

 select x.* from pgbench_accounts a, pg_tuple_header(a.tableoid, a.ctid) x;

 It may be true that that query wouldn't have worked before Tom added
 LATERAL, but he did add LATERAL and we have it now and nobody's going
 to care about this function on versions that haven't got LATERAL,
 because it won't exist there anyway.  The whole point of adding
 features like LATERAL (which doesn't even appear in that query,
 interestingly enough) is that it was really annoying to use functions
 like this before we had it.  But now we *do* have it, so the fact a
 function like this would have been annoying to use in older releases
 isn't a reason not to add it now.

 I will admit that performance may be a reason.  After pgbench -i:

 rhaas=# explain analyze select xmax from pgbench_accounts;
QUERY PLAN

 
  Seq Scan on pgbench_accounts  (cost=0.00..2640.00 rows=10
 width=4) (actual time=0.009..14.950 rows=10 loops=1)
  Total runtime: 20.354 ms
 (2 rows)

 rhaas=# explain analyze select (pg_tuple_header(tableoid,ctid)).xmax
 from pgbench_accounts;
 QUERY PLAN

 --
  Seq Scan on pgbench_accounts  (cost=0.00..2890.00 rows=10
 width=10) (actual time=0.023..314.783 rows=10 loops=1)
  Total runtime: 322.714 ms
 (2 rows)

   Hindsight being what it is, perhaps we should have stuffed the system
   columns into a complex type instead of having individual columns, but
   I'm not sure changing that now would be worth the backwards
   compatibility break (yes, I know they're system columns, but I've seen
   more than one case of using ctid to break ties in otherwise identical
   rows..).
 
  Well, if the consensus is in favor of adding more system columns,
  that's not my first choice, but I'm OK with it.  However, I wonder how
  we plan to name them.  If we add pg_infomask and pg_infomask2, it
  won't be consistent with the existing naming convention which doesn't
  include any kind of pg-prefix, but if we don't use such a prefix then
  every column we add further pollutes the namespace.
 
  Yeah, I agree that it gets a bit ugly...  What would you think of doing
  *both*?  Keep the existing system columns for backwards compatibility,
  but then also have a complex 'pg_header' type which provides all of the
  existing columns, as well as infomask  infomask2 ...?

 I don't really see what that accomplishes, TBH.  If we further modify
 the header format in the future, we'll need to modify the definition
 of that type, and that will be a backward-compatibility break for
 anyone using it.  Adding new system columns is probably less
 intrusive.

 Maybe it's best to just add pg_infomask and pg_infomask2 as system
 columns and not worry about the inconsistency with the naming
 convention for existing columns.

 Other opinions?

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] preserving forensic information when we freeze

2013-12-27 Thread Andres Freund
On 2013-12-26 15:25:41 -0800, Robert Haas wrote:
 On Tue, Dec 24, 2013 at 6:22 AM, Andres Freund and...@2ndquadrant.com wrote:
  If system columns don't have an overhead anymore, I fail to see the
  advantage that functions have over simply accessing parts of the row in
  the normal way parts of rows are accessed. The only reasoning I can see
  is lessening the likelihood of conflicts - but that's essentially only
  solved via namespaced pg_ prefixes for functions as well.
 
 I dunno, I just have an uneasy feeling about it.  I guess if
 everyone's happy to add pg_infomask and pg_infomask2 as new system
 columns, we could go that route.  For my part, I think that functions
 are a real extensibility mechanism and hidden system columns are a
 crock I'd rather not propagate.  But I just work here, and I'll give
 way if the consensus is otherwise.

I am happy enough with functions as well, so I certainly won't
fundamentally block that path after a bit more discussion. My problems
with the function approach may in parts even be fixable, making me
prefer it:
* It's slower. It refetches the tuple from s_b/disk, it builds a row
  type to return, which then needs to get accessed for the individual
  columns.
* By nature of being a record returning function it's awkward to use if you
  want the individual columns because you essentially need to use
  LATERAL or you re-execute the function for each column.
* It's awkward to use because you need to need to pass
  tableoid/ctid. That's quite some notational overhead, relying on
  system columns itself...
* It's imo harder to spot differenced between versions in record types than
  columns.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-27 Thread Stephen Frost
* Andres Freund (and...@2ndquadrant.com) wrote:
 On 2013-12-26 15:25:41 -0800, Robert Haas wrote:
  I dunno, I just have an uneasy feeling about it.  I guess if
  everyone's happy to add pg_infomask and pg_infomask2 as new system
  columns, we could go that route.  For my part, I think that functions
  are a real extensibility mechanism and hidden system columns are a
  crock I'd rather not propagate.  But I just work here, and I'll give
  way if the consensus is otherwise.
 
 I am happy enough with functions as well, so I certainly won't
 fundamentally block that path after a bit more discussion. My problems
 with the function approach may in parts even be fixable, making me
 prefer it:
 * It's slower. It refetches the tuple from s_b/disk, it builds a row
   type to return, which then needs to get accessed for the individual
   columns.
 * By nature of being a record returning function it's awkward to use if you
   want the individual columns because you essentially need to use
   LATERAL or you re-execute the function for each column.
 * It's awkward to use because you need to need to pass
   tableoid/ctid. That's quite some notational overhead, relying on
   system columns itself...
 * It's imo harder to spot differenced between versions in record types than
   columns.

For my 2c, I tend to agree w/ Andres on this one.  I like the *idea* of
having a function instead, but, practically, it doesn't really work.
Perhaps if the 'function' approach was one-function-per-column and we
could remove the need for the function to take any arguments, but I'm
not sure we've really got something better than what we have now.

Hindsight being what it is, perhaps we should have stuffed the system
columns into a complex type instead of having individual columns, but
I'm not sure changing that now would be worth the backwards
compatibility break (yes, I know they're system columns, but I've seen
more than one case of using ctid to break ties in otherwise identical
rows..).

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] preserving forensic information when we freeze

2013-12-27 Thread Robert Haas
On Fri, Dec 27, 2013 at 6:15 AM, Stephen Frost sfr...@snowman.net wrote:
 * Andres Freund (and...@2ndquadrant.com) wrote:
 On 2013-12-26 15:25:41 -0800, Robert Haas wrote:
  I dunno, I just have an uneasy feeling about it.  I guess if
  everyone's happy to add pg_infomask and pg_infomask2 as new system
  columns, we could go that route.  For my part, I think that functions
  are a real extensibility mechanism and hidden system columns are a
  crock I'd rather not propagate.  But I just work here, and I'll give
  way if the consensus is otherwise.

 I am happy enough with functions as well, so I certainly won't
 fundamentally block that path after a bit more discussion. My problems
 with the function approach may in parts even be fixable, making me
 prefer it:
 * It's slower. It refetches the tuple from s_b/disk, it builds a row
   type to return, which then needs to get accessed for the individual
   columns.
 * By nature of being a record returning function it's awkward to use if you
   want the individual columns because you essentially need to use
   LATERAL or you re-execute the function for each column.
 * It's awkward to use because you need to need to pass
   tableoid/ctid. That's quite some notational overhead, relying on
   system columns itself...
 * It's imo harder to spot differenced between versions in record types than
   columns.

 For my 2c, I tend to agree w/ Andres on this one.  I like the *idea* of
 having a function instead, but, practically, it doesn't really work.
 Perhaps if the 'function' approach was one-function-per-column and we
 could remove the need for the function to take any arguments, but I'm
 not sure we've really got something better than what we have now.

I'm not sure what you mean by doesn't work, because it clearly does
work.  I've already posted a patch.  You may find it ugly, but that's
not the same as not working.

 Hindsight being what it is, perhaps we should have stuffed the system
 columns into a complex type instead of having individual columns, but
 I'm not sure changing that now would be worth the backwards
 compatibility break (yes, I know they're system columns, but I've seen
 more than one case of using ctid to break ties in otherwise identical
 rows..).

Well, if the consensus is in favor of adding more system columns,
that's not my first choice, but I'm OK with it.  However, I wonder how
we plan to name them.  If we add pg_infomask and pg_infomask2, it
won't be consistent with the existing naming convention which doesn't
include any kind of pg-prefix, but if we don't use such a prefix then
every column we add further pollutes the namespace.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-27 Thread Stephen Frost
Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
 On Fri, Dec 27, 2013 at 6:15 AM, Stephen Frost sfr...@snowman.net wrote:
  For my 2c, I tend to agree w/ Andres on this one.  I like the *idea* of
  having a function instead, but, practically, it doesn't really work.
  Perhaps if the 'function' approach was one-function-per-column and we
  could remove the need for the function to take any arguments, but I'm
  not sure we've really got something better than what we have now.
 
 I'm not sure what you mean by doesn't work, because it clearly does
 work.  I've already posted a patch.  You may find it ugly, but that's
 not the same as not working.

I meant *practically*, it doesn't work.  By which, I mean, it sucks as
a solution. :)

  Hindsight being what it is, perhaps we should have stuffed the system
  columns into a complex type instead of having individual columns, but
  I'm not sure changing that now would be worth the backwards
  compatibility break (yes, I know they're system columns, but I've seen
  more than one case of using ctid to break ties in otherwise identical
  rows..).
 
 Well, if the consensus is in favor of adding more system columns,
 that's not my first choice, but I'm OK with it.  However, I wonder how
 we plan to name them.  If we add pg_infomask and pg_infomask2, it
 won't be consistent with the existing naming convention which doesn't
 include any kind of pg-prefix, but if we don't use such a prefix then
 every column we add further pollutes the namespace.

Yeah, I agree that it gets a bit ugly...  What would you think of doing
*both*?  Keep the existing system columns for backwards compatibility,
but then also have a complex 'pg_header' type which provides all of the
existing columns, as well as infomask  infomask2 ...?

I've not looked into any of the implementation complexity, but it might
give us a path to providing *just* the 'pg_header' (or whatever color we
end up making that bike shed...) complex type with all the system
columns available under it, maybe only using that in 10.0?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] preserving forensic information when we freeze

2013-12-26 Thread Robert Haas
On Tue, Dec 24, 2013 at 6:22 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-12-23 14:15:29 -0500, Robert Haas wrote:
 On Mon, Dec 23, 2013 at 1:57 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  Well, all of the fundamental changes (combocids, the initial multixact
  introduction) have been quite some time ago. I think backward compat has
  a much higher value these days (I also don't see much point in looking
  at cmin/cmax for anything but diagnostic purposes). I don't think any of
  the usecases I've seen would be broken by either fk-locks (multixacts
  have been there before, doesn't matter much that they now contain
  updates) nor by forensic freezing. The latter because they really only
  checked that xmin/xmax were the same, and we hopefully haven't broken
  that...
 
  But part of my point really was the usability, not only the
  performance. Requiring LATERAL queries to be usable sensibly causes a
  Meh from my side ;)

 I simply can't imagine that we're going to want to add a system column
 every time we change something, or even enough of them to cover all of
 the things we've already got.  We'd need at least infomask, infomask2,
 and hoff, and maybe some functions for peeking through mxacts to find
 the updater and locker XIDs and lock modes.  With a couple of
 functions we can do all that and not look back.

 If system columns don't have an overhead anymore, I fail to see the
 advantage that functions have over simply accessing parts of the row in
 the normal way parts of rows are accessed. The only reasoning I can see
 is lessening the likelihood of conflicts - but that's essentially only
 solved via namespaced pg_ prefixes for functions as well.

I dunno, I just have an uneasy feeling about it.  I guess if
everyone's happy to add pg_infomask and pg_infomask2 as new system
columns, we could go that route.  For my part, I think that functions
are a real extensibility mechanism and hidden system columns are a
crock I'd rather not propagate.  But I just work here, and I'll give
way if the consensus is otherwise.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-24 Thread Andres Freund
On 2013-12-23 14:15:29 -0500, Robert Haas wrote:
 On Mon, Dec 23, 2013 at 1:57 PM, Andres Freund and...@2ndquadrant.com wrote:
  Well, all of the fundamental changes (combocids, the initial multixact
  introduction) have been quite some time ago. I think backward compat has
  a much higher value these days (I also don't see much point in looking
  at cmin/cmax for anything but diagnostic purposes). I don't think any of
  the usecases I've seen would be broken by either fk-locks (multixacts
  have been there before, doesn't matter much that they now contain
  updates) nor by forensic freezing. The latter because they really only
  checked that xmin/xmax were the same, and we hopefully haven't broken
  that...
 
  But part of my point really was the usability, not only the
  performance. Requiring LATERAL queries to be usable sensibly causes a
  Meh from my side ;)
 
 I simply can't imagine that we're going to want to add a system column
 every time we change something, or even enough of them to cover all of
 the things we've already got.  We'd need at least infomask, infomask2,
 and hoff, and maybe some functions for peeking through mxacts to find
 the updater and locker XIDs and lock modes.  With a couple of
 functions we can do all that and not look back.

If system columns don't have an overhead anymore, I fail to see the
advantage that functions have over simply accessing parts of the row in
the normal way parts of rows are accessed. The only reasoning I can see
is lessening the likelihood of conflicts - but that's essentially only
solved via namespaced pg_ prefixes for functions as well.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-23 Thread Robert Haas
On Fri, Dec 20, 2013 at 9:56 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Dec 20, 2013 at 7:51 AM, Andres Freund and...@2ndquadrant.com wrote:
 I wondered that, too, but it's not well-defined for all tuples.  What
 happens if you pass in constructed tuple rather than an on-disk tuple?

 Those should be discernible I think, t_self/t_tableOid won't be set for
 generated tuples.

 I went looking for existing precedent for code that does things like
 this and found record_out, which does this:

 HeapTupleHeader rec = PG_GETARG_HEAPTUPLEHEADER(0);
 ...
 /* Extract type info from the tuple itself */
 tupType = HeapTupleHeaderGetTypeId(rec);
 tupTypmod = HeapTupleHeaderGetTypMod(rec);
 tupdesc = lookup_rowtype_tupdesc(tupType, tupTypmod);
 ncolumns = tupdesc-natts;

 /* Build a temporary HeapTuple control structure */
 tuple.t_len = HeapTupleHeaderGetDatumLength(rec);
 ItemPointerSetInvalid((tuple.t_self));
 tuple.t_tableOid = InvalidOid;
 tuple.t_data = rec;

 This appears to be a typical pattern, although interestingly I noticed
 that row_to_json() doesn't bother setting t_tableOid or t_self, which
 I think it's supposed to do.  The problem I see here is that this code
 seems to imply that a function passed a record doesn't actually have
 enough information to know what sort of a thing it's getting.  The use
 of HeapTupleHeaderGetTypeId and HeapTupleHeaderGetTypMod implies that
 it's safe to assume that t_choice will contain DatumTupleFields rather
 than HeapTupleFields, which doesn't seem to bode well for your
 approach.

 Am I missing a trick?

If not, here's a patch done the way I originally proposed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pg-tuple-header.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-23 Thread Andres Freund
On 2013-12-20 13:22:07 +0100, Andres Freund wrote:
 On 2013-12-20 07:12:01 -0500, Robert Haas wrote:
  I think the root of the problem is that nobody's very eager to add
  more hidden system catalog columns because each one bloats
  pg_attribute significantly.
 
 I think that part is actually solveable - there's no real need for them
 to be real columns, present in pg_attribute, things could easily enough be
 setup during parse analysis.

I've spent some time yesterday hacking up a prototype for this. The
amount of effort seems reasonable, although the attached patch certainly
doesn't do all the neccessary things. The regression tests pass.

In the prototype only oid keeps it's pg_attribute entry, everything else
uses the static entries via
SystemAttributeDefinition()/SystemAttributeByName(). We might want to
remove oids as well, but it seems reasonable to continue allowing
defining column permissions on it. And it's likely the attribute that's
most likely ingrained deeply in user applications.

 The bigger problem I see is that adding
 more useful columns will cause name clashes, which will probably
 prohibit improvements in that direction.

I wonder if we could solve that by naming new columns like
system:attribute or similar. Not pretty, but maybe better than the
alternatives.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 3a5692575c0077601cfba938239f4dd6f74de3c5 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Mon, 23 Dec 2013 13:44:36 +0100
Subject: [PATCH] prototype-patch: remove system columns from pg_attribute

---
 src/backend/access/common/heaptuple.c |  14 
 src/backend/catalog/aclchk.c  |  27 +++---
 src/backend/catalog/heap.c|  18 ++--
 src/backend/commands/tablecmds.c  | 154 +++---
 src/backend/parser/parse_relation.c   |  65 ++
 src/backend/utils/cache/lsyscache.c   |  11 ++-
 src/include/access/sysattr.h  |   3 +-
 7 files changed, 188 insertions(+), 104 deletions(-)

diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c
index 347d616..99471ab 100644
--- a/src/backend/access/common/heaptuple.c
+++ b/src/backend/access/common/heaptuple.c
@@ -59,7 +59,9 @@
 
 #include access/sysattr.h
 #include access/tuptoaster.h
+#include access/transam.h
 #include executor/tuptable.h
+#include storage/procarray.h
 
 
 /* Does att's datatype allow packing into the 1-byte-header varlena format? */
@@ -286,6 +288,7 @@ heap_attisnull(HeapTuple tup, int attnum)
 		case MinCommandIdAttributeNumber:
 		case MaxTransactionIdAttributeNumber:
 		case MaxCommandIdAttributeNumber:
+		case CookedMaxTransactionIdAttributeNumber:
 			/* these are never null */
 			break;
 
@@ -558,6 +561,17 @@ heap_getsysattr(HeapTuple tup, int attnum, TupleDesc tupleDesc, bool *isnull)
 		case TableOidAttributeNumber:
 			result = ObjectIdGetDatum(tup-t_tableOid);
 			break;
+		case CookedMaxTransactionIdAttributeNumber:
+			{
+TransactionId xid;
+xid = HeapTupleHeaderGetUpdateXid(tup-t_data);
+if (TransactionIdIsValid(xid) 
+	!TransactionIdIsInProgress(xid) 
+	!TransactionIdDidCommit(xid))
+	xid = InvalidTransactionId;
+result = TransactionIdGetDatum(xid);
+			}
+			break;
 		default:
 			elog(ERROR, invalid attnum: %d, attnum);
 			result = 0;			/* keep compiler quiet */
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index 5a46fd9..807e274 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -1532,16 +1532,19 @@ expand_all_col_privileges(Oid table_oid, Form_pg_class classForm,
 		if (classForm-relkind == RELKIND_VIEW  curr_att  0)
 			continue;
 
-		attTuple = SearchSysCache2(ATTNUM,
-   ObjectIdGetDatum(table_oid),
-   Int16GetDatum(curr_att));
-		if (!HeapTupleIsValid(attTuple))
-			elog(ERROR, cache lookup failed for attribute %d of relation %u,
- curr_att, table_oid);
-
-		isdropped = ((Form_pg_attribute) GETSTRUCT(attTuple))-attisdropped;
-
-		ReleaseSysCache(attTuple);
+		if (curr_att  0)
+			isdropped = false;
+		else
+		{
+			attTuple = SearchSysCache2(ATTNUM,
+	   ObjectIdGetDatum(table_oid),
+	   Int16GetDatum(curr_att));
+			if (!HeapTupleIsValid(attTuple))
+elog(ERROR, cache lookup failed for attribute %d of relation %u,
+	 curr_att, table_oid);
+			isdropped = ((Form_pg_attribute) GETSTRUCT(attTuple))-attisdropped;
+			ReleaseSysCache(attTuple);
+		}
 
 		/* ignore dropped columns */
 		if (isdropped)
@@ -1579,6 +1582,10 @@ ExecGrant_Attribute(InternalGrant *istmt, Oid relOid, const char *relname,
 	Oid		   *oldmembers;
 	Oid		   *newmembers;
 
+	/* FIXME: don't set column permissions on system columns */
+	if (attnum  0)
+		return;
+
 	attr_tuple = SearchSysCache2(ATTNUM,
  ObjectIdGetDatum(relOid),
  Int16GetDatum(attnum));
diff --git 

Re: [HACKERS] preserving forensic information when we freeze

2013-12-23 Thread Andres Freund
On 2013-12-20 21:56:42 -0500, Robert Haas wrote:
 On Fri, Dec 20, 2013 at 7:51 AM, Andres Freund and...@2ndquadrant.com wrote:
  I wondered that, too, but it's not well-defined for all tuples.  What
  happens if you pass in constructed tuple rather than an on-disk tuple?
 
  Those should be discernible I think, t_self/t_tableOid won't be set for
  generated tuples.
 
 I went looking for existing precedent for code that does things like
 this and found record_out, which does this:

I think there's plenty precedent for C code, but here the problem seems
to be that, when we pass a record that's originally a plain row, it's
coerced (via IO functions) into a record with typeid/typemod set.

So, for this to work via SQL, we'd need to add a notation that allows
passing table rows to functions without being modified. That might be a
good idea in general, but likely more work than it's work tackling in
this context.

 This appears to be a typical pattern, although interestingly I noticed
 that row_to_json() doesn't bother setting t_tableOid or t_self, which
 I think it's supposed to do.

Yes, that seems to be a minor bug. Andrew?

 Am I missing a trick?

No, don't think so, I wasn't thinking on the SQL level.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-23 Thread Robert Haas
On Mon, Dec 23, 2013 at 7:45 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-12-20 13:22:07 +0100, Andres Freund wrote:
 On 2013-12-20 07:12:01 -0500, Robert Haas wrote:
  I think the root of the problem is that nobody's very eager to add
  more hidden system catalog columns because each one bloats
  pg_attribute significantly.

 I think that part is actually solveable - there's no real need for them
 to be real columns, present in pg_attribute, things could easily enough be
 setup during parse analysis.

 I've spent some time yesterday hacking up a prototype for this. The
 amount of effort seems reasonable, although the attached patch certainly
 doesn't do all the neccessary things. The regression tests pass.

Well, I'm all in favor of de-bloating pg_attribute, but I don't
particularly like cooked_xmax.  Even if it were better-named
(updatexid?), I'm still not sure I'd like it very much.  And I
definitely think that getting rid of the pg_attribute entries ought to
be a separate patch from adding any new system columns we want to
have.

 In the prototype only oid keeps it's pg_attribute entry, everything else
 uses the static entries via
 SystemAttributeDefinition()/SystemAttributeByName(). We might want to
 remove oids as well, but it seems reasonable to continue allowing
 defining column permissions on it. And it's likely the attribute that's
 most likely ingrained deeply in user applications.

Seems fine to me.

 The bigger problem I see is that adding
 more useful columns will cause name clashes, which will probably
 prohibit improvements in that direction.

 I wonder if we could solve that by naming new columns like
 system:attribute or similar. Not pretty, but maybe better than the
 alternatives.

If we're going to add a prefix, I think it should be one that doesn't
require quoting the identifier.  In retrospect pg_xmin or _pg_xmin or
__pg_xmin would have been a better name than xmin.

But TBH, I'm kind of enamored of the function-based approach at the
moment.  It just seems a lot more flexible.  Adding a function is
nearly free and we can add as many as we need doing whatever we want,
and they can even go into contrib modules if we like it better that
way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-23 Thread Andres Freund
On 2013-12-23 10:26:49 -0500, Robert Haas wrote:
 On Mon, Dec 23, 2013 at 7:45 AM, Andres Freund and...@2ndquadrant.com wrote:
  I've spent some time yesterday hacking up a prototype for this. The
  amount of effort seems reasonable, although the attached patch certainly
  doesn't do all the neccessary things. The regression tests pass.
 
 Well, I'm all in favor of de-bloating pg_attribute, but I don't
 particularly like cooked_xmax.  Even if it were better-named
 (updatexid?), I'm still not sure I'd like it very much.  And I
 definitely think that getting rid of the pg_attribute entries ought to
 be a separate patch from adding any new system columns we want to
 have.

Yea, that was just a demo attribute, I am far from sure we should add
any. It was just important to me to test that we're easily able to do
so. I don't even think it's semantics are necessarily all that useful.

I think we should do this, independent from adding additional
attributes. The reduction of rows in pg_attribute is quite noticeable,
and I can't see us just dropping the old system columns.

 But TBH, I'm kind of enamored of the function-based approach at the
 moment.  It just seems a lot more flexible.  Adding a function is
 nearly free and we can add as many as we need doing whatever we want,
 and they can even go into contrib modules if we like it better that
 way.

Well, it really depends on the usecase. Reading
SELECT header.xmin, header.xmax
FROM sometable tbl,
pg_tuple_header(tbl.tableoid, tbl.ctid) header;
is surely harder to understand than
SELECT tbl.xmin, tbl.xmax
FROM sometable tbl;
and the latter is noticeably cheaper since we're not re-fetching the
tuple, especially when the tuple is the result of a more complicated
query including a seqscan, we might often need to re-fetch the tuple
from disk, thanks to the ringbuffer.

That really makes me less than convinced that the function based
approach is the right tool for everything.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-23 Thread Robert Haas
On Mon, Dec 23, 2013 at 10:42 AM, Andres Freund and...@2ndquadrant.com wrote:
 But TBH, I'm kind of enamored of the function-based approach at the
 moment.  It just seems a lot more flexible.  Adding a function is
 nearly free and we can add as many as we need doing whatever we want,
 and they can even go into contrib modules if we like it better that
 way.

 Well, it really depends on the usecase. Reading
 SELECT header.xmin, header.xmax
 FROM sometable tbl,
 pg_tuple_header(tbl.tableoid, tbl.ctid) header;
 is surely harder to understand than
 SELECT tbl.xmin, tbl.xmax
 FROM sometable tbl;
 and the latter is noticeably cheaper since we're not re-fetching the
 tuple, especially when the tuple is the result of a more complicated
 query including a seqscan, we might often need to re-fetch the tuple
 from disk, thanks to the ringbuffer.

 That really makes me less than convinced that the function based
 approach is the right tool for everything.

Meh.  Who are all of these people who are fetching xmin, xmax, cmin,
and cmax in complex queries, and why are they doing that?  If those
columns are just for forensic purposes, then whatever performance
disadvantages the function-based approach has don't really matter.  If
people are using them as integral parts of their application, then
that's more of a concern, but how are those people not getting broken
every release or three anyway?  We keep inventing things like
combo-cids and multi-xacts and multi-xacts that also contain an update
and now freezing without changing xmin, and if you're actually relying
on those columns for anything but forensics your application is
presumably going to break when we change whatever part it depends on.
Is anyone really doing that, and if so, why?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-23 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:

 Meh.  Who are all of these people who are fetching xmin, xmax, cmin,
 and cmax in complex queries, and why are they doing that?  If those
 columns are just for forensic purposes, then whatever performance
 disadvantages the function-based approach has don't really matter.  If
 people are using them as integral parts of their application, then
 that's more of a concern, but how are those people not getting broken
 every release or three anyway?  We keep inventing things like
 combo-cids and multi-xacts and multi-xacts that also contain an update
 and now freezing without changing xmin, and if you're actually relying
 on those columns for anything but forensics your application is
 presumably going to break when we change whatever part it depends on.
 Is anyone really doing that, and if so, why?

I've seen an ORM use xmin for a sort of optimistic concurrency
control scheme.  Let's say that you bring up an address by primary
key, and change an incorrect apartment number.  At the same time,
someone else is entering a change of address, to a whole new
location where a totally different apartment number is supplied.
The developers want to prevent portions of the two addresses from
being intermingled, they don't want to hold a database transaction
open for the time the user has the window up, they want to avoid
blocking as much as possible, and they don't want to store an extra
column with a version number or timestamp just to do that -- so
they use xmin.  When they go to rewrite the row they use the PK
values plus the xmin in the UPDATE statement's WHERE clause.  If
the row is not found, the user gets an error message.  Currently
that does mean that the users can get a spurious error message when
there is a tuple freeze while the window is open, so the current
changes are probably good news for those using this approach.

Other than that and ctid (for similar reasons) I have not seen
system columns used in applications or application frameworks.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-23 Thread Andres Freund
On 2013-12-23 10:56:07 -0500, Robert Haas wrote:
  Well, it really depends on the usecase. Reading
  SELECT header.xmin, header.xmax
  FROM sometable tbl,
  pg_tuple_header(tbl.tableoid, tbl.ctid) header;
  is surely harder to understand than
  SELECT tbl.xmin, tbl.xmax
  FROM sometable tbl;
  and the latter is noticeably cheaper since we're not re-fetching the
  tuple, especially when the tuple is the result of a more complicated
  query including a seqscan, we might often need to re-fetch the tuple
  from disk, thanks to the ringbuffer.
 
  That really makes me less than convinced that the function based
  approach is the right tool for everything.

 Meh.  Who are all of these people who are fetching xmin, xmax, cmin,
 and cmax in complex queries, and why are they doing that?

I have seen it done to avoid ABA problems in cache invalidation
mechanisms by simply checking ctid, xmin, xmax to stay the same.

 If those
 columns are just for forensic purposes, then whatever performance
 disadvantages the function-based approach has don't really matter.  If
 people are using them as integral parts of their application, then
 that's more of a concern, but how are those people not getting broken
 every release or three anyway?  We keep inventing things like
 combo-cids and multi-xacts and multi-xacts that also contain an update
 and now freezing without changing xmin, and if you're actually relying
 on those columns for anything but forensics your application is
 presumably going to break when we change whatever part it depends on.

Well, all of the fundamental changes (combocids, the initial multixact
introduction) have been quite some time ago. I think backward compat has
a much higher value these days (I also don't see much point in looking
at cmin/cmax for anything but diagnostic purposes). I don't think any of
the usecases I've seen would be broken by either fk-locks (multixacts
have been there before, doesn't matter much that they now contain
updates) nor by forensic freezing. The latter because they really only
checked that xmin/xmax were the same, and we hopefully haven't broken
that...

But part of my point really was the usability, not only the
performance. Requiring LATERAL queries to be usable sensibly causes a
Meh from my side ;)

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-23 Thread Robert Haas
On Mon, Dec 23, 2013 at 1:57 PM, Andres Freund and...@2ndquadrant.com wrote:
 Well, all of the fundamental changes (combocids, the initial multixact
 introduction) have been quite some time ago. I think backward compat has
 a much higher value these days (I also don't see much point in looking
 at cmin/cmax for anything but diagnostic purposes). I don't think any of
 the usecases I've seen would be broken by either fk-locks (multixacts
 have been there before, doesn't matter much that they now contain
 updates) nor by forensic freezing. The latter because they really only
 checked that xmin/xmax were the same, and we hopefully haven't broken
 that...

 But part of my point really was the usability, not only the
 performance. Requiring LATERAL queries to be usable sensibly causes a
 Meh from my side ;)

I simply can't imagine that we're going to want to add a system column
every time we change something, or even enough of them to cover all of
the things we've already got.  We'd need at least infomask, infomask2,
and hoff, and maybe some functions for peeking through mxacts to find
the updater and locker XIDs and lock modes.  With a couple of
functions we can do all that and not look back.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-22 Thread Robert Haas
On Fri, Dec 20, 2013 at 8:01 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-12-20 07:58:46 -0500, Robert Haas wrote:
 I think the immediate problem is to decide whether this patch ought to
 make the xmin column display the result of GetXmin() or GetRawXmin().
 Thoughts on that?

 I slightly favor GetRawXmin().

Committed that way for now.  It seems more consistent with what we do
elsewhere.  We neglected to write documentation for this change, so
here's a patch for that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 8f9f61d..0f2f2bf 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5264,8 +5264,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   listitem
para
 Specifies the cutoff age (in transactions) that commandVACUUM/
-should use to decide whether to replace transaction IDs with
-literalFrozenXID/ while scanning a table.
+should use to decide whether to freeze row versions
+while scanning a table.
 The default is 50 million transactions.  Although
 users can set this value anywhere from zero to one billion,
 commandVACUUM/ will silently limit the effective value to half
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index fb94596..324ed0e 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -397,8 +397,12 @@
 
para
 The reason that periodic vacuuming solves the problem is that
-productnamePostgreSQL/productname reserves a special XID
-as literalFrozenXID/.  This XID does not follow the normal XID
+commandVACUUM/ will mark rows as emphasisfrozen/, indicating that
+they were inserted by a transaction which committed sufficiently far in
+the past that the effects of the inserting transaction is certain to be
+visible, from an MVCC perspective, to all current and future transactions.
+productnamePostgreSQL/ reserves a special XID,
+literalFrozenTransactionId/, which does not follow the normal XID
 comparison rules and is always considered older
 than every normal XID. Normal XIDs are
 compared using modulo-2superscript32/ arithmetic. This means
@@ -410,20 +414,19 @@
 the next two billion transactions, no matter which normal XID we are
 talking about. If the row version still exists after more than two billion
 transactions, it will suddenly appear to be in the future. To
-prevent this, old row versions must be reassigned the XID
-literalFrozenXID/ sometime before they reach the
-two-billion-transactions-old mark. Once they are assigned this
-special XID, they will appear to be quotein the past/ to all
-normal transactions regardless of wraparound issues, and so such
-row versions will be valid until deleted, no matter how long that is.
-This reassignment of old XIDs is handled by commandVACUUM/.
+prevent this, frozen row versions are treated as if the inserting XID were
+literalFrozenTransactionId/, so that they will appear to be
+quotein the past/ to all normal transactions regardless of wraparound
+issues, and so such row versions will be valid until deleted, no matter
+how long that is.
/para
 
para
 xref linkend=guc-vacuum-freeze-min-age
-controls how old an XID value has to be before it's replaced with
-literalFrozenXID/.  Larger values of this setting
-preserve transactional information longer, while smaller values increase
+controls how old an XID value has to be before its row version will be
+frozen.  Increasing this setting may avoid unnecessary work if the
+rows that would otherwise be frozen will soon be modified again,
+but decreasing this setting increases
 the number of transactions that can elapse before the table must be
 vacuumed again.
/para
@@ -431,8 +434,8 @@
para
 commandVACUUM/ normally skips pages that don't have any dead row
 versions, but those pages might still have row versions with old XID
-values.  To ensure all old XIDs have been replaced by
-literalFrozenXID/, a scan of the whole table is needed.
+values.  To ensure all old row versions have been frozen, a
+scan of the whole table is needed.
 xref linkend=guc-vacuum-freeze-table-age controls when
 commandVACUUM/ does that: a whole table sweep is forced if
 the table hasn't been fully scanned for varnamevacuum_freeze_table_age/
@@ -447,8 +450,8 @@
 the time commandVACUUM/ last scanned the whole table.  If it were to go
 unvacuumed for longer than
 that, data loss could result.  To ensure that this does not happen,
-autovacuum is invoked on any table that might contain XIDs older than the
-age specified by the configuration parameter xref
+autovacuum is invoked on any table that might contain unfrozen rows 

Re: [HACKERS] preserving forensic information when we freeze

2013-12-20 Thread Robert Haas
On Thu, Dec 19, 2013 at 9:22 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Jim Nasby escribió:
 One thing users will lose in this patch is the ability to reliably see if a 
 tuple is frozen via SQL. Today you can do that just by selecting xmin from 
 the table.

 Obviously people don't generally need to do that... but it's one of those 
 things that when you do need it it's incredibly handy to have... would it be 
 difficult to expose infomask(2) via SQL, the same way xmin et all are?

 It's already exposed via the pageinspect extension.  It's doubtful that
 there are many valid cases where you need infomask but don't have access
 to that module.

 The real fix seems to ensure that the module is always available.

We could make the code that displays this column do GetXmin rather
than GetRawXmin, which would allow this question to be answered, but
lose the ability to find the original xmin once the tuple is frozen
and break precedent with the xmax and cmin/cmax fields, which return
the raw value from the tuple header.  But I'm OK to go whichever
direction people prefer.

I think the root of the problem is that nobody's very eager to add
more hidden system catalog columns because each one bloats
pg_attribute significantly.  Currently, we have xmin, xmax, cmin, and
cmax columns, and that's basically just a historical artifact at this
point.  cmin and cmax always return the same value, and the value
returned may be neither a cmin nor a cmax but a combo CID.  And if
you're looking for information that's only in the infomask, good luck.

Maybe what we should do is add a function something like
pg_tuple_header(tableoid, ctid) that returns a record, maybe something
like (rawxmin xid, rawxmax xid, rawcid cid, infomask int, infomask2
int, hoff int).  Or perhaps some slightly more cooked version of that
information.  And then delete the xmin, xmax, cmin, and cmax system
columns.  That'd save significantly on pg_attribute entries while, at
the same time, actually providing more information than we do today.

pageinspect is useful, too, but it seems to deal mostly with pages, so
I'm not sure it's a natural substitute for something like this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-20 Thread Andres Freund
On 2013-12-20 07:12:01 -0500, Robert Haas wrote:
 I think the root of the problem is that nobody's very eager to add
 more hidden system catalog columns because each one bloats
 pg_attribute significantly.

I think that part is actually solveable - there's no real need for them
to be real columns, present in pg_attribute, things could easily enough be
setup during parse analysis. The bigger problem I see is that adding
more useful columns will cause name clashes, which will probably
prohibit improvements in that direction.

 Maybe what we should do is add a function something like
 pg_tuple_header(tableoid, ctid) that returns a record, maybe something
 like (rawxmin xid, rawxmax xid, rawcid cid, infomask int, infomask2
 int, hoff int).  Or perhaps some slightly more cooked version of that
 information.  And then delete the xmin, xmax, cmin, and cmax system
 columns.  That'd save significantly on pg_attribute entries while, at
 the same time, actually providing more information than we do today.

I was wondering whether we couldn't just pass pg_tuple_header() a whole
row, instead of having the user manually pass in reloid and ctid. I
think that should actually work in the interesting scenarios.

 pageinspect is useful, too, but it seems to deal mostly with pages, so
 I'm not sure it's a natural substitute for something like this.

Agreed. I also think that we really need something to investigate
problems in core, not in a contrib module people won't have installed,
especially in bigger setups. That said I plan to either submit some
improvements to pageinspect myself, or convince somebody else to do so
in the not too far away future.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-20 Thread Robert Haas
On Fri, Dec 20, 2013 at 7:22 AM, Andres Freund and...@2ndquadrant.com wrote:
 Maybe what we should do is add a function something like
 pg_tuple_header(tableoid, ctid) that returns a record, maybe something
 like (rawxmin xid, rawxmax xid, rawcid cid, infomask int, infomask2
 int, hoff int).  Or perhaps some slightly more cooked version of that
 information.  And then delete the xmin, xmax, cmin, and cmax system
 columns.  That'd save significantly on pg_attribute entries while, at
 the same time, actually providing more information than we do today.

 I was wondering whether we couldn't just pass pg_tuple_header() a whole
 row, instead of having the user manually pass in reloid and ctid. I
 think that should actually work in the interesting scenarios.

I wondered that, too, but it's not well-defined for all tuples.  What
happens if you pass in constructed tuple rather than an on-disk tuple?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-20 Thread Andres Freund
On 2013-12-20 07:47:17 -0500, Robert Haas wrote:
 On Fri, Dec 20, 2013 at 7:22 AM, Andres Freund and...@2ndquadrant.com wrote:
  Maybe what we should do is add a function something like
  pg_tuple_header(tableoid, ctid) that returns a record, maybe something
  like (rawxmin xid, rawxmax xid, rawcid cid, infomask int, infomask2
  int, hoff int).  Or perhaps some slightly more cooked version of that
  information.  And then delete the xmin, xmax, cmin, and cmax system
  columns.  That'd save significantly on pg_attribute entries while, at
  the same time, actually providing more information than we do today.
 
  I was wondering whether we couldn't just pass pg_tuple_header() a whole
  row, instead of having the user manually pass in reloid and ctid. I
  think that should actually work in the interesting scenarios.
 
 I wondered that, too, but it's not well-defined for all tuples.  What
 happens if you pass in constructed tuple rather than an on-disk tuple?

Those should be discernible I think, t_self/t_tableOid won't be set for
generated tuples.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-20 Thread Robert Haas
On Fri, Dec 20, 2013 at 7:51 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-12-20 07:47:17 -0500, Robert Haas wrote:
 On Fri, Dec 20, 2013 at 7:22 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  Maybe what we should do is add a function something like
  pg_tuple_header(tableoid, ctid) that returns a record, maybe something
  like (rawxmin xid, rawxmax xid, rawcid cid, infomask int, infomask2
  int, hoff int).  Or perhaps some slightly more cooked version of that
  information.  And then delete the xmin, xmax, cmin, and cmax system
  columns.  That'd save significantly on pg_attribute entries while, at
  the same time, actually providing more information than we do today.
 
  I was wondering whether we couldn't just pass pg_tuple_header() a whole
  row, instead of having the user manually pass in reloid and ctid. I
  think that should actually work in the interesting scenarios.

 I wondered that, too, but it's not well-defined for all tuples.  What
 happens if you pass in constructed tuple rather than an on-disk tuple?

 Those should be discernible I think, t_self/t_tableOid won't be set for
 generated tuples.

Hmm.  That might work.

I think the immediate problem is to decide whether this patch ought to
make the xmin column display the result of GetXmin() or GetRawXmin().
Thoughts on that?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-20 Thread Andres Freund
On 2013-12-20 07:58:46 -0500, Robert Haas wrote:
 I think the immediate problem is to decide whether this patch ought to
 make the xmin column display the result of GetXmin() or GetRawXmin().
 Thoughts on that?

I slightly favor GetRawXmin().

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-20 Thread Alvaro Herrera
Robert Haas escribió:

 Maybe what we should do is add a function something like
 pg_tuple_header(tableoid, ctid) that returns a record, maybe something
 like (rawxmin xid, rawxmax xid, rawcid cid, infomask int, infomask2
 int, hoff int).  Or perhaps some slightly more cooked version of that
 information.  And then delete the xmin, xmax, cmin, and cmax system
 columns.  That'd save significantly on pg_attribute entries while, at
 the same time, actually providing more information than we do today.

+1 for this general idea.  I proposed this some time ago and got shot
down because of pageinspect.  I don't know about taking the system
columns out completely -- not sure how much third party code we're going
to break that way, certainly a lot -- but the extra functionality would
be useful nonetheless.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-20 Thread Alvaro Herrera
Robert Haas escribió:
 On Fri, Dec 20, 2013 at 7:22 AM, Andres Freund and...@2ndquadrant.com wrote:
  Maybe what we should do is add a function something like
  pg_tuple_header(tableoid, ctid) that returns a record, maybe something
  like (rawxmin xid, rawxmax xid, rawcid cid, infomask int, infomask2
  int, hoff int).  Or perhaps some slightly more cooked version of that
  information.  And then delete the xmin, xmax, cmin, and cmax system
  columns.  That'd save significantly on pg_attribute entries while, at
  the same time, actually providing more information than we do today.
 
  I was wondering whether we couldn't just pass pg_tuple_header() a whole
  row, instead of having the user manually pass in reloid and ctid. I
  think that should actually work in the interesting scenarios.
 
 I wondered that, too, but it's not well-defined for all tuples.  What
 happens if you pass in constructed tuple rather than an on-disk tuple?

I assume without checking that passing reloid/ctid would allow this to
work for tuples in a RETURNING clause; and if we ever have an OLD
reference for the RETURNING clause of an UPDATE, that it would work
there, too, showing the post-update status of the updated tuple.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-20 Thread Robert Haas
On Fri, Dec 20, 2013 at 1:41 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Robert Haas escribió:
 On Fri, Dec 20, 2013 at 7:22 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  Maybe what we should do is add a function something like
  pg_tuple_header(tableoid, ctid) that returns a record, maybe something
  like (rawxmin xid, rawxmax xid, rawcid cid, infomask int, infomask2
  int, hoff int).  Or perhaps some slightly more cooked version of that
  information.  And then delete the xmin, xmax, cmin, and cmax system
  columns.  That'd save significantly on pg_attribute entries while, at
  the same time, actually providing more information than we do today.
 
  I was wondering whether we couldn't just pass pg_tuple_header() a whole
  row, instead of having the user manually pass in reloid and ctid. I
  think that should actually work in the interesting scenarios.

 I wondered that, too, but it's not well-defined for all tuples.  What
 happens if you pass in constructed tuple rather than an on-disk tuple?

 I assume without checking that passing reloid/ctid would allow this to
 work for tuples in a RETURNING clause; and if we ever have an OLD
 reference for the RETURNING clause of an UPDATE, that it would work
 there, too, showing the post-update status of the updated tuple.

I don't understand what you're saying here.  Are you saying that
reloid/ctid is a better approach, a worse approach, or just a
different approach?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-20 Thread Alvaro Herrera
Robert Haas escribió:
 On Fri, Dec 20, 2013 at 1:41 PM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:

  I assume without checking that passing reloid/ctid would allow this to
  work for tuples in a RETURNING clause; and if we ever have an OLD
  reference for the RETURNING clause of an UPDATE, that it would work
  there, too, showing the post-update status of the updated tuple.
 
 I don't understand what you're saying here.  Are you saying that
 reloid/ctid is a better approach, a worse approach, or just a
 different approach?

That probably wasn't worded very well.  I am just saying that whatever
approach we end up with, it would be nice that it worked somehow with
RETURNING clauses.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-20 Thread Robert Haas
On Fri, Dec 20, 2013 at 2:17 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Robert Haas escribió:
 On Fri, Dec 20, 2013 at 1:41 PM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:

  I assume without checking that passing reloid/ctid would allow this to
  work for tuples in a RETURNING clause; and if we ever have an OLD
  reference for the RETURNING clause of an UPDATE, that it would work
  there, too, showing the post-update status of the updated tuple.

 I don't understand what you're saying here.  Are you saying that
 reloid/ctid is a better approach, a worse approach, or just a
 different approach?

 That probably wasn't worded very well.  I am just saying that whatever
 approach we end up with, it would be nice that it worked somehow with
 RETURNING clauses.

Hmm.  It's not evident to me why that particular case would be any
different than anything else, but that might be my ignorance showing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-20 Thread Robert Haas
On Fri, Dec 20, 2013 at 7:51 AM, Andres Freund and...@2ndquadrant.com wrote:
 I wondered that, too, but it's not well-defined for all tuples.  What
 happens if you pass in constructed tuple rather than an on-disk tuple?

 Those should be discernible I think, t_self/t_tableOid won't be set for
 generated tuples.

I went looking for existing precedent for code that does things like
this and found record_out, which does this:

HeapTupleHeader rec = PG_GETARG_HEAPTUPLEHEADER(0);
...
/* Extract type info from the tuple itself */
tupType = HeapTupleHeaderGetTypeId(rec);
tupTypmod = HeapTupleHeaderGetTypMod(rec);
tupdesc = lookup_rowtype_tupdesc(tupType, tupTypmod);
ncolumns = tupdesc-natts;

/* Build a temporary HeapTuple control structure */
tuple.t_len = HeapTupleHeaderGetDatumLength(rec);
ItemPointerSetInvalid((tuple.t_self));
tuple.t_tableOid = InvalidOid;
tuple.t_data = rec;

This appears to be a typical pattern, although interestingly I noticed
that row_to_json() doesn't bother setting t_tableOid or t_self, which
I think it's supposed to do.  The problem I see here is that this code
seems to imply that a function passed a record doesn't actually have
enough information to know what sort of a thing it's getting.  The use
of HeapTupleHeaderGetTypeId and HeapTupleHeaderGetTypMod implies that
it's safe to assume that t_choice will contain DatumTupleFields rather
than HeapTupleFields, which doesn't seem to bode well for your
approach.

Am I missing a trick?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-19 Thread Andres Freund
On 2013-12-18 21:42:25 -0500, Robert Haas wrote:
 On Wed, Dec 18, 2013 at 5:54 PM, Andres Freund and...@2ndquadrant.com wrote:
if (frz-frzflags  XLH_FREEZE_XVAC)
  + {
HeapTupleHeaderSetXvac(tuple, FrozenTransactionId);
  + /* If we somehow haven't hinted the tuple previously, do it 
  now. */
  + HeapTupleHeaderSetXminCommitted(tuple);
  + }
 
  What's the reasoning behind adding HeapTupleHeaderSetXminCommitted()
  here?
 
 I'm just copying the existing logic.  See the final stanza of
 heap_prepare_freeze_tuple.

Yes, but why don't you keep that in heap_prepare_freeze_tuple()? Just
because of HeapTupleHeaderSetXminCommitted()? I dislike transporting the
infomask in the wal record and then changing it away from that again afterwards.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-19 Thread Robert Haas
On Thu, Dec 19, 2013 at 5:44 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-12-18 21:42:25 -0500, Robert Haas wrote:
 On Wed, Dec 18, 2013 at 5:54 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
if (frz-frzflags  XLH_FREEZE_XVAC)
  + {
HeapTupleHeaderSetXvac(tuple, FrozenTransactionId);
  + /* If we somehow haven't hinted the tuple previously, do it 
  now. */
  + HeapTupleHeaderSetXminCommitted(tuple);
  + }
 
  What's the reasoning behind adding HeapTupleHeaderSetXminCommitted()
  here?

 I'm just copying the existing logic.  See the final stanza of
 heap_prepare_freeze_tuple.

 Yes, but why don't you keep that in heap_prepare_freeze_tuple()? Just
 because of HeapTupleHeaderSetXminCommitted()?

Yes, that's pretty much it.

 I dislike transporting the
 infomask in the wal record and then changing it away from that again 
 afterwards.

I don't really see a problem with it.  Relying on the macros to tweak
the bits seems more future-proof than inventing some other way to do
it (that might even get copied into other parts of the code where it's
even less safe).  I actually think transporting the infomask is kind
of a funky way to handle this in the first instance, but I don't think
it's this patch's job to kibitz that decision.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-19 Thread Andres Freund
On 2013-12-19 07:40:40 -0500, Robert Haas wrote:
 On Thu, Dec 19, 2013 at 5:44 AM, Andres Freund and...@2ndquadrant.com wrote:
  On 2013-12-18 21:42:25 -0500, Robert Haas wrote:
 
  I dislike transporting the
  infomask in the wal record and then changing it away from that again 
  afterwards.
 
 I don't really see a problem with it.  Relying on the macros to tweak
 the bits seems more future-proof than inventing some other way to do
 it (that might even get copied into other parts of the code where it's
 even less safe).

Then there should be a macro to twiddle the infomask, without touching
the tuple.

  I actually think transporting the infomask is kind
 of a funky way to handle this in the first instance, but I don't think
 it's this patch's job to kibitz that decision.

It's not nice, I grant you that, but I don't see how to do it
otherwise. We can't yet set the hint bits in
heap_prepare_freeze_tuple(), as we're not in a critical section, and
thus haven't replaced eventual multixacts by plain xids.
Running it inside a critical section isn't really realistic, as we'd
either have to iterate over the whole page, including memory
allocations, inside one, or we'd have to WAL log each individual item.

We could obviously encode all the infomask setting required in flags
instructing heap_execute_freeze_tuple() to set them, but that seems more
complex without accompanying benefit.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-19 Thread Robert Haas
On Thu, Dec 19, 2013 at 9:19 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-12-19 07:40:40 -0500, Robert Haas wrote:
 On Thu, Dec 19, 2013 at 5:44 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2013-12-18 21:42:25 -0500, Robert Haas wrote:

  I dislike transporting the
  infomask in the wal record and then changing it away from that again 
  afterwards.

 I don't really see a problem with it.  Relying on the macros to tweak
 the bits seems more future-proof than inventing some other way to do
 it (that might even get copied into other parts of the code where it's
 even less safe).

 Then there should be a macro to twiddle the infomask, without touching
 the tuple.

Sure, we can invent that.  I personally don't like it as well.

  I actually think transporting the infomask is kind
 of a funky way to handle this in the first instance, but I don't think
 it's this patch's job to kibitz that decision.

 It's not nice, I grant you that, but I don't see how to do it
 otherwise. We can't yet set the hint bits in
 heap_prepare_freeze_tuple(), as we're not in a critical section, and
 thus haven't replaced eventual multixacts by plain xids.
 Running it inside a critical section isn't really realistic, as we'd
 either have to iterate over the whole page, including memory
 allocations, inside one, or we'd have to WAL log each individual item.

 We could obviously encode all the infomask setting required in flags
 instructing heap_execute_freeze_tuple() to set them, but that seems more
 complex without accompanying benefit.

Abstraction is a benefit unto itself.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-19 Thread Robert Haas
On Thu, Dec 19, 2013 at 9:37 AM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Dec 19, 2013 at 9:19 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-12-19 07:40:40 -0500, Robert Haas wrote:
 On Thu, Dec 19, 2013 at 5:44 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2013-12-18 21:42:25 -0500, Robert Haas wrote:

  I dislike transporting the
  infomask in the wal record and then changing it away from that again 
  afterwards.

 I don't really see a problem with it.  Relying on the macros to tweak
 the bits seems more future-proof than inventing some other way to do
 it (that might even get copied into other parts of the code where it's
 even less safe).

 Then there should be a macro to twiddle the infomask, without touching
 the tuple.

 Sure, we can invent that.  I personally don't like it as well.

After some off-list discussion via IM I propose the following
compromise: it reverts my changes to do some of the infomask bit
twaddling in the execute function, but doesn't invent new macros
either.  My main concern about inventing new macros is that I don't
want to encourage people to write code that knows specifically which
parts of the heap tuple header are in which fields.  I think it may
have been a mistake to divide responsibility between the prepare and
execute functions the way we did in this case, because it doesn't
appear to be a clean separation of concerns.  But it's not this
patch's job to kibitz that decision, so this version just fits in with
the way things are already being done there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/contrib/pageinspect/heapfuncs.c b/contrib/pageinspect/heapfuncs.c
index 6d8f6f1..a78cff3 100644
--- a/contrib/pageinspect/heapfuncs.c
+++ b/contrib/pageinspect/heapfuncs.c
@@ -162,7 +162,7 @@ heap_page_items(PG_FUNCTION_ARGS)
 
 			tuphdr = (HeapTupleHeader) PageGetItem(page, id);
 
-			values[4] = UInt32GetDatum(HeapTupleHeaderGetXmin(tuphdr));
+			values[4] = UInt32GetDatum(HeapTupleHeaderGetRawXmin(tuphdr));
 			values[5] = UInt32GetDatum(HeapTupleHeaderGetRawXmax(tuphdr));
 			values[6] = UInt32GetDatum(HeapTupleHeaderGetRawCommandId(tuphdr)); /* shared with xvac */
 			values[7] = PointerGetDatum(tuphdr-t_ctid);
diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c
index e39b977..347d616 100644
--- a/src/backend/access/common/heaptuple.c
+++ b/src/backend/access/common/heaptuple.c
@@ -539,7 +539,7 @@ heap_getsysattr(HeapTuple tup, int attnum, TupleDesc tupleDesc, bool *isnull)
 			result = ObjectIdGetDatum(HeapTupleGetOid(tup));
 			break;
 		case MinTransactionIdAttributeNumber:
-			result = TransactionIdGetDatum(HeapTupleHeaderGetXmin(tup-t_data));
+			result = TransactionIdGetDatum(HeapTupleHeaderGetRawXmin(tup-t_data));
 			break;
 		case MaxTransactionIdAttributeNumber:
 			result = TransactionIdGetDatum(HeapTupleHeaderGetRawXmax(tup-t_data));
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index db683b1..deacd7c 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1738,7 +1738,7 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
 		 */
 		if (TransactionIdIsValid(prev_xmax) 
 			!TransactionIdEquals(prev_xmax,
- HeapTupleHeaderGetXmin(heapTuple-t_data)))
+ HeapTupleHeaderGetRawXmin(heapTuple-t_data)))
 			break;
 
 		/*
@@ -1908,7 +1908,7 @@ heap_get_latest_tid(Relation relation,
 		 * tuple.  Check for XMIN match.
 		 */
 		if (TransactionIdIsValid(priorXmax) 
-		  !TransactionIdEquals(priorXmax, HeapTupleHeaderGetXmin(tp.t_data)))
+		  !TransactionIdEquals(priorXmax, HeapTupleHeaderGetRawXmin(tp.t_data)))
 		{
 			UnlockReleaseBuffer(buffer);
 			break;
@@ -2257,13 +2257,10 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
 	tup-t_data-t_infomask = ~(HEAP_XACT_MASK);
 	tup-t_data-t_infomask2 = ~(HEAP2_XACT_MASK);
 	tup-t_data-t_infomask |= HEAP_XMAX_INVALID;
+	HeapTupleHeaderSetXmin(tup-t_data, xid);
 	if (options  HEAP_INSERT_FROZEN)
-	{
-		tup-t_data-t_infomask |= HEAP_XMIN_COMMITTED;
-		HeapTupleHeaderSetXmin(tup-t_data, FrozenTransactionId);
-	}
-	else
-		HeapTupleHeaderSetXmin(tup-t_data, xid);
+		HeapTupleHeaderSetXminFrozen(tup-t_data);
+
 	HeapTupleHeaderSetCmin(tup-t_data, cid);
 	HeapTupleHeaderSetXmax(tup-t_data, 0);		/* for cleanliness */
 	tup-t_tableOid = RelationGetRelid(relation);
@@ -5094,7 +5091,7 @@ l4:
 		 * the end of the chain, we're done, so return success.
 		 */
 		if (TransactionIdIsValid(priorXmax) 
-			!TransactionIdEquals(HeapTupleHeaderGetXmin(mytup.t_data),
+			!TransactionIdEquals(HeapTupleHeaderGetRawXmin(mytup.t_data),
  priorXmax))
 		{
 			UnlockReleaseBuffer(buf);
@@ -5724,13 +5721,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid,
 	if (TransactionIdIsNormal(xid) 
 		TransactionIdPrecedes(xid, cutoff_xid))
 	{
-		frz-frzflags |= 

Re: [HACKERS] preserving forensic information when we freeze

2013-12-19 Thread Alvaro Herrera
Robert Haas escribió:

 I think it may have been a mistake to divide responsibility between
 the prepare and execute functions the way we did in this case, because
 it doesn't appear to be a clean separation of concerns.  But it's not
 this patch's job to kibitz that decision, so this version just fits in
 with the way things are already being done there.

If you want to change how it works, feel free to propose a patch; I'm
not in love with what we're doing, honestly, and I did propose the idea
of using some flag bits instead of the whole mask, but didn't get any
traction.  (Not that that thread had a lot of participants.)

Or are you suggesting that I should do it?  I would have welcomed
feedback when I was on that, but I have moved on to other things now,
and I don't want to be a blocker for the forensic freeze patch.

Anyway I think if we want to change it, the time is now, before we
release 9.3.3.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-19 Thread Robert Haas
On Thu, Dec 19, 2013 at 3:36 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Robert Haas escribió:
 I think it may have been a mistake to divide responsibility between
 the prepare and execute functions the way we did in this case, because
 it doesn't appear to be a clean separation of concerns.  But it's not
 this patch's job to kibitz that decision, so this version just fits in
 with the way things are already being done there.

 If you want to change how it works, feel free to propose a patch; I'm
 not in love with what we're doing, honestly, and I did propose the idea
 of using some flag bits instead of the whole mask, but didn't get any
 traction.  (Not that that thread had a lot of participants.)

 Or are you suggesting that I should do it?  I would have welcomed
 feedback when I was on that, but I have moved on to other things now,
 and I don't want to be a blocker for the forensic freeze patch.

 Anyway I think if we want to change it, the time is now, before we
 release 9.3.3.

I am sorry I wasn't able to follow that thread in more detail at the
time, and I'm not trying to create extra work for you now.  You've put
a lot of work into stabilizing the fkeylocks stuff and it's not my
purpose to cast aspersions on that, nor do I think that this is so bad
we can't live with it.  The somewhat ambiguous phrasing of that email
is attributable to the fact that I really don't have a clear idea what
would be better than what you've got here now, and even if I did, I'm
not eager to be the guy who insists on refactoring and breaks things
again in the process.

It's tempting to think that the prepare function log a set of flags
indicating what logical operations should be performed on the target
tuple, rather than the new infomask and infomask2 fields per se; or
else that we ought to just log the whole HeapTupleHeader, so that the
execute function just copies the data in, splat.  In other words, make
the logging either purely logical, or purely physical, not a mix.  But
making it purely logical would involve translating between multiple
sets of flags in a way that might not accomplish much beyond
increasing the possibility for error, and making it purely physical
would make the WAL record bigger for no real gain.  So perhaps the way
you've chosen is best after all, despite my reservations.

But in either case, it was not my purpose to hijack this thread to
talk about that patch, just to explain how I've updated this patch to
(hopefully) satisfy Andres's concerns.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-19 Thread Jim Nasby

One thing users will lose in this patch is the ability to reliably see if a 
tuple is frozen via SQL. Today you can do that just by selecting xmin from the 
table.

Obviously people don't generally need to do that... but it's one of those 
things that when you do need it it's incredibly handy to have... would it be 
difficult to expose infomask(2) via SQL, the same way xmin et all are?
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-19 Thread Alvaro Herrera
Jim Nasby escribió:
 One thing users will lose in this patch is the ability to reliably see if a 
 tuple is frozen via SQL. Today you can do that just by selecting xmin from 
 the table.
 
 Obviously people don't generally need to do that... but it's one of those 
 things that when you do need it it's incredibly handy to have... would it be 
 difficult to expose infomask(2) via SQL, the same way xmin et all are?

It's already exposed via the pageinspect extension.  It's doubtful that
there are many valid cases where you need infomask but don't have access
to that module.

The real fix seems to ensure that the module is always available.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-18 Thread Andres Freund
Hi,

On 2013-12-17 16:00:14 -0500, Robert Haas wrote:
 @@ -5874,19 +5858,27 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, 
 TransactionId cutoff_xid,
  void
  heap_execute_freeze_tuple(HeapTupleHeader tuple, xl_heap_freeze_tuple *frz)
  {
 + tuple-t_infomask = frz-t_infomask;
 + tuple-t_infomask2 = frz-t_infomask2;
 +
   if (frz-frzflags  XLH_FREEZE_XMIN)
 - HeapTupleHeaderSetXmin(tuple, FrozenTransactionId);
 + HeapTupleHeaderSetXminFrozen(tuple);
  
   HeapTupleHeaderSetXmax(tuple, frz-xmax);
  
   if (frz-frzflags  XLH_FREEZE_XVAC)
 + {
   HeapTupleHeaderSetXvac(tuple, FrozenTransactionId);
 + /* If we somehow haven't hinted the tuple previously, do it 
 now. */
 + HeapTupleHeaderSetXminCommitted(tuple);
 + }

What's the reasoning behind adding HeapTupleHeaderSetXminCommitted()
here?

 @@ -823,14 +823,14 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
* NB: Like with per-tuple hint bits, 
 we can't set the
* PD_ALL_VISIBLE flag if the inserter 
 committed
* asynchronously. See SetHintBits for 
 more info. Check
 -  * that the HEAP_XMIN_COMMITTED hint 
 bit is set because of
 -  * that.
 +  * that the tuple is hinted 
 xmin-committed hint bit because
 +  * of that.
*/

Looks like there's a hint bit too much here.

 @@ -65,6 +65,9 @@ manage to be a conflict it would merely mean that one 
 bit-update would
  be lost and need to be done again later.  These four bits are only hints
  (they cache the results of transaction status lookups in pg_clog), so no
  great harm is done if they get reset to zero by conflicting updates.
 +Note, however, that a tuple is frozen by setting both HEAP_XMIN_INVALID
 +and HEAP_XMIN_COMMITTED; this is a critical update and accordingly requires
 +an exclusive buffer lock.

I think it'd be appropriate to mention that this needs to be preserved
via WAL logging, it sounds like it's suficient to set a hint bit without
persistenc guarantees.

(not sure if I already wrote this, but whatever)

Looking good.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-18 Thread Robert Haas
On Wed, Dec 18, 2013 at 5:54 PM, Andres Freund and...@2ndquadrant.com wrote:
   if (frz-frzflags  XLH_FREEZE_XVAC)
 + {
   HeapTupleHeaderSetXvac(tuple, FrozenTransactionId);
 + /* If we somehow haven't hinted the tuple previously, do it 
 now. */
 + HeapTupleHeaderSetXminCommitted(tuple);
 + }

 What's the reasoning behind adding HeapTupleHeaderSetXminCommitted()
 here?

I'm just copying the existing logic.  See the final stanza of
heap_prepare_freeze_tuple.

 [ snip ]

Will fix.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-17 Thread Robert Haas
On Wed, Dec 11, 2013 at 5:25 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Andres Freund escribió:
 What's your plan to commit this? I'd prefer to wait till Alvaro's
 freezing changes get in, so his patch will look the same in HEAD and
 9.3. But I think he plans to commit soon.

 Yes, I do.  I'm waiting on feedback on the patch I posted this
 afternoon, so if there's nothing more soon I will push it.

That's done now, so I've rebased this patch and hacked on it a bit
more.  The latest version is attached.  Review would be appreciate in
case I've goofed up anything critical, especially around adjusting
things over top of Alvaro's freezing changes.  But I think this is
more or less ready to go.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/contrib/pageinspect/heapfuncs.c b/contrib/pageinspect/heapfuncs.c
index 6d8f6f1..a78cff3 100644
--- a/contrib/pageinspect/heapfuncs.c
+++ b/contrib/pageinspect/heapfuncs.c
@@ -162,7 +162,7 @@ heap_page_items(PG_FUNCTION_ARGS)
 
 			tuphdr = (HeapTupleHeader) PageGetItem(page, id);
 
-			values[4] = UInt32GetDatum(HeapTupleHeaderGetXmin(tuphdr));
+			values[4] = UInt32GetDatum(HeapTupleHeaderGetRawXmin(tuphdr));
 			values[5] = UInt32GetDatum(HeapTupleHeaderGetRawXmax(tuphdr));
 			values[6] = UInt32GetDatum(HeapTupleHeaderGetRawCommandId(tuphdr)); /* shared with xvac */
 			values[7] = PointerGetDatum(tuphdr-t_ctid);
diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c
index e39b977..347d616 100644
--- a/src/backend/access/common/heaptuple.c
+++ b/src/backend/access/common/heaptuple.c
@@ -539,7 +539,7 @@ heap_getsysattr(HeapTuple tup, int attnum, TupleDesc tupleDesc, bool *isnull)
 			result = ObjectIdGetDatum(HeapTupleGetOid(tup));
 			break;
 		case MinTransactionIdAttributeNumber:
-			result = TransactionIdGetDatum(HeapTupleHeaderGetXmin(tup-t_data));
+			result = TransactionIdGetDatum(HeapTupleHeaderGetRawXmin(tup-t_data));
 			break;
 		case MaxTransactionIdAttributeNumber:
 			result = TransactionIdGetDatum(HeapTupleHeaderGetRawXmax(tup-t_data));
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index db683b1..a9fcd98 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1738,7 +1738,7 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
 		 */
 		if (TransactionIdIsValid(prev_xmax) 
 			!TransactionIdEquals(prev_xmax,
- HeapTupleHeaderGetXmin(heapTuple-t_data)))
+ HeapTupleHeaderGetRawXmin(heapTuple-t_data)))
 			break;
 
 		/*
@@ -1908,7 +1908,7 @@ heap_get_latest_tid(Relation relation,
 		 * tuple.  Check for XMIN match.
 		 */
 		if (TransactionIdIsValid(priorXmax) 
-		  !TransactionIdEquals(priorXmax, HeapTupleHeaderGetXmin(tp.t_data)))
+		  !TransactionIdEquals(priorXmax, HeapTupleHeaderGetRawXmin(tp.t_data)))
 		{
 			UnlockReleaseBuffer(buffer);
 			break;
@@ -2257,13 +2257,10 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
 	tup-t_data-t_infomask = ~(HEAP_XACT_MASK);
 	tup-t_data-t_infomask2 = ~(HEAP2_XACT_MASK);
 	tup-t_data-t_infomask |= HEAP_XMAX_INVALID;
+	HeapTupleHeaderSetXmin(tup-t_data, xid);
 	if (options  HEAP_INSERT_FROZEN)
-	{
-		tup-t_data-t_infomask |= HEAP_XMIN_COMMITTED;
-		HeapTupleHeaderSetXmin(tup-t_data, FrozenTransactionId);
-	}
-	else
-		HeapTupleHeaderSetXmin(tup-t_data, xid);
+		HeapTupleHeaderSetXminFrozen(tup-t_data);
+
 	HeapTupleHeaderSetCmin(tup-t_data, cid);
 	HeapTupleHeaderSetXmax(tup-t_data, 0);		/* for cleanliness */
 	tup-t_tableOid = RelationGetRelid(relation);
@@ -5094,7 +5091,7 @@ l4:
 		 * the end of the chain, we're done, so return success.
 		 */
 		if (TransactionIdIsValid(priorXmax) 
-			!TransactionIdEquals(HeapTupleHeaderGetXmin(mytup.t_data),
+			!TransactionIdEquals(HeapTupleHeaderGetRawXmin(mytup.t_data),
  priorXmax))
 		{
 			UnlockReleaseBuffer(buf);
@@ -5725,12 +5722,6 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid,
 		TransactionIdPrecedes(xid, cutoff_xid))
 	{
 		frz-frzflags |= XLH_FREEZE_XMIN;
-
-		/*
-		 * Might as well fix the hint bits too; usually XMIN_COMMITTED will
-		 * already be set here, but there's a small chance not.
-		 */
-		frz-t_infomask |= HEAP_XMIN_COMMITTED;
 		changed = true;
 	}
 
@@ -5837,13 +5828,6 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid,
 frz-frzflags |= XLH_INVALID_XVAC;
 			else
 frz-frzflags |= XLH_FREEZE_XVAC;
-
-			/*
-			 * Might as well fix the hint bits too; usually XMIN_COMMITTED
-			 * will already be set here, but there's a small chance not.
-			 */
-			Assert(!(tuple-t_infomask  HEAP_XMIN_INVALID));
-			frz-t_infomask |= HEAP_XMIN_COMMITTED;
 			changed = true;
 		}
 	}
@@ -5874,19 +5858,27 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid,
 void
 heap_execute_freeze_tuple(HeapTupleHeader tuple, xl_heap_freeze_tuple *frz)
 {

Re: [HACKERS] preserving forensic information when we freeze

2013-12-11 Thread Robert Haas
On Thu, Nov 21, 2013 at 4:51 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-11-21 15:59:35 -0500, Robert Haas wrote:
  * Should HeapTupleHeaderXminFrozen also check for FrozenTransactionId?
It seems quite possible that people think they've delt with frozen
xmin entirely after checking, but they still might get
FrozenTransactionId back in a pg_upgraded cluster.

 The reason I originally wrote the patch the way I did, rather than the
 way that you prefer, is that it minimizes the number of places where
 we might perform extra tests that are known not to be needed in
 context.  These code paths are hot.

 The patch as sent shouldn't really do that in any of paths I know to be
 hot - it uses *RawXmin() there.

 If you do this sort of thing,  then after macro expansion we may end up with 
 a lot of things like:
 (flags  FROZEN) || (rawxid == 2) ? 2 : rawxid.  I want to avoid that.

 But in which cases would that actually be slower? There'll be no
 additional code executed if the hint bits for frozen are set, and in
 case not it will usually safe us an external function call to
 TransactionIdPrecedes().

Dunno.  It's at least more code generation.

  That macros is intended, specifically, to be a test for flag bits,
 and I think it should do precisely that.  If that's not what you want,
 then don't use that macro.

 That's a fair argument. Although there's several HeapTupleHeader* macros
 that muck with stuff besides infomask.

Sure, but that doesn't mean they ALL have to.

  * Existing htup_details boolean checks contain an 'Is', but
HeapTupleHeaderXminCommitted, HeapTupleHeaderXminInvalid,
HeapTupleHeaderXminFrozen don't contain any verb. Not sure.

 We could say XminIsComitted, XminIsInvalid, XminIsFrozen, etc.  I
 don't particularly care for it, but I can see the argument for it.

 I don't have a clear preference either, I just noticed the inconsistency
 and wasn't sure whether it was intentional.

It was intentional enough.  :-)

  I think once we have this we should start opportunistically try to
  freeze tuples during vacuum using OldestXmin instead of FreezeLimit if
  the page is already dirty.

 Separate patch, but yeah, something like that.  If we have to mark the
 page all-visible, we might as well freeze it while we're there.  We
 should think about how it interacts with Heikki's freeze-without-write
 patch though.

 Definitely separate yes. And I agree, it's partially moot if Heikki's
 patch gets in, but I am not sure it will make it into 9.4. There seems
 to be quite some work left.

I haven't heard anything further from Heikki, so I'm thinking we
should proceed with this approach.  It seems to be the path of least
resistance, if not essential, for making CLUSTER freeze everything
automatically, a change almost everyone seems to really want.  Even if
we did have Heikki's stuff, making cluster freeze more aggressively is
still a good argument for doing this.  The pages can then be marked
all-visible (something Bruce is working on) and never need to be
revisited.  Without this, I don't think we can get there.  If we also
handle the vacuum-dirtied-it-already case as you propose here, I think
we'd have quite a respectable improvement in vacuum behavior for 9.4,
even without Heikki's stuff.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-11 Thread Heikki Linnakangas

On 12/11/2013 09:17 PM, Robert Haas wrote:

On Thu, Nov 21, 2013 at 4:51 PM, Andres Freund and...@2ndquadrant.com wrote:

On 2013-11-21 15:59:35 -0500, Robert Haas wrote:

Separate patch, but yeah, something like that.  If we have to mark the
page all-visible, we might as well freeze it while we're there.  We
should think about how it interacts with Heikki's freeze-without-write
patch though.


Definitely separate yes. And I agree, it's partially moot if Heikki's
patch gets in, but I am not sure it will make it into 9.4. There seems
to be quite some work left.


I haven't heard anything further from Heikki, so I'm thinking we
should proceed with this approach.


+1. It seems unlikely that my patch is going to make it into 9.4.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-11 Thread Andres Freund
On 2013-12-11 14:17:27 -0500, Robert Haas wrote:
  But in which cases would that actually be slower? There'll be no
  additional code executed if the hint bits for frozen are set, and in
  case not it will usually safe us an external function call to
  TransactionIdPrecedes().
 
 Dunno.  It's at least more code generation.

IMO it looks cleaner and is less error prone, so an additional
instruction or two in the slow path doesn't seem a high price. And we're
really not talking about more.

But I won't do more than roll my eyes loudly if you decide to go ahead
with your version as long as you change rewriteheap.c accordingly.

   I think once we have this we should start opportunistically try to
   freeze tuples during vacuum using OldestXmin instead of FreezeLimit if
   the page is already dirty.
 
  Separate patch, but yeah, something like that.  If we have to mark the
  page all-visible, we might as well freeze it while we're there.  We
  should think about how it interacts with Heikki's freeze-without-write
  patch though.
 
  Definitely separate yes. And I agree, it's partially moot if Heikki's
  patch gets in, but I am not sure it will make it into 9.4. There seems
  to be quite some work left.
 
 I haven't heard anything further from Heikki, so I'm thinking we
 should proceed with this approach.

Yea, I think by now it's pretty unlikely to get into 9.4. I hope for
early 9.5 tho.

What's your plan to commit this? I'd prefer to wait till Alvaro's
freezing changes get in, so his patch will look the same in HEAD and
9.3. But I think he plans to commit soon.

 If we also
 handle the vacuum-dirtied-it-already case as you propose here, I think
 we'd have quite a respectable improvement in vacuum behavior for 9.4,
 even without Heikki's stuff.

Yes, and it seems like a realistic goal, so let's go for it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-12-11 Thread Alvaro Herrera
Andres Freund escribió:

 What's your plan to commit this? I'd prefer to wait till Alvaro's
 freezing changes get in, so his patch will look the same in HEAD and
 9.3. But I think he plans to commit soon.

Yes, I do.  I'm waiting on feedback on the patch I posted this
afternoon, so if there's nothing more soon I will push it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-11-24 Thread Simon Riggs
On 21 November 2013 16:51, Andres Freund and...@2ndquadrant.com wrote:

 Definitely separate yes. And I agree, it's partially moot if Heikki's
 patch gets in, but I am not sure it will make it into 9.4. There seems
 to be quite some work left.

I'd prefer to do all 3 patches in one release. Don't mind which one.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-11-21 Thread Andres Freund
On 2013-11-21 15:59:35 -0500, Robert Haas wrote:
  * Should HeapTupleHeaderXminFrozen also check for FrozenTransactionId?
It seems quite possible that people think they've delt with frozen
xmin entirely after checking, but they still might get
FrozenTransactionId back in a pg_upgraded cluster.
 
 The reason I originally wrote the patch the way I did, rather than the
 way that you prefer, is that it minimizes the number of places where
 we might perform extra tests that are known not to be needed in
 context.  These code paths are hot.

The patch as sent shouldn't really do that in any of paths I know to be
hot - it uses *RawXmin() there.

 If you do this sort of thing,  then after macro expansion we may end up with 
 a lot of things like:
 (flags  FROZEN) || (rawxid == 2) ? 2 : rawxid.  I want to avoid that.

But in which cases would that actually be slower? There'll be no
additional code executed if the hint bits for frozen are set, and in
case not it will usually safe us an external function call to
TransactionIdPrecedes().

  That macros is intended, specifically, to be a test for flag bits,
 and I think it should do precisely that.  If that's not what you want,
 then don't use that macro.

That's a fair argument. Although there's several HeapTupleHeader* macros
that muck with stuff besides infomask.

  * Existing htup_details boolean checks contain an 'Is', but
HeapTupleHeaderXminCommitted, HeapTupleHeaderXminInvalid,
HeapTupleHeaderXminFrozen don't contain any verb. Not sure.
 
 We could say XminIsComitted, XminIsInvalid, XminIsFrozen, etc.  I
 don't particularly care for it, but I can see the argument for it.

I don't have a clear preference either, I just noticed the inconsistency
and wasn't sure whether it was intentional.

  * EvalPlanQualFetch() uses priorXmax like logic to find updated versions
of tuples. I am not 100% sure there aren't cases where that's
problematic even with the current code, but I think it's better not to
use the raw xid there - priorXmax can never be FrozenTransactionId, so
comparing it to the GetXmin() seems better.
It also has the following wrong comment:
   *   (Should be safe to examine xmin without getting
   * buffer's content lock, since xmin never changes in an existing
   * tuple.)
 
 Hmm.  The new tuple can't be frozen unless it's all-visible, and it
 can't be all-visible if our scan can still see its predecessor.  That
 would imply that if it's frozen, it must be an unrelated tuple.  I
 think.

Yes, that's my reasoning as well - and why I think we should check for
new-style frozen xids. Either via my version of GetXmin() or
HeapTupleHeaderXminFrozen() if we go with yours.

  I think once we have this we should start opportunistically try to
  freeze tuples during vacuum using OldestXmin instead of FreezeLimit if
  the page is already dirty.
 
 Separate patch, but yeah, something like that.  If we have to mark the
 page all-visible, we might as well freeze it while we're there.  We
 should think about how it interacts with Heikki's freeze-without-write
 patch though.

Definitely separate yes. And I agree, it's partially moot if Heikki's
patch gets in, but I am not sure it will make it into 9.4. There seems
to be quite some work left.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-11-21 Thread Andres Freund
Hi,

As promised I'm currently updating the patch. Some questions arose
during that:

* Should HeapTupleHeaderXminFrozen also check for FrozenTransactionId?
  It seems quite possible that people think they've delt with frozen
  xmin entirely after checking, but they still might get
  FrozenTransactionId back in a pg_upgraded cluster.
* Existing htup_details boolean checks contain an 'Is', but
  HeapTupleHeaderXminCommitted, HeapTupleHeaderXminInvalid,
  HeapTupleHeaderXminFrozen don't contain any verb. Not sure.
* EvalPlanQualFetch() uses priorXmax like logic to find updated versions
  of tuples. I am not 100% sure there aren't cases where that's
  problematic even with the current code, but I think it's better not to
  use the raw xid there - priorXmax can never be FrozenTransactionId, so
  comparing it to the GetXmin() seems better.
  It also has the following wrong comment:
 *   (Should be safe to examine xmin without getting
 * buffer's content lock, since xmin never changes in an existing
 * tuple.)
  But I don't see that causing any problems.
* ri_trigger.c did do a TransactionIdIsCurrentTransactionId() on a raw
  xmin value, the consequences are minor though.

The rewrite to introduce HeapTupleHeaderGet[Raw]Xmin() indeed somewhat
increases the patchsize - but I think it's enough increase in
expressiveness to be well worth the cost. Indeed it led me to find at
least one issue (with minor consequences).

I think once we have this we should start opportunistically try to
freeze tuples during vacuum using OldestXmin instead of FreezeLimit if
the page is already dirty.

Patch 01 is a rebased version of Robert's patch without further changes,
02 contains my suggested modifications. 

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From ac6c9a15836414208c49393d05d2d7d29e8fae4a Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Thu, 21 Nov 2013 12:57:20 +0100
Subject: [PATCH 1/2] freeze-forensically-v1

Robert Haas
---
 src/backend/access/heap/heapam.c  | 50 +++
 src/backend/access/heap/rewriteheap.c |  1 +
 src/backend/catalog/index.c   |  1 +
 src/backend/commands/analyze.c|  1 +
 src/backend/commands/cluster.c|  1 +
 src/backend/commands/sequence.c   |  4 +--
 src/backend/commands/typecmds.c   |  3 ++-
 src/backend/commands/vacuumlazy.c |  9 +--
 src/backend/optimizer/util/plancat.c  |  1 +
 src/backend/storage/lmgr/predicate.c  |  7 -
 src/backend/utils/adt/ri_triggers.c   |  6 +++--
 src/backend/utils/cache/relcache.c|  8 --
 src/backend/utils/time/combocid.c |  8 +++---
 src/backend/utils/time/tqual.c| 31 +++---
 src/include/access/htup_details.h | 35 +++-
 15 files changed, 108 insertions(+), 58 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a21f31b..eb56230 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2221,13 +2221,9 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
 	tup-t_data-t_infomask = ~(HEAP_XACT_MASK);
 	tup-t_data-t_infomask2 = ~(HEAP2_XACT_MASK);
 	tup-t_data-t_infomask |= HEAP_XMAX_INVALID;
+	HeapTupleHeaderSetXmin(tup-t_data, xid);
 	if (options  HEAP_INSERT_FROZEN)
-	{
-		tup-t_data-t_infomask |= HEAP_XMIN_COMMITTED;
-		HeapTupleHeaderSetXmin(tup-t_data, FrozenTransactionId);
-	}
-	else
-		HeapTupleHeaderSetXmin(tup-t_data, xid);
+		HeapTupleHeaderSetXminFrozen(tup-t_data);
 	HeapTupleHeaderSetCmin(tup-t_data, cid);
 	HeapTupleHeaderSetXmax(tup-t_data, 0);		/* for cleanliness */
 	tup-t_tableOid = RelationGetRelid(relation);
@@ -5120,19 +5116,15 @@ heap_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid,
 	bool		changed = false;
 	TransactionId xid;
 
-	xid = HeapTupleHeaderGetXmin(tuple);
-	if (TransactionIdIsNormal(xid) 
-		TransactionIdPrecedes(xid, cutoff_xid))
+	if (!HeapTupleHeaderXminFrozen(tuple))
 	{
-		HeapTupleHeaderSetXmin(tuple, FrozenTransactionId);
-
-		/*
-		 * Might as well fix the hint bits too; usually XMIN_COMMITTED will
-		 * already be set here, but there's a small chance not.
-		 */
-		Assert(!(tuple-t_infomask  HEAP_XMIN_INVALID));
-		tuple-t_infomask |= HEAP_XMIN_COMMITTED;
-		changed = true;
+		xid = HeapTupleHeaderGetXmin(tuple);
+		if (TransactionIdIsNormal(xid) 
+			TransactionIdPrecedes(xid, cutoff_xid))
+		{
+			HeapTupleHeaderSetXminFrozen(tuple);
+			changed = true;
+		}
 	}
 
 	/*
@@ -5185,8 +5177,7 @@ heap_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid,
 			 * Might as well fix the hint bits too; usually XMIN_COMMITTED
 			 * will already be set here, but there's a small chance not.
 			 */
-			Assert(!(tuple-t_infomask  HEAP_XMIN_INVALID));
-			tuple-t_infomask |= HEAP_XMIN_COMMITTED;
+			HeapTupleHeaderSetXminCommitted(tuple);
 			

Re: [HACKERS] preserving forensic information when we freeze

2013-11-21 Thread Robert Haas
On Thu, Nov 21, 2013 at 9:47 AM, Andres Freund and...@2ndquadrant.com wrote:
 As promised I'm currently updating the patch. Some questions arose
 during that:

 * Should HeapTupleHeaderXminFrozen also check for FrozenTransactionId?
   It seems quite possible that people think they've delt with frozen
   xmin entirely after checking, but they still might get
   FrozenTransactionId back in a pg_upgraded cluster.

The reason I originally wrote the patch the way I did, rather than the
way that you prefer, is that it minimizes the number of places where
we might perform extra tests that are known not to be needed in
context.  These code paths are hot.  If you do this sort of thing,
then after macro expansion we may end up with a lot of things like:
(flags  FROZEN) || (rawxid == 2) ? 2 : rawxid.  I want to avoid that.
 That macros is intended, specifically, to be a test for flag bits,
and I think it should do precisely that.  If that's not what you want,
then don't use that macro.

 * Existing htup_details boolean checks contain an 'Is', but
   HeapTupleHeaderXminCommitted, HeapTupleHeaderXminInvalid,
   HeapTupleHeaderXminFrozen don't contain any verb. Not sure.

We could say XminIsComitted, XminIsInvalid, XminIsFrozen, etc.  I
don't particularly care for it, but I can see the argument for it.

 * EvalPlanQualFetch() uses priorXmax like logic to find updated versions
   of tuples. I am not 100% sure there aren't cases where that's
   problematic even with the current code, but I think it's better not to
   use the raw xid there - priorXmax can never be FrozenTransactionId, so
   comparing it to the GetXmin() seems better.
   It also has the following wrong comment:
  *   (Should be safe to examine xmin without getting
  * buffer's content lock, since xmin never changes in an existing
  * tuple.)

Hmm.  The new tuple can't be frozen unless it's all-visible, and it
can't be all-visible if our scan can still see its predecessor.  That
would imply that if it's frozen, it must be an unrelated tuple.  I
think.

   But I don't see that causing any problems.
 * ri_trigger.c did do a TransactionIdIsCurrentTransactionId() on a raw
   xmin value, the consequences are minor though.

 The rewrite to introduce HeapTupleHeaderGet[Raw]Xmin() indeed somewhat
 increases the patchsize - but I think it's enough increase in
 expressiveness to be well worth the cost. Indeed it led me to find at
 least one issue (with minor consequences).

 I think once we have this we should start opportunistically try to
 freeze tuples during vacuum using OldestXmin instead of FreezeLimit if
 the page is already dirty.

Separate patch, but yeah, something like that.  If we have to mark the
page all-visible, we might as well freeze it while we're there.  We
should think about how it interacts with Heikki's freeze-without-write
patch though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-07-08 Thread Josh Berkus
On 07/03/2013 11:59 AM, Robert Haas wrote:
 Yeah.  I think the system columns that we have today are pretty much
 crap.  I wonder if we couldn't throw them out and replace them with
 some kind of functions that you can pass a row to.  That would allow
 us to expose a lot more detail without adding a bajillion hidden
 columns, and for a bonus we'd save substantially on catalog bloat.

Where are we on this patch?  Should I mark it Returned and you'll work
on it for September?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-07-08 Thread Robert Haas
On Jul 8, 2013, at 1:34 PM, Josh Berkus j...@agliodbs.com wrote:
 On 07/03/2013 11:59 AM, Robert Haas wrote:
 Yeah.  I think the system columns that we have today are pretty much
 crap.  I wonder if we couldn't throw them out and replace them with
 some kind of functions that you can pass a row to.  That would allow
 us to expose a lot more detail without adding a bajillion hidden
 columns, and for a bonus we'd save substantially on catalog bloat.
 
 Where are we on this patch?  Should I mark it Returned and you'll work
 on it for September?

Sure.

...Robert


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-07-03 Thread Andres Freund
On 2013-07-02 16:29:56 -0400, Robert Haas wrote:
 On Mon, Jun 24, 2013 at 8:12 AM, Andres Freund and...@2ndquadrant.com wrote:
  Agreed. And it also improves the status quo when debugging. I wish this
  would have been the representation since the beginning.
 
  Some remarks:
  * I don't really like that HeapTupleHeaderXminFrozen() now is frequently
performed without checking for FrozenTransactionId. I think the places
where that's done are currently safe, but it seems likely that we
introduce bugs due to people writing similar code.
I think replacing *GetXmin by a wrapper that returns
FrozenTransactionId if the hint bit tell us so would be a good
idea. Then add *GetRawXmin for the rest (probably mostly display).
Seems like it would make the patch actually smaller.
 
 I did think about this approach, but it seemed like it would add
 cycles in a significant number of places.  For example, consider the
 HeapTupleSatisfies() routines, which are perhaps PostgreSQL's finest
 example of a place where you DON'T want to add any cycles.  All of the
 checks on xmin are conditional on HeapTupleHeaderXminCommitted()
 having been found already to be false.  That implies that the frozen
 bits aren't set either, so if HeapTupleHeaderGetXmin() were to recheck
 the bits it would be a waste.  As I got to the end of the patch I was
 a little dismayed by the number of places that did need adjustment to
 check both things, but there are still plenty of important places that
 don't.

Well, nothing would prevent using the HeapTupleHeaderGetRawXmin() in
those places. Exactly the number of callsites is what makes me think
that somebody will get this wrong in the future.

  * The PLs imo should set fn_xmin to FrozenTransactionId if the hint bit
tell us the tuple is frozen.
 
 Why?  I thought about that, but it seemed to me that the purpose of
 that caching was to avoid confusing two functions whose pg_proc tuples
 ended up at the same TID.
 [good reasoning]

Man. I hate this hack. I wonder what happens if somebody does a VACUUM
FULL of pg_class and there are a bunch of functions created in the same
transaction...

  * I think rewrite_heap_dead_tuple needs to check for a frozen xmin and
store that. We might looking at a chain which partially was done in
9.4. Not sure if that's a realistic scenario, but I'd rather be safe.
 
 IIUC, you're talking about the scenario where we have an update chain
 X - Y, where X is dead but not actually removed and Y is
 (forensically) frozen.   We're examining tuple Y and trying to
 determine whether X has been entered in rs_unresolved_tups.  If, as I
 think you're proposing, we consider the xmin of Y to be
 FrozenTransactionId, we will definitely not find it - because the way
 it got into the table in the first place was based on the value
 returned by HeapTupleHeaderGetUpdateXid().  And that value is certain
 not to be FrozenTransactionId, because we never set the xmax of a
 tuple to FrozenTransactionId.

I am thinking of something slightly different. rewrite_heap_dead_tuple()
now removes tuples/xids from the unresolved table that could be from a
different xid epoch since it unconditionally does a HASH_REMOVE if it
finds an entry doing a lookup using the *preserved* xid. Earlier that
was harmless since for frozen tuples it only ever used
FrozenTransactionId which obviously cannot be part of a valid chain and
couldn't even get entered into unresolved_tups.

I am not sure at all if that actually can be harmful but there isn't any
reason we would need to do the delete so I wouldn't. There can be
complex enough situation where later parts of a ctid chain are dead and
earlier ones are recently dead and such that I would rather be cautious.

 There's no possibility of getting confused here; if X is still around
 at all, it's xmax is of the same generation as Y's xmin.  Otherwise,
 we've had an undetected XID wraparound.

Another issue I thought about is what we will return for SELECT xmin
FROM blarg; Some people use that in their applications (IIRC
skytools/pqg/londiste does so) and they might get confused if we
suddently return xids from the future. On the other hand, not returning
the original xid would be a shame as well...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-07-03 Thread Robert Haas
On Wed, Jul 3, 2013 at 1:07 PM, Andres Freund and...@2ndquadrant.com wrote:
 Well, nothing would prevent using the HeapTupleHeaderGetRawXmin() in
 those places. Exactly the number of callsites is what makes me think
 that somebody will get this wrong in the future.

Well, I guess I could go rework the whole patch that way.  It's a fair
request, but I kinda doubt it's going to make the patch smaller.

  * I think rewrite_heap_dead_tuple needs to check for a frozen xmin and
store that. We might looking at a chain which partially was done in
9.4. Not sure if that's a realistic scenario, but I'd rather be safe.

 IIUC, you're talking about the scenario where we have an update chain
 X - Y, where X is dead but not actually removed and Y is
 (forensically) frozen.   We're examining tuple Y and trying to
 determine whether X has been entered in rs_unresolved_tups.  If, as I
 think you're proposing, we consider the xmin of Y to be
 FrozenTransactionId, we will definitely not find it - because the way
 it got into the table in the first place was based on the value
 returned by HeapTupleHeaderGetUpdateXid().  And that value is certain
 not to be FrozenTransactionId, because we never set the xmax of a
 tuple to FrozenTransactionId.

 I am thinking of something slightly different. rewrite_heap_dead_tuple()
 now removes tuples/xids from the unresolved table that could be from a
 different xid epoch since it unconditionally does a HASH_REMOVE if it
 finds an entry doing a lookup using the *preserved* xid. Earlier that
 was harmless since for frozen tuples it only ever used
 FrozenTransactionId which obviously cannot be part of a valid chain and
 couldn't even get entered into unresolved_tups.

 I am not sure at all if that actually can be harmful but there isn't any
 reason we would need to do the delete so I wouldn't. There can be
 complex enough situation where later parts of a ctid chain are dead and
 earlier ones are recently dead and such that I would rather be cautious.

OK, I think I see your point, and I think you're right.

 There's no possibility of getting confused here; if X is still around
 at all, it's xmax is of the same generation as Y's xmin.  Otherwise,
 we've had an undetected XID wraparound.

 Another issue I thought about is what we will return for SELECT xmin
 FROM blarg; Some people use that in their applications (IIRC
 skytools/pqg/londiste does so) and they might get confused if we
 suddently return xids from the future. On the other hand, not returning
 the original xid would be a shame as well...

Yeah.  I think the system columns that we have today are pretty much
crap.  I wonder if we couldn't throw them out and replace them with
some kind of functions that you can pass a row to.  That would allow
us to expose a lot more detail without adding a bajillion hidden
columns, and for a bonus we'd save substantially on catalog bloat.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-07-03 Thread Marko Kreen
On Wed, Jul 3, 2013 at 8:07 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-07-02 16:29:56 -0400, Robert Haas wrote:
 There's no possibility of getting confused here; if X is still around
 at all, it's xmax is of the same generation as Y's xmin.  Otherwise,
 we've had an undetected XID wraparound.

 Another issue I thought about is what we will return for SELECT xmin
 FROM blarg; Some people use that in their applications (IIRC
 skytools/pqg/londiste does so) and they might get confused if we
 suddently return xids from the future. On the other hand, not returning
 the original xid would be a shame as well...

Skytools/pqg/londiste use only txid_* APIs, they don't look at
4-byte xids on table rows.

-- 
marko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-07-02 Thread Robert Haas
On Mon, Jun 24, 2013 at 8:12 AM, Andres Freund and...@2ndquadrant.com wrote:
 Agreed. And it also improves the status quo when debugging. I wish this
 would have been the representation since the beginning.

 Some remarks:
 * I don't really like that HeapTupleHeaderXminFrozen() now is frequently
   performed without checking for FrozenTransactionId. I think the places
   where that's done are currently safe, but it seems likely that we
   introduce bugs due to people writing similar code.
   I think replacing *GetXmin by a wrapper that returns
   FrozenTransactionId if the hint bit tell us so would be a good
   idea. Then add *GetRawXmin for the rest (probably mostly display).
   Seems like it would make the patch actually smaller.

I did think about this approach, but it seemed like it would add
cycles in a significant number of places.  For example, consider the
HeapTupleSatisfies() routines, which are perhaps PostgreSQL's finest
example of a place where you DON'T want to add any cycles.  All of the
checks on xmin are conditional on HeapTupleHeaderXminCommitted()
having been found already to be false.  That implies that the frozen
bits aren't set either, so if HeapTupleHeaderGetXmin() were to recheck
the bits it would be a waste.  As I got to the end of the patch I was
a little dismayed by the number of places that did need adjustment to
check both things, but there are still plenty of important places that
don't.

 * The PLs imo should set fn_xmin to FrozenTransactionId if the hint bit
   tell us the tuple is frozen.

Why?  I thought about that, but it seemed to me that the purpose of
that caching was to avoid confusing two functions whose pg_proc tuples
ended up at the same TID.  So there's a failure mechanism there: the
tuple can get vacuumed away and replaced with a new tuple which then
gets frozen, and everything (bogusly) matches.  If the actual
XID-prior-to-freezing has to match, ISTM that the chances of a false
match are far lower.  You have to get the new tuple at the same TID
slot *and* the XID counter has to wrap back around to the
exactly-right XID to get a false match on XID, within the lifetime of
a single backend.  That's not bloody likely.  Remember, the whole
point of the patch is to start freezing things sooner, so the scenario
where both the original and replacement tuples are frozen is going to
become more common.

We also don't particularly *want* the freezing of a pg_proc tuple to
force recompilations in every backend.

 * I think rewrite_heap_dead_tuple needs to check for a frozen xmin and
   store that. We might looking at a chain which partially was done in
   9.4. Not sure if that's a realistic scenario, but I'd rather be safe.

IIUC, you're talking about the scenario where we have an update chain
X - Y, where X is dead but not actually removed and Y is
(forensically) frozen.   We're examining tuple Y and trying to
determine whether X has been entered in rs_unresolved_tups.  If, as I
think you're proposing, we consider the xmin of Y to be
FrozenTransactionId, we will definitely not find it - because the way
it got into the table in the first place was based on the value
returned by HeapTupleHeaderGetUpdateXid().  And that value is certain
not to be FrozenTransactionId, because we never set the xmax of a
tuple to FrozenTransactionId.

There's no possibility of getting confused here; if X is still around
at all, it's xmax is of the same generation as Y's xmin.  Otherwise,
we've had an undetected XID wraparound.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-06-24 Thread Andres Freund
On 2013-05-28 09:21:27 -0400, Robert Haas wrote:
 As a general statement, I view this work as something that is likely
 needed no matter which one of the remove freezing approaches that
 have been proposed we choose to adopt.  It does not fix anything in
 and of itself, but it (hopefully) removes an objection to the entire
 line of inquiry.

Agreed. And it also improves the status quo when debugging. I wish this
would have been the representation since the beginning.

Some remarks:
* I don't really like that HeapTupleHeaderXminFrozen() now is frequently
  performed without checking for FrozenTransactionId. I think the places
  where that's done are currently safe, but it seems likely that we
  introduce bugs due to people writing similar code.
  I think replacing *GetXmin by a wrapper that returns
  FrozenTransactionId if the hint bit tell us so would be a good
  idea. Then add *GetRawXmin for the rest (probably mostly display).
  Seems like it would make the patch actually smaller.

* The PLs imo should set fn_xmin to FrozenTransactionId if the hint bit
  tell us the tuple is frozen.

* I think rewrite_heap_dead_tuple needs to check for a frozen xmin and
  store that. We might looking at a chain which partially was done in
  9.4. Not sure if that's a realistic scenario, but I'd rather be safe.

Otherwise I don't see much that needs to be done before this can get
committed.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-05-29 Thread Robert Haas
On Tue, May 28, 2013 at 10:08 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-05-28 21:26:49 -0400, Robert Haas wrote:
 On Tue, May 28, 2013 at 8:00 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  I only suggested MOVED_IN/OFF because I didn't remember
  HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID was still free ;). So, unless that
  combination could have been produced in the past in a way I couldn't
  find so far, I am all for it. I don't see a problem with breaking
  backward compatibility on that level and I don't think there's any way
  to get there without some level of low level compat break.

 I'm not sure why you call it a compatibility break.  It's true that an
 old PostgreSQL can't read new heaps, but we've never guaranteed that
 direction anyway, and every time we add or reinterpret any infomask
 bits, that's true.  For example, the fklocks patch did tat.  What's
 important is that a new PostgreSQL will still be able to read old
 heaps; that is, pg_upgrade compatibility is preserved.

 Oh, not the on-disk format. But as you said, code that looks at xmin is
 going to need to change. fklocks broke code that relied on
 HeapTupleHeaderGetXmax(), this would break code that looks at
 xmin. Removing/renaming *GetXmin similarly sounds like a good idea to
 make the breakage explicit.

I thought about that, but there are enough callers that don't care
that it seemed like the wrong choice to me.

  I am all for adding a comment why this is safe though. We thought about
  it for some while before we were convinced...

 I'm fine with that, but the logic is present in multiple places, and I
 did not want to comment them all identically.  If there's a central
 place in which a comment would be appropriate, let's add it there; or
 IOW, what do you suggest in detail?

 That's a good point. Generally lots of this is underdocumented/commented
 and can only learned by spending a year or so in the postgres code. I'd
 say rename README.HOT to README and add a section there and reference it
 from those two places? It already has a mostly general explanation of
 concurrent index builds. Don't have a better idea.

Anyone else have a suggestion?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-05-29 Thread Peter Eisentraut
On 5/28/13 8:00 PM, Andres Freund wrote:
 - Various procedural languages use the combination of TID and XMIN to
  determine whether a function needs to be recompiled.

 Hm. As previously said, I am less than convinced of those adhoc
 mechanisms and I think this should get properly integrated into the
 normal cache invalidation mechanisms.

Yes please.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-05-29 Thread Alvaro Herrera
Robert Haas escribió:
 On Tue, May 28, 2013 at 10:08 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2013-05-28 21:26:49 -0400, Robert Haas wrote:

   I am all for adding a comment why this is safe though. We thought about
   it for some while before we were convinced...
 
  I'm fine with that, but the logic is present in multiple places, and I
  did not want to comment them all identically.  If there's a central
  place in which a comment would be appropriate, let's add it there; or
  IOW, what do you suggest in detail?
 
  That's a good point. Generally lots of this is underdocumented/commented
  and can only learned by spending a year or so in the postgres code. I'd
  say rename README.HOT to README and add a section there and reference it
  from those two places? It already has a mostly general explanation of
  concurrent index builds. Don't have a better idea.
 
 Anyone else have a suggestion?

I support the idea of using README files.  I don't have an opinion on
whether it's best to have a single file for everything (i.e. rename
README.HOT and add to it) or just explain this in README.freeze or
something like that.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] preserving forensic information when we freeze

2013-05-28 Thread Robert Haas
Various people, including at least Heikki, Andres, and myself, have
proposed various schemes for avoiding freezing that amount to doing it
sooner, when we're already writing WAL anyway, or at least when the
buffer is already dirty anyway, or at least while the buffer is
already in shared_buffers anyway.  Various people, including at least
Tom and Andres, have raised the point that this would lose
possibly-useful forensic information that they have in the past found
to be of tangible value in previous debugging of databases that have
somehow gotten messed up.  I don't know who originally proposed it,
but I've had many conversations about how we could address this
concern: instead of replacing xmin when we freeze, just set an
infomask bit that means xmin is frozen and leave the old, literal
xmin in place.  FrozenTransactionId would still exist and still be
understood, of course, but new freezing operations wouldn't use it.

I have attempted to implement this.  Trouble is, we're out of infomask
bits.  Using an infomask2 bit might work but we don't have many of
them left either, so it's worth casting about a bit for a better
solution.   Andres proposed using HEAP_MOVED_IN|HEAP_MOVED_OFF for
this purpose, but I think we're better off trying to reclaim those
bits in a future release.  Exactly how to do that is a topic for
another email, but I believe it's very possible.  What I'd like to
propose instead is using HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID to
indicate that xmin is frozen.  This bit pattern isn't used for
anything else, so there's no confusion possible with existing data
already on disk, but it's necessary to audit all code that checks
HEAP_XMIN_INVALID to make sure it doesn't get confused.  I've done
this, and there's little enough of it that it seems pretty easy to
handle.

A somewhat larger problem is that this requires auditing every place
that looks at a tuple xmin and deciding whether any changes are needed
to handle the possibility that the tuple may be frozen even though
xmin != FrozenTransactionId.  This is a somewhat more significant
change, but it doesn't seem to be too bad.  But there are a couple of
cases that are tricky enough that they seem worth expounding upon:

- When we follow HOT chains, we determine where the HOT chain ends by
matching the xmax of each tuple with the xmin of the next tuple.  If
they don't match, we conclude that the HOT chain has ended.  I
initially thought this logic might be buggy even as things stand today
if the latest tuple in the chain is frozen, but as Andres pointed out
to me, that's not so.  If the newest tuple in the chain is all-visible
(the earliest point at which we can theoretically freeze it), all
earlier tuples are dead altogether, and heap_page_prune() is always
called after locking the buffer and before freezing, so any tuple we
freeze must be the first in its HOT chain.  For the same reason, this
logic doesn't need any adjustment for the new freezing system: it's
never looking at anything old enough to be frozen in the first place.

- Various procedural languages use the combination of TID and XMIN to
determine whether a function needs to be recompiled.  Although the
possibility of this doing the wrong thing seems quite remote, it's not
obvious to me why it's theoretically correct even as things stand
today.  Suppose that previously-frozen tuple is vacuumed away and
another tuple is placed at the same TID and then frozen.  Then, we
check whether the cache is still valid and, behold, it is.  This would
actually get better with this patch, since it wouldn't be enough
merely for the old and new tuples to both be frozen; they'd have to
have had the same XID prior to freezing.  I think that could only
happen if a backend sticks around for at least 2^32 transactions, but
I don't know what would prevent it in that case.

- heap_get_latest_tid() appears broken even without this patch.  It's
only used on user-specified TIDs, either in a TID scan, or due to the
use of currtid_byreloid() and currtid_byrelname().  It attempts find
the latest version of the tuple referenced by the given TID by
following the CTID links.  Like HOT, it uses XMAX/XMIN matching to
detect when the chain is broken.  However, unlike HOT, update chains
can in general span multiple blocks.  It is not now nor has it ever
been safe to assume that the next tuple in the chain can't be frozen
before the previous one is vacuumed away.  Andres came up with the
best example: suppose the tuple to be frozen physically precedes its
predecessor; then, an in-progress vacuum might reach the to-be-frozen
tuple before it reaches (and removes) the previous row version.  In
newer releases, the same problem could be caused by vacuum's
occasional page-skipping behavior.  As with the previous point, the
don't actually change xmin when we freeze approach actually makes it
harder for a chain to get broken when it shouldn't, but I suspect
it's just moving us from one set of extremely-obscure failure cases to

Re: [HACKERS] preserving forensic information when we freeze

2013-05-28 Thread Josh Berkus
On 05/28/2013 06:21 AM, Robert Haas wrote:
 As a general statement, I view this work as something that is likely
 needed no matter which one of the remove freezing approaches that
 have been proposed we choose to adopt.  It does not fix anything in
 and of itself, but it (hopefully) removes an objection to the entire
 line of inquiry.

Agreed.  I have some ideas on how to reduce the impact of freezing as
well (of course), and the description of your approach certainly seems
to benefit them, especially as it removes the whole forensic
information objection.

One question though: if we're not removing the xmin, how do we know the
maximum xid to which we can prune clog?  I can imagine several ways
given your approach.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-05-28 Thread Andres Freund
On 2013-05-28 09:39:13 -0700, Josh Berkus wrote:
 On 05/28/2013 06:21 AM, Robert Haas wrote:
  As a general statement, I view this work as something that is likely
  needed no matter which one of the remove freezing approaches that
  have been proposed we choose to adopt.  It does not fix anything in
  and of itself, but it (hopefully) removes an objection to the entire
  line of inquiry.
 
 Agreed.  I have some ideas on how to reduce the impact of freezing as
 well (of course), and the description of your approach certainly seems
 to benefit them, especially as it removes the whole forensic
 information objection.
 
 One question though: if we're not removing the xmin, how do we know the
 maximum xid to which we can prune clog?  I can imagine several ways
 given your approach.

Simply don't count xids which are frozen. Currently we ignore an xid
because its a special value, after this because the tuple has a certain
hint bit (combination) set.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-05-28 Thread Robert Haas
On Tue, May 28, 2013 at 7:27 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-05-28 09:39:13 -0700, Josh Berkus wrote:
 On 05/28/2013 06:21 AM, Robert Haas wrote:
  As a general statement, I view this work as something that is likely
  needed no matter which one of the remove freezing approaches that
  have been proposed we choose to adopt.  It does not fix anything in
  and of itself, but it (hopefully) removes an objection to the entire
  line of inquiry.

 Agreed.  I have some ideas on how to reduce the impact of freezing as
 well (of course), and the description of your approach certainly seems
 to benefit them, especially as it removes the whole forensic
 information objection.

 One question though: if we're not removing the xmin, how do we know the
 maximum xid to which we can prune clog?  I can imagine several ways
 given your approach.

 Simply don't count xids which are frozen. Currently we ignore an xid
 because its a special value, after this because the tuple has a certain
 hint bit (combination) set.

Right, what he said.  Calculating the XID before which we no longer
need CLOG is just a matter of looking at all the tuples that we don't
know to be frozen and taking the oldest XID from among those.  This
patch changes the definition of frozen but that's a pretty minor
detail of the CLOG-truncation calculation.  So, in essence, this patch
doesn't really make much difference in that area either way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-05-28 Thread Andres Freund
On 2013-05-28 09:21:27 -0400, Robert Haas wrote:
 I have attempted to implement this.  Trouble is, we're out of infomask
 bits.  Using an infomask2 bit might work but we don't have many of
 them left either, so it's worth casting about a bit for a better
 solution.   Andres proposed using HEAP_MOVED_IN|HEAP_MOVED_OFF for
 this purpose, but I think we're better off trying to reclaim those
 bits in a future release.  Exactly how to do that is a topic for
 another email, but I believe it's very possible.  What I'd like to
 propose instead is using HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID to
 indicate that xmin is frozen.  This bit pattern isn't used for
 anything else, so there's no confusion possible with existing data
 already on disk, but it's necessary to audit all code that checks
 HEAP_XMIN_INVALID to make sure it doesn't get confused.  I've done
 this, and there's little enough of it that it seems pretty easy to
 handle.

I only suggested MOVED_IN/OFF because I didn't remember
HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID was still free ;). So, unless that
combination could have been produced in the past in a way I couldn't
find so far, I am all for it. I don't see a problem with breaking
backward compatibility on that level and I don't think there's any way
to get there without some level of low level compat break.

 - When we follow HOT chains, we determine where the HOT chain ends by
 matching the xmax of each tuple with the xmin of the next tuple.  If
 they don't match, we conclude that the HOT chain has ended.  I
 initially thought this logic might be buggy even as things stand today
 if the latest tuple in the chain is frozen, but as Andres pointed out
 to me, that's not so.  If the newest tuple in the chain is all-visible
 (the earliest point at which we can theoretically freeze it), all
 earlier tuples are dead altogether, and heap_page_prune() is always
 called after locking the buffer and before freezing, so any tuple we
 freeze must be the first in its HOT chain.  For the same reason, this
 logic doesn't need any adjustment for the new freezing system: it's
 never looking at anything old enough to be frozen in the first place.

I am all for adding a comment why this is safe though. We thought about
it for some while before we were convinced...

 - Various procedural languages use the combination of TID and XMIN to
 determine whether a function needs to be recompiled.  Although the
 possibility of this doing the wrong thing seems quite remote, it's not
 obvious to me why it's theoretically correct even as things stand
 today.  Suppose that previously-frozen tuple is vacuumed away and
 another tuple is placed at the same TID and then frozen.  Then, we
 check whether the cache is still valid and, behold, it is.  This would
 actually get better with this patch, since it wouldn't be enough
 merely for the old and new tuples to both be frozen; they'd have to
 have had the same XID prior to freezing.  I think that could only
 happen if a backend sticks around for at least 2^32 transactions, but
 I don't know what would prevent it in that case.

Hm. As previously said, I am less than convinced of those adhoc
mechanisms and I think this should get properly integrated into the
normal cache invalidation mechanisms.
But: I think this is safe since we compare the stored/cached xmin/tid
with one gotten from the SearchSysCache just before which will point to
the correct location as of the last AcceptInvalidationMessages(). I
can't think of a way were this would allow the case you describe.

 - heap_get_latest_tid() appears broken even without this patch.  It's
 only used on user-specified TIDs, either in a TID scan, or due to the
 use of currtid_byreloid() and currtid_byrelname().  It attempts find
 the latest version of the tuple referenced by the given TID by
 following the CTID links.  Like HOT, it uses XMAX/XMIN matching to
 detect when the chain is broken.  However, unlike HOT, update chains
 can in general span multiple blocks.  It is not now nor has it ever
 been safe to assume that the next tuple in the chain can't be frozen
 before the previous one is vacuumed away.  Andres came up with the
 best example: suppose the tuple to be frozen physically precedes its
 predecessor; then, an in-progress vacuum might reach the to-be-frozen
 tuple before it reaches (and removes) the previous row version.  In
 newer releases, the same problem could be caused by vacuum's
 occasional page-skipping behavior.  As with the previous point, the
 don't actually change xmin when we freeze approach actually makes it
 harder for a chain to get broken when it shouldn't, but I suspect
 it's just moving us from one set of extremely-obscure failure cases to
 another.

I don't think this is especially problematic though. If you do a tidscan
starting from a tid that is so old that it can be removed: you're doing
it wrong. The tid could have been reused for something else anyway. I
think the ctid chaining is only meaningful if the tuple 

Re: [HACKERS] preserving forensic information when we freeze

2013-05-28 Thread Robert Haas
On Tue, May 28, 2013 at 8:00 PM, Andres Freund and...@2ndquadrant.com wrote:
 I only suggested MOVED_IN/OFF because I didn't remember
 HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID was still free ;). So, unless that
 combination could have been produced in the past in a way I couldn't
 find so far, I am all for it. I don't see a problem with breaking
 backward compatibility on that level and I don't think there's any way
 to get there without some level of low level compat break.

I'm not sure why you call it a compatibility break.  It's true that an
old PostgreSQL can't read new heaps, but we've never guaranteed that
direction anyway, and every time we add or reinterpret any infomask
bits, that's true.  For example, the fklocks patch did tat.  What's
important is that a new PostgreSQL will still be able to read old
heaps; that is, pg_upgrade compatibility is preserved.

 I am all for adding a comment why this is safe though. We thought about
 it for some while before we were convinced...

I'm fine with that, but the logic is present in multiple places, and I
did not want to comment them all identically.  If there's a central
place in which a comment would be appropriate, let's add it there; or
IOW, what do you suggest in detail?

 Hm. As previously said, I am less than convinced of those adhoc
 mechanisms and I think this should get properly integrated into the
 normal cache invalidation mechanisms.
 But: I think this is safe since we compare the stored/cached xmin/tid
 with one gotten from the SearchSysCache just before which will point to
 the correct location as of the last AcceptInvalidationMessages(). I
 can't think of a way were this would allow the case you describe.

I don't understand why it can't.  AcceptInvalidationMessages()
guarantees that we're looking at the latest version of the catalog,
but it doesn't say anything about whether the latest catalog state
happens to look like any earlier catalog state.

 I don't think this is especially problematic though. If you do a tidscan
 starting from a tid that is so old that it can be removed: you're doing
 it wrong. The tid could have been reused for something else anyway. I
 think the ctid chaining is only meaningful if the tuple got updated very
 recently, i.e. you hold a snapshot that prevents the removal of the
 root tuple's snapshot.

That logic seems sound to me.

 Nice work!

Thanks!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] preserving forensic information when we freeze

2013-05-28 Thread Andres Freund
On 2013-05-28 21:26:49 -0400, Robert Haas wrote:
 On Tue, May 28, 2013 at 8:00 PM, Andres Freund and...@2ndquadrant.com wrote:
  I only suggested MOVED_IN/OFF because I didn't remember
  HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID was still free ;). So, unless that
  combination could have been produced in the past in a way I couldn't
  find so far, I am all for it. I don't see a problem with breaking
  backward compatibility on that level and I don't think there's any way
  to get there without some level of low level compat break.

 I'm not sure why you call it a compatibility break.  It's true that an
 old PostgreSQL can't read new heaps, but we've never guaranteed that
 direction anyway, and every time we add or reinterpret any infomask
 bits, that's true.  For example, the fklocks patch did tat.  What's
 important is that a new PostgreSQL will still be able to read old
 heaps; that is, pg_upgrade compatibility is preserved.

Oh, not the on-disk format. But as you said, code that looks at xmin is
going to need to change. fklocks broke code that relied on
HeapTupleHeaderGetXmax(), this would break code that looks at
xmin. Removing/renaming *GetXmin similarly sounds like a good idea to
make the breakage explicit.

  I am all for adding a comment why this is safe though. We thought about
  it for some while before we were convinced...

 I'm fine with that, but the logic is present in multiple places, and I
 did not want to comment them all identically.  If there's a central
 place in which a comment would be appropriate, let's add it there; or
 IOW, what do you suggest in detail?

That's a good point. Generally lots of this is underdocumented/commented
and can only learned by spending a year or so in the postgres code. I'd
say rename README.HOT to README and add a section there and reference it
from those two places? It already has a mostly general explanation of
concurrent index builds. Don't have a better idea.

  Hm. As previously said, I am less than convinced of those adhoc
  mechanisms and I think this should get properly integrated into the
  normal cache invalidation mechanisms.
  But: I think this is safe since we compare the stored/cached xmin/tid
  with one gotten from the SearchSysCache just before which will point to
  the correct location as of the last AcceptInvalidationMessages(). I
  can't think of a way were this would allow the case you describe.

 I don't understand why it can't.  AcceptInvalidationMessages()
 guarantees that we're looking at the latest version of the catalog,
 but it doesn't say anything about whether the latest catalog state
 happens to look like any earlier catalog state.

Well, AcceptInvalidationMessages() will get a new version of the tuple
with a new tid/xmin combo. So what would need to happen is the function
being updated (to a new location), then the old version needs to get
removed, then the new one updated again, reusing to the old
location. Allthewhile either both versions need to have been frozen or
we need to have wrapped around to the same xid. All that without the
function being executed inbetween which would have invalidated the old
state and stored a new xmin/tid.
But you're right, nothing except chance prevents that from happening,
not sure what I thought of earlier.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers