Re: [HACKERS] [PATCH] bigint txids vs 'xid' type, new txid_recent(bigint) => xid

2016-08-20 Thread Craig Ringer
On 19 August 2016 at 21:10, Peter Eisentraut  wrote:

> On 8/18/16 9:20 PM, Craig Ringer wrote:
> > On 19 August 2016 at 02:35, Jim Nasby  > > wrote:
> > I think we need to either add real types for handling XID/epoch/TXID
> > or finally create uint types. It's *way* too easy to screw things up
> > the way they are today.
> >
> > Hm. Large scope increase there. Especially introducing unsigned types.
> > There's a reason that hasn't been done already - it's not just copying a
> > whole pile of code, it's also defining all the signed/unsigned
> > interactions and conversions carefully.
>
> https://github.com/petere/pguint ;-)
>
> > I'm not against adding a 'bigxid' or 'epoch_xid' or something,
> > internally a uint64. It wouldn't need all the opclasses, casts, function
> > overloads, etc that uint8 would.
>
> That sounds much better.


Yeah, but not something I expect to be able to do in the near future :S .
As it is, this is considerably off what I really need to be working on. And
we'll still want a way to get the short 32-bit xid with detection of epoch
wrap, which is what this patch adds.

I posted an updated version of this patch on
https://www.postgresql.org/message-id/camsr+yhqiwnei0dactbos40t+v5s_+dst3pyv_8v2wnvh+x...@mail.gmail.com
since it's stacked on top of txid_status(bigint) now. The patch there is
just a trivial rename of this one.

I don't expect to get to adding a 'bigxid' commit and converting
views/functions in this release cycle. Still too much else to do.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] [PATCH] bigint txids vs 'xid' type, new txid_recent(bigint) => xid

2016-08-19 Thread Peter Eisentraut
On 8/18/16 9:20 PM, Craig Ringer wrote:
> On 19 August 2016 at 02:35, Jim Nasby  > wrote:
> I think we need to either add real types for handling XID/epoch/TXID
> or finally create uint types. It's *way* too easy to screw things up
> the way they are today.
> 
> Hm. Large scope increase there. Especially introducing unsigned types.
> There's a reason that hasn't been done already - it's not just copying a
> whole pile of code, it's also defining all the signed/unsigned
> interactions and conversions carefully.

https://github.com/petere/pguint ;-)

> I'm not against adding a 'bigxid' or 'epoch_xid' or something,
> internally a uint64. It wouldn't need all the opclasses, casts, function
> overloads, etc that uint8 would.

That sounds much better.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] bigint txids vs 'xid' type, new txid_recent(bigint) => xid

2016-08-18 Thread Craig Ringer
On 19 August 2016 at 02:35, Jim Nasby  wrote:

> On 8/18/16 5:46 AM, Amit Kapila wrote:
>
>> I think there is a value in exposing such a variant which takes bigint
>> and internally converts it to xid.  I am not sure the semantics for
>>
>
> I think that's a bad idea because you have the exact same problems we have
> now: bigint is signed, epoch is not.


Eh. A distant future problem IMO. txid_current will already start returning
negative values if the epoch crosses INT32_MAX.

the other proposal txid_recent() is equivalent to what we have for
>> txid_current().  One thing which is different is that txid_current()
>> allocates a new transaction if there is currently none.  If you
>>
>
> Right, and it would be nice to be able to tell if an XID has been assigned
> to your transaction or not; something you currently can't do.


It's trivial to expose GetTopTransactionIdIfAny() . Basically copy and
paste txid_current() to txid_current_ifassigned() and replace the
GetTopTransactionId() call with GetTopTransactionIdIfAny() .

Or add a bool argument to txid_current() to not assign one. But I'd rather
a new function in this case, and it's so short that the duplication is no
concern.

plainly want to convert it to 32 bit xid, then may be txid_convert or
>> something like that is more suitable.
>>
>
> I think we need to either add real types for handling XID/epoch/TXID or
> finally create uint types. It's *way* too easy to screw things up the way
> they are today.


Hm. Large scope increase there. Especially introducing unsigned types.
There's a reason that hasn't been done already - it's not just copying a
whole pile of code, it's also defining all the signed/unsigned interactions
and conversions carefully. People mix signed and unsigned types incorrectly
in C all the time, and often don't notice the problems. It also only gains
you an extra bit. Unsigned types would be nice when interacting with
outside systems that use them and Pg innards, but that's about all they're
good for IMO. For everything else you should be using numeric if you're
worried about fitting in a bigint.

I'm not against adding a 'bigxid' or 'epoch_xid' or something, internally a
uint64. It wouldn't need all the opclasses, casts, function overloads, etc
that uint8 would. It's likely to break code that expects txid_current() to
return a bigint, but since it looks like most of that code is already
silently broken I'm not too upset by that.

Separately to all that, though, we should still have a way to get the
32-bit xid from an xid with epoch that doesn't require the user to know its
internal structure and bitshift it, especially since they can't check the
epoch. Maybe call it txid_convert_ifrecent(bigint). IMO the "recent" part
is important because of the returns-null-if-xid-is-old behaviour. It's not
a straight conversion.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] [PATCH] bigint txids vs 'xid' type, new txid_recent(bigint) => xid

2016-08-18 Thread Jim Nasby

On 8/18/16 5:46 AM, Amit Kapila wrote:

I think there is a value in exposing such a variant which takes bigint
and internally converts it to xid.  I am not sure the semantics for


I think that's a bad idea because you have the exact same problems we 
have now: bigint is signed, epoch is not.



the other proposal txid_recent() is equivalent to what we have for
txid_current().  One thing which is different is that txid_current()
allocates a new transaction if there is currently none.  If you


Right, and it would be nice to be able to tell if an XID has been 
assigned to your transaction or not; something you currently can't do.



plainly want to convert it to 32 bit xid, then may be txid_convert or
something like that is more suitable.


I think we need to either add real types for handling XID/epoch/TXID or 
finally create uint types. It's *way* too easy to screw things up the 
way they are today.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] bigint txids vs 'xid' type, new txid_recent(bigint) => xid

2016-08-18 Thread Amit Kapila
On Tue, Aug 16, 2016 at 2:45 PM, Craig Ringer  wrote:
> Hi all
>
> While implementing support for traceable transactions (finding out after the
> fact whether an xact committed or aborted), I've found that Pg is very
> inconsistent with what it considers a transaction ID from a user facing
> point of view, to the point where I think it's hard for users to write
> correct queries.
>
> txid_current() returns a 64-bit xid in which the higher 32 bits are the xid
> epoch. This providers users with wraparound protection and means they don't
> have to deal with the moving xid threshold.
>
> Many other functions accept and return 'xid', the 32-bit type that isn't
> wraparound protected. Presumably they assume you'll only use them with
> recent transaction IDs, but there are a couple of problems with this:
>
> * We can't ensure they're only used with recent XIDs and can't detect if
> they're passed a wrapped around xid
>
> * There's no good way to _get_ a 32-bit xid for the current xact since
> txid_current() returns a 64-bit bigint xid.
>
> (I have to admit that in the past I always blindly assumed that
> txid_current() returned bigint for historical reasons, because we don't have
> a uint32 type and the xid type didn't exist yet. So I'd do things like get
> the value of txid_current() and pass it to pg_xact_commit_timestamp() later
> on. This turns out to be wrong, it just happens to work until the epoch
> counter increments for the first time. Similarly, working around the seeming
> oversight of a missing bigint to xid cast with ::text::xid is wrong but will
> seem fine at first.)
>
> I'm surprised the 32-bit xid was ever exposed to the user, rather than a
> 64-bit epoch-extended xid.
>
> It's not clear to me how a user is supposed to correctly pass the result of
> txid_current() to anything like pg_xact_commit_timestamp(xid). They'd have
> to get the epoch from a new txid_current() call, split both into two 32-bit
> values, and do wraparound checking. Exceedingly unwieldy and hard to get
> right.
>
> Since I don't think we can get rid of the 32-bit xid, I think we need a
> function to get the 32-bit xid from a 64-bit epoch-and-xid with wraparound
> protection.
>
> Here's a patch for that, adding a function txid_recent(bigint) => xid that
> returns the low 32 bits of a 64-bit xid like that returned from txid_current
> if the xid isn't wrapped around. If it's past the wraparound threshold the
> function returns null, since most functions that take xid are strict and
> will in turn return null. The alternative, an ERROR, seems harder for users
> to handle without resorting to plpgsql. It does ERROR on XIDs in the future
> though, since there's no good reason to see those. The epoch is ignored for
> permanent XIDs.
>
> I don't like the name much, but haven't come up with a better one yet.
>
> Thoughts?
>
>
> IMO some functions that take 'xid' should be considered for a bigint
> variant:
>
>  age(as txid_age(bigint))
>  pg_xact_commit_timestamp
>

I think there is a value in exposing such a variant which takes bigint
and internally converts it to xid.  I am not sure the semantics for
the other proposal txid_recent() is equivalent to what we have for
txid_current().  One thing which is different is that txid_current()
allocates a new transaction if there is currently none.  If you
plainly want to convert it to 32 bit xid, then may be txid_convert or
something like that is more suitable.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] bigint txids vs 'xid' type, new txid_recent(bigint) => xid

2016-08-18 Thread Craig Ringer
On 16 August 2016 at 21:44, Craig Ringer  wrote:

> On 16 August 2016 at 20:58, Greg Stark  wrote:
>
>> On Tue, Aug 16, 2016 at 10:15 AM, Craig Ringer 
>> wrote:
>> > I'm surprised the 32-bit xid was ever exposed to the user, rather than a
>> > 64-bit epoch-extended xid.
>>
>> Once upon a time we didn't have epoch counting at all.
>>
>
> Makes sense. I didn't dig back too far in history.
>
> Sounds like you're in favour of the 2nd part of the proposal (not covered
> by the current patch) then.
>
> I haven't yet done the validation required on the epoch logic btw, and I
> won't be too surprised if it's a bit off. I'm writing a fast xid burn
> function for use in testing now. I doubt it'll be fast enough to use in
> routine regression testing since all those clog pages will still take time.
> But we'll see.  I'd kind of like to be able to avoid all that - advance the
> xid counter and treat all the old xids as frozen. I don't know or if this
> is practical within a normal backend though.
>
> Anyway, will follow up with more tests and - probably - a bugfix or three
> soon.
>


I've written a function to fast-forward the xid counter efficiently, so I
can reach xid wraparound in 5s or so. Probably not quite fast enough to be
desirable in the basic 'make check' but close. Coming soon.

In the process I noticed that even in the regression tests there are
mistakes with xid handling, like


where virtualtransaction = (
select virtualtransaction
from pg_locks
where transactionid = txid_current()::integer)

which breaks if txid_current() returns anything > INT32_MAX.

To do it right(ish) you have to


where virtualtransaction = (
select virtualtransaction
from pg_locks
where transactionid::text::bigint = (txid_current() & (BIGINT '1'
<< 32))  )

... I think.

So yeah, we need a function to get the 'xid' component from an xid with
epoch and/or to fix up things that expose 'xid' to expose bigint txids. The
patch on the start of this mail is the first step.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] [PATCH] bigint txids vs 'xid' type, new txid_recent(bigint) => xid

2016-08-16 Thread Craig Ringer
On 16 August 2016 at 20:58, Greg Stark  wrote:

> On Tue, Aug 16, 2016 at 10:15 AM, Craig Ringer 
> wrote:
> > I'm surprised the 32-bit xid was ever exposed to the user, rather than a
> > 64-bit epoch-extended xid.
>
> Once upon a time we didn't have epoch counting at all.
>

Makes sense. I didn't dig back too far in history.

Sounds like you're in favour of the 2nd part of the proposal (not covered
by the current patch) then.

I haven't yet done the validation required on the epoch logic btw, and I
won't be too surprised if it's a bit off. I'm writing a fast xid burn
function for use in testing now. I doubt it'll be fast enough to use in
routine regression testing since all those clog pages will still take time.
But we'll see.  I'd kind of like to be able to avoid all that - advance the
xid counter and treat all the old xids as frozen. I don't know or if this
is practical within a normal backend though.

Anyway, will follow up with more tests and - probably - a bugfix or three
soon.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] [PATCH] bigint txids vs 'xid' type, new txid_recent(bigint) => xid

2016-08-16 Thread Greg Stark
On Tue, Aug 16, 2016 at 10:15 AM, Craig Ringer  wrote:
> I'm surprised the 32-bit xid was ever exposed to the user, rather than a
> 64-bit epoch-extended xid.

Once upon a time we didn't have epoch counting at all.

I don't think it would be a bad idea to clean up everything to do with
xids so that everything user-facing is epoch-aware. Of course you
don't always have the epoch but if we're careful about where users can
see xids they should never see an xid from an old epoch. That could be
a problem for internal tools like pageinspect or xlogdump but
shouldn't be a problem for any real production api.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] bigint txids vs 'xid' type, new txid_recent(bigint) => xid

2016-08-16 Thread Craig Ringer
Hi all

While implementing support for traceable transactions (finding out after
the fact whether an xact committed or aborted), I've found that Pg is very
inconsistent with what it considers a transaction ID from a user facing
point of view, to the point where I think it's hard for users to write
correct queries.

txid_current() returns a 64-bit xid in which the higher 32 bits are the xid
epoch. This providers users with wraparound protection and means they don't
have to deal with the moving xid threshold.

Many other functions accept and return 'xid', the 32-bit type that isn't
wraparound protected. Presumably they assume you'll only use them with
recent transaction IDs, but there are a couple of problems with this:

* We can't ensure they're only used with recent XIDs and can't detect if
they're passed a wrapped around xid

* There's no good way to _get_ a 32-bit xid for the current xact since
txid_current() returns a 64-bit bigint xid.

(I have to admit that in the past I always blindly assumed that
txid_current() returned bigint for historical reasons, because we don't
have a uint32 type and the xid type didn't exist yet. So I'd do things like
get the value of txid_current() and pass it to pg_xact_commit_timestamp()
later on. This turns out to be wrong, it just happens to work until the
epoch counter increments for the first time. Similarly, working around the
seeming oversight of a missing bigint to xid cast with ::text::xid is wrong
but will seem fine at first.)

I'm surprised the 32-bit xid was ever exposed to the user, rather than a
64-bit epoch-extended xid.

It's not clear to me how a user is supposed to correctly pass the result of
txid_current() to anything like pg_xact_commit_timestamp(xid). They'd have
to get the epoch from a new txid_current() call, split both into two 32-bit
values, and do wraparound checking. Exceedingly unwieldy and hard to get
right.

Since I don't think we can get rid of the 32-bit xid, I think we need a
function to get the 32-bit xid from a 64-bit epoch-and-xid with wraparound
protection.

Here's a patch for that, adding a function txid_recent(bigint) => xid that
returns the low 32 bits of a 64-bit xid like that returned from
txid_current if the xid isn't wrapped around. If it's past the wraparound
threshold the function returns null, since most functions that take xid are
strict and will in turn return null. The alternative, an ERROR, seems
harder for users to handle without resorting to plpgsql. It does ERROR on
XIDs in the future though, since there's no good reason to see those. The
epoch is ignored for permanent XIDs.

I don't like the name much, but haven't come up with a better one yet.

Thoughts?


IMO some functions that take 'xid' should be considered for a bigint
variant:

 age(as txid_age(bigint))
 pg_xact_commit_timestamp

[ select proname from pg_proc where 'xid'::regtype = ANY
(proargtypes::regtype[]) ; ]

and most or all the system views that expose xid should switch to bigint
for 10.0:

 pg_class.relfrozenxid
 pg_class.relminmxid
 pg_database.datfrozenxid
 pg_database.datminmxid
 pg_locks.transactionid
 pg_prepared_xacts.transaction
 pg_stat_activity.backend_xid
 pg_stat_activity.backend_xmin
 pg_stat_replication.backend_xmin
 pg_replication_slots.xmin
 pg_replication_slots.catalog_xmin

[ select attrelid::regclass || '.' || attname from pg_attribute  where
atttypid = 'xid'::regtype AND attnum >= 0; ]

... or if folks find using bigint too ugly, a new xid64 type. "bigxid"?


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From e8af137358b1d89fb334ad1d715c3f81c15ba5cf Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Tue, 16 Aug 2016 16:54:00 +0800
Subject: [PATCH] Add txid_recent(bigint) => xid

Provide a function to get the 32-bit xid from a bigint extended xid-with-epoch
as returned by txid_current() etc, for use in functions that expect an xid
argument.
---
 doc/src/sgml/func.sgml | 17 +++--
 src/backend/utils/adt/txid.c   | 56 ++
 src/include/catalog/pg_proc.h  |  2 ++
 src/test/regress/expected/txid.out | 70 ++
 src/test/regress/sql/txid.sql  | 40 ++
 5 files changed, 182 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 02b25f9..4ae621f 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -16956,6 +16956,11 @@ SELECT collation for ('foo' COLLATE "de_DE");
boolean
is transaction ID visible in snapshot? (do not use with subtransaction ids)
   
+  
+   txid_recent(bigint)
+   xid
+   return the 32-bit xid for a 64-bit transaction ID if it isn't wrapped around, otherwise return null
+  
  
 

@@ -16964,9 +16969,15 @@ SELECT collation for ('foo' COLLATE "de_DE");
 The internal transaction ID type (xid) is 32 bits wide and
 wraps around every