from:"Jeff Davis"

Re: [HACKERS] Reduce WAL logging of INSERT SELECT

2011-08-04 Thread Jeff Davis

On Thu, 2011-08-04 at 17:46 -0400, Bruce Momjian wrote:
> Right.  I brought up SELECT INTO because you could make the argument
> that INSERT ... SELECT is not a utility command like the other ones and
> therefore can't be done easily, but CREATE TABLE AS is internal SELECT
> INTO and implemented in execMain.c, which I think is where INSERT ...
> SELECT would also be implemented.

The above statement is a little confusing, so let me start from the
beginning:

How could we avoid WAL logging for INSERT ... SELECT?

The way we do it for CREATE TABLE AS is because nobody would even *see*
the table if our transaction doesn't commit. Therefore we don't need to
bother logging it. Same can be said for SELECT INTO.

INSERT ... SELECT is just an insert. It needs just as much logging as
inserting tuples any other way. For instance, it will potentially share
pages with other inserts, and better properly record all such page
modifications so that they return to a consistent state.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transient plans versus the SPI API

2011-08-04 Thread Jeff Davis

On Wed, 2011-08-03 at 13:07 -0400, Robert Haas wrote:
> A little OT here, but (as I think Simon said elsewhere) I think we
> really ought to be considering the table statistics when deciding
> whether or not to replan.  It seems to me that the overwhelmingly
> common case where this is going to come up is when (some subset of)
> the MCVs require a different plan than run-of-the-mill values.  It
> would be nice to somehow work that out.

That blurs the line a little bit. It sounds like this might be described
as "incremental planning", and perhaps that's a good way to think about
it.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transient plans versus the SPI API

2011-08-04 Thread Jeff Davis

On Wed, 2011-08-03 at 12:19 -0400, Tom Lane wrote:
> Of course we could address the worst cases by providing some mechanism
> to tell the plancache code "always use a generic plan for this query"
> or "always use a custom plan".  I'm not entirely thrilled with that,
> because it's effectively a planner hint and has got the same problems
> as all planner hints, namely that users are likely to get it wrong.

I'm not entirely convinced by that. It's fairly challenging for a human
to choose a good plan for a moderately complex SQL query, and its much
more likely that the plan will become a bad one over time. But, in many
cases, a developer knows if they simply don't care about planning time,
and are willing to always replan.

Also, we have a fairly reasonable model for planning SQL queries, but
I'm not sure that the model for determining whether to replan a SQL
query is quite as clear. Simon brought up some useful points along these
lines.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transient plans versus the SPI API

2011-08-04 Thread Jeff Davis

On Tue, 2011-08-02 at 16:47 -0400, Tom Lane wrote:
> The most straightforward way to reimplement things within spi.c would be
> to redefine SPI_prepare as just doing the parse-and-rewrite steps, with
> planning always postponed to SPI_execute.  In the case where you just
> prepare and then execute a SPIPlan, this would come out the same or
> better, since we'd still just do one planning cycle, but the planner could
> be given the actual parameter values to use.  However, if you SPI_prepare,
> SPI_saveplan, and then SPI_execute many times, you might come out behind.
> This is of course the same tradeoff we are going to impose at the SQL level
> anyway, but I wonder whether there needs to be a control knob available to
> C code to retain the old plan-once-and-always-use-that-plan approach.

Would there ultimately be a difference between the way SPI_prepare and
PQprepare work? It seems like the needs would be about the same, so I
think we should be consistent.

Also, I assume that SPI_execute and PQexecParams would always force a
custom plan, just like always, right?

A control knob sounds limited. For instance, what if the application
knows that some parameters will be constant over the time that the plan
is saved? It would be nice to be able to bind some parameters to come up
with a generic (but less generic) plan, and then execute it many times.
Right now that can only be done by inlining such constants in the SQL,
which is what we want to avoid.

I'm a little bothered by "prepare" sometimes planning and sometimes not
(and, by implication, "execute_plan" sometimes planning and sometimes
not). It seems cleaner to just separate the steps into parse+rewrite,
bind parameters, plan (with whatever parameters are present, giving a
more generic plan when some aren't specified), and execute (which would
require you to specify any parameters not bound yet). Maybe we don't
need to expose all of those steps (although maybe we do), but it would
be nice if the API we do offer resembles those steps.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] lazy vxid locks, v3

2011-08-04 Thread Jeff Davis

On Mon, 2011-08-01 at 12:12 -0400, Robert Haas wrote:
> I guess you could look at that way.  It just seemed like the obvious
> way to write the code: we do LockRefindAndRelease() only if we have a
> fast-path lock that someone else has pushed into the main table.

OK, looks good to me. Marked "ready for committer".

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] lazy vxid locks, v3

2011-08-01 Thread Jeff Davis

On Mon, 2011-08-01 at 08:12 -0400, Robert Haas wrote:
> > Is the "&& LocalTransactionIdIsValid(lxid)" a guard against calling
> > VirtualXactLockTableCleanup twice? Can that happen? Or is it just
> > defensive coding to avoid making an additional assumption?
> 
> lxid there is just a local variable storing the value that we
> extracted from fpLocalTransactionId while holding the lock.  I named
> it that way just as a mnemonic for the type of value that it was, not
> intending to imply that it was copied from MyProc->lxid.

I know, this is the other purpose of fpLocalTransactionId that I was
talking about. Is it just a guard against calling
VirtualXactLockTableCleanup twice?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI heap_insert and page-level predicate locks

2011-07-31 Thread Jeff Davis

On Wed, 2011-06-08 at 17:29 -0500, Kevin Grittner wrote:
> Heikki Linnakangas  wrote:
> > heap_insert() calls CheckForSerializableConflictIn(), which checks if
> > there is a predicate lock on the whole relation, or on the page we're
> > inserting to. It does not check for tuple-level locks, because there
> > can't be any locks on a tuple that didn't exist before.
> > AFAICS, the check for page lock is actually unnecessary. A page-level
> > lock on a heap only occurs when tuple-level locks are promoted. It is
> > just a coarser-grain representation of holding locks on all tuples on
> > the page, *that exist already*. It is not a "gap" lock like the index
> > locks are, it doesn't need to conflict with inserting new tuples on
> the 
> > page. In fact, if heap_insert chose to insert the tuple on some other
> > heap page, there would have been no conflict.
>  
> Absolutely correct.  Patch attached.

I like the change, but the comment is slightly confusing. Perhaps
something like:

"For a heap insert, we only need to check for table locks. Our new tuple
can't possibly conflict with existing tuple locks, and heap page locks
are only consolidated versions of tuple locks. The index insert will
check for any other predicate locks."

would be a little more clear? (the last sentence is optional, and I only
included it because the original comment mentioned indexes).

Same for heap_update().

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] lazy vxid locks, v3

2011-07-31 Thread Jeff Davis

On Wed, 2011-07-20 at 13:41 -0400, Robert Haas wrote:
> I took another look at v2 of my lazy vxid locks patch and realized
> that it was pretty flaky in a couple of different ways.  Here's a
> version that I think is a bit more robust, but considering the extent
> of the revisions, it probably needs another round of review from
> someone before I commit it.
> 
> Any review appreciated; I would prefer not to have to wait until
> October to get this committed, since there is quite a bit of follow-on
> work that I would like to do as well.  FWIW, the performance
> characteristics are basically identical to the previous versions,
> AFAICT.
> 

fpLocalTransactionId is redundant with the lxid, and the explanation is
that one that they have different locking semantics. That looks
reasonable, and it avoided the need for the careful ordering while
starting/ending a transaction that was present in v2.

However, it also looks like you're using it for another purpose:

In VirtualXactLockTableCleanup():
/*
 * If fpVXIDLock has been cleared without touching fpLocalTransactionId,
 * that means someone transferred the lock to the main lock table.
 */
if (!fastpath && LocalTransactionIdIsValid(lxid))

Is the "&& LocalTransactionIdIsValid(lxid)" a guard against calling
VirtualXactLockTableCleanup twice? Can that happen? Or is it just
defensive coding to avoid making an additional assumption?

Regards,
Jeff Davis

PS: In the recent sinval synch patch, you had a typo: "If we haven't
catch up completely". Other than that, it looked good.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] cheaper snapshots

2011-07-28 Thread Jeff Davis

On Thu, 2011-07-28 at 14:27 -0400, Robert Haas wrote:
> > Right, but if the visibility order were *defined* as the order in which
> > commit records appear in WAL, that problem neatly goes away.  It's only
> > because we have the implementation artifact that "set my xid to 0 in the
> > ProcArray" is decoupled from inserting the commit record that there's
> > any difference.
> 
> Hmm, interesting idea.  However, consider the scenario where some
> transactions are using synchronous_commit or synchronous replication,
> and others are not.  If a transaction that needs to wait (either just
> for WAL flush, or for WAL flush and synchronous replication) inserts
> its commit record, and then another transaction with
> synchronous_commit=off comes along and inserts its commit record, the
> second transaction will have to block until the first transaction is
> done waiting.  We can't make either transaction visible without making
> both visible, and we certainly can't acknowledge the second
> transaction to the client until we've made it visible.  I'm not going
> to say that's so horrible we shouldn't even consider it, but it
> doesn't seem great, either.

I'm trying to follow along here.

Wouldn't the same issue exist if one transaction is waiting for sync rep
(synchronous_commit=on), and another is waiting for just a WAL flush
(synchronous_commit=local)? I don't think that a synchronous_commit=off
is required.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Deferred partial/expression unique constraints

2011-07-25 Thread Jeff Davis

On Fri, 2011-07-22 at 23:35 +0300, Peter Eisentraut wrote:
> On ons, 2011-07-13 at 11:26 -0400, Tom Lane wrote:
> > Our standard reason for not implementing UNIQUE constraints on
> > expressions has been that then you would have a thing that claims to be
> > a UNIQUE constraint but isn't representable in the information_schema
> > views that are supposed to show UNIQUE constraints.  We avoid this
> > objection in the current design by shoving all that functionality into
> > EXCLUDE constraints, which are clearly outside the scope of the spec.
> 
> I have never heard that reason before, and I think it's a pretty poor
> one.  There are a lot of other things that are not representable in the
> information schema.

I think what Tom is saying is that the information_schema might appear
inconsistent to someone following the spec.

Can you give another example where we do something like that?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] range types and ip4r

2011-07-19 Thread Jeff Davis

On Tue, 2011-07-19 at 09:38 +0300, Peter Eisentraut wrote:
> Just wondering, will the planned range type functionality also be able
> to absorb the functionality of the ip4r type as a range of the ip4 type
> (http://pgfoundry.org/projects/ip4r)?  Maybe it's trivial, but since the
> ip types also have a kind of hierarchical structure, I figured I'd point
> it out in case you hadn't considered it.

Thanks for bringing that up.

It had briefly crossed my mind, but I didn't see any problem with it.
Does it use the hierarchical nature to manipulate the values at all, or
is it just a flat range?

If it's just a flat range it would be similar to int4range, I would
think.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest Status: Sudden Death Overtime

2011-07-18 Thread Jeff Davis

On Mon, 2011-07-18 at 15:59 -0400, Robert Haas wrote:
> On a pgbench run with 8
> clients on a 32-core machine, I see about a 2% speedup from that patch
> on pgbench -S, and it grows to 8% at 32 clients.  At 80 clients (but
> still just a 32-core box), the results bounce around more, but taking
> the median of three five-minute runs, it's an 11% improvement.  To me,
> that's enough to make it worth applying, especially considering that
> what is 11% on today's master is, in raw TPS, equivalent to maybe 30%
> of yesterday's master (prior to the fast relation lock patch being
> applied).  More, it seems pretty clear that this is the conceptually
> right thing to do, even if it's going to require some work elsewhere
> to file down all the rough edges thus exposed.  If someone objects to
> that, then OK, we should talk about that: but so far I don't think
> anyone has expressed strong opposition: in which case I'd like to fix
> it up and get it in.

Agreed. I certainly like the concept of the lazy vxid patch.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] lazy vxid locks, v2

2011-07-13 Thread Jeff Davis

On Tue, 2011-07-05 at 13:15 -0400, Robert Haas wrote:
> On Tue, Jul 5, 2011 at 1:13 PM, Robert Haas  wrote:
> > Here is an updated version of the "lazy vxid locks" patch [1], which
> > applies over the latest "reduce the overhead of frequent table
> > locks"[2] patch.
> >
> > [1] https://commitfest.postgresql.org/action/patch_view?id=585
> > [2] https://commitfest.postgresql.org/action/patch_view?id=572
> 
> And then I forgot the attachment.

The patch looks good, and I like the concept.

My only real comment is one that you already made: the
BackendIdGetProc() mechanism is not "awesome". However, that seems like
material for a separate patch, if at all.

Big disclaimer: I did not do any performance review, despite the fact
that this is a performance patch.

I see that there are some active performance concerns around this patch,
specifically that it may cause an increase in spinlock contention:

http://archives.postgresql.org/message-id/banlktikp4egbfw9xdx9bq_vk8dqa11w...@mail.gmail.com

Fortunately, there's a subsequent discussion that shows a lot of
promise:

http://archives.postgresql.org/pgsql-hackers/2011-07/msg00293.php

I'll mark this "waiting on author" pending the results of that
discussion.

I like the approach you're taking with this series of patches, so
perhaps we shouldn't set the bar so high that you have to remove all of
the bottlenecks before making any progress. Then again, maybe there's
not a huge cost to leaving these patches on the shelf until we're sure
that they lead somewhere.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [RRR] [HACKERS] Three patches which desperately need reviewers

2011-07-13 Thread Jeff Davis

On Thu, 2011-07-14 at 02:54 +0200, Florian Pflug wrote:
> On Jul14, 2011, at 02:42 , Josh Berkus wrote:
> > lazy vxid locks
> > https://commitfest.postgresql.org/action/patch_view?id=585
> 
> I can try to review that. It does seems to depend on
> the fastlock patch though, and that patch seems to be
> somewhat of a moving target. I'm thus not sure what the
> most reasonable approach is here. I could wait for the
> fastlock patch to be applied, but since the commitfest
> is drawing to a close that might not be the best course
> of action. How is such a situation handled usually?

Moving target? Hopefully not. I marked it "ready for committer" already.
I did give some feedback, but it's nothing that would get in the way of
reviewing the VXID patch.

So please do take a look at the VXID patch. Just apply it over the last
version of the fastlock patch.

I also plan to take a look, but the earliest will be tomorrow.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing the overhead of frequent table locks, v4

2011-07-12 Thread Jeff Davis

On Tue, 2011-07-12 at 13:32 -0500, Robert Haas wrote:
> On Jul 12, 2011, at 12:02 PM, Jeff Davis  wrote:
> > Yeah, I think you're right here. It's probably not much of a practical
> > concern.
> > 
> > I was slightly bothered because it seemed a little unpredictable. But it
> > seems very minor, and if we wanted to fix it later I think we could.
> 
> Yes, I agree. I think there are a number of things we could possibly 
> fine-tune, but it's not clear to me just yet which ones are really problems 
> or what the right solutions are.  I think once the basic patch is in and 
> people start beating on it we'll get a better feeling for which parts can 
> benefit from further engineering.

OK, marking "ready for committer" assuming that you will take care of my
previous complaints (the biggest one is that holdsStrongLockCount should
be boolean).

Disclaimer: I have done no performance review at all, even though this
is a performance patch!

I like the patch and I like the approach. It seems like the potential
benefits are worth the extra complexity, which seems manageable and
mostly isolated to lock.c.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing the overhead of frequent table locks, v4

2011-07-12 Thread Jeff Davis

On Tue, 2011-07-12 at 07:55 -0500, Robert Haas wrote:
> I haven't been that worried about overflow of the fast path table. If
> you are locking more than 16 relations at once, you probably have at
> least 5 tables in the query, maybe more - it depends in how many
> indexes you have, of course.  My assumption has been that at that
> point you're going to spend enough time planning and executing the
> query that the lock manager will no longer be a major bottleneck.  Of
> course, there might be cases where that isn't so.

Yeah, I think you're right here. It's probably not much of a practical
concern.

I was slightly bothered because it seemed a little unpredictable. But it
seems very minor, and if we wanted to fix it later I think we could.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing the overhead of frequent table locks, v4

2011-07-11 Thread Jeff Davis

 * ... It's also possible that
 * we're acquiring a second or third lock type on a relation we have
 * already locked using the fast-path, but for now we don't worry about
 * that case either.
 */

How common is that case? There are only 16 entries in the fast path lock
table, so it seems like it would frequently fill up. So, if there are
common code paths that acquire different weak locks on the same
relation, then we might commonly miss a fast-path opportunity.

One path that acquires multiple weak locks is an INSERT INTO foo
SELECT ... FROM foo ...

Is that common enough to worry about?

Regards,
    Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing the overhead of frequent table locks, v4

2011-07-10 Thread Jeff Davis

A few very minor things that I noticed:

1. You use pre-increment in "for" loops (e.g. FastPathGrantLock). The
rest of the code seems to use post-increment in "for" loops, so you
might as well stick to the convention in cases where the two have
identical meaning.

2. Typo in the README: "acquire the every" should be "acquire every".

3. Typo in comment in lock.c: "last because most" should be "last
because it's the most". (I just realized that's actually not your typo,
so fixing it is optional).

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing the overhead of frequent table locks, v4

2011-07-10 Thread Jeff Davis

On Mon, 2011-06-27 at 10:13 -0400, Robert Haas wrote:
> I didn't get a lot of comments on my the previous version of my patch
> to accelerate table locks.
> 
> http://archives.postgresql.org/pgsql-hackers/2011-06/msg00953.php
> 
> Here's a new version anyway.  In this version, I have:

I am trying to figure out holdsStrongLockCount. It's declared as an
integer, but (unless cscope is failing me) is only ever set to 0 or 1.
It's never incremented or decremented. It looks like it's supposed to be
a boolean indicating that the lock should decrement something in
FastPathStrongLocks when released.

Furthermore, in AtPrepare_Locks(), the comment says:

/*
 * Arrange not to release any strong lock count held by this lock
 * entry.  We must retain the count until the prepared transaction
 * is committed or rolled back.
 */
locallock->holdsStrongLockCount = 0;

But doesn't seem to "arrange" much, as far as I can tell.

I think the 2PC code is still correct, because it infers from the
lockmode that the FastPathStrongLocks counter needs to be incremented.
However, why doesn't other code (RemoveLocalLock is the only reader)
make a similar inference?

Can we just get rid of holdsStrongLockCount completely?

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extra check in 9.0 exclusion constraint unintended consequences

2011-07-10 Thread Jeff Davis

On Sun, 2011-07-10 at 00:36 -0400, Alvaro Herrera wrote:
> Is this really a good idea?  I think the note should still be there in
> 9.1 and beyond (with the version applicability note of course)

I see your point, but it also seems strange to keep such a note
permanently. And it also seems minor enough that we don't want it to be
another thing to keep track of.

I don't really have a strong opinion here. People might hit in in 9.0,
but there's a workaround. And they won't hit it in 9.1+.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extra check in 9.0 exclusion constraint unintended consequences

2011-07-09 Thread Jeff Davis

On Fri, 2011-07-08 at 22:51 -0400, Robert Haas wrote:
> I'm wondering if we might want to call this out with a  or
> similar...  especially if we're only going to put it into the 9.0
> docs.

Sure, sounds good.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Creating temp tables inside read only transactions

2011-07-09 Thread Jeff Davis

On Fri, 2011-07-08 at 23:39 -0700, Darren Duncan wrote:
> What if you used the context of the calling code and resolve in favor of 
> whatever match is closest to it?  The problem is related to general-purpose 
> programming languages.
> 
> Basically start looking in the lexical context for an "x" and if you find one 
> use that; otherwise, assuming we're talking about referencing code that lives 
> in 
> the database such as a function, look at the innermost schema containing the 
> referencing code and see if it has a direct child named "x"; otherwise go up 
> one 
> level to a parent schema, and so on until you get to the top, and finding 
> none 
> by then say it doesn't exist.

This is an example of where data languages and normal programming
languages have a crucial difference.

With a data language, you have this problem:
 1. An application uses a query referencing 'y.z.foo' that resolves to
internal object with fully-qualified name 'x.y.z'.
 2. An administrator creates object 'y.z.foo'.

Now, the application breaks all of a sudden.

In a normal prgramming language, if the schema of the two "foo"s are
different, the compiler could probably catch the error. SQL really has
no hope of catching it though.

PostgreSQL has this problem now in a couple ways, but it's much easier
to grasp what you might be conflicting with. If you have multiple nested
levels to traverse and different queries using different levels of
qualification, it gets a little more messy and I think a mistake is more
likely.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Creating temp tables inside read only transactions

2011-07-09 Thread Jeff Davis

On Fri, 2011-07-08 at 21:04 -0700, Darren Duncan wrote:
> > I think you should make more of an effort to understand how the system
> > works now, and why, before proposing radical redesigns.
> 
> Well yes, of course.  But that will take time and I think I already 
> understand 
> enough about it to make some useful contributions in the meantime.  How much 
> or 
> what I already know may not always come across well.  If this bothers people 
> then I can make more of an effort to reduce my input until I have more solid 
> things to back them up.

I don't think anyone expects you to understand all the internal APIs in
postgres before you make a proposal. But we do expect you to look
critically at your own proposals with the status quo (i.e. existing
code, users, and standards) in mind. And that probably means poking at
the code a little to see if you find stumbling blocks, and asking
questions to try to trace out the shape of the project.

I'm hoping that we can learn a lot from your work on Muldis D. In
particular, the type system might be the most fertile ground -- you've
clearly done some interesting things there, and I think we've felt some
pressure to improve the type system from a number of different
projects*.

Regards,
Jeff Davis

* That being said, PostgreSQL's type system is actually very good.
Consider the sophisticated type infrastructure (or at least plumbing
around the type system) required to make KNN-GiST work, for instance.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Creating temp tables inside read only transactions

2011-07-08 Thread Jeff Davis

On Fri, 2011-07-08 at 12:34 -0700, Darren Duncan wrote:
> Yes, but that would just be in-memory or in temporary places external to 
> every 
> database.  On disk internal to a database there would just be the oid.  In 
> fact, 
> another aspect of the database model I defined is that each "database" is 
> entirely self-contained; while you can do cross-database queries, you don't 
> have 
> cross-database constraints, in the general case.

Yes, you can have a "local oid" and a "fully-qualified oid". It sounds
like it might take some effort (which is an understatement) to go
through the system and figure out which ones should be local and which
ones should be fully-qualified.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Creating temp tables inside read only transactions

2011-07-08 Thread Jeff Davis

On Thu, 2011-07-07 at 23:21 -0700, Darren Duncan wrote:
> I think an even better way to support this is would be based on Postgres 
> having 
> support for directly using multiple databases within the same SQL session at 
> once, as if namespaces were another level deep, the first level being the 
> databases, the second level the schemas, and the third level the schema 
> objects.
> 
> Kind of like what the SQL standard defines its catalog/schema/object 
> namespaces.
> 
> This instead of needing to use federating or that contrib module to use 
> multiple 
> Pg databases of the same cluster at once.
> 
> Under this scenario, we make the property of a database being read-only or 
> read-write for the current SQL session associated with a database rather than 
> the whole SQL session.  A given transaction can read from any database but 
> can 
> only make changes to the ones not read-only.
> 
> Also, the proper way to do temporary tables would be to put them in another 
> database than the main one, where the whole other database has the property 
> of 
> being temporary.
> 
> Under this scenario, there would be separate system catalogs for each 
> database, 
> and so the ones for read-only databases are read-only, and the ones for other 
> databases aren't.
> 
> Then the system catalog itself fundamentally isn't more complicated, per 
> database, and anything extra to handle cross-database queries or whatever, if 
> anything, is a separate layer.  Code that only deals with a single database 
> at 
> once would be an optimized situation and perform no worse than it does now.

One challenge that jumps to mind here is that an Oid would need to
become a pair (catalog, oid). Even if the end result isn't much more
complex, getting there is not trivial.

> See also how SQLite works; this "mount" being analogous to their "attach".

I'm not sure SQLite is the best example. It has a radically different
architecture.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Creating temp tables inside read only transactions

2011-07-07 Thread Jeff Davis

On Thu, 2011-07-07 at 20:56 -0700, Darren Duncan wrote:
> > When you create a temporary table, PostgreSQL needs to add rows in
> > pg_class, pg_attribute, and probably other system catalogs. So there are
> > writes, which aren't possible in a read-only transaction. Hence the
> > error. And no, there is no workaround.
> 
> That sounds like a deficiency to overcome.
> 
> It should be possible for those system catalogs to be virtual, defined like 
> union views over similar immutable tables for the read-only database plus 
> mutable in-memory ones for the temporary tables.

Ideally, yes, from a logical standpoint there are catalog entries that
are only interesting to one backend.

But that doesn't mean it's easy to do. Remember that catalog lookups
(even though most go through a cache) are a path that is important to
performance. Also, more complex catalog interpretations may introduce
some extra bootstrapping challenges.

> Are there any plans in the works to do this?

I don't think so. It sounds like some fairly major work for a
comparatively minor benefit.

Suggestions welcome, of course, to either make the work look more minor
or the benefits look more major ;)

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extra check in 9.0 exclusion constraint unintended consequences

2011-07-07 Thread Jeff Davis

On Thu, 2011-07-07 at 12:36 -0400, Robert Haas wrote:
> I think it's probably too late to go fiddling with the behavior of 9.0
> at this point.  If we change the text of error messages, there is a
> chance that it might break applications; it would also require those
> messages to be re-translated, and I don't think the issue is really
> important enough to justify a change.

Good point on the error messages -- I didn't really think of that as a
big deal.

> I am happy to see us document
> it better, though, since it's pretty clear that there is more
> likelihood of hitting that error than we might have suspected at the
> outset.

Doc patch attached, but I'm not attached to the wording. Remember that
we only need to update the 9.0 docs, I don't think you want to apply
this to master (though I'm not sure how this kind of thing is normally
handled).

Regards,
Jeff Davis


excl-oper-doc.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

2011-07-06 Thread Jeff Davis

On Wed, 2011-07-06 at 15:14 -0400, Tom Lane wrote:
> > I ran into problems with that before... I think with the I/O functions.
> > I don't think that's a problem here, but I thought I'd ask.
> 
> I think it'd probably be all right to do that.  The places where you
> might find shortcuts being taken are where functions are called directly
> by C code, such as I/O function calls --- but these constructors should
> only ever get invoked from SQL queries, no?

Perhaps index expressions/predicates as well (which are also fine). I
was more worried about some case that I hadn't thought of.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Old postgresql repository

2011-07-06 Thread Jeff Davis

On Wed, 2011-07-06 at 16:31 +0200, Magnus Hagander wrote:
> When we did the migration to git, we decided to leave the old
> postgresql git repository around "for a while", for people who had
> clones around it. This is the repository that was live updated from
> cvs while we were using cvs, and does *not* correspond to the current
> git repository when it comes to hashes and other details. It's
> available at http://git.postgresql.org/gitweb?p=postgresql-old.git;a=summary
> 
> I think it's time to drop this repository now. Anybody who had active
> clones on it should've moved to the new repository by now, I think.

That's OK with me.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing the overhead of frequent table locks, v4

2011-07-06 Thread Jeff Davis

On Thu, 2011-06-30 at 19:25 -0400, Robert Haas wrote:
> I'm really hurting
> for is some code review.

I'm trying to get my head into this patch. I have a couple questions:

Does this happen to be based on some academic research? I don't
necessarily expect it to be; just thought I'd ask.

Here is my high-level understanding of the approach, please correct me
where I'm mistaken:

Right now, concurrent activity on the same object, even with weak locks,
causes contention because everything has to hit the same global lock
partition. Because we expect an actual conflict to be rare, this patch
kind of turns the burden upside down such that:
 (a) those taking weak locks need only acquire a lock on their own lock
in their own PGPROC, which means that it doesn't contend with anyone
else taking out a weak lock; and
 (b) taking out a strong lock requires a lot more work, because it needs
to look at every backend in the proc array to see if it has conflicting
locks.

Of course, those things both have some complexity, because the
operations need to be properly synchronized. You force a valid schedule
by using the memory synchronization guarantees provided by taking those
per-backend locks rather than a centralized lock, thus still avoiding
lock contention in the common (weak locks only) case.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

2011-07-06 Thread Jeff Davis

On Wed, 2011-07-06 at 12:51 -0400, Robert Haas wrote:
> On Wed, Jul 6, 2011 at 12:22 PM, Jeff Davis  wrote:
> > To get into some more details: how exactly would this constructor be
> > generated on the fly? Clearly we want only one underlying C function
> > that accepts something like:
> >  range_internal(lower, upper, flags, Oid rangetype)
> > So how do we get the rangetype in there?
> 
> I think that the C function could call get_call_result_type() and get
> the return type OID back via the second argument.

I'm also a little unclear on the rules for when that might be set
properly or not.

I ran into problems with that before... I think with the I/O functions.
I don't think that's a problem here, but I thought I'd ask.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

2011-07-06 Thread Jeff Davis

On Wed, 2011-07-06 at 09:10 -0400, Robert Haas wrote:
> > There's some slight ugliness around the NULL/infinity business, but I
> > think that I could be convinced. I'd like to avoid confusion between
> > NULL and infinity if possible.
> 
> I was thinking that if you passed 'i' for one of the bounds, it would
> ignore the supplied argument and substitute its special infinity
> value.  But you'd still need to supply some argument in that position,
> which could be NULL or anything else.  It doesn't really seem worth
> having additional constructor functions to handle the case where one
> or both arguments are infinite.

Right, that's what I assumed that you meant. I can't think of anything
better, either, because I like the fact that two arguments are there so
that you can visually see which sides are bounded/unbounded.

I suppose we could have constructors like:
  range(text, subtype)
and
  range(subtype, text)
where the text field is used to specify "infinity". But that has the
obvious problem "what if the subtype is text?". So, of course, we make a
special new pseudotype to represent infinity... ;)

But seriously, your idea is starting to look more appealing.

To get into some more details: how exactly would this constructor be
generated on the fly? Clearly we want only one underlying C function
that accepts something like:
  range_internal(lower, upper, flags, Oid rangetype)
So how do we get the rangetype in there? I suppose a default 4th
argument?

That would be kind of an interesting option, but what if someone
actually specified that 4th argument? We couldn't allow that.

Also, are default arguments always applied in all the contexts where
this function might be called?

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

2011-07-05 Thread Jeff Davis

On Tue, 2011-07-05 at 13:06 -0400, Robert Haas wrote:
> On Tue, Jul 5, 2011 at 12:54 PM, Jeff Davis  wrote:
> > It would be something like: range_co(1,8)::int8range
> >
> > (just so we're comparing apples to apples)
> >
> > The intermediate type proposal doesn't require that we move the "c" and
> > "o" into the parameter list.
> 
> Well, you have to specify the bounds somewhere...

That's true. In my example it's in the function name.

> OK, so let's pass the information on the bounds as a separate
> argument.  Like this:
> 
> int8range(1,8,'co')

That has a lot going for it, in the sense that it avoids dealing with
the type problems.

> Then you can instead pass 'o' for open or 'i' for infinity (passing
> NULL for the corresponding argument position in that case).  The third
> argument can be optional and default to 'cc'.

The fact that there can be a default for the third argument makes this
quite a lot more appealing than I had originally thought (although I
think 'co' is the generally-accepted default).

There's some slight ugliness around the NULL/infinity business, but I
think that I could be convinced. I'd like to avoid confusion between
NULL and infinity if possible.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

2011-07-05 Thread Jeff Davis

On Tue, 2011-07-05 at 11:26 -0400, Robert Haas wrote:
> How about the idea of creating a family of four constructor functions
> for each new range type?  The functions would be named after the range
> type, with "_cc", "_co", "_oc", and "_oo" appended.  So, then, instead
> of writing:
> 
> RANGE(1,8,'c','o')::int8range

It would be something like: range_co(1,8)::int8range

(just so we're comparing apples to apples)

The intermediate type proposal doesn't require that we move the "c" and
"o" into the parameter list.

> int8range_co(1,8)
> 
> ...which is both more compact and less ugly, IMHO, and seems to
> circumvent all the type system problems as well.

I brought that up before:
http://archives.postgresql.org/pgsql-hackers/2011-06/msg02046.php

It certainly circumvents the polymorphic type problems, but the problem
is that it adds up to quite a few permutations. Not only are there
cc/co/oc/oo, but there are also variations for infinite bounds and empty
ranges. So I think we're talking 10+ functions per range type rather
than 4.

Also, if someone has an idea for another constructor, like the one you
mention above:
  range(1,8,'c','o')
then they have to create it for every range type, and they can't
anticipate new range types that someone might create. In other words,
the constructors wouldn't benefit from the polymorphism. However, if we
used an intermediate type, then they could create the above constructor
and it would work for any range type automatically.

I don't object to this idea, but we'll need to come up with a pretty
exhaustive list of possibly-useful constructors.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Extra check in 9.0 exclusion constraint unintended consequences

2011-07-05 Thread Jeff Davis

In the 9.0 version of exclusion constraints, we added an extra check to
ensure that, when searching for a conflict, a tuple at least found
itself as a conflict. This extra check is not present in 9.1+.

It was designed to help diagnose certain types of problems, and is fine
for most use cases. A value is equal to itself (and therefore conflicts
with itself), and a value overlaps with itself (and therefore conflicts
with itself), which were the primary use cases. We removed the extra
check in 9.1 because there are other operators for which that might not
be true, like <>, but the use case is a little more obscure.

However, values don't always overlap with themselves -- for instance the
empty period (which was an oversight by me). So, Abel Abraham Camarillo
Ojeda ran into a rather cryptic error message when he tried to do that:

ERROR:  failed to re-find tuple within index "t_period_excl"
HINT:  This may be because of a non-immutable index expression.

I don't think we need to necessarily remove the extra check in 9.0,
because the workaround is simple: add a WHERE clause to the constraint
eliminating empty periods. Perhaps we could improve the error message
and hint, and add a note in the documentation.

Thoughts?

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

2011-07-05 Thread Jeff Davis

On Tue, 2011-07-05 at 10:06 -0400, Robert Haas wrote:
> > But if it's actually better, we should do it. If an intermediate type
> > seems to be problematic, or if people think it's strange to require
> > casting, then I think this is reasonable.
> 
> I don't understand how the bespoke syntax avoids the need for a cast?

It doesn't, it just avoids the need for an intermediate type.

What I meant was that it might be strange to require a cast on the
result of a function call, because we don't really do that anywhere
else. Florian pointed out that it's common to require casting the
ARRAY[] constructor, so that has more of a precedent. I'm not really
sure how much that matters.

I'm OK with the intermediate type, but Florian seems skeptical of that
idea.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

2011-06-30 Thread Jeff Davis

On Thu, 2011-06-30 at 12:28 +0200, Florian Pflug wrote:
> Well, arrays are containers, and we need two values to construct a range,

What about empty ranges? What about infinite ranges?

It seems quite a bit more awkward to shoehorn ranges into an array than
to use a real type (even if it's intermediate and otherwise useless).

> Hm, I guess. I'm sill no huge fan of RANGEINPUT, but if we prevent
> it from being used as a column type and from being used as an argument
> type, then I guess it's workable...
> 
> Btw, what happened to the idea of making RANGE(...) a special syntactic
> construct instead of a normal function call? Did we discard that for its
> intrusiveness, or were there other reasons?

It has not been discarded; as far as I'm concerned it's still on the
table. The main advantage is that it doesn't require an intermediate
type, and that requiring a cast (or some specification of the range
type) might be a little more natural. The downside is that, well, it's
new syntax, and there's a little inertia there.

But if it's actually better, we should do it. If an intermediate type
seems to be problematic, or if people think it's strange to require
casting, then I think this is reasonable.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

2011-06-30 Thread Jeff Davis

On Thu, 2011-06-30 at 09:59 -0700, David E. Wheeler wrote:
> On Jun 30, 2011, at 9:34 AM, Jeff Davis wrote:
> 
> > Then how do you get a text range that doesn't correspond to the
> > LC_COLLATE setting?
> 
> You cast it.

My original solution was something like this, except involving domains.
With a sufficient amount of casting of all arguments to anything
involving a range type, it works, but it's a little too un-SQL-like.
There was at least one fairly strong objection to my approach, but if
you have some further thoughts along that line, I'm open to suggestion.

Also, what if the LC_COLLATE is C, and you want to cast it to en_US
collation?
  range('a','Z')
would be invalid in the C locale, and it would fail before you had a
chance to cast it.

> Cast where you need it explicit, and have a reasonable default when
> it's not cast.

I thought about that, too, but it's not ideal, either. That means that
something might start out as the only range type for a given subtype,
and doesn't need explicit casts. Then you define another range type over
that subtype, and all the original queries break because they are now
ambiguous.

I think the fundamental differences with range types that we're dealing
with are:
 1. multiple range types might reasonbly exist for a single subtype
 2. the order is a fundamental part of the type definition, not just an
extra argument useful for operations on the range type

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

2011-06-30 Thread Jeff Davis

On Thu, 2011-06-30 at 09:58 -0700, David E. Wheeler wrote:
> On Jun 30, 2011, at 9:29 AM, Jeff Davis wrote:
> 
> > Right. In that respect, it's more like a record type: many possible
> > record types exist, but you only define the ones you want.
> 
> Well, okay. How is this same problem handled for RECORD types, then?

What problem, exactly? For a given list of subtypes, there is only one
valid record type.

Also, record is not a great example. The implementation uses at least
one pretty horrible hack.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Make the visibility map crash-safe.

2011-06-30 Thread Jeff Davis

On Thu, 2011-06-30 at 07:50 -0400, Robert Haas wrote:
> I compare the performance of commit
> 431ab0e82819b31fcd1e33ecb52c2cd3b4b41da7 (post-patch) with commit
> 431ab0e82819b31fcd1e33ecb52c2cd3b4b41da7 (pre-patch).

I believe that is a copy/paste error.

Regards,
    Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

2011-06-30 Thread Jeff Davis

On Wed, 2011-06-29 at 12:34 -0400, Robert Haas wrote:
> But now that I'm thinking about this a little more, I'm worried about this 
> case:
> 
> CREATE TABLE foo AS RANGE('something'::funkytype, 'somethingelse'::funktype);
> DROP TYPE funkytype;
> 
> It seems to me that the first statement had better fail, or else the
> second one is going to create a hopeless mess (imagine that a new type
> comes along and gets the OID of funkytype).

Interesting point. I don't think it's a problem because pseudo-types
can't be used that way, so that provides us a mechanism to stop it. But
it means that we have to be a little more sure that such values can't
persist anywhere.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

2011-06-30 Thread Jeff Davis

On Wed, 2011-06-29 at 10:15 -0700, David E. Wheeler wrote:
> On Jun 29, 2011, at 10:13 AM, Florian Pflug wrote:
> 
> > Because there might be more than one range type for a
> > base type. Say there are two range types over text, one
> > with collation 'de_DE' and one with collation 'en_US'.
> > What would the type of
> >  range('foo', 'f')
> > be?
> 
> The one that corresponds to the current LC_COLLATE setting.

Then how do you get a text range that doesn't correspond to the
LC_COLLATE setting? Does that mean you couldn't dump/reload from a
system with one collation and get the same values in a system with a
different collation? That would be very strange.

Or, what about other types that just happen to have multiple useful
total orders?

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

2011-06-30 Thread Jeff Davis

On Thu, 2011-06-30 at 09:11 +0200, Florian Pflug wrote:
> > How would the system catalogs be initialized under that theory: surely
> > you're not going to seed (nr. of types) * (nr. of collations) * (nr. of
> > opclasses) range types in initdb?
> 
> There's CREATE RANGE.

Right. In that respect, it's more like a record type: many possible
record types exist, but you only define the ones you want.

> By default, no range types would exists I believe.

I was planning to include _some_ by default. Probably not text ranges,
but integer and timestamp[tz] ranges. If nothing else, it makes it
easier to document.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

2011-06-29 Thread Jeff Davis

On Wed, 2011-06-29 at 08:52 -0400, Robert Haas wrote:
> On Tue, Jun 28, 2011 at 11:02 PM, Jeff Davis  wrote:
> > It's still not out of the question, but I thought that the intermediate
> > type would be a less-intrusive alternative (and Robert seemed concerned
> > about how intrusive it was).
> 
> I'm no great fan of our existing type system, and I'm not opposed to
> trying to improve it.  However, I'm a bit wary of the theory that we
> can just tweak X, Y, or Z and then everything will go more smoothly
> for range types.  I fear that there will be knock-on consequences that
> we'll spend a lot of time either (a) arguing about or (b) fixing.

Agreed.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

2011-06-29 Thread Jeff Davis

On Wed, 2011-06-29 at 13:35 +0200, Florian Pflug wrote:
> What I'm concerned about is how elegantly we'd be able to tie up all
> the loose ends. What'd be the result of
>   select range(1,2)
> for example? Or
>   create table (r rangeinput)
> for that matter.
> 
> I think we'd want to forbid both of these, and more or less every other
> use except
>   range(1,2)::
> but that seems to require special-casing RANGEINPUT in a lot of places.

We could make it a pseudo-type and make the IO functions generate
exceptions. That should prevent most mistakes and effectively hide it
from the user (sure, they could probably use it somewhere if they really
want to, but I wouldn't be worried about breaking backwards
compatibility with undocumented usage like that). There are plenty of
types that are hidden from users in one way or another -- trigger, void,
internal, fdw_handler, etc., so I don't see this as special-casing at
all.

That might make it slightly harder to document, but I think it can be
done. All we have to do is document the range constructors saying "you
must cast the result to a valid range type; trying to use the result of
these functions directly raises an exception". In fact, I think I'll
take back the "hard to document" claim from before: it will be pretty
easy to document, and if someone gets it wrong, we can throw a helpful
error and hint.

Robert didn't really seem to like the idea of throwing an error though
-- Robert, can you expand on your reasoning here?

I tend to lean toward throwing an error as well, but I don't really have
much of an opinion.

> If we don't restrict RANGEINPUT that way, I think we ought to provide
> at least a basic set of operators and functions for it - e.g.
> input, output, lower(), upper(), ...
> 
> *Pondering this*
> 
> But we can't do that easily, since RANGEINPUT would actually be a kind of
> VARIANT type (i.e. can hold values of arbitrary types). That's something
> that our type system doesn't really support. We do have RECORD, which is
> similar in a way, but its implementation is about as intrusive as it
> gets...

I don't want to go down the road of making this a fully supported type.
I don't see any use case for it at all, and I think it's a bad idea to
design something with no idea how people might want to use it.

> Is it? That's actually too bad, since I kinda like it. But anyway,
> if that's a concern it could also be
>   range_bounds(ARRAY[1,2]::int8range, '(]')

What type would the result of that be? What value?

> > * It still suffers similar problems as casting back and forth to text:
> > ANYARRAY is too general, doesn't really take advantage of the type
> > system, and not a great fit anyway.
> 
> I believe it alleviates the gravest problems of casting back and forth
> to text. It doesn't have quoting issues and it doesn't potentially lose
> information.

I think it still circumvents the type system to a degree. We're just
putting stuff in an array with no intention of really using it that way.

> In any case, I wouldn't expect this to *stay* the only way to construct
> a range forever. But I does have it's virtues for a first incarnation of
> range type, I believe, mostly because it's completely unintrusive and
> won't cause any backwards-compatbility headaches in the future

I'm not sure that your overloading of arrays is completely immune from
backwards-compatibility problems, should we decide to change it later.

But regardless, we have quite a lot of time to make a decision before
9.2 is released; so let's do it once and do it right.

> I fear that the intermediate type will turn out to be quite intrusive,
> at least if we try to handle all the corner cases and loose ends. And if
> we don't, I'm concerned that we're painting ourselves into a corner here...

Can you expand on some of the corner-cases and loose ends you're
concerned about? Does marking it as a pseudotype and making the IO
functions throw exceptions handle them?

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

2011-06-28 Thread Jeff Davis

On Tue, 2011-06-28 at 22:20 +0200, Florian Pflug wrote:
> Hm, so RANGEINPUT would actually be what was previously discussed as
> the "range as a pair of bounds" definition, as opposed to the
> "range as a set of values" definition. So essentially we'd add a
> second concept of what a "range" is to work around the range input
> troubles.
> 
> I believe if we go that route we should make RANGEINPUT a full-blown
> type, having "pair of bound" semantics. Adding a lobotomized version
> just for the sake of range input feels a bit like a kludge to me.

I think David Wheeler was trying to make a similar point, but I'm still
not convinced.

It's not a pair, because it can be made up of 0, 1, or 2 scalar values
(unless you count infinity as one of those values, in which case 0 or
2). And without ordering, it's not clear that those values are really
"bounds".

The type needs to:
 * represent two values, either of which might be a special infinite
value
 * represent the value "empty"
 * represent inclusivity/exclusivity of both values

and those things seem fairly specific to ranges, so I don't really see
what other use we'd have for such a type. But I'm open to suggestion.

I don't think that having an extra type around is so bad. It solves a
lot of problems, and doesn't seem like it would get in the way. And it's
only for the construction of ranges out of scalars, which seems like the
most natural place where a cast might be required (similar to casting an
unknown literal, which is fairly common).

> Alternatively, we could replace RANGEINPUT simply with ANYARRAY,
> and add functions ANYRANGE->ANYRANGE which allow specifying the
> bound operator (<, <= respectively >,>=) after construction.
> 
> So you'd write (using the functions-as-fields syntax I believe
> we support)
>   (ARRAY[1,2]::int8range).left_open.right_closed for '(1,2]'
> and
>   ARRAY[NULL,2]::int8range for '[-inf,2]'

I think we can rule this one out:
 * The functions-as-fields syntax is all but deprecated (or should be)
 * That's hardly a readability improvement
 * It still suffers similar problems as casting back and forth to text:
ANYARRAY is too general, doesn't really take advantage of the type
system, and not a great fit anyway.

> All assuming that modifying the type system to support polymorphic
> type resolution based on the return type is out of the question... ;-)

It's still not out of the question, but I thought that the intermediate
type would be a less-intrusive alternative (and Robert seemed concerned
about how intrusive it was).

There also might be a little more effort educating users if we selected
the function based on the return type, because they might think that
casting the inputs explicitly would be enough to get it to pick the
right function. If it were a new syntax like RANGE[]::int8range, then I
think it would be easier to understand.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

2011-06-28 Thread Jeff Davis

On Tue, 2011-06-28 at 09:30 -0700, David E. Wheeler wrote:
> On Jun 27, 2011, at 8:42 PM, Jeff Davis wrote:
> 
> > Do we think that this is a good way forward? The only thing I can think
> > of that's undesirable is that it's not normal to be required to cast the
> > result of a function, and might be slightly difficult to explain in the
> > documentation in a straightforward way
> 
> That's the part that bothers me.

Yeah, that bothered me, too. 

> I think that if it's not cast it should somehow be useful.

Let's see, what can one do with a range that has no ordering yet? ;)

Robert suggested that we don't need to throw an error, and I think I
agree. Just having a working output function solves most of the
documentation problem, because it makes it less abstract.

The only operators that we could really support are accessors, which
seems somewhat reasonable. However, I'd have some concerns even about
that, because if you do range(10,1), then what's the upper bound?

> Maybe default to a text range or something?

That sounds a little dangerous:
  select range('1','09')
would fail before it could be cast to int4range.

We could invent an UNKNOWNRANGE type or something. But I don't
particularly like that; it would start out working nicely when people
only had one textrange type, and then their old queries would start
failing when they added another range type based on text.

I think it's fine if the RANGEINPUT type isn't too useful by itself.
It's already a common requirement to cast unknown literals, and this
isn't too much different. It's only for constructors, so it still fits
pretty closely with that idea.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

2011-06-28 Thread Jeff Davis

On Tue, 2011-06-28 at 10:58 -0400, Robert Haas wrote:
> On Mon, Jun 27, 2011 at 11:42 PM, Jeff Davis  wrote:
> > So, in effect, RANGEINPUT is a special type used only for range
> > constructors. If someone tried to output it, it would throw an
> > exception, and we'd even have enough information at that point to print
> > a nice error message with a hint.
> 
> I don't think I like the idea of throwing an error when you try to
> output it, but the rest seems reasonably sensible.

I thought it might add a little confusion if people thought they had a
range type but really had RANGEINPUT. For instance, if you do a "create
table as select range(1,2)" then the result might be slightly
unexpected.

But it's probably no more unexpected than "create table as select
'foo'". So, I suppose there's not much reason to throw an error. We can
just output it in the same format as a range type.

It's also much easier to explain something in the documentation that has
an output format, because at least it's tangible.

> > Actually, this is pretty much exactly Florian's idea (thanks again,
> > Florian), but at the time I didn't like it because "pair" didn't capture
> > everything that I wanted to capture, like infinite bounds, etc. But
> > there's no reason that it can't, and your point made me realize that --
> > you are effectively just using TEXT as the intermediate type (which
> > works, but has some undesirable characteristics).
> 
> What undesirable characteristics?

Well, for one, outputting something as text and then reading it back in
does not always produce the same value. For instance, for float, it only
does that if you have extra_float_digits set to some high-enough value.
I suppose I could save the GUC, set it, and set it back; but that seems
like unnecessary ugliness.

There's also the deparsing/reparsing cycle. That might not really matter
for performance, but it seems unnecessary.

And there's always the fallback that "we have types for a reason".
Wouldn't it be odd if you wrote a query like:
  select range(1,2) || 'foo'
and it succeeded? I'm sure that kind of thing can lead to some dangerous
situations.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

2011-06-27 Thread Jeff Davis

On Mon, 2011-06-27 at 14:50 -0400, Robert Haas wrote:
> Couldn't we also do neither of these things?  I mean, presumably
> '[1,10]'::int8range had better work.

I think that if we combine this idea with Florian's "PAIR" suggestion
here:
http://archives.postgresql.org/message-id/ad4fc75d-db99-48ed-9082-52ee3a4d7...@phlo.org

then I think we have a solution.

If we add a type RANGEINPUT that is not a pseudotype, we can use that as
an intermediate type that is returned by range constructors. Then, we
add casts from RANGEINPUT to each range type. That would allow
  range(1,2)::int8range
to work without changing the type system around, because range() would
have the signature:
  range(ANYELEMENT, ANYELEMENT) -> RANGEINPUT
and then the cast would change it into an int8range. But we only need
the one cast per range type, and we can also support all of the other
kinds of constructors like:
  range_cc(ANYELEMENT, ANYELEMENT) -> RANGEINPUT
  range_linf_c(ANYELEMENT) -> RANGEINPUT
without additional hassle.

The RANGEINPUT type itself would hold similar information to actual
range types: the subtype OID (instead of the range type, because it's
not a range yet), optionally the two bounds (depending on the flags),
and the flags byte. The cast to a real range type would read the
subtype, and try to coerce the bounds to the subtype of the range you're
casting to, set the range type oid, leave the flags byte the same, and
it's done.

So, in effect, RANGEINPUT is a special type used only for range
constructors. If someone tried to output it, it would throw an
exception, and we'd even have enough information at that point to print
a nice error message with a hint.

Actually, this is pretty much exactly Florian's idea (thanks again,
Florian), but at the time I didn't like it because "pair" didn't capture
everything that I wanted to capture, like infinite bounds, etc. But
there's no reason that it can't, and your point made me realize that --
you are effectively just using TEXT as the intermediate type (which
works, but has some undesirable characteristics).

Do we think that this is a good way forward? The only thing I can think
of that's undesirable is that it's not normal to be required to cast the
result of a function, and might be slightly difficult to explain in the
documentation in a straightforward way.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

2011-06-27 Thread Jeff Davis

On Sun, 2011-06-26 at 22:29 -0700, Darren Duncan wrote:
> Tom Lane wrote:
> > Darren Duncan  writes:
> >> I believe that the best general solution here is for every ordered base 
> >> type to 
> >> just have a single total order, which is always used with that type in any 
> >> generic order-sensitive operation, including any ranges defined over it, 
> >> or any 
> >> ORDER BY or any <,>,etc.
> > 
> > We've spent years and blood on making sure that Postgres could support
> > multiple orderings for any datatype; and there are plenty of natural
> > examples for the usefulness of that.  So I'm not at all impressed by
> > any line of reasoning that starts out by baldly throwing that away.
> 
> I'm not saying that you can't use multiple orderings with a data type.  I'm 
> just 
> saying that the type only has *at most* one (possibly none) *native* 
> ordering, 
> which is what is used when you do something ordered-sensitive with the type, 
> such as have a range.

So, are you saying that it would be impossible to have a range that uses
a different ordering? What about ORDER BY? What about BTrees?

And if those things can use different orders for the same type, then
what is the difference between what you are suggesting and a default
ordering for the type (which we already support)?

I suppose it's hard to tell what you mean by "native".

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

2011-06-27 Thread Jeff Davis

On Mon, 2011-06-27 at 12:16 +0200, Florian Pflug wrote:
> I wouldn't take it that far. What I had in mind was to *only* support
> the case where the cast directly follows the function call, i.e. the case
>   f(...)::type

OK, so instead of writing:
range(lower(range(1,2)),upper(range(1,2)))::int8range

users would write:
range(lower(range(1,2)::int8range),upper(range(1,2)::int8range))::int8range

A little more verbose, but it seems like it wouldn't be a practical
problem in very many cases. Multiple levels of constructors seem like
they'd be fairly uncommon, and probably a case where a function should
be written anyway.

OK, I'll have to think about this a little more, but it seems like a
reasonable approach.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and length function

2011-06-27 Thread Jeff Davis

On Mon, 2011-06-27 at 12:25 +0200, Florian Pflug wrote:
> Does the current definition of length(range), i.e.
>   upper(range) - lower(range)
> deal correctly with open vs. closed ranges and unbounded ranges? I'm thinking
> that it probably doesn't - what would be the results of
>   length('[0,1]'::intrange) -- Should be 2
>   length('[0,1)'::intrange) -- Should be 1

I alluded to this problem in an earlier email.

I think this would need to be handled by the "canonical" function. If
the canonical function is specified to return values in [) or (] form,
then we'd get the behavior above.

However, it's a little strange, because for discrete ranges you probably
want cardinality, not length. I don't have a clear idea on exactly what
behavior users will expect in this case, which is a pretty good argument
to leave length() out.

>   length('[0,inf]'::intrange) -- Should be infinity, but ints can't
>  represent that, can't they?

That would throw an exception currently, for exactly the reason you
mention.

> If it cannot be easily made to support these cases, than I vote for
> removing it all together.

I now agree. I think you've brought up some good reasons for that. If
users write upper(r)-lower(r), then they know what the semantics will
be; or they can easily write their own length() function (perhaps
specific to a range type).

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and length function

2011-06-26 Thread Jeff Davis

On Mon, 2011-06-27 at 00:37 +0200, Florian Pflug wrote:
> I actually wouldn't expect there to be one. From what I gathered
> during the last discussion, the ideal behind range types is that they
> model sets of the form {x in T | a <= x < b} for arbitrary types
> T, with the only requirement being that T be ordered. To compute
> a length, you additionally need either an algebraic structure on
> T which defines an operation "minus", or some metric which defines
> distance(a,b). Both are *much* stronger concepts than simply being
> ordered. The problems you outline below seem to me to all root in
> this discrepancy.

I agree with you here. It does seem like supporting length() increases
the the complexity of range types quite a bit.

> Strings are a nice example of an ordered type on which no "intuitive"
> definition of either "s1 - s2" or "distance(s1,s2)" exists.

Another good point. There's no logical "length()" function at all for a
text range.

> > The length() function is obviously an
> > important function to provide.
> 
> 
> I'd say it isn't, but maybe I'm missing some use-case that you have
> in mind.

The reason I said that is because, if making only a single range type
for, say, timestamptz, I would make a length() function without even
thinking about it.

There are a few types of queries where that kind of thing is useful,
like billing based on the amount of time some resource is allocated to
you.

But I think you're right, it shouldn't be the responsibility of range
types. Perhaps I should leave length() as some inlinable SQL functions
like I mentioned, or perhaps I should remove them completely.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

2011-06-26 Thread Jeff Davis

On Mon, 2011-06-27 at 00:56 +0200, Florian Pflug wrote:
> Well, there actually *is* some precedence for that kind of top-down
> (form a syntactic perspective) type inference. We *enforce* the cast
> in 
>   array[]::
> and actually for a very similar reason - without the case, there's no
> way of knowing which type of empty array was meant. I think we also

That's a good point.

Although, I'm not sure whether that's an argument that we can make the
type system work as-is, or if it means that we should add syntax like
ARRAY[].

> special-case
>   'literal'::
> to use the input function of type directly, instead of first creating
> a text value and later casting it to .

That is certainly true. Quoted strings never start out as text, they
start out as "unknown" and wait for the type inference to determine the
type. I'm not entirely sure whether a quoted string followed by a cast
is briefly unknown and then cast, or if it's directly interpreted using
the cast's type input function.

I don't know if that's a good example though because it's near the end
of the line and there's no function call in between the arguments and
the cast. It might get more complex with cases like:

  range(lower(range(1,2)),upper(range(1,2)))::int8range

but maybe that can be done more easily than I think?

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and length function

2011-06-26 Thread Jeff Davis

On Sun, 2011-06-26 at 13:45 +0100, Greg Stark wrote:
> On Sun, Jun 26, 2011 at 8:18 AM, Jeff Davis  wrote:
> >  * it needs to know the result type of that function, which might not be
> > the subtype (for instance, for timestamp the difference type would be
> > interval)
> 
> What's the use case for the length() function? Is it for users to be
> able to display useful information about their ranges? Or is it for
> implementing things like GIST indexes?

Here I was talking about something for logical use, not GiST. It's
pretty common to want to know how long a range is.

> For the latter a length function that always returns a float might be
> more useful. Even if it isn't guaranteed to always be perfectly
> precise, that is if ranges of similar length sometimes returned
> identical values, at least it could be used for things like penalty().

I already have a function like that. It's actually a function that takes
the subtype and returns a float, and the GiST code does the subtraction.

But you're right, I could have a length function that always returns a
float instead, and that would do the job. Do you see an advantage?

If I had a length function that returned the subtype, I wouldn't need
that. Except for those pesky types like timestamp -- because then, even
if I had a length() function, I'd also need a total order on the
"interval" type.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

2011-06-26 Thread Jeff Davis

On Sun, 2011-06-26 at 00:57 -0700, Darren Duncan wrote:
> I believe that the best general solution here is for every ordered base type 
> to 
> just have a single total order, which is always used with that type in any 
> generic order-sensitive operation, including any ranges defined over it, or 
> any 
> ORDER BY or any <,>,etc.  The built-in generic text type would have exactly 1 
> system-defined collation that can't be changed, and it would be something 
> simple 
> and generic, such as simply sorting on the codepoint as integers.

Well, we're trying to support SQL, and SQL supports collations, so I
don't think we can just ignore that.

I also agree with Tom that it's not a good idea. My reasons are:

 * Practical considerations, such as having a bunch of cruft from
duplicated types all over the system. With sufficient changes to the
type system, maybe that could be overcome. Or perhaps domains could be
used to make that work for range types (sort of), but the result would
not be very consistent with the rest of the system.

 * It doesn't seem to be based in any mathematical argument. A type is a
set of values, and there's no reason it can't have several total orders;
or no total order at all. So it appears to just be piggybacking on the
type system infrastructure as a place to hold the metadata for a total
order.

 * Who's to say that a "compare" function is the only way to specify a
total order? There might be other interfaces that would support
something closer to a lexicographic sort. So, from a theoretical
standpoint, trying to attach a single notion of total order to a type
seems strange, because there might be multiple interfaces for specifying
even one total order.

 * It would require extra explicit type annotations. If you have 12 text
types, the only way to practically use any text type is to constantly
specify which more-specific text type it actually is (probably using
the :: operator). That is not necessarily a bad choice if starting a
language from scratch and forming the syntax in a way that it's
reasonable to do. But this is SQL, and lots of type annotations are
un-SQL-like.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Range Types and length function

2011-06-26 Thread Jeff Davis

Currently, there is no way to define a generic "length" function over
range types, which would give you the distance between the boundary
points.

It sounds simple, but the system actually needs quite a lot more
information to accomplish that:
 * a function that subtracts two values of the range's subtype
 * it needs to know the result type of that function, which might not be
the subtype (for instance, for timestamp the difference type would be
interval)
 * it needs to know the "zero" value of the subtype for empty ranges
 * it also needs to know how to canonicalize discrete ranges for
meaningful results -- what's the length of [10,10]? If you write a
difference "canonical" function should the result be different? I
suppose so.

Even if the system knows all of that, we might run into problems with
the type system, because if you have a generic function:
  f(anyrange) -> anyelement
how would it know whether "anyelement" should be the subtype (e.g. if
"f" is the function "upper") or the difference type (e.g. if "f" is the
function "length")?

My solution to all of this is somewhat simplistic, but the best idea I
have so far:

create function length(anyrange) returns anyelement
language sql as
$$
  select case when $1? then upper($1) - lower($1) else '0' end;
$$;

And then, for timestamp[tz] and date, just define specific functions for
those like:

create function length(tsrange) returns interval
language sql as
$$
  select case when $1? then upper($1) - lower($1) else '0 s' end;
$$;

In other words, special case the range types where the "difference type"
is not the same as the subtype, and rely on function overloading to sort
them out.

These work for the most part, but they have a few problems:

1. It assumes that "-" really means "minus" and is defined effectively
over the subtypes.

2. It assumes that '0' is valid input for the "zero" value of the
subtype.

3. If the difference type is not the same as the subtype, and you forget
to define the special-case function, then you are bound to get a cryptic
error.

I suppose the "right" way to solve these problems would be:

1. Force users to supply the "minus" function.

2. Force users to supply the "zero" value as a constant of the same type
as the minus function's return value.

3. Check to see if the minus function's return type is different from
the subtype. If so, automatically create a new entry in the catalog for
the "length" function.

I suppose it's not out of the question to do all of that work, but it
seems like a little much just to get the generic length() function.

I don't mind leaving it as-is, and I think it's a fairly reasonable
solution. But I thought I would re-open it for discussion in case
someone has a better idea. The length() function is obviously an
important function to provide.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Range Types, constructors, and the type system

2011-06-25 Thread Jeff Davis

Different ranges over the same subtype make sense when using different
total orders for the subtype. This is most apparent with text collation,
but makes sense (at least mathematically, if not practically) for any
subtype.

For instance:
 [a, Z)
is a valid range in "en_US", but not in "C", so it makes sense to have
multiple ranges over the same subtype with different collations.

But what if you have a function (like a constructor), of the form:
  (anyelement, anyelement) -> anyrange
? To work with the type system, you need to be able to figure out the
return type from the arguments; which means to support functions like
this we need a mapping from the subtype to the range type.
Unfortunately, that restricts us to one range type per subtype (this
isn't a problem for ARRAYs, because there is only one useful array type
for a given element type).

This problem first came up a while ago:
http://archives.postgresql.org/pgsql-hackers/2011-01/msg02788.php

My workaround was to use domains, but that's not a very clean solution
(you have to add a bunch of casts to make sure the right domain is
chosen). It became entirely unworkable with collations, because people
would be using different text collations a lot more frequently than,
say, a different ordering for timestamptz. Tom mentioned that here:

http://archives.postgresql.org/message-id/24831.1308579...@sss.pgh.pa.us

I think Florian proposed the most promising line of attack here:

http://archives.postgresql.org/message-id/ad4fc75d-db99-48ed-9082-52ee3a4d7...@phlo.org

by suggesting that functions of the form:
  (anyelement, [other non-anyrange arguments]) -> anyrange
might be expendable. After all, they are only useful for constructors as
far as we can tell. Other range functions will have an anyrange
parameter, and we can use the actual type of the argument to know the
range type (as well as the subtype).

Although it's very nice to be able to say:
  range(1,10)
and get an int4range out of it, it's not the only way, and it's not
without its problems anyway. For instance, to get an int8range you have
to do:
  range(1::int8, 10::int8)
or similar.

So, we could just make functions like:
  int4range(int4, int4)
  int8range(int8, int8)
  ...
when creating the range type, and it would actually be a usability
improvement.

There are at least a few constructors that would need to be made for
each rangetype: the constructor above, the singleton constructor,
constructors that have infinite bounds, the empty constructor, and all
of the permutations for inclusivity/exclusivity. That adds up to quite a
few catalog entries per range type.

We could reduce some of the permutations by using extra arguments
somehow, but that seems like it adds to the ugliness. This might also be
a time to revisit whether there is a better way to present all of these
constructors (rather than the _[co][co] suffixes to names, etc.).

Even if we're willing to put up with a bunch of catalog entries, it will
take a little creativity to figure out how to run the functions
generically from a fixed set of C functions.

Are there other thoughts or ideas about eliminating the need for generic
constructors like range()?

Another idea Florian suggested (in the same email) was the ability to
declare the return type of a function, and then use the declared type to
infer the argument types. That would be nice because you would just have
to do:
  range(1,10)::int8range
However, that's kind of backwards from how our type inference system
works now, and sounds like a big change.

Maybe we could consider a syntax addition for constructing range values?
That was kicked around briefly, but perhaps we should revisit the
possibilities there.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] heap_hot_search_buffer refactoring

2011-06-25 Thread Jeff Davis

On Fri, 2011-06-24 at 15:32 -0400, Robert Haas wrote:
> On Sun, Jun 19, 2011 at 2:16 PM, Robert Haas  wrote:
> > New patch attached, with that one-line change.
> 
> Jeff, are you planning to review this further?  Do you think it's OK to 
> commit?

1. Patch does not apply to master cleanly, and it's in unified format
(so I can't compare it against the old patch very easily). This review
is for the first patch, disregarding the "skip = !first_call" issue that
you already fixed. If you had other changes in the latest version,
please repost the patch.

2. Comment above heap_hot_search_buffer should be updated to document
that heapTuple is an out-parameter and document the behavior of
first_call

3. The logic around "skip" is slightly confusing to me. Here's my
description: if it's not an MVCC snapshot and it's not the first call,
then you don't actually want to fetch the tuple with the given tid or a
later one in the chain -- you want to fetch the _next_ tuple in the
chain or a later one in the chain. Some wording of that description in a
comment (either in the function's comment or near the use of "skip")
would help a lot. Also, if skip is true, then the tid _must_ be visible
according to the (non-MVCC) snapshot, correct? It might help if that was
apparent from the code/comments.

Other than that, it looks good.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] crash-safe visibility map, take five

2011-06-24 Thread Jeff Davis

On Thu, 2011-06-23 at 22:02 -0400, Robert Haas wrote:
> > 1. INSERT to a new page, marking it with LSN X
> > 2. WAL flushed to LSN Y (Y > X)
> > 2. Some persistent snapshot (that doesn't see the INSERT) is released,
> > and generates WAL recording that fact with LSN Z (Z > Y)
> > 3. Lazy VACUUM marks the newly all-visible page with PD_ALL_VISIBLE
> > 4. page is written out because LSN is still X
> > 5. crash

> I don't really think that's a separate set of assumptions - if we had
> some system whereby snapshots could survive a crash, then they'd have
> to be WAL-logged (because that's how we make things survive crashes).

In the above example, it is WAL-logged (with LSN Z).

> And anything that is WAL-logged must obey the WAL-before-data rule.
> We have a system already that ensures that when
> synchronous_commit=off, CLOG pages can't be flushed before the
> corresponding WAL record makes it to disk.

In this case, how do you prevent the PD_ALL_VISIBLE from making it to
disk if you never bumped the LSN when it was set? It seems like you just
don't have the information to do so, and it seems like the information
required would be variable in size.

> I guess the point you are driving at here is that a page can only go
> from being all-visible to not-all-visible by virtue of being modified.
>  There's no other piece of state (like a persistent snapshot) that can
> be lost as part of a crash that would make us need change our mind and
> decide that an all-visible XID is really not all-visible after all.
> (The reverse is not true: since snapshots are ephemeral, a crash will
> render every row either all-visible or dead.)  I guess I never thought
> about documenting that particular aspect of it because (to me) it
> seems fairly self-evident.  Maybe I'm wrong...

I didn't mean to make this conversation quite so hypothetical. My
primary points are:

1. Sometimes it makes sense to break the typical WAL conventions for
performance reasons. But when we do so, we have to be quite careful,
because things get complicated quickly.

2. PD_ALL_VISIBLE is a little bit more complex than other hint bits,
because the conditions under which it may be set are more complex
(having to do with both snapshots and cleanup actions). Other hint bits
are based only on transaction status: either the WAL for that
transaction completion got flushed (and is therefore permanent), and we
set the hint bit; or it didn't get flushed and we don't.

Just having this discussion has been good enough for me to get a better
idea what's going on, so if you think the comments are sufficient that's
OK with me.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] crash-safe visibility map, take five

2011-06-23 Thread Jeff Davis

On Thu, 2011-06-23 at 18:18 -0400, Robert Haas wrote:
> Lazy VACUUM is the only thing that makes a page all visible.  I don't
> understand the part about snapshots.

Lazy VACUUM is the only thing that _marks_ a page with PD_ALL_VISIBLE. 

After an INSERT to a new page, and after all snapshots are released, the
page becomes all-visible; and thus subject to being marked with
PD_ALL_VISIBLE by lazy vacuum without bumping the LSN. Note that there
is no cleanup action that takes place here, so nothing else will bump
the LSN either.

So, let's say that we hypothetically had persistent snapshots, then
you'd have the following problem:

1. INSERT to a new page, marking it with LSN X
2. WAL flushed to LSN Y (Y > X)
2. Some persistent snapshot (that doesn't see the INSERT) is released,
and generates WAL recording that fact with LSN Z (Z > Y)
3. Lazy VACUUM marks the newly all-visible page with PD_ALL_VISIBLE
4. page is written out because LSN is still X
5. crash

Now, the persistent snapshot is still present because LSN Z never made
it to disk; but the page is marked with PD_ALL_VISIBLE.

Sure, if these hypothetical persistent snapshots were transactional, and
if synchronous_commit is on, then LSN Z would be flushed before step 3;
but that's another set of assumptions. That's why I left it simple and
said that the assumption was "snapshots are released if there's a
crash".

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] crash-safe visibility map, take five

2011-06-22 Thread Jeff Davis

On Wed, 2011-06-22 at 21:53 -0400, Robert Haas wrote:
> On Wed, Jun 22, 2011 at 8:53 PM, Jeff Davis  wrote:
> > On Thu, 2011-06-16 at 23:17 -0400, Noah Misch wrote:
> >> 2. In the words of a comment added by the patch:
> >>  * The critical integrity requirement here is that we must never end up 
> >> with
> >>  * a situation where the visibility map bit is set, and the page-level
> >>  * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
> >>  * page modification would fail to clear the visibility map bit.
> >> It does this by WAL-logging the operation of setting a vm bit.  This also 
> >> has
> >> the benefit of getting vm bits set correctly on standbys.
> >
> > In the same function, there is also the comment:
> >
> > "We don't bump the LSN of the heap page when setting the visibility
> > map bit, because that would generate an unworkable volume of
> > full-page writes.  This exposes us to torn page hazards, but since
> > we're not inspecting the existing page contents in any way, we
> > don't care."
> >
> > It would be nice to have a comment explaining why that is safe with
> > respect to the WAL-before-data rule. Obviously torn pages aren't much of
> > a problem, because it's a single bit and completely idempotent.
> 
> That's exactly what I was trying to explain in the comment you cite.
> Feel free to propose a specific change...

Well, I was a little unsure, but here's my attempt:

The potential problems are:
1. Torn pages -- not a problem because it's a single bit and idempotent.
2. PD_ALL_VISIBLE bit makes it to disk before a WAL record representing
an action that makes the page all-visible. Depending on what action
makes a page all-visible:
  a. An old snapshot is released -- not a problem, because if there is a
crash all snapshots are released.
  b. Cleanup action on the page -- not a problem, because that will
create a WAL record and update the page's LSN before setting the
PD_ALL_VISIBLE.

First of all, is that correct? Second, are there other cases to
consider?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] crash-safe visibility map, take five

2011-06-22 Thread Jeff Davis

On Thu, 2011-06-16 at 23:17 -0400, Noah Misch wrote:
> 2. In the words of a comment added by the patch:
>  * The critical integrity requirement here is that we must never end up with
>  * a situation where the visibility map bit is set, and the page-level
>  * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
>  * page modification would fail to clear the visibility map bit.
> It does this by WAL-logging the operation of setting a vm bit.  This also has
> the benefit of getting vm bits set correctly on standbys.

In the same function, there is also the comment:

"We don't bump the LSN of the heap page when setting the visibility
map bit, because that would generate an unworkable volume of
full-page writes.  This exposes us to torn page hazards, but since
we're not inspecting the existing page contents in any way, we
don't care."

It would be nice to have a comment explaining why that is safe with
respect to the WAL-before-data rule. Obviously torn pages aren't much of
a problem, because it's a single bit and completely idempotent.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-21 Thread Jeff Davis

On Mon, 2011-06-20 at 13:43 -0400, Tom Lane wrote:
> The other viable alternative seems to be to require those two properties
> (btree opclass and collation) to be part of a specific range type
> definition.  The complaint about that seemed to be that we couldn't
> infer an ANYRANGE type given only ANYELEMENT, but could we alleviate
> that by identifying one range type as the default for the base type,
> and then using that one in cases where we have no ANYRANGE input?

Yes, that sounds similar to Florian's suggestion, and I think there may
be a solution down this path. However, if we're going to have range
types with non-default orderings, then we need a way to construct them.

I suggested that, if constructors are the primary problem case, then
just generate non-polymorphic constructors at range type definition
time, named after the range type name. I'll look into that approach.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-21 Thread Jeff Davis

On Mon, 2011-06-20 at 12:54 -0700, Darren Duncan wrote:
> That DOMAIN-based solution ostensibly sounds like a good one then, under the 
> circumstances.

It's not bad from a theoretical standpoint, but it does require some
extra type annotation, which is not really the "SQL way".

>   What I *don't* want to see is for things like ranges to have 
> their own collations and the like.

I'm not 100% sure what you mean here. If you mean that you don't want
range types to pay attention to COLLATE clauses, etc., then I agree. I
would also agree if you mean that range values should not carry the
collation with them.

However, it looks like we might try to make the opclass/collation pair a
property of the range type definition. That seems nice, because it
allows us to keep the nice properties of ranges as well as the type
inference and polymorphism for everything except the constructors.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-20 Thread Jeff Davis

On Mon, 2011-06-20 at 16:01 +0200, Florian Pflug wrote:
> Hm, I'm starting to wonder if there isn't a way around that. It seems that
> this restriction comes from the desire to allow functions with the
> polymorphic signature
>   (ANYELEMENT, ANYELEMENT) -> ANYRANGE.
> 
> The only such function I can currently come up with is the generic
> range constructor. Is having that worth the restriction to one
> range type per base type?

Good point.

Having constructors is obviously important, but perhaps they don't have
to be generic. We could generate catalog entries for each constructor
for each range type, and name them after the range type itself. So,
instead of:
  range(1, 10)
you'd write:
  int4range(1,10)

That actually might be better anyway, because relying on the polymorphic
version is not perfect now anyway. For instance, if you want an
int8range using the generic range() constructor, you need a cast.

We'd still need to get the polymorphic type system to work the way we
want in this case. I'll look into that.

> Another option might be to extend polymorphic argument matching
> to allow functions with the signature
>   () -> 
> but to require the concrete output type to be specified with a cast
> at the call site. For the generic range constructor, you'd then
> have to write
>   RANGE(lower, upper)::range_type

Interesting idea. 

> A third approach might be to first define a PAIR type and then
> define ranges on top of that. Since PAIR types wouldn't include
> a comparison operators, the restriction to one PAIR type per
> base type wouldn't matter. Instead of a generic RANGE constructor
> you'd then use the generic PAIR constructor and cast the resulting
> PAIR to whatever range you desire, i.e. write
>   PAIR(lower, upper)::range_type.

Another interesting idea. A little awkward though, and doesn't offer
much opportunity to specify inclusivity/exclusivity of the bounds.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-19 Thread Jeff Davis

On Sun, 2011-06-19 at 21:29 +0200, Florian Pflug wrote:
> If I'm not mistaken about this, that would imply that we also cannot
> have two range types with the same base type, the same opclass,
> but different collations. Which seems rather unfortunate... In fact,
> if that's true, maybe restricing range types to the database collation
> would be best...

Yes, we cannot have two range types with the same base type. That is a
consequence of the polymorphic type system, which needs to be able to
determine the range type given the base type.

A workaround is to use domains. That is effective, but awkward. For
instance, given:
  CREATE DOMAIN textdomain AS text;
  CREATE TYPE textdomainrange AS RANGE (subtype=textdomain);
then:
  '[a,z)'::textdomainrange @> 'b'::textdomain
would work, but:
  '[a,z)'::textdomainrange @> 'b'
would not, which would be annoying.

I don't see a way around this. It's not a collation problem, but a
general "multiple range types with the same subtype" problem.

I don't think there's much benefit in restricting it to the DB
collation. If someone really needs a different collation (or opclass,
for that matter), it might as well be allowed, even if you have to do
extra type annotations.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-19 Thread Jeff Davis

On Sun, 2011-06-19 at 12:24 +0200, Martijn van Oosterhout wrote:
> Collation checking is generally done by the planner. I don't see why
> the input function should check, the result of an input function is by
> definition DEFAULT. It's up to the 'in' operator to check.
> 
> Note that the whole idea of collation is not really supposed to be
> assigned to object for storage.  How that can be resolved I'm not sure.

I think if we just say that it's a property of the range type
definition, then that's OK. It's similar to specifying a non-default
btree opclass for the range type -- it just changes which total order
the range type adheres to.

If you meant that the collation shouldn't be stored along with the value
itself, then I agree.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] heap_hot_search_buffer refactoring

2011-06-19 Thread Jeff Davis

On Sun, 2011-06-19 at 10:50 -0700, Jeff Davis wrote:
> On Mon, 2011-06-06 at 14:03 -0400, Robert Haas wrote:
> > The attached patch refactors heap_hot_search_buffer() so that
> > index_getnext() can use it, and modifies index_getnext() to do so.
> 
> Attached is a version of the patch that applies cleanly to master.

In heap_hot_search_buffer:

  +   /* If this is not the first call, previous call returned
 a (live!) tuple */
  if (all_dead)
  -   *all_dead = true;
  +   *all_dead = !first_call;

I think that's a typo: it should be:

  +   *all_dead = first_call;

Right?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] heap_hot_search_buffer refactoring

2011-06-19 Thread Jeff Davis

On Mon, 2011-06-06 at 14:03 -0400, Robert Haas wrote:
> The attached patch refactors heap_hot_search_buffer() so that
> index_getnext() can use it, and modifies index_getnext() to do so.

Attached is a version of the patch that applies cleanly to master.

Regards,
    Jeff Davis


heap-hot-search-buffer.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-18 Thread Jeff Davis

On Sat, 2011-06-18 at 22:19 +0200, Florian Pflug wrote:
> Yes, that seems necessary for consistency. That leaves the question
> of what to do if someone tries to modify a textrange's collation with
> a COLLATE clause. For example,
> 
> For example, whats the result of
>   'Ä' in '[A,Z']::textrange_german COLLATE 'C'
> where 'Ä' is a german Umlaut-A which sorts after 'A' but before 'B'
> in locale 'de_DE' but sorts after 'Z' in locale 'C'. (I'm assuming
> that textrange_german was defined with collation 'de_DE').
> 
> With the set-based definition of ranges, the only sensible thing
> is to simply ignore the COLLATE clause I think.

I think rejecting it makes more sense, so a range would not be a
collatable type; it just happens to use collations of the subtype
internally.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-18 Thread Jeff Davis

On Fri, 2011-06-10 at 00:26 +0200, Florian Pflug wrote:
> Maybe that check should just be removed? If one views the range
> '[L, U)' as a concise way of expressing "L <= x AND x < U" for some
> x, then allowing the case L > U seems quite natural. There won't
> be any such x of course, but the range is still valid, just empty.

[ Please excuse the late reply, I was on vacation. ]

That's an interesting perspective, but I don't think it's a good idea. 

Up to this point, I've considered a range value to be a set of
contiguous values, and the endpoints just happen to be a way to
represent that set. If changing the collation changes a set of positive
cardinality into an empty set, clearly it's a different value.

We don't want the COLLATE clause to change the value, because things
that do change the value (like a typecast) should offer the opportunity
to call a function so that you can verify that it's valid or change it
to some canonical form.

So, I believe that you are proposing to change the concept of a range
value from "a contiguous set of values" to "a pair of bounds". There are
numerous implications, one of which is that I don't think that we can
maintain the equality of all empty ranges. Consider these expressions,
where x is a non-empty range with collation "A", but is empty in
collation "B" (and "*" means "range intersection"):

  (x COLLATE "B") COLLATE "A"
  ((x COLLATE "B") * '(-Inf, Inf)') COLLATE "A"
  ('-'::textrange * '(-Inf, Inf)') COLLATE "A"

All of those expressions should be equal (according to global
substitutibility, as Darren mentioned). But they can't be, because the
last expression is always an empty range, whereas the first one is not
(because merely changing the collation back and forth offers no
opportunity to even notice that you have an empty range at one point).
So, I believe that we'd be stuck with non-equal empty ranges, as well as
many other possibly non-intuitive implications.

So, I lean strongly toward the interpretation that a range is a
contiguous set of values, and changing the collation should not change
the value. Things that do change the value (like a typecast) should
offer the opportunity to handle cases like this with a function call,
but changing collation does not.

This leaves making the collation a part of the range type itself (as
Robert suggested).

Comments?

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-08 Thread Jeff Davis

On Tue, 2011-06-07 at 10:20 -0700, Jeff Davis wrote:
> > BTW, Jeff, have you worked out the implications of collations for
> > textual range types?
> 
> Well, "it seems to work" is about as far as I've gotten.
> 
> As far as the implications, I'll need to do a little more research and
> thinking. But I don't immediately see anything too worrisome.

I take that back :(

It looks like the type input function may be a problem, because it
doesn't look like it knows what the collation is yet. In other words,
PG_GET_COLLATION() is zero for the type input function.

But I need to do a comparison to find out if the range is valid or not.
For instance:
  '[a, Z)'::textrange
is valid in "en_US" but not "C".

The range constructor:
  range('a', 'Z')
is fine though.

Not sure what to do here.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-07 Thread Jeff Davis

On Mon, 2011-06-06 at 14:42 -0700, Darren Duncan wrote:
> On this note, here's a *big* thing that needs discussion ...

[ refering to the concept of "discrete" versus "continuous" ranges ]

Yes, there has been much discussion on this topic already.

The solution right now is that they both behave like continuous ranges
for most operations. But each time a value is produced, a discrete range
has a "canonicalize" function that aligns it to the proper boundaries
and chooses a convention from [], [), (], (). For discrete ranges that's
only a convention, because multiple representations are equal in value,
but that's not so for continuous ranges.

Another approach would be to offer "next" and "prev" functions instead
of "canonical", or a "plus(thetype, integer)" and "minus(thetype,
integer)".

> Can Pg be changed to support "." in operator names as long as they don't just 
> appear by themselves?  What would this break to do so?

Someone else would have to comment on that. My feeling is that it might
create problems with qualified names, and also with PG's "arg.function"
call syntax.

> >>foo in 1..10

> I believe it is quite reasonable to treat ranges like sets, in an abstract 
> sense, and so using set membership syntax like "in" is valid.

OK, I think I agree with this now. I'll think about it some more.

> I also see these as considerably less important and useful in practice than 
> the 
> continuous intervals.

[ multiranges ]

Agreed. I've left those alone for now, because it's a separate concept.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-07 Thread Jeff Davis

On Tue, 2011-06-07 at 11:15 -0400, Tom Lane wrote:
> Merlin Moncure  writes:
> > right. hm -- can you have multiple range type definitions for a
> > particular type?
> 
> In principle, sure, if the type has multiple useful sort orderings.

Right. Additionally, you might want to use different "canonical"
functions for the same subtype.

> I don't immediately see any core types for which we'd bother.

Agreed.

> BTW, Jeff, have you worked out the implications of collations for
> textual range types?

Well, "it seems to work" is about as far as I've gotten.

As far as the implications, I'll need to do a little more research and
thinking. But I don't immediately see anything too worrisome.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Jeff Davis

On Mon, 2011-06-06 at 16:45 +, Christopher Browne wrote:
> How to slice it apart into an appropriate admixture of core and
> extensions is a good question, though it seems pretty likely that
> having an extension for each data type that is to be mixed into a
> range is a reasonable way to go.

...

> Per-type extensions offers a pretty natural partitioning of the code
> for each type, which seems pretty good.

Ideally, most range types can be created with a simple:

CREATE TYPE foorange AS RANGE (subtype=foo);

There might be a few subtype-specific functions, like the canonical
function, but overall it should be a small amount of code per range.
However, I'd say just bundle a bunch of rangetypes together in one
extension. There's not really much cost -- if you are using one range
type, you'll use a few more.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Jeff Davis

On Mon, 2011-06-06 at 18:28 +0200, Pavel Stehule wrote:
> we can define a step
> 
> FOREACH x IN RANGE . BY 

That wouldn't need any of the range infrastructure at all -- it would be
purely syntactic, right?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Jeff Davis

On Mon, 2011-06-06 at 14:42 +0200, Dimitri Fontaine wrote:
> I think the way things are going to be organised now is that we will
> have core-blessed extensions:  don't mix the mechanism and the policy.

I like that idea.

> > non-issue if we had a good type interface system (that works on
> > polymorphic types) -- we could just have a built-in "range" interface,
> > and the range extension could add "&&" as the range interface's overlaps
> > operator for the type ANYRANGE.
> 
> That means that this is, IMHO, the right approach.  Have core support
> that enables user defined RANGE types with indexing and planner support,
> etc, like we have OPERATOR CLASS and FAMILY and all the jazz.

If we take the minimal approach, the index support would be the first to
be moved to an extension. In order to have index support in core, we
need quite a few functions and a significant amount of code.

Minimal would be:
  * CREATE TYPE ... AS RANGE
  * ANYRANGE
  * The IO functions
  * Possibly the constructors and accessors ( range(),
range_oc(), lower(), upper(), etc.)

Regarding the type interfaces, the only thing that really worries me
there is that my future work will depend on them existing, and I haven't
really thought through the details. For instance, it just occurred to me
recently that it would need to support polymorphic types, which might be
a little bit more complicated than a simple lookup.

I suppose it's easier to put a few functions in core later if we get
stuck than to rip them out later.

> And the useful stuff you need to have to benefit from that core support
> would be an extension.  It could be a core maintained extension, and it
> could even get installed by default, so that all the users would need to
> do is 'CREATE EXTENSION timeranges;', for example.

Sounds good to me. However, would the extension be available in
pg_regress? If not, I will need to include those constructors/accessors
to be able to test anything.

> I think the consensus is to instead add a new chapter (maybe between
> current chapters 9. Functions and Operators and 10. Type Conversion) and
> host “core extensions” docs there.  The source code organisation is
> controversial because technically not necessary.  We have to keep the
> work Greg did to keep those contribs shipped by default.  Oh, and that
> is on the 9.1 Open Items, right?

OK, so there are still a few things to be decided around documentation
and tests. Both of those things can take a significant amount of time to
rework, so I think I'll leave it alone until we have more of a
consensus.

We still have time before 9.2 to break some of the code out into an
extension when we do have the doc/test issues resolved.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Jeff Davis

On Sun, 2011-06-05 at 21:51 -0700, Darren Duncan wrote:
> Jeff Davis wrote:
> > I'd like to take another look at Range Types and whether part of it
> > should be an extension. Some of these issues relate to extensions in
> > general, not just range types.
> > 
> > First of all, what are the advantages to being in core?
> 
> I believe that ranges aka intervals are widely useful generic types, next 
> after 
> relations/tuples/arrays, and they *should* be supported in core, same as 
> arrays are.

I think we all agree that ranges are important. I am not suggesting that
we sacrifice on the semantics to make it an extension; I'm just trying
to see if involving extensions for some of the approximately 5000 lines
would be a good idea.

> Now assuming that a range/interval value is generally defined in terms of a 
> pair 
> of endpoints of some ordered type (that is, a type for which ORDER BY or RANK 
> or 
> {<,>,<=,>=} etc or LIMIT makes sense), it will be essential that this value 
> is 
> capable of distinguishing open and closed intervals.

Right, it already does that explicitly. I'd appreciate your input on
some of the previous discussion though.

> Also, if Postgres has some concept of type-generic special values -Inf and 
> +Inf 
> (which always sort before or after any other value in the type system), those 
> can be used as endpoints to indicate that the interval is unbounded.

I already introduced +/- infinity to range types. They are not generic
outside of ranges, however -- therefore you can't select the upper bound
of an upper-infinite range.

> Unless you have some other syntax in mind, I suggest lifting the range 
> literal 
> syntax from Perl 6, where ".." is an infix operator building a range between 
> its 
> arguments, and a "^" on either side means that side is open, I think; so 
> there 
> are 4 variants: {..,^..,..^,^..^}.

Oh, interesting syntax. That might make a good operator version of a
constructor. Unfortunately, "." is not valid in an operator name in PG.
Maybe I can use tilde or dash?

> Any operation that wants to deal with a range somehow, such as the BETWEEN 
> syntax, could instead use a range/interval; for example, both of:
> 
>foo in 1..10

I don't know if it's reasonable to introduce syntax like "in" here.
Maybe we could just still use "between" and it would recognize that the
RHS is a range?

> The LIMIT clause could take a range to specify take and skip count at once.

Interesting idea.

> Array slicing can be done using foo[first..last] or such.

I like that, but we already have foo[3:7], so it might be better not to
introduce redundancy. Too bad I can't use ":" as an operator.

> A random number generator that takes endpoints can take a range argument.

Sounds useful because it would make it more explicit whether the
endpoints are possible results.

> An array or relation of these range can represent ranges with holes, and the 
> general results of range union operations.

Right, that's been brought up before as well. In particular, Scott
Bailey has done some thinking/writing on this topic.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Jeff Davis

On Mon, 2011-06-06 at 06:56 +0200, Pavel Stehule wrote:
> 2011/6/6 Darren Duncan :
> > Jeff Davis wrote:
> >>
> >> I'd like to take another look at Range Types and whether part of it
> >> should be an extension. Some of these issues relate to extensions in
> >> general, not just range types.
> >>
> >> First of all, what are the advantages to being in core?
> 
> it should be supported by FOREACH statement in PL/pgSQL

Oh, good idea. It would only work for discrete ranges though.

However, I would need to somehow reintroduce the concept of "next",
which has some hazards to it (as Tom pointed out, we don't want someone
to define the "next" for a float to be "+1.0"). I'll have to think about
this.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assert failure when rechecking an exclusion constraint

2011-06-05 Thread Jeff Davis

On Sun, 2011-06-05 at 15:09 -0400, Tom Lane wrote:
> so once we've set the index as the currentlyReindexedIndex, there's
> no need for it still to be in pendingReindexedIndexes.

OK. The second version of the patch looks good to me.

Regards,
    Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Range Types and extensions

2011-06-05 Thread Jeff Davis

I'd like to take another look at Range Types and whether part of it
should be an extension. Some of these issues relate to extensions in
general, not just range types.

First of all, what are the advantages to being in core?

1. ANYRANGE + CREATE TYPE ... AS RANGE
--
This is the most compelling, in my opinion. People can define new range
functions and new range types independently and each one gets the
benefit of the other automatically. Without this, there will be an
explosion of functions and a bunch of inconsistencies like functions
that support most range types but not all (merely because the function
author didn't know that the type existed).

In the several talks that I've given, a common question is related to
"multiranges" (ranges with holes). These get a little complex, and I
don't have a complete answer. However, multiranges can be approximated
with ordered arrays of non-overlapping, non-adjacent ranges. If someone
wants to take it upon themselves to develop a set of operators here,
that would be great -- but without ANYRANGE the operators would be
unmanageable.

2. Documentation and Tests
--
Let's say we take a minimalist view, and only have ANYRANGE and CREATE
TYPE ... AS RANGE in core; and leave the rest as an extension.

What exactly would the documentation say? I think it would be even more
hypothetical and abstract than the documentation for Exclusion
Constraints. So, there is a certain documentation advantage to having at
least enough functionality to allow someone to try out the feature.

And the tests for such a minimalist feature would be a significant
challenge -- what do we do there? Get pg_regress to load the extension
from PGXN?


3. Quality
--
PostgreSQL has a great reputation for quality, and for good reason. But
extensions don't follow the same quality-control standards; and even if
some do, there is no visible stamp of approval. So, to ask someone to
use an extension means that they have to evaluate the quality for
themselves, which is a pretty high barrier.

Since PGXN (thanks David Wheeler) and EXTENSIONs (thanks Dmitri) solve
many of the other issues, quality control is one of the biggest ones
remaining. I still get questions about when the temporal type will be
"in core", and I think this is why.

I don't think this is a good excuse to put it in core though. We need to
solve this problem, and the best way to start is by getting
well-reviewed, high-quality extensions out there.


4. Future work -- RANGE KEY, RANGE FOREIGN KEY, RANGE MERGE JOIN, etc.
-
There are a few aspects of range types that aren't in the first patch,
but are fairly obvious follow-up additions. These will require some
knowledge about ranges in the backend, like finding the "overlaps"
operator for a range. The current patch provides this knowledge by
providing a built-in overlaps operator for ANYRANGE. This would be a
non-issue if we had a good type interface system (that works on
polymorphic types) -- we could just have a built-in "range" interface,
and the range extension could add "&&" as the range interface's overlaps
operator for the type ANYRANGE.

=

So, where on this spectrum should range types fall? I think the most
minimalist would be to only support #1 (and the necessary type IO
functions); and leave all other functions, operators, and opclasses to
an extension. That has a lot of appeal, but I don't think we can ignore
the challenges above.

On the other hand, trying to make it a complete feature in core has
challenges as well. For instance, even with Range Types, Exclusion
Constraints aren't practical out-of-the-box unless we also have
BTree-GiST in core. So there's a snowball effect.

There might also be some middle ground, where its like the minimalist
approach, but with a few very basic constructors and accessors. That
would at least make it easier to test, but then to be actually useful
(with index support, operators, fancy functions, etc.) you'd need the
extension.

Thoughts?

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assert failure when rechecking an exclusion constraint

2011-06-05 Thread Jeff Davis

On Sun, 2011-06-05 at 14:17 -0400, Tom Lane wrote:
> Attached are two versions of a patch to fix this.  The second one
> modifies the code that tracks what's "pending" as per the above thought.
> I'm not entirely sure which one I like better ... any comments?

I think I'm missing something simple: if it's not in the pending list,
what prevents systable_beginscan() from using it?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] storing TZ along timestamps

2011-06-02 Thread Jeff Davis

On Thu, 2011-06-02 at 20:28 -0400, Robert Haas wrote:
> But that doesn't seem like enough, because if someone adds '1 day',
> knowing the offset isn't sufficient to figure out the answer.  You
> have to know where the DST boundary is.

Good point, I guess the timezone itself needs to be stored. That's a
little unfortunate, because timezones are somewhat of a moving target
(which I think was Tom's point).

That means that we'd need an entire history (and future?) of timezone
definitions, and apply the timezone definition as of the associated
timestamp to get the offset. Or, should we apply the timezone definition
as of the "real" time the value was entered?

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] storing TZ along timestamps

2011-06-02 Thread Jeff Davis

On Fri, 2011-05-27 at 16:43 -0400, Alvaro Herrera wrote:
> One of our customers is interested in being able to store original
> timezone along with a certain timestamp.

Another thing to consider is that this will eliminate any useful total
order.

You could define an arbitrary total order, of course, just to allow
BTrees for equality searches. However, I don't think you should define
">" (and other non-equality comparator operators) according to that
total order -- they should be more hidden like "~>~". ">" should not
exist as an operator over this type at all.

I also do not like the idea of having "=" mean "equivalent after
timezone adjustment". If we're making a distinction between "2000-01-01
10:00:00 +03" and "2000-01-01 9:00:00 +02", then "=" should not obscure
that distinction.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] storing TZ along timestamps

2011-06-02 Thread Jeff Davis

On Thu, 2011-06-02 at 18:46 +, Christopher Browne wrote:
> > 1. How would the time-zone be defined in this composite? Offset from GMT?
> > Timezone (well, link thereto) with all DST rules intact? Would "extract"
> > need to be modified to include the ability to grab the timezone?
> 
> That doesn't seem appropriate, because timezones are not always
> represented by strict offsets from GMT.  Some frequently-used
> timezones represent variable offsets.  ("EDT/EST", I'm looking at
> you!)

In conjunction with a specific timestamp, a timezone does strictly map
to a single offset.

That is, unless it's a timestamp in the future, and someone decides to
adjust a timezone before the timestamp actually occurs. But that's a
problem with the current timestamptz implementation anyway...

> > Since this isn't going to alter my current beloved timestamptz and I don't
> > have a use-case I leave the decisions on the above to others. But in my
> > imagined use-cases I still see the originating zone as a separate piece of
> > information better handled as a different column - for example sorting by
> > timestamp plus priority or selecting everything for a specific time zone.

I have a similar inclination. ">" seems like the fundamental operation
you'd want to perform on any timestamp (perhaps more so than equality),
and that's not well-defined if there is no total order (but several
meaningful partial orders).

However, I do see some nice benefits, too. The main one is that you can
easily get either local time or GMT out of it. So you can answer queries
such as "which of these activities occurred outside of normal business
hours" as well as "which of these events happened first". It would take
a little care to use properly, however.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] storing TZ along timestamps

2011-06-01 Thread Jeff Davis

On Fri, 2011-05-27 at 16:43 -0400, Alvaro Herrera wrote:
> Hi,
> 
> One of our customers is interested in being able to store original
> timezone along with a certain timestamp.

I assume that you're talking about a new data type, not augmenting the
current types, correct?

Regards,
    Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tackling full page writes

2011-05-24 Thread Jeff Davis

On Tue, 2011-05-24 at 16:34 -0400, Robert Haas wrote:
> As I think about it a bit more, we'd
> need to XLOG not only the parts of the page we actually modifying, but
> any that the WAL record would need to be correct on replay.

I don't understand that statement. Can you clarify?

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] What was the exact version of PostgreSQL where the column name length changed from 31 to 63 characters?

2011-04-26 Thread Jeff Davis

On Tue, 2011-04-26 at 18:35 +, Dann Corbit wrote:
> I need to know so that I can handle cases like:
> 
> Create table foolongcols(
>   nevermindthefurthermorejustpleadinselfdefense char(5), 
>   nevermindthefurthermorejustpleadguilty char(5)
> );
> 
> I assume that other object names (table name, function name, etc.) are 
> similarly affected.  Is that correct?

It was changed in this commit:

http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=46bb23ac016714065711cf2a780e080c7310d66e

which was first released in 7.3.0, as far as I can tell.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch for pg_upgrade to turn off autovacuum

2011-04-22 Thread Jeff Davis

On Fri, 2011-04-22 at 17:34 -0400, Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian  writes:
> > > I thought some more about this and I don't want autovacuum to run on the
> > > old server.  This is because pg_dumpall --binary-upgrade --schema-only
> > > grabs the datfrozenxid for all the databases at the start, then connects
> > > to each database to gets the relfrozenxids.  I don't want to risk any
> > > advancement of either of those during the pg_dumpall run.
> > 
> > Why?  It doesn't really matter --- if you grab a value that is older
> > than the latest, it's still valid.  As Robert said, you're
> > over-engineering this, and thereby introducing potential failure modes,
> > for no gain.
> 
> Uh, I am kind of paranoid about pg_upgrade because it is trying to do
> something Postgres was never designed to do.  I am a little worried that
> we would be assuming that pg_dumpall always does the datfrozenxid first
> and if we ever did it last we would have relfrozenxids before the
> datfrozenxid.  I am worried if we don't prevent autovacuum on the old
> server that pg_upgrade will be more fragile to changes in other parts of
> the system.

If we back-patch the "-b" to 8.3, then we can always use it on both the
old and new systems. Upgrading to the latest patch-level on both old and
new should be a prerequisite for pg_upgrade anyway.

That would turn the catalog check from a special case (use "-b"
sometimes, other times don't; which could cause fragility and bugs),
into just another sanity check with an easy workaround ("your postgres
doesn't support '-b', upgrade to the latest patch-level before
upgrading").

One of the things I like about the design of pg_upgrade is that it
doesn't seem to have a lot of special cases for different version
combinations.

What do you think?

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch for pg_upgrade to turn off autovacuum

2011-04-21 Thread Jeff Davis

On Thu, 2011-04-21 at 18:22 -0400, Bruce Momjian wrote:
> I can also control the
> behavior based on the catalog version number, which seems the most
> logical.

It seems like we want a simple "use -b if available; else don't". Is
that right?

If so, switching based on the version seems reasonable. However, can you
get the information directly from the bianry, rather than trying to
infer it from the catalog version?

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade bug found!

2011-04-08 Thread Jeff Davis

On Fri, 2011-04-08 at 15:03 -0400, Bruce Momjian wrote:
> A fix will be included in upcoming Postgres releases 8.4.8 and 9.0.4.
> These releases will remove the need for the above script by correctly
> updating all TOAST tables in the migrated databases.

You might want to clarify that the fix may be required if you ever used
pg_upgrade before. Using the new version of pg_upgrade/dump when you
still have a bad relfrozenxid doesn't help.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade bug found!

2011-04-08 Thread Jeff Davis

On Fri, 2011-04-08 at 13:35 -0400, Noah Misch wrote:
> > 1. Make relfrozenxid go backward to the right value. There is currently
> > no mechanism to do this without compiling C code into the server,
> > because (a) VACUUM FREEZE will never move the relfrozenxid backward; and
> > (b) there is no way to find the oldest xid in a table with a normal
> > snapshot.
> 
> Couldn't you set relfrozenxid and datfrozenxid to txid_current() - 11
> (the highest possible vacuum_freeze_min_age, plus some slop), then run "SET
> vacuum_freeze_table_age = 0; VACUUM tbl" on all tables for which you did this?
> There's no need to set relfrozenxid back to a particular "right" value. 

That's a good point that we don't need relfrozenxid to really be the
right value; we just need it to be less than or equal to the right
value. I don't think you need to mess around with
vacuum_freeze_table_age though -- that looks like it's taken care of in
the logic for deciding when to do a full table vacuum.

This has the additional merit that transaction IDs are not needlessly
removed; therefore leaving some forensic information if there are
further problems.

> 
> Suppose that your next xid at pg_upgrade time was 500M, and it's now 505M.  If
> you're using the default vacuum_freeze_min_age = 50M, "SET
> vacuum_freeze_table_age = 0; VACUUM tbl" will only freeze tuples covering 5M
> transaction ids.

If the pg_upgrade time was at txid 500M, then the relfrozenxid of the
toast table will be about 500M. That means you need to get rid of all
xids less than about 500M (unless you already fixed relfrozenxid,
perhaps using the process you mention above).

So if you only freeze tuples less than about 455M (505M - 50M), then
that is wrong.

The only difference really is that you don't really need to freeze those
last 5M transactions since the upgrade happened.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade bug found!

2011-04-08 Thread Jeff Davis

On Fri, 2011-04-08 at 07:08 -0400, Noah Misch wrote:
> > Right, VACUUM FREEZE.  I now see I don't need to set
> > vacuum_freeze_table_age if I use the FREEZE keyword, e.g. gram.y has:
> > 
> > if (n->options & VACOPT_FREEZE)
> > n->freeze_min_age = n->freeze_table_age = 0;
> 
> True; it just performs more work than strictly necessary.  We don't actually
> need earlier-than-usual freezing.  We need only ensure that the relfrozenxid
> will guide future VACUUMs to do that freezing early enough.  However, I'm not
> sure how to do that without directly updating relfrozenxid, so it's probably
> just as well to cause some extra work and stick to the standard interface.

If there are tuples in a toast table containing xids that are older than
the toast table's relfrozenxid, then there are only two options:

1. Make relfrozenxid go backward to the right value. There is currently
no mechanism to do this without compiling C code into the server,
because (a) VACUUM FREEZE will never move the relfrozenxid backward; and
(b) there is no way to find the oldest xid in a table with a normal
snapshot.

2. Get rid of those xids older than relfrozenxid (i.e. VACUUM FREEZE). 

I don't know what you mean about VACUUM FREEZE doing extra work. I
suppose you could set the vacuum_freeze_min_age to be exactly the right
value such that it freezes everything before the existing (and wrong)
relfrozenxid, but in practice I think it would be the same amount of
work.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade bug found!

2011-04-08 Thread Jeff Davis

On Thu, 2011-04-07 at 22:21 -0400, Bruce Momjian wrote:
> One concern I have is that existing heap tables are protecting clog
> files, but once those are frozen, the system might remove clog files not
> realizing it has to freeze the heap tables too.

I don't understand. Can you elaborate?

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade bug found!

2011-04-07 Thread Jeff Davis

On Thu, 2011-04-07 at 20:14 -0400, Bruce Momjian wrote:
> So I think we have four possible approaches to correct databases:
> 
>   1) SELECT * to set the hint bits
>   2) VACUUM to set the hint bits
>   3) VACUUM FREEZE to remove the old xids
>   4) some complicated function
> 
> I don't like #4, and I think I can script #2 and #3 in psql by using COPY
> to create a VACUUM script and then run it with \i.  #1 is easy in a DO
> block with PL/pgSQL.

The only one that sounds very reasonable to me is #3. If there are any
xids older than the relfrozenxid, we need to get rid of them. If there
is some reason that doesn't work, I suppose we can consider the
alternatives. But I don't like the hint-bit-setting approach much.

What if the xmax is really a transaction that got an exclusive lock on
the tuple, rather than actually deleting it? Are you sure that a SELECT
(or even a normal VACUUM) would get rid of that xid, or might something
still try to look it up in the clog later?

Not only that, but hint-bit-setting is not WAL-logged, so you'd really
have to do a checkpoint afterward.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade bug found!

2011-04-07 Thread Jeff Davis

On Thu, 2011-04-07 at 12:38 -0700, Jeff Davis wrote: 
> > Any idea how to correct existing systems?  Would VACUUM FREEZE of just
> > the toast tables work?
> 
> VACUUM FREEZE will never set the relfrozenxid backward. If it was never
> preserved to begin with, I assume that the existing value could be
> arbitrarily before or after, so it might not be updated.

Now that I understand the problem a little better, I think VACUUM FREEZE
might work, after all.

Originally, I thought that the toast table's relfrozenxid could be some
arbitrarily wrong value. But actually, the CREATE TABLE is issued after
the xid of the new cluster has already been advanced to the xid of the
old cluster, so it should be a "somewhat reasonable" value.

That means that VACUUM FREEZE of the toast table, if there are no
concurrent transactions, will freeze all of the tuples; and the
newFrozenXid should always be seen as newer than the existing (and
wrong) relfrozenxid. Then, it will set relfrozenxid to newFrozenXid and
everything should be fine. Right?

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade bug found!

2011-04-07 Thread Jeff Davis

On Thu, 2011-04-07 at 17:06 -0400, Bruce Momjian wrote:
> I want to avoid anything that requires a compile because they are hard
> for many sites to install so TransactionIdPrecedes() is out.  We will
> need to do this in PL/pgSQL probably.

PL/pgSQL can't see dead rows, so that would not be correct. It's
guaranteed to be the same value you see from the heap or newer; because
if it's not visible in the heap, it's not going to be visible in the
toast table.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade bug found!

2011-04-07 Thread Jeff Davis

On Thu, 2011-04-07 at 15:46 -0400, Bruce Momjian wrote:
> OK, so the only other idea I have is to write some pretty complicated
> query function that does a sequential scan of each toast table and pulls
> the earliest xmin/xmax from the tables and use that to set the
> relfrozenxid (pretty complicated because it has to deal with the freeze
> horizon and wraparound).

That sounds like the correct way to fix the situation, although it's a
little more work to install another function just for this one-time
purpose. TransactionIdPrecedes() should already account for wraparound,
so I don't think that it will be too complicated (make sure to read
every tuple though, not just the ones currently visible).

Stepping back a second to make sure I understand the problem: the only
problem is that relfrozenxid on the toast table after an upgrade is
wrong. Correct?

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade bug found!

2011-04-07 Thread Jeff Davis

On Thu, 2011-04-07 at 12:16 -0400, Bruce Momjian wrote:
> Bruce Momjian wrote:
> > OK, thanks to RhodiumToad on IRC, I was able to determine the cause of
> > the two reported pg_upgrade problems he saw via IRC.  It seems toast
> > tables have xids and pg_dump is not preserving the toast relfrozenxids
> > as it should.  Heap tables have preserved relfrozenxids, but if you
> > update a heap row but don't change the toast value, and the old heap row
> > is later removed, the toast table can have an older relfrozenxids than
> > the heap table.
> > 
> > The fix for this is to have pg_dump preserve toast relfrozenxids, which
> > can be easily added and backpatched.  We might want to push a 9.0.4 for
> > this.  Second, we need to find a way for people to detect and fix
> > existing systems that have this problem, perhaps looming when the
> > pg_class relfrozenxid passes the toast relfrozenxid, and thirdly, we
> > need to figure out how to get this information to users.  Perhaps the
> > communication comes through the 9.0.4 release announcement.
> 
> I am not sure how to interpret the lack of replies to this email. 
> Either it is confidence, shock, or we told you so.  ;-)
> 
> Anyway, the attached patch fixes the problem.  The fix is for pg_dump's
> binary upgrade mode.  This would need to be backpatched back to 8.4
> because pg_migrator needs this too.
> 
> I have added a personal regression test to show which
> pg_class.relfrozenxid values are not preserved, and with this patch the
> only ones not preserved are toast tables used by system tables, which
> are not copied from the old cluster (FirstNormalObjectId = 16384).  I am
> attaching that old/new pg_class.relfrozenxid diff as well.
> 
> Any idea how to correct existing systems?  Would VACUUM FREEZE of just
> the toast tables work?

VACUUM FREEZE will never set the relfrozenxid backward. If it was never
preserved to begin with, I assume that the existing value could be
arbitrarily before or after, so it might not be updated.

I think that after you VACUUM FREEZE the toast table, then the real
oldest frozen xid (as opposed to the bad value in relfrozenxid for the
toast table) would have to be the same or newer than that of the heap.
Right? That means you could safely copy the heap's relfrozenxid to the
relfrozenxid of its toast table.

> I perhaps could create a short DO block that
> would vacuum freeze just toast tables;  it would have to be run in every
> database.

Well, that won't work, because VACUUM can't be executed in a transaction
block or function.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

< 2 3 4 5 6 7 8 9 10 11 >

601 - 700 of 1617 matches

Mail list logo