[HACKERS] Getting consistent snapshot in multiple backends, for parallel pg_dump

2009-11-07 Thread Heikki Linnakangas
Me  Simon got talking about the difficulty of doing parallel pg_dump,
where when you open multiple connections you must somehow ensure that
all the connections use the same snapshot so that you get a consistent
dump. We came up with a pretty simple way to do that:

1. Open N+1 the connections to the server
2. In one of them, grab ProcArrayLock in shared mode
3. In all other connections, begin a (serializable) transaction.
4. Release ProcArrayLock.

Because we're holding the ProcArrayLock across 2-4, all the connections
get at step 3 will get the same snapshot. That's exactly what we want
for doing a parallel pg_dump.

A difficulty with that is that we need some way to hold a lwlock until
the client tells to release it. You can't hold a lwlock over command
boundaries. But that's surely solvable, e.g by sleeping in the backend
with the lock held until signaled by another backend. With a timeout to
make sure we don't block indefinitely if the client crashes or something.

I'm not planning to do anything with this at the moment, but wanted to
get the idea out there and archived. It would be nice to see someone
implement parallel pg_dump similar to parallel pg_restore using this.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Getting consistent snapshot in multiple backends, for parallel pg_dump

2009-11-07 Thread Simon Riggs
On Sat, 2009-11-07 at 11:36 +0100, Heikki Linnakangas wrote:
 Me  Simon got talking about the difficulty of doing parallel pg_dump,
 where when you open multiple connections you must somehow ensure that
 all the connections use the same snapshot so that you get a consistent
 dump. We came up with a pretty simple way to do that:
 
 1. Open N+1 the connections to the server
 2. In one of them, grab ProcArrayLock in shared mode

 3. In all other connections, begin a (serializable) transaction.

 4. Release ProcArrayLock.
 
 Because we're holding the ProcArrayLock across 2-4, all the connections
 get at step 3 will get the same snapshot. That's exactly what we want
 for doing a parallel pg_dump.
 
 A difficulty with that is that we need some way to hold a lwlock until
 the client tells to release it. You can't hold a lwlock over command
 boundaries. But that's surely solvable, e.g by sleeping in the backend
 with the lock held until signaled by another backend. With a timeout to
 make sure we don't block indefinitely if the client crashes or something.

How about this

* In parent session, run 
SELECT synchronize_snapshots('master_name',N);
synchronize_snapshots grabs ProcArrayLock and sets a ref count to N,
then waits until ref count is 0 before releasing ProcArrayLock. No need
to wait across a command.

* In N child sessions, begin serializable xact then run
SELECT snapshot_taken('master_name');
which decrements the ref count.

We protect ref count using a spin lock. Ref count is given a name, so
that we can tell apart concurrent requests for synchronize_snapshots()

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Uninitialized Data in WAL records generated in heap_(insert|update|delete)

2009-11-07 Thread Andres Freund
Hi,

While checking some other code I used valgrind and noticed, as I had before, 
that XLogInsert showed accesses to unitialized data.
After some searching and playing around I found the source of that:
heap_insert uses a struct xl_heap_insert which in turn has a xl_heaptid member 
- which is padded.
COMP_CRC32 will read most of xl_heap_insert (excluding its trailing padding) 
and thus generates valgrind warnings...

Questions:
* I don't actually see any real danger in that - correct?
* valgrind is quite usefull for investigating some issues, has a patch 
conditionally zeroing or annotating those structs any chances? 

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] operator exclusion constraints

2009-11-07 Thread Jeff Davis
On Fri, 2009-11-06 at 21:23 -0500, Tom Lane wrote:
 Or maybe forget about it and go to EXCLUDE or EXCLUDING?

I left it as EXCLUSION for now. EXCLUDING USING ... and EXCLUSIVE
USING ... both sound a little awkward to me. Either could be improved
by moving the USING clause around, but that just creates more grammar
headaches.

EXCLUDE probably flows most nicely with the optional USING clause or
without. My only complaint was that it's a transitive verb, so it seems
to impart more meaning than it actually can. I doubt anyone would
actually be more confused in practice, though. If a couple of people
agree, I'll change it to EXCLUDE.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] operator exclusion constraints

2009-11-07 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 EXCLUDE probably flows most nicely with the optional USING clause or
 without. My only complaint was that it's a transitive verb, so it seems
 to impart more meaning than it actually can. I doubt anyone would
 actually be more confused in practice, though. If a couple of people
 agree, I'll change it to EXCLUDE.

EXCLUDE sounds good to me.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] operator exclusion constraints

2009-11-07 Thread Robert Haas
On Sat, Nov 7, 2009 at 1:56 PM, Jeff Davis pg...@j-davis.com wrote:
 On Fri, 2009-11-06 at 21:23 -0500, Tom Lane wrote:
 Or maybe forget about it and go to EXCLUDE or EXCLUDING?

 I left it as EXCLUSION for now. EXCLUDING USING ... and EXCLUSIVE
 USING ... both sound a little awkward to me. Either could be improved
 by moving the USING clause around, but that just creates more grammar
 headaches.

 EXCLUDE probably flows most nicely with the optional USING clause or
 without. My only complaint was that it's a transitive verb, so it seems
 to impart more meaning than it actually can. I doubt anyone would
 actually be more confused in practice, though. If a couple of people
 agree, I'll change it to EXCLUDE.

Personally, I think that this is all rather a matter of opinion, and
of course bikeshedding.  CHECK is a verb, which might suggest that
EXCLUDE is the best choice, and it has a nice declarative sound to it.
 But the other example is FOREIGN KEY, which is not a verb at all,
which seems to me to more closely parallel EXCLUSION or perhaps
EXCLUDING.  I think I like EXCLUSIVE the least of the four, but at the
end of the day, I don't think we can really go far wrong.

I also don't think there's anything wrong with EXCLUDING USING, nor
anything more wrong EXCLUSIVE USING than there is with EXCLUSIVE
alone.  Nor do I think there's any problem with EXCLUDE being
transitive because, of course, we're going to follow it with a
description of what we want to exclude, which may be thought of as its
direct object.  Once again, I don't think we can go far wrong.

Honestly, I'd probably be in favor of breaking the virtual tie in
favor of whichever word is already a keyword, rather than trying to
decide on (IMHO extremely tenuous) grammatical grounds.  But I can't
get worked up about that one way or the other either.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] operator exclusion constraints

2009-11-07 Thread David E. Wheeler

On Nov 7, 2009, at 11:08 AM, Tom Lane wrote:


EXCLUDE probably flows most nicely with the optional USING clause or
without. My only complaint was that it's a transitive verb, so it  
seems

to impart more meaning than it actually can. I doubt anyone would
actually be more confused in practice, though. If a couple of people
agree, I'll change it to EXCLUDE.


EXCLUDE sounds good to me.


+1

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] operator exclusion constraints

2009-11-07 Thread Jeff Davis
On Sat, 2009-11-07 at 14:11 -0500, Robert Haas wrote:
 Honestly, I'd probably be in favor of breaking the virtual tie in
 favor of whichever word is already a keyword

The ones that are already keywords are EXCLUSIVE and EXCLUDING, which
are also the least desirable, so that rule doesn't work as a
tie-breaker.

I think that EXCLUSION and EXCLUDE are the options still in the running
here.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Specific names for plpgsql variable-resolution control options?

2009-11-07 Thread Sergio A. Kessler
hi tom, sorry for the out-of-the-blue email (I'm not on the list)...

On Nov 6, 2009, at 12:21 PM, Tom Lane wrote:

 I believe we had consensus that plpgsql should offer the following
 three
 behaviors when a name in a SQL query could refer to either a plpgsql
 variable or a column from a table of the query:
   * prefer the plpgsql variable (plpgsql's historical behavior)
   * prefer the table column (Oracle-compatible)
   * throw error for the ambiguity (to become the factory default)
 and that we wanted a way for users to select one of these behaviors
 at the
 per-function level, plus provide a SUSET GUC to determine the default
 behavior when there is not a specification in the function text.

 What we did not have was any concrete suggestions for the name or
 values of the GUC, nor for the exact per-function syntax beyond the
 thought that it could look something like the existing '#option dump'
 modifier.

 The code is now there and ready to go, so I need a decision on these
 user-visible names in order to proceed.  Anyone have ideas?

is this become configurable somehow,
how would I know that my code work as expected when I distribute my code ?

one option is to put
foo_variable_conflict = error
throughout the code, which can be thousands of lines, which is not
nice just to be sure my code works as expected no matter what...
(setting a general GUC can interfere with another code, which presumes
different things)

and moreover, is a burden for postgresql that should be supporting
'foo_variable_conflict' in the foreseeable  future...

IMO, postgres should stick with one option (+1 for error) and be done
with this, just one simple rule to rule them all...
and with this, there is no need to band-aid the code just in case...

regards,
/sergio

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] operator exclusion constraints

2009-11-07 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 On Sat, 2009-11-07 at 14:11 -0500, Robert Haas wrote:
 Honestly, I'd probably be in favor of breaking the virtual tie in
 favor of whichever word is already a keyword

 The ones that are already keywords are EXCLUSIVE and EXCLUDING, which
 are also the least desirable, so that rule doesn't work as a
 tie-breaker.

I think it doesn't really matter now that we've succeeded in making the
keyword unreserved.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Specific names for plpgsql variable-resolution control options?

2009-11-07 Thread Tom Lane
Sergio A. Kessler sergiokess...@gmail.com writes:
 On Nov 6, 2009, at 12:21 PM, Tom Lane wrote:
 I believe we had consensus that plpgsql should offer the following
 three
 behaviors when a name in a SQL query could refer to either a plpgsql
 variable or a column from a table of the query:
 * prefer the plpgsql variable (plpgsql's historical behavior)
 * prefer the table column (Oracle-compatible)
 * throw error for the ambiguity (to become the factory default)
 and that we wanted a way for users to select one of these behaviors
 at the
 per-function level, plus provide a SUSET GUC to determine the default
 behavior when there is not a specification in the function text.

 is this become configurable somehow,
 how would I know that my code work as expected when I distribute my code ?

If you're sufficiently worried about that, you can put the
about-to-be-selected option syntax at the start of every function.
Bear in mind though that there are many many ways for unexpected
environmental settings to break functions (search_path being one
of the more obvious ones); I'm not sure this one is any worse than
the rest.  Especially not if you test under the default 'raise error
on conflict' setting.  I think the other two values will mainly be
useful for legacy code of one persuasion or the other.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers