[HACKERS] Re: Vacuum of newly activated 8.3.12 standby receives warnings page xxx is uninitialized --- fixing

2010-12-30 Thread Mark Kirkwood

Well, it is none of the things I considered.

The problem seems to be due to use of --delete in the base backup 
rsync (see diff attached).  In fact I can now reproduce the 
uninitialized pages using the bare bones method:


primary:
$ grep archive_command postgresql.conf
 archive_command = 'rsync %p standby:/var/lib/postgresql/archive'
$ pgbench -c 4 -t 20 bench
(wait for approx 1 transactions)

standby:
$ psql -h primary -c SELECT pg_start_backup('backup');
$ rsync --exclude pg_xlog/\* --exclude postmaster.pid --delete 
--exclude=backup_label \

primary:/var/lib/postgresql/8.3/main/* \
/var/lib/postgresql/8.3/main
$ psql -h primary -c SELECT pg_stop_backup();

$ grep restore_command recovery.conf
restore_command = '/usr/lib/postgresql/8.3/bin/pg_standby -t 
/tmp/trigger.5432 /var/lib/postgresql/archive %f %p %r'

$ /etc/init.d/postgresql-8.3 start
(wait for approx 14 transactions)
$ touch /tmp/trigger.5432

Removing the offending

--delete --exclude=backup_label

options from the base backup step makes everything work properly again.

I'd be interested to know if the other folks getting these warnings were 
using unusual rsync options either during backup or for archiving.


regards

Mark

On 30/12/10 13:32, Mark Kirkwood wrote:


I'm frankly puzzled about what Pitrtools is doing that is different - 
I only noticed it using rsync compression (-z) and doing rsync backups 
via pulling from the standby rather than pushing from the primary (I'm 
in the process of trying these variations out in the bare bones case). 
Just as I'm writing this I see Pitrtools rsync's pg_xlog - I wonder if 
there is/are timing issues which mean that recovery might use some 
(corrupted) logs from there before the (clean) archived ones arrive 
(will check).




*** cmd_standby.orig	Tue Dec 28 21:10:31 2010
--- cmd_standby	Thu Dec 30 05:20:04 2010
***
*** 175,181 
  
  if debug == on:
 ssh_flags = -vvv -o ConnectTimeout=%s -o StrictHostKeyChecking=no % (str(ssh_timeout))
!rsync_flags = -avzl --delete --stats --exclude=backup_label
 pg_standby_flags = -s5 -w0 -d -c
 if pgversion == '8.2':
   pg_standby_args = %%f %%p -k%s % (float(numarchives))
--- 175,181 
  
  if debug == on:
 ssh_flags = -vvv -o ConnectTimeout=%s -o StrictHostKeyChecking=no % (str(ssh_timeout))
!rsync_flags = -a 
 pg_standby_flags = -s5 -w0 -d -c
 if pgversion == '8.2':
   pg_standby_args = %%f %%p -k%s % (float(numarchives))
***
*** 184,190 

  else:
 ssh_flags = -o ConnectTimeout=%s -o StrictHostKeyChecking=no % (str(ssh_timeout))
!rsync_flags = -azl --delete --exclude=backup_label
 pg_standby_flags = -s5 -w0 -c
 if pgversion == '8.2':
   pg_standby_args = %%f %%p -k%s % (float(numarchives))
--- 184,190 

  else:
 ssh_flags = -o ConnectTimeout=%s -o StrictHostKeyChecking=no % (str(ssh_timeout))
!rsync_flags = -azl 
 pg_standby_flags = -s5 -w0 -c
 if pgversion == '8.2':
   pg_standby_args = %%f %%p -k%s % (float(numarchives))

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_streamrecv for 9.1?

2010-12-30 Thread Magnus Hagander
On Wed, Dec 29, 2010 at 22:30, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 Magnus Hagander mag...@hagander.net writes:
 Would people be interested in putting pg_streamrecv
 (http://github.com/mhagander/pg_streamrecv) in bin/ or contrib/ for
 9.1? I think it would make sense to do so.

 +1 for having that in core, only available for the roles WITH
 REPLICATION I suppose?

Yes.

Well, anybody who wants can run it, but they need those permissions on
the server to make it work. pg_streamrecv is entirely a client app.


 I think that the base backup feature is more important than simple streaming
 chunks of the WAL (SR already does this). Talking about the base backup over
 libpq, it is something we should implement to fulfill people's desire that
 claim an easy replication setup.

 Yes, definitely. But that also needs server side support.

 Yeah, but it's already in core for 9.1, we have pg_read_binary_file()
 there. We could propose a contrib module for previous version
 implementing the function in C, that should be pretty easy to code.

Oh. I didn't actually think about that one. So yeah, we could use that
- making it easy to code. However, I wonder how much less efficient it
would be than being able to stream the base backup. It's going to be a
*lot* more roundtrips across the network, and we're also going to
open/close the files a lot more.

Also, I haven't tested it, but a quick look at the code makes me
wonder how it will actually work with tablespaces - it seems to only
allow files under PGDATA? That could of course be changed..


  The only reason I didn't do that for pg_basebackup is that I wanted a
  self-contained python script, so that offering a public git repo is
  all I needed as far as distributing the tool goes.

Right, there's an advantage with that when it comes to being able to
work on old versions.


 Yeah, the WIP patch heikki posted is simliar, except it uses tar
 format and is implemented natively in the backend with no need for
 pl/pythonu to be installed.

 As of HEAD the dependency on pl/whatever is easily removed.

 The included C tool would need to have a parallel option from the get-go
 if at all possible, but if you look at the pg_basebackup prototype, it
 would be good to drop the wrong pg_xlog support in there and rely on a
 proper archiving setup on the master.

 Do you want to work on internal archive and restore commands over libpq
 in the same effort too?  I think this tool should be either a one time
 client or a daemon with support for:

Definitely a one-time client. If you want it to be a deamon, you write
a small wrapper that makes it one :)


  - running a base backup when receiving a signal
  - continuous WAL streaming from a master

Yes.

  - accepting standby connections and streaming to them

I see that as a separate tool, I think. But still a useful one, sure.

  - one-time libpq streaming of a WAL file, either way

Hmm. That might be interesting, yes.


 Maybe we don't need to daemonize the tool from the get-go, but if you're
 going parallel for the base-backup case you're almost there, aren't you?
 Also having internal commands for archive and restore commands that rely
 on this daemon running would be great too.

I don't want anything *relying* on this tool. I want to keep the
current way where you can choose whatever you prefer - I just want us
to ship a good default tool.


 I'd offer more help if it wasn't for finishing the extension patches,

:-) Yeah, focus on that, please - don't want to get it stalled.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_streamrecv for 9.1?

2010-12-30 Thread Magnus Hagander
On Wed, Dec 29, 2010 at 20:19, Gurjeet Singh singh.gurj...@gmail.com wrote:
 On Wed, Dec 29, 2010 at 1:42 PM, Robert Haas robertmh...@gmail.com wrote:

 On Dec 29, 2010, at 1:01 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Is it really stable enough for bin/?  My impression of the state of
  affairs is that there is nothing whatsoever about replication that
  is really stable yet.

 Well, that's not stopping us from shipping a core feature called
 replication.  I'll defer to others on how mature pg_streamrecv is, but if
 it's no worse than replication in general I think putting it in bin/ is the
 right thing to do.

 As the README says that is not self-contained (for no fault of its own) and
 one should typically set archive_command to guarantee zero WAL loss.

Yes. Though you can combine it fine with wal_keep_segments if you
think that's safe - but archive_command is push and this tool is pull,
so if your backup server goes down for a while, pg_streamrecv will get
a gap and fail. Whereas if you configure an archive_command, it will
queue up the log on the master if it stops working, up to the point of
shutting it down because of out-of-disk. Which you *want*, if you want
to be really sure about the backups.


 quote
 TODO: Document some ways of setting up an archive_command that works well
 together with pg_streamrecv.
 /quote

     I think implementing just that TODO might make it a candidate.

Well, yes, that's obviously a requirement.

     I have neither used it nor read the code, but if it works as advertised
 then it is definitely a +1 from me; no preference of bin/ or contrib/, since
 the community will have to maintain it anyway.

It's not that much code, but some more eyes on it would always be good!


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_streamrecv for 9.1?

2010-12-30 Thread Magnus Hagander
On Wed, Dec 29, 2010 at 19:42, Robert Haas robertmh...@gmail.com wrote:
 On Dec 29, 2010, at 1:01 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Is it really stable enough for bin/?  My impression of the state of
 affairs is that there is nothing whatsoever about replication that
 is really stable yet.

 Well, that's not stopping us from shipping a core feature called 
 replication.  I'll defer to others on how mature pg_streamrecv is, but if 
 it's no worse than replication in general I think putting it in bin/ is the 
 right thing to do.

It has had less eyes on it, which puts it worse off than general
replication. OTOH, it's a lot simper code, which puts it better.

Either way, as long as it gets those eyes before release if we put it
in, it shouldn't be worse off than general replication.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

2010-12-30 Thread Marko Tiikkaja

On 2010-12-30 9:02 AM +0200, Greg Smith wrote:

Marko Tiikkaja wrote:

I have no idea why it worked in the past, but the patch was never
designed to work for UPSERT.  This has been discussed in the past and
some people thought that that's not a huge deal.


It takes an excessively large lock when doing UPSERT, which means its
performance under a heavy concurrent load can't be good.  The idea is
that if the syntax and general implementation issues can get sorted out,
fixing the locking can be a separate performance improvement to be
implemented later.  Using MERGE for UPSERT is the #1 use case for this
feature by a gigantic margin.  If that doesn't do what's expected, the
whole implementation doesn't provide the community anything really worth
talking about.  That's why I keep hammering on this particular area in
all my testing.


I'm confused.  Are you saying that the patch is supposed to lock the 
table against concurrent INSERT/UPDATE/DELETE/MERGE?  Because I don't 
see it in the patch, and the symptoms you're having are a clear 
indication of the fact that it's not happening.  I also seem to recall 
that people thought locking the table would be excessive.



Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication as a separate permissions

2010-12-30 Thread Magnus Hagander
On Wed, Dec 29, 2010 at 20:12, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Magnus Hagander's message of mié dic 29 11:40:34 -0300 2010:
 On Wed, Dec 29, 2010 at 15:05, Gurjeet Singh singh.gurj...@gmail.com wrote:

  Any specific reason NOREPLICATION_P and REPLICATION_P use the _P suffix?

 Um, I just copied it off a similar entry elsewhere. I saw no comment
 about what _P actually means, and I can't say I know. I know very
 little about the bison files :-)

 Some lexer keywords have a _P prefix because otherwise they'd collide
 with some symbol in Windows header files or something like that.  It's
 old stuff, but I think you, Magnus, were around at that time.

Heh. That doesn't mean I *remember* it :-)

But yes, I see in commit 12c942383296bd626131241c012c2ab81b081738 the
comment convert some keywords.c symbols to KEYWORD_P to prevent
conflict.

Based on that, I should probably change it back, right? I just tried a
patch for it and it compiles and checks just fine with the _P parts
removed.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_streamrecv for 9.1?

2010-12-30 Thread Aidan Van Dyk
On Thu, Dec 30, 2010 at 6:41 AM, Magnus Hagander mag...@hagander.net wrote:

 As the README says that is not self-contained (for no fault of its own) and
 one should typically set archive_command to guarantee zero WAL loss.

 Yes. Though you can combine it fine with wal_keep_segments if you
 think that's safe - but archive_command is push and this tool is pull,
 so if your backup server goes down for a while, pg_streamrecv will get
 a gap and fail. Whereas if you configure an archive_command, it will
 queue up the log on the master if it stops working, up to the point of
 shutting it down because of out-of-disk. Which you *want*, if you want
 to be really sure about the backups.

I was thinking I'ld like use pg_streamrecv to make my archive, and
the archive script on the master would just verify the archive has
that complete segment.

This get's you an archive synced as it's made (as long as streamrecv
is running), and my verifyarchive command would make sure that if
for some reason, the backup archive went down, the wal segments
would be blocked on the master until it's up again and current.

a.



-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Snapshot synchronization, again...

2010-12-30 Thread Joachim Wieland
The snapshot synchronization discussion from the parallel pg_dump
patch somehow ended without a clear way to go forward.

Let me sum up what has been brought up and propose a short- and
longterm solution.

Summary:

Passing snapshot sync information can be done either:

a) by returning complete snapshot information from the backend to the
client so that the client can pass it along to a different backend
b) or by returning only a unique identifier to the client and storing
the actual snapshot data somewhere on the server side

Advantage of a: no memory is used in the backend and no memory needs
to get cleaned up, it is also theoretically possible that we could
forward that data to a hot standby server and do e.g. a dump partially
on the master server and partially on the hot standby server or among
several hot standby servers.
Disadvantage of a: The snapshot must be validated to make sure that
its information is still current, it might be difficult to cover all
cases of this validation. A client can not only access exactly a
published snapshot, but just about any snapshot that fits and passes
the validation checks (this is more a disadvantage than an advantage
because it allows to see a database state that never existed in
reality).

Advantage of b: No validation necessary, as soon as the transaction
that publishes the snapshot loses that snapshot, it will also revoke
the snapshot information (either by removing a temp file or deleting
it from shared memory)
Disadvantage of b: It doesn't allow a snapshot to be installed on a
different server. It requires a serializable open transaction to hold
the snapshot.

What I am proposing now is the following:

We return snapshot information as a chunk of data to the client. At
the same time however, we set a checksum in shared memory to protect
against modification of the snapshot. A publishing backend can revoke
its snapshot by deleting the checksum and a backend that is asked to
install a snapshot can verify that the snapshot is correct and current
by calculating the checksum and comparing it with the one in shared
memory.

This only costs us a few bytes for the checksum * max_connection in
shared memory and apart from resetting the checksum it does not have
cleanup and verification issues. Note that we are also free to change
the internal format of the chunk of data we return whenever we like,
so we are free to enhance this feature in the future, transparently to
the client.


Thoughts?


Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_streamrecv for 9.1?

2010-12-30 Thread Magnus Hagander
On Thu, Dec 30, 2010 at 13:30, Aidan Van Dyk ai...@highrise.ca wrote:
 On Thu, Dec 30, 2010 at 6:41 AM, Magnus Hagander mag...@hagander.net wrote:

 As the README says that is not self-contained (for no fault of its own) and
 one should typically set archive_command to guarantee zero WAL loss.

 Yes. Though you can combine it fine with wal_keep_segments if you
 think that's safe - but archive_command is push and this tool is pull,
 so if your backup server goes down for a while, pg_streamrecv will get
 a gap and fail. Whereas if you configure an archive_command, it will
 queue up the log on the master if it stops working, up to the point of
 shutting it down because of out-of-disk. Which you *want*, if you want
 to be really sure about the backups.

 I was thinking I'ld like use pg_streamrecv to make my archive, and
 the archive script on the master would just verify the archive has
 that complete segment.

 This get's you an archive synced as it's made (as long as streamrecv
 is running), and my verifyarchive command would make sure that if
 for some reason, the backup archive went down, the wal segments
 would be blocked on the master until it's up again and current.

That's exactly the method I was envisionning, and in fact that I am
using in a couple of cases - jus thaven't documented it properly :)

Since pg_streamrecv only moves a segment into the correct archive
location when it's completed, the archive_command only needs to check
if the file *exists* - if it does, it's transferred, if not, it
returns an error to make sure the wal segments don't get cleaned out.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Vacuum of newly activated 8.3.12 standby receives warnings page xxx is uninitialized --- fixing

2010-12-30 Thread Robert Haas
On Thu, Dec 30, 2010 at 3:55 AM, Mark Kirkwood
mark.kirkw...@catalyst.net.nz wrote:
 Well, it is none of the things I considered.

 The problem seems to be due to use of --delete in the base backup rsync
 (see diff attached).  In fact I can now reproduce the uninitialized pages
 using the bare bones method:

Any time a relation is extended, we end up with a page of all zeros at
the end until the updated page is written out, which often doesn't
happen until the next checkpoint.  So it doesn't seem too mysterious
that you could end up with all zeroes pages on the standby initially,
but WAL replay ought to fix that.  I suppose the reason it isn't is
because you've excluded the backup label, so recovery will begin from
the wrong place.  Unless I'm missing something, that seems like a
really bad idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Old git repo

2010-12-30 Thread Magnus Hagander
Hi!

Are we ready to drop the old git mirror? The one that's still around
(as postgresql-old.git) from before we migrated the main repository to
git, and thus has the old hashes around.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication as a separate permissions

2010-12-30 Thread Alvaro Herrera
Excerpts from Magnus Hagander's message of jue dic 30 08:57:09 -0300 2010:
 On Wed, Dec 29, 2010 at 20:12, Alvaro Herrera
 alvhe...@commandprompt.com wrote:

  Some lexer keywords have a _P prefix because otherwise they'd collide
  with some symbol in Windows header files or something like that.  It's
  old stuff, but I think you, Magnus, were around at that time.
 
 Heh. That doesn't mean I *remember* it :-)

:-)

 But yes, I see in commit 12c942383296bd626131241c012c2ab81b081738 the
 comment convert some keywords.c symbols to KEYWORD_P to prevent
 conflict.

Wow, what a mess of a patch ... nowadays this would be like 10 commits
(or so I hope) ... hey, did Bruce sabotage the qnx4 port surreptitiously?

 Based on that, I should probably change it back, right? I just tried a
 patch for it and it compiles and checks just fine with the _P parts
 removed.

Hmm, I wouldn't bother really.  It's not that important anyway, IMHO.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Function for dealing with xlog data

2010-12-30 Thread Magnus Hagander
On Tue, Dec 28, 2010 at 16:30, Tom Lane t...@sss.pgh.pa.us wrote:
 Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Magnus Hagander's message of mar dic 28 10:46:31 -0300 2010:
 Well, yeah, that was obvious ;) The question is, how much do we prefer
 the more elegant method? ;)

 If we go the new type route, do we need it to have an implicit cast to
 text, for backwards compatibility?

 I'd argue not.  Probably all existing uses are just selecting the
 function value.  What comes back to the client will just be the text
 form anyway.

That's certainly the only thing I've seen.


 I'm of the opinion that a new type isn't worth the work, myself,
 but it would mostly be up to whoever was doing the work.

Fair enough - at least enough people have said it won't be rejected
because it's done as a function rather than a datatype - so that seems
like the easiest way to proceed.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Old git repo

2010-12-30 Thread Robert Haas
On Thu, Dec 30, 2010 at 8:31 AM, Magnus Hagander mag...@hagander.net wrote:
 Are we ready to drop the old git mirror? The one that's still around
 (as postgresql-old.git) from before we migrated the main repository to
 git, and thus has the old hashes around.

I see no reason to drop that ever, or at least not any time soon.
What is it costing us?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Old git repo

2010-12-30 Thread Magnus Hagander
On Thu, Dec 30, 2010 at 15:28, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Dec 30, 2010 at 8:31 AM, Magnus Hagander mag...@hagander.net wrote:
 Are we ready to drop the old git mirror? The one that's still around
 (as postgresql-old.git) from before we migrated the main repository to
 git, and thus has the old hashes around.

 I see no reason to drop that ever, or at least not any time soon.
 What is it costing us?

Some disk space, so almost nothing. And the potential that people grab
it by mistake - it adds a bit to confusion.

Looking at it from the other side, what's the use-case for keeping it?
If you want to diff against it or something like that, you can just
do that against your local clone (that you already had - if you
didn't, you shouldn't be using it at all)...


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

2010-12-30 Thread Andrew Dunstan



On 12/30/2010 02:02 AM, Greg Smith wrote:

Marko Tiikkaja wrote:
I have no idea why it worked in the past, but the patch was never 
designed to work for UPSERT.  This has been discussed in the past and 
some people thought that that's not a huge deal.


It takes an excessively large lock when doing UPSERT, which means its 
performance under a heavy concurrent load can't be good.  The idea is 
that if the syntax and general implementation issues can get sorted 
out, fixing the locking can be a separate performance improvement to 
be implemented later.  Using MERGE for UPSERT is the #1 use case for 
this feature by a gigantic margin.  If that doesn't do what's 
expected, the whole implementation doesn't provide the community 
anything really worth talking about.  That's why I keep hammering on 
this particular area in all my testing.


One of the reflexive I can't switch to PostgreSQL easily stopping 
points for MySQL users is I can't convert my ON DUPLICATE KEY UPDATE 
code.  Every other use for MERGE is a helpful side-effect of adding 
the implementation in my mind, but not the primary driver of why this 
is important.  My hints in this direction before didn't get adopted, 
so I'm saying it outright now:  this patch must have an UPSERT 
implementation in its regression tests.  And the first thing I'm going 
to do every time a new rev comes in is try and break it with the 
pgbench test I attached.  If Boxuan can start doing that as part of 
his own testing, I think development here might start moving forward 
faster.  I don't care so much about the rate at which concurrent 
UPSERT-style MERGE happens, so long as it doesn't crash.  But that's 
where this has been stuck at for a while now.


I strongly agree. It *is* a huge deal.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2010-12-30 Thread Alvaro Herrera
Excerpts from Joachim Wieland's message of jue dic 30 09:31:47 -0300 2010:

 Advantage of b: No validation necessary, as soon as the transaction
 that publishes the snapshot loses that snapshot, it will also revoke
 the snapshot information (either by removing a temp file or deleting
 it from shared memory)
 Disadvantage of b: It doesn't allow a snapshot to be installed on a
 different server. It requires a serializable open transaction to hold
 the snapshot.

Why does it require a serializable transaction?  You could simply
register the snapshot in any transaction.  (Of course, the net effect
would be pretty similar to a serializable transaction).

 We return snapshot information as a chunk of data to the client. At
 the same time however, we set a checksum in shared memory to protect
 against modification of the snapshot. A publishing backend can revoke
 its snapshot by deleting the checksum and a backend that is asked to
 install a snapshot can verify that the snapshot is correct and current
 by calculating the checksum and comparing it with the one in shared
 memory.
 
 This only costs us a few bytes for the checksum * max_connection in
 shared memory and apart from resetting the checksum it does not have
 cleanup and verification issues.

So one registered snapshot per transaction?  Sounds a reasonable
limitation (I doubt there's a use case for more than that, anyway).

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] estimating # of distinct values

2010-12-30 Thread Florian Pflug
On Dec27, 2010, at 23:49 , Kevin Grittner wrote:
 Robert Haas robertmh...@gmail.com wrote:
 
 With respect to (b), I think I'd need to see a much more detailed
 design for how you intend to make this work.  Off the top of my
 head there seems to be some pretty serious feasibility problems.
 
 I had one random thought on that -- it seemed like a large concern
 was that there would need to be at least an occasional scan of the
 entire table to rebuild the distinct value information.

I believe we could actually avoid that.

First, the paper An Optimal Algorithm for the Distinct Elements Problem
actually contains an algorithm with *does* handle deletes - it's called
L_0 estimate there.

Second, as Tomas pointed out, the stream-based estimator is essentially a
simplified version of a bloom filter. It starts out with a field of
N zero bits, and sets K of them to 1 for each value v in the stream.
Which bits are set to 1 depends on some hash function(s) H_i(v). It's
then easy to compute how many 1-bits you'd expect to find in the bit
field after seeing D distinct values, and by reversing that you can
estimate D from the number of 1-bits in the bit field.

To avoid having to rescan large tables, instead of storing one such
bit field, we'd store one per B pages of data. We'd then only need
to scan a range of B pages around every updated or deleted tuple,
and could afterwards compute a new global estimate of D by combining
the individual bit fields with bitwise and.

Since the need to regularly VACUUM tables hit by updated or deleted
won't go away any time soon, we could piggy-back the bit field
rebuilding onto VACUUM to avoid a second scan.

A good value for B would probably be around
1000*size of bitfield/page size. If the bitfield needs ~100k, that'd
make B ~= 12000 pages ~= 100MB.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2010-12-30 Thread Florian Pflug
On Dec30, 2010, at 13:31 , Joachim Wieland wrote:
 We return snapshot information as a chunk of data to the client. At
 the same time however, we set a checksum in shared memory to protect
 against modification of the snapshot. A publishing backend can revoke
 its snapshot by deleting the checksum and a backend that is asked to
 install a snapshot can verify that the snapshot is correct and current
 by calculating the checksum and comparing it with the one in shared
 memory.

We'd still have to stream these checksums to the standbys though,
or would they be exempt from the checksum checks?

I still wonder whether these checks are worth the complexity. I
believe we'd only allow snapshot modifications for read-only queries
anyway, so what point is there in preventing clients from setting
broken snapshots?

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Old git repo

2010-12-30 Thread Robert Haas
On Thu, Dec 30, 2010 at 9:30 AM, Magnus Hagander mag...@hagander.net wrote:
 On Thu, Dec 30, 2010 at 15:28, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Dec 30, 2010 at 8:31 AM, Magnus Hagander mag...@hagander.net wrote:
 Are we ready to drop the old git mirror? The one that's still around
 (as postgresql-old.git) from before we migrated the main repository to
 git, and thus has the old hashes around.

 I see no reason to drop that ever, or at least not any time soon.
 What is it costing us?

 Some disk space, so almost nothing. And the potential that people grab
 it by mistake - it adds a bit to confusion.

Well if it's clearly labeled old I don't think it should confuse
anyone much.  You could even tack one more commit on there adding a
README file with a big ol' warning.

 Looking at it from the other side, what's the use-case for keeping it?
 If you want to diff against it or something like that, you can just
 do that against your local clone (that you already had - if you
 didn't, you shouldn't be using it at all)...

I realize it's not as official as the CVS repository was, but I
still think we ought to hold onto it for a year or two.  Maybe no one
will ever look at it again, but I'm not prepared to bet on that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication as a separate permissions

2010-12-30 Thread Peter Eisentraut
On tor, 2010-12-23 at 17:29 -0500, Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
  On 12/23/10 2:21 PM, Tom Lane wrote:
  Well, that's one laudable goal here, but secure by default is another
  one that ought to be taken into consideration.
 
  I don't see how *not* granting the superuser replication permissions
  makes things more secure.  The superuser can grant replication
  permissions to itself, so why is suspending them by default beneficial?
   I'm not following your logic here.
 
 Well, the reverse of that is just as true: if we ship it without
 replication permissions on the postgres user, people can change that if
 they'd rather not create a separate role for replication.  But I think
 we should encourage people to NOT do it that way.  Setting it up that
 way by default hardly encourages use of a more secure arrangement.

I think this argument is a bit inconsistent in the extreme.  You might
as well argue that a superuser shouldn't have any permissions by
default, to discourage users from using it.  They can always grant
permissions back to it.  I don't see why this particular one is so
different.

If we go down this road, we'll end up with a mess of permissions that a
superuser has and doesn't have.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication as a separate permissions

2010-12-30 Thread Peter Eisentraut
On ons, 2010-12-29 at 11:09 +0100, Magnus Hagander wrote:
 I've applied this version (with some minor typo-fixes).

This page is now somewhat invalidated:

http://developer.postgresql.org/pgdocs/postgres/role-attributes.html

First, it doesn't mention the replication privilege, and second it
continues to claim that superuser status bypasses all permission checks.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication as a separate permissions

2010-12-30 Thread Robert Haas
On Thu, Dec 30, 2010 at 9:54 AM, Peter Eisentraut pete...@gmx.net wrote:
 On tor, 2010-12-23 at 17:29 -0500, Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
  On 12/23/10 2:21 PM, Tom Lane wrote:
  Well, that's one laudable goal here, but secure by default is another
  one that ought to be taken into consideration.

  I don't see how *not* granting the superuser replication permissions
  makes things more secure.  The superuser can grant replication
  permissions to itself, so why is suspending them by default beneficial?
   I'm not following your logic here.

 Well, the reverse of that is just as true: if we ship it without
 replication permissions on the postgres user, people can change that if
 they'd rather not create a separate role for replication.  But I think
 we should encourage people to NOT do it that way.  Setting it up that
 way by default hardly encourages use of a more secure arrangement.

 I think this argument is a bit inconsistent in the extreme.  You might
 as well argue that a superuser shouldn't have any permissions by
 default, to discourage users from using it.  They can always grant
 permissions back to it.  I don't see why this particular one is so
 different.

 If we go down this road, we'll end up with a mess of permissions that a
 superuser has and doesn't have.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SLRU API tweak

2010-12-30 Thread Alvaro Herrera
Excerpts from Kevin Grittner's message of mié dic 29 20:46:55 -0300 2010:
 Attached is a small patch to avoid putting an opaque structure into
 the slru.h file and using it in an external function call where
 external callers must always specify NULL.

Thanks, committed.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


RIGHT/FULL OUTER hash joins (was Re: [HACKERS] small table left outer join big table)

2010-12-30 Thread Tom Lane
I had an epiphany about this topic, or actually two of them.

1. Whether or not you think there's a significant performance reason
to support hash right joins, there's a functionality reason.  The
infrastructure for right join could just as easily do full joins.
And AFAICS, a hash full join would only require one hashable join
clause --- the other FULL JOIN ON conditions could be anything at all.
This is unlike the situation for merge join, where all the JOIN ON
conditions have to be mergeable or it doesn't work right.  So we could
greatly reduce the scope of the dreaded FULL JOIN is only supported
with merge-joinable join conditions error.  (Well, okay, it's not
*that* dreaded, but people complain about it occasionally.)

2. The obvious way to implement this would involve adding an extra bool
field to struct HashJoinTupleData.  The difficulty with that, and the
reason I'd been resistant to the whole idea, is that it'd eat up a full
word per hashtable entry because of alignment considerations.  (On
64-bit machines it'd be free because of alignment considerations, but
that's cold comfort when 32-bit machines are the ones pressed for
address space.)  But we only need one bit, so what about commandeering
an infomask bit in the tuple itself?  For the initial implementation
I'd be inclined to take one of the free bits in t_infomask2.  We could
actually get away with overlaying the flag bit with one of the tuple
visibility bits, since it will only be used in tuples that are in the
in-memory hash table, which don't need visibility info anymore.  But
that seems like a kluge that could wait until we really need the flag
space.

Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] and it's not a bunny rabbit, either

2010-12-30 Thread Robert Haas
On Wed, Dec 29, 2010 at 5:14 PM, David Fetter da...@fetter.org wrote:
 On Wed, Dec 29, 2010 at 04:53:47PM -0500, Robert Haas wrote:
 On Wed, Dec 29, 2010 at 4:09 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
  On 29.12.2010 06:54, Robert Haas wrote:
 
   With the patch:
 
  rhaas=# cluster v;
  ERROR:  views do not support CLUSTER
 
  do not support sounds like a missing feature, rather than a nonsensical
  command. How about something like CLUSTER cannot be used on views

 In the latest version of this patch, I created four translatable
 strings per object type:

 blats do not support %s (where %s is an SQL command)
 blats do not support constraints
 blats do not support rules
 blats do not support triggers

 It's reasonable enough to write CLUSTER cannot be used on views, but
 does constraints cannot be used on views seems more awkward to me.
 Or do we think that's OK?

 That particular one looks good insofar as it describes reality.  With
 predicate locks, etc., it may become untrue later, though :)

After further thought, I think it makes sense to change this around a
bit and create a family of functions that can be invoked like this:

void check_relation_for_FEATURE_support(Relation rel);

...where FEATURE is constraint, trigger, rule, index, etc.  The
function will be defined to throw an error if the relation isn't of a
type that can support the named feature.  The error message will be of
the form:

constraints can only be used on tables
triggers can be used only on tables and views
etc.

This avoids the need to define a separate error message for each
unsupported relkind, and I think it's just as informative as, e.g.,
constraints cannot be used on whatever object type you tried to
invoke it on.  We can adopt the same language for commands, e.g.:
CLUSTER can only be used on tables.

Comments?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: RIGHT/FULL OUTER hash joins (was Re: [HACKERS] small table left outer join big table)

2010-12-30 Thread Robert Haas
On Thu, Dec 30, 2010 at 10:45 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I had an epiphany about this topic, or actually two of them.

 1. Whether or not you think there's a significant performance reason
 to support hash right joins, there's a functionality reason.  The
 infrastructure for right join could just as easily do full joins.
 And AFAICS, a hash full join would only require one hashable join
 clause --- the other FULL JOIN ON conditions could be anything at all.
 This is unlike the situation for merge join, where all the JOIN ON
 conditions have to be mergeable or it doesn't work right.  So we could
 greatly reduce the scope of the dreaded FULL JOIN is only supported
 with merge-joinable join conditions error.  (Well, okay, it's not
 *that* dreaded, but people complain about it occasionally.)

Yeah, that would be neat.  It might be a lot faster in some cases, too.

 2. The obvious way to implement this would involve adding an extra bool
 field to struct HashJoinTupleData.  The difficulty with that, and the
 reason I'd been resistant to the whole idea, is that it'd eat up a full
 word per hashtable entry because of alignment considerations.  (On
 64-bit machines it'd be free because of alignment considerations, but
 that's cold comfort when 32-bit machines are the ones pressed for
 address space.)  But we only need one bit, so what about commandeering
 an infomask bit in the tuple itself?  For the initial implementation
 I'd be inclined to take one of the free bits in t_infomask2.  We could
 actually get away with overlaying the flag bit with one of the tuple
 visibility bits, since it will only be used in tuples that are in the
 in-memory hash table, which don't need visibility info anymore.  But
 that seems like a kluge that could wait until we really need the flag
 space.

I think that's a reasonable approach, although I might be inclined to
do the overlay sooner rather than later if it doesn't add too much
complexity.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication as a separate permissions

2010-12-30 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 But yes, I see in commit 12c942383296bd626131241c012c2ab81b081738 the
 comment convert some keywords.c symbols to KEYWORD_P to prevent
 conflict.

 Based on that, I should probably change it back, right? I just tried a
 patch for it and it compiles and checks just fine with the _P parts
 removed.

I'd leave it be, it's fine as-is.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] and it's not a bunny rabbit, either

2010-12-30 Thread Alvaro Herrera
Excerpts from Robert Haas's message of jue dic 30 12:47:42 -0300 2010:

 After further thought, I think it makes sense to change this around a
 bit and create a family of functions that can be invoked like this:
 
 void check_relation_for_FEATURE_support(Relation rel);
 
 ...where FEATURE is constraint, trigger, rule, index, etc.  The
 function will be defined to throw an error if the relation isn't of a
 type that can support the named feature.  The error message will be of
 the form:
 
 constraints can only be used on tables
 triggers can be used only on tables and views
 etc.

So this will create a combinatorial explosion of strings to translate?
I liked the other idea because the number of translatable strings was
kept within reasonable bounds.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Old git repo

2010-12-30 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Dec 30, 2010 at 9:30 AM, Magnus Hagander mag...@hagander.net wrote:
 On Thu, Dec 30, 2010 at 15:28, Robert Haas robertmh...@gmail.com wrote:
 I see no reason to drop that ever, or at least not any time soon.
 What is it costing us?

 Some disk space, so almost nothing. And the potential that people grab
 it by mistake - it adds a bit to confusion.

 I realize it's not as official as the CVS repository was, but I
 still think we ought to hold onto it for a year or two.  Maybe no one
 will ever look at it again, but I'm not prepared to bet on that.

I'm with Magnus on this: the risk of confusion seems to greatly
outweigh any possible benefit from keeping it.  There is no reason for
anyone to use that old repo unless they are still working with a local
clone of it, and even if they do have a local clone, such a clone is
self-sufficient.  And more to the point, it seems quite unlikely that
anyone is still working with such a clone rather than having rebased
by now.

We should wait a week or so to see if anyone does pipe up and say they
still use that repo; but in the absence of such feedback, it should go.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] and it's not a bunny rabbit, either

2010-12-30 Thread Robert Haas
On Thu, Dec 30, 2010 at 11:00 AM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Robert Haas's message of jue dic 30 12:47:42 -0300 2010:

 After further thought, I think it makes sense to change this around a
 bit and create a family of functions that can be invoked like this:

 void check_relation_for_FEATURE_support(Relation rel);

 ...where FEATURE is constraint, trigger, rule, index, etc.  The
 function will be defined to throw an error if the relation isn't of a
 type that can support the named feature.  The error message will be of
 the form:

 constraints can only be used on tables
 triggers can be used only on tables and views
 etc.

 So this will create a combinatorial explosion of strings to translate?
 I liked the other idea because the number of translatable strings was
 kept within reasonable bounds.

No, quite the opposite.  With the other approach, you needed:

constraints cannot be used on views
constraints cannot be used on composite types
constraints cannot be used on TOAST tables
constraints cannot be used on indexes
constraints cannot be used on foreign tables

With this, you just need:

constraints can only be used on tables

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Old git repo

2010-12-30 Thread Robert Haas
On Thu, Dec 30, 2010 at 11:02 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Dec 30, 2010 at 9:30 AM, Magnus Hagander mag...@hagander.net wrote:
 On Thu, Dec 30, 2010 at 15:28, Robert Haas robertmh...@gmail.com wrote:
 I see no reason to drop that ever, or at least not any time soon.
 What is it costing us?

 Some disk space, so almost nothing. And the potential that people grab
 it by mistake - it adds a bit to confusion.

 I realize it's not as official as the CVS repository was, but I
 still think we ought to hold onto it for a year or two.  Maybe no one
 will ever look at it again, but I'm not prepared to bet on that.

 I'm with Magnus on this: the risk of confusion seems to greatly
 outweigh any possible benefit from keeping it.  There is no reason for
 anyone to use that old repo unless they are still working with a local
 clone of it, and even if they do have a local clone, such a clone is
 self-sufficient.  And more to the point, it seems quite unlikely that
 anyone is still working with such a clone rather than having rebased
 by now.

 We should wait a week or so to see if anyone does pipe up and say they
 still use that repo; but in the absence of such feedback, it should go.

Well, I still have at least on repo against the old respository, which
is why I mentioned it.  Maybe there's nothing valuable in there and
maybe I don't need the origin anyway, but I haven't bothered to check
it over carefully yet because, well, there's no rush to clean up my
old repositories, and there is a rush to finish 9.1 development real
soon now.  I can, of course, carve out time to deal with it, but I
think that it's a poor use of time and that the risk of confusion that
you and Magnus are postulating is mostly hypothetical.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: RIGHT/FULL OUTER hash joins (was Re: [HACKERS] small table left outer join big table)

2010-12-30 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Dec 30, 2010 at 10:45 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 ... But we only need one bit, so what about commandeering
 an infomask bit in the tuple itself?  For the initial implementation
 I'd be inclined to take one of the free bits in t_infomask2.  We could
 actually get away with overlaying the flag bit with one of the tuple
 visibility bits, since it will only be used in tuples that are in the
 in-memory hash table, which don't need visibility info anymore.  But
 that seems like a kluge that could wait until we really need the flag
 space.

 I think that's a reasonable approach, although I might be inclined to
 do the overlay sooner rather than later if it doesn't add too much
 complexity.

Well, there's no complexity involved, it's just which bit do we define
the macro as.  Any complexity is conceptual.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] and it's not a bunny rabbit, either

2010-12-30 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 After further thought, I think it makes sense to change this around a
 bit and create a family of functions that can be invoked like this:
 void check_relation_for_FEATURE_support(Relation rel);

That seems like a reasonable idea, but ...

 ... The error message will be of the form:

 constraints can only be used on tables
 triggers can be used only on tables and views
 etc.

 This avoids the need to define a separate error message for each
 unsupported relkind, and I think it's just as informative as, e.g.,
 constraints cannot be used on whatever object type you tried to
 invoke it on.  We can adopt the same language for commands, e.g.:
 CLUSTER can only be used on tables.

ISTM there are four things we might potentially want to state in the
error message: the feature/operation you tried to apply, the name of the
object you tried to apply it to, the type of that object, and the set of
object types that the feature/operation will actually work for.  Our
current wording (foo is not a table or view) covers the second and
fourth of these, though the fourth is stated rather awkwardly.  Your
proposal above covers the first and fourth.  I'm not happy about leaving
out the object name, because there are going to be cases where people
get this type of error out of a long sequence or nest of operations and
it's not clear what it's talking about.  It'd probably be okay to leave
out the actual object type as long as you include its name, though.

One possibility is to break it down like this:

ERROR: foo is a sequence
DETAIL: Triggers can only be used on tables and views.

This could still be emitted by a function such as you suggest, and
indeed that would be the most practical way from both a consistency
and code-size standpoint.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: RIGHT/FULL OUTER hash joins (was Re: [HACKERS] small table left outer join big table)

2010-12-30 Thread Jie Li
On Thu, Dec 30, 2010 at 11:50 PM, Robert Haas robertmh...@gmail.com wrote:

 On Thu, Dec 30, 2010 at 10:45 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  I had an epiphany about this topic, or actually two of them.
 
  1. Whether or not you think there's a significant performance reason
  to support hash right joins, there's a functionality reason.  The
  infrastructure for right join could just as easily do full joins.
  And AFAICS, a hash full join would only require one hashable join
  clause --- the other FULL JOIN ON conditions could be anything at all.
  This is unlike the situation for merge join, where all the JOIN ON
  conditions have to be mergeable or it doesn't work right.  So we could
  greatly reduce the scope of the dreaded FULL JOIN is only supported
  with merge-joinable join conditions error.  (Well, okay, it's not
  *that* dreaded, but people complain about it occasionally.)

 Yeah, that would be neat.  It might be a lot faster in some cases, too.


Yeah, PostgreSQL should have this great feature.

Actually Oracle 10g already has the right hash join,
http://dbcrusade.blogspot.com/2008/01/oracle-hash-join-right-outer.html

 And Oracle 11g has the full hash join.
http://www.dba-oracle.com/oracle11g/oracle_11g_full_hash_join.htm

Haven't checked whether other DBMS have this feature.

Thanks,
Li Jie


[HACKERS] pl/python do not delete function arguments

2010-12-30 Thread Jan Urbański
(continuing the flurry of patches)

Here's a patch that stops PL/Python from removing the function's
arguments from its globals dict after calling it. It's
an incremental patch on top of the plpython-refactor patch sent in
http://archives.postgresql.org/message-id/4d135170.3080...@wulczer.org.

Git branch for this patch:
https://github.com/wulczer/postgres/tree/dont-remove-arguments

Apart from being useless, as the whole dict is unreffed and thus freed
in PLy_procedure_delete, removing args actively breaks things for
recursive invocation of the same function. The recursive callee after
returning will remove the args from globals, and subsequent access to
the arguments in the caller will cause a NameError (see new regression
test in patch).

Cheers,
Jan
diff --git a/src/pl/plpython/expected/plpython_spi.out b/src/pl/plpython/expected/plpython_spi.out
index 7f4ae5c..cb11f60 100644
*** a/src/pl/plpython/expected/plpython_spi.out
--- b/src/pl/plpython/expected/plpython_spi.out
*** CONTEXT:  PL/Python function result_nro
*** 133,135 
--- 133,163 
   2
  (1 row)
  
+ --
+ -- check recursion with same argument does not clobber globals
+ --
+ CREATE FUNCTION recursion_test(n integer) RETURNS integer
+ AS $$
+ if n in (0, 1):
+ return 1
+ 
+ return n * plpy.execute(select recursion_test(%d) as result % (n - 1))[0][result]
+ $$ LANGUAGE plpythonu;
+ SELECT recursion_test(5);
+  recursion_test 
+ 
+ 120
+ (1 row)
+ 
+ SELECT recursion_test(4);
+  recursion_test 
+ 
+  24
+ (1 row)
+ 
+ SELECT recursion_test(1);
+  recursion_test 
+ 
+   1
+ (1 row)
+ 
diff --git a/src/pl/plpython/plpython.c b/src/pl/plpython/plpython.c
index 67eb0f3..1827fc9 100644
*** a/src/pl/plpython/plpython.c
--- b/src/pl/plpython/plpython.c
*** static Datum PLy_function_handler(Functi
*** 307,313 
  static HeapTuple PLy_trigger_handler(FunctionCallInfo fcinfo, PLyProcedure *);
  
  static PyObject *PLy_function_build_args(FunctionCallInfo fcinfo, PLyProcedure *);
- static void PLy_function_delete_args(PLyProcedure *);
  static PyObject *PLy_trigger_build_args(FunctionCallInfo fcinfo, PLyProcedure *,
  	   HeapTuple *);
  static HeapTuple PLy_modify_tuple(PLyProcedure *, PyObject *,
--- 307,312 
*** PLy_function_handler(FunctionCallInfo fc
*** 988,1001 
  			 */
  			plargs = PLy_function_build_args(fcinfo, proc);
  			plrv = PLy_procedure_call(proc, args, plargs);
- 			if (!proc-is_setof)
- 			{
- /*
-  * SETOF function parameters will be deleted when last row is
-  * returned
-  */
- PLy_function_delete_args(proc);
- 			}
  			Assert(plrv != NULL);
  		}
  
--- 987,992 
*** PLy_function_handler(FunctionCallInfo fc
*** 1053,1060 
  Py_XDECREF(plargs);
  Py_XDECREF(plrv);
  
- PLy_function_delete_args(proc);
- 
  if (has_error)
  	ereport(ERROR,
  			(errcode(ERRCODE_DATA_EXCEPTION),
--- 1044,1049 
*** PLy_function_build_args(FunctionCallInfo
*** 1267,1287 
  	return args;
  }
  
- 
- static void
- PLy_function_delete_args(PLyProcedure *proc)
- {
- 	int			i;
- 
- 	if (!proc-argnames)
- 		return;
- 
- 	for (i = 0; i  proc-nargs; i++)
- 		if (proc-argnames[i])
- 			PyDict_DelItemString(proc-globals, proc-argnames[i]);
- }
- 
- 
  /* Decide if a cached PLyProcedure struct is still valid */
  static bool
  PLy_procedure_valid(PLyProcedure *proc, HeapTuple procTup)
--- 1256,1261 
diff --git a/src/pl/plpython/sql/plpython_spi.sql b/src/pl/plpython/sql/plpython_spi.sql
index 7f8f6a3..3b65f95 100644
*** a/src/pl/plpython/sql/plpython_spi.sql
--- b/src/pl/plpython/sql/plpython_spi.sql
*** else:
*** 105,107 
--- 105,123 
  $$ LANGUAGE plpythonu;
  
  SELECT result_nrows_test();
+ 
+ 
+ --
+ -- check recursion with same argument does not clobber globals
+ --
+ CREATE FUNCTION recursion_test(n integer) RETURNS integer
+ AS $$
+ if n in (0, 1):
+ return 1
+ 
+ return n * plpy.execute(select recursion_test(%d) as result % (n - 1))[0][result]
+ $$ LANGUAGE plpythonu;
+ 
+ SELECT recursion_test(5);
+ SELECT recursion_test(4);
+ SELECT recursion_test(1);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] and it's not a bunny rabbit, either

2010-12-30 Thread Alvaro Herrera
Excerpts from Tom Lane's message of jue dic 30 13:49:20 -0300 2010:

 One possibility is to break it down like this:
 
 ERROR: foo is a sequence
 DETAIL: Triggers can only be used on tables and views.
 
 This could still be emitted by a function such as you suggest, and
 indeed that would be the most practical way from both a consistency
 and code-size standpoint.

This seems good to me.  There will only be as many messages as relkinds
we have, plus as many as features there are.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] and it's not a bunny rabbit, either

2010-12-30 Thread Robert Haas
On Thu, Dec 30, 2010 at 11:49 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 One possibility is to break it down like this:

        ERROR: foo is a sequence
        DETAIL: Triggers can only be used on tables and views.

 This could still be emitted by a function such as you suggest, and
 indeed that would be the most practical way from both a consistency
 and code-size standpoint.

Great idea.  I should have thought of that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Stefan Kaltenbrunner

On 12/30/2010 06:26 PM, Simon Riggs wrote:


I've mulled over the design for sync rep for awhile now, and have come
up with a feature set that includes the final detailed feedback from
Fujii Masao, Aidan Van Dyk, Josh Berkus and others.

The design also draws from MySQL concepts to make the two interfaces as
similar and as simple as possible. It should be noted that the design
presented here has many features that the MySQL design does not.

I am currently finishing up my patch to offer these features, so its
time to begin final discussions.

As an interim step, I enclose a PDF version of relevant excerpts from
the doc patch. The patch will follow on a later post in the near future.

I would like to separate discussions on user interface from that of
internal design, to make it easier for more people to get involved.
Please read the following and post your comments. Thank you.


it would help if this would just be a simple text-only description of 
the design that people can actually comment on inline. I don't think 
sending technical design proposals as a pdf (which seems to be written 
in doc-style as well) is a good idea to encourage discussion on -hackers :(



Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Problems with autovacuum and vacuum

2010-12-30 Thread JotaComm
Hello,

Last week I had a serious problem with my PostgreSQL database. My autovacuum
is OFF, but in September it started to prevent the transaction wraparoud;
however last week the following message appeared continuously in my log:

WARNING: database production must be vacuumed within 4827083 transactions
HINT: To avoid a database shutdown, execute a full-database VACUUM in
production.

This message appeared for five to six hours; after that, the message
disappeared from log. Any idea about what could have happened?

Every day the vacuum is executed on some tables; and on Sundays it's
executed on all tables. But as the autovacuum is running since September,
and it runs for a long time, the vacuum was blocked because autovacuum had
been running on the same table. How should I procede in this case?

The table where the autovacuum is running and where the vacuum was blocked
has billion of rows.

I'm using the PostgreSQL 8.3.8

The configuration of the vacuum parameters are:

vacuum_cost_limit = 200
vacuum_cost_delay = 0
vacuum_freeze_min_age = 1
autovacuum = off
autovacuum_freeze_max_age = 2

Regards,

João Paulo

-- 
JotaComm
http://jotacomm.wordpress.com


Re: [HACKERS] Avoiding rewrite in ALTER TABLE ALTER TYPE

2010-12-30 Thread Jim Nasby
On Dec 29, 2010, at 10:14 PM, Robert Haas wrote:
 +1 for trying to optimize these cases (but maybe after we optimize the
 varchar - text and varchar(less) - varchar(more) cases to skip the
 scan altogether).

+1 on getting the obvious cases of varchar and numeric done first; we run into 
those a lot at work and would be willing to sponsor work on a patch that makes 
those operations as fast as just adding a new column.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Simon Riggs
On Thu, 2010-12-30 at 18:42 +0100, Stefan Kaltenbrunner wrote:

 it would help if this would just be a simple text-only description of 
 the design that people can actually comment on inline. I don't think 
 sending technical design proposals as a pdf (which seems to be written 
 in doc-style as well) is a good idea to encourage discussion on -hackers :(

25.2.6. Synchronous Replication
Streaming replication is by default asynchronous. Transactions on the
primary server write commit records to WAL, yet do not know whether or
when a standby has received and processed those changes. So with
asynchronous replication, if the primary crashes, transactions committed
on the primary might not have been received by any standby. As a result,
failover from primary to standby could cause data loss because
transaction completions are absent, relative to the primary. The amount
of data loss is proportional to the replication delay at the time of
failover. 

Synchronous replication offers the ability to guarantee that all changes
made by a transaction have been transferred to at least one remote
standby server. This is an extension to the standard level of durability
offered by a transaction commit. This is referred to as semi-synchronous
replication. 

When synchronous replication is requested, the commit of a write
transaction will wait until confirmation that the commit record has been
transferred successfully to at least one standby server. Waiting for
confirmation increases the user's confidence that the changes will not
be lost in the event of server crashes but it also necessarily increases
the response time for the requesting transaction. The minimum wait time
is the roundtrip time from primary to standby. 

Read only transactions and transaction rollbacks need not wait for
replies from standby servers. Subtransaction commits do not wait for
responses from standby servers, only final top-level commits. Long
running actions such as data loading or index building do not wait until
the very final commit message. 


25.2.6.1. Basic Configuration
Synchronous replication must be enabled on both the primary and at least
one standby server. If synchronous replication is disabled on the
master, or enabled on the primary but not enabled on any slaves, the
primary will use asynchronous replication by default. 

We use a single parameter to enable synchronous replication, set in
postgresql.conf on both primary and standby servers: 

synchronous_replication = off (default) | on

On the primary, synchronous_replication can be set for particular users
or databases, or dynamically by applications programs. 

If more than one standby server specifies synchronous_replication, then
whichever standby replies first will release waiting commits. 

Turning this setting off for a standby allows the administrator to
exclude certain standby servers from releasing waiting transactions.
This is useful if not all standby servers are designated as potential
future primary servers. On the standby, this parameter only takes effect
at server start. 


25.2.6.2. Planning for Performance
Synchronous replication usually requires carefully planned and placed
standby servers to ensure applications perform acceptably. Waiting
doesn't utilise system resources, but transaction locks continue to be
held until the transfer is confirmed. As a result, incautious use of
synchronous replication will reduce performance for database
applications because of increased response times and higher contention. 

PostgreSQL allows the application developer to specify the durability
level required via replication. This can be specified for the system
overall, though it can also be specified for specific users or
connections, or even individual transactions. 

For example, an application workload might consist of: 10% of changes
are important customer details, while 90% of changes are less important
data that the business can more easily survive if it is lost, such as
chat messages between users. 

With synchronous replication options specified at the application level
(on the master) we can offer sync rep for the most important changes,
without slowing down the bulk of the total workload. Application level
options are an important and practical tool for allowing the benefits of
synchronous replication for high performance applications. This feature
is unique to PostgreSQL. 


25.2.6.3. Planning for High Availability
The easiest and safest method of gaining High Availability using
synchronous replication is to configure at least two standby servers. To
understand why, we need to examine what can happen when you lose all
standby servers. 

Commits made when synchronous_replication is set will wait until at
least one standby responds. The response may never occur if the last, or
only, standby should crash or the network drops. What should we do in
that situation? 

Sitting and waiting will typically cause operational problems because it
is an effective outage of the 

Re: [HACKERS] and it's not a bunny rabbit, either

2010-12-30 Thread Robert Haas
On Thu, Dec 30, 2010 at 12:32 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Dec 30, 2010 at 11:49 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 One possibility is to break it down like this:

        ERROR: foo is a sequence
        DETAIL: Triggers can only be used on tables and views.

 This could still be emitted by a function such as you suggest, and
 indeed that would be the most practical way from both a consistency
 and code-size standpoint.

 Great idea.  I should have thought of that.

On further reflection, this can still turn into a laundry list in certain cases.

DETAIL: You can only comment on columns of tables, views, and composite types.

seems less helpful than:

DETAIL: Comments on relations with system-generated column names are
not supported.

I think that for rules, triggers, constraints, and anything that only
works on a single relkind, we can't do much better than to list the
specific object types.  But where there's some sort of guiding
principle involved I think we'd do well to articulate it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] estimating # of distinct values

2010-12-30 Thread Tomas Vondra
Dne 30.12.2010 15:43, Florian Pflug napsal(a):
 On Dec27, 2010, at 23:49 , Kevin Grittner wrote:
 Robert Haas robertmh...@gmail.com wrote:

 With respect to (b), I think I'd need to see a much more detailed
 design for how you intend to make this work.  Off the top of my
 head there seems to be some pretty serious feasibility problems.

 I had one random thought on that -- it seemed like a large concern
 was that there would need to be at least an occasional scan of the
 entire table to rebuild the distinct value information.
 
 I believe we could actually avoid that.
 
 First, the paper An Optimal Algorithm for the Distinct Elements Problem
 actually contains an algorithm with *does* handle deletes - it's called
 L_0 estimate there.

Hmmm, that's interesting. I know there's a part about L_0 estimation,
but that's about estimating Hamming norm of a vector - so I've ignored
it as I thought we can't use it to estimate number of distinct values.
But if it really handles deletions and if we can use it, then it's
really interesting.

 Second, as Tomas pointed out, the stream-based estimator is essentially a
 simplified version of a bloom filter. It starts out with a field of
 N zero bits, and sets K of them to 1 for each value v in the stream.
 Which bits are set to 1 depends on some hash function(s) H_i(v). It's
 then easy to compute how many 1-bits you'd expect to find in the bit
 field after seeing D distinct values, and by reversing that you can
 estimate D from the number of 1-bits in the bit field.

No, I haven't said the stream-based estimators are simplified versions
of a Bloom filter. I said the approach is very similar - all the
algorithms use bitmaps and hash functions, but the algorithms (Bloom
filter vs. probabilistic counting and adaptive sampling) are very different.

The Bloom filter is much more straightforward. The other algorithms are
much more sophisticated which allows to use less space.

 To avoid having to rescan large tables, instead of storing one such
 bit field, we'd store one per B pages of data. We'd then only need
 to scan a range of B pages around every updated or deleted tuple,
 and could afterwards compute a new global estimate of D by combining
 the individual bit fields with bitwise and.

I don't think this could help.

1) This works just with the Bloom filters, not with the other
   algorithms (you can't combine the segments using bitmap OR).

2) With heavily modified tables the updates are usually 'spread'
   through the whole table, so you'll have to rebuild all the
   segments anyway.

 Since the need to regularly VACUUM tables hit by updated or deleted
 won't go away any time soon, we could piggy-back the bit field
 rebuilding onto VACUUM to avoid a second scan.

Well, I guess it's a bit more complicated. First of all, there's a local
VACUUM when doing HOT updates. Second, you need to handle inserts too
(what if the table just grows?).

But I'm not a VACUUM expert, so maybe I'm wrong and this is the right
place to handle rebuilds of distinct stats.

I was thinking about something else - we could 'attach' the rebuild to
an actual seq scan if the amount of changes reaches some threshold
(since the last rebuild). Only in case the amount of changes reaches a
higher threshold, we would rebuild the stats on our own.

Something like

IF (# of updates * deletes  5%) THEN wait for seq scan
IF (# of updates * deletes  10%) THEN rebuild the stats

I've found a nice ppt describing how Oracle does this:

   http://www.oraclegeek.net/downloads/One_Pass_Distinct_Sampling.ppt

and there's PDF version

   http://www.oraclegeek.net/downloads/OnePassDistinctSampling.pdf

According to this, Oracle is using the probabilistic counting approach
(see slide 26). And once they find out there were to many changes in the
table, they rebuild the whole thing.

I'm not saying we should do exactly the same, just that this might be a
good direction.

regards
Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] C++ keywords in headers

2010-12-30 Thread Chris Browne
pete...@gmx.net (Peter Eisentraut) writes:

 On mån, 2010-12-27 at 12:33 -0500, Andrew Dunstan wrote:
 On a more general point, it would be useful to have some
 infrastructure for running quality checks like this and publishing
 the results. We should be way beyond the point where we rely on
 individuals doing this sort of stuff.

 I had a Hudson service set up for things like this, but the hosting
 was unreliable and then the thing faded away.  I could try to revive
 it.

Careful, Oracle has been trying to claim proprietary ownership of
that...
  http://hudson-labs.org/content/whos-driving-thing
-- 
``God decided to take the  devil to court and settle their differences
once and for all.  When Satan heard of this, he grinned and said, And
just where do you think you're going to find a lawyer?''

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Robert Treat
On Thu, Dec 30, 2010 at 2:04 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On Thu, 2010-12-30 at 18:42 +0100, Stefan Kaltenbrunner wrote:

  it would help if this would just be a simple text-only description of
  the design that people can actually comment on inline. I don't think
  sending technical design proposals as a pdf (which seems to be written
  in doc-style as well) is a good idea to encourage discussion on -hackers
 :(

 25.2.6. Synchronous Replication
 Streaming replication is by default asynchronous. Transactions on the
 primary server write commit records to WAL, yet do not know whether or
 when a standby has received and processed those changes. So with
 asynchronous replication, if the primary crashes, transactions committed
 on the primary might not have been received by any standby. As a result,
 failover from primary to standby could cause data loss because
 transaction completions are absent, relative to the primary. The amount
 of data loss is proportional to the replication delay at the time of
 failover.

 Synchronous replication offers the ability to guarantee that all changes
 made by a transaction have been transferred to at least one remote
 standby server. This is an extension to the standard level of durability
 offered by a transaction commit. This is referred to as semi-synchronous
 replication.

 When synchronous replication is requested, the commit of a write
 transaction will wait until confirmation that the commit record has been
 transferred successfully to at least one standby server. Waiting for
 confirmation increases the user's confidence that the changes will not
 be lost in the event of server crashes but it also necessarily increases
 the response time for the requesting transaction. The minimum wait time
 is the roundtrip time from primary to standby.

 Read only transactions and transaction rollbacks need not wait for
 replies from standby servers. Subtransaction commits do not wait for
 responses from standby servers, only final top-level commits. Long
 running actions such as data loading or index building do not wait until
 the very final commit message.


 25.2.6.1. Basic Configuration
 Synchronous replication must be enabled on both the primary and at least
 one standby server. If synchronous replication is disabled on the
 master, or enabled on the primary but not enabled on any slaves, the
 primary will use asynchronous replication by default.

 We use a single parameter to enable synchronous replication, set in
 postgresql.conf on both primary and standby servers:

 synchronous_replication = off (default) | on

 On the primary, synchronous_replication can be set for particular users
 or databases, or dynamically by applications programs.


This seems like a potential issue, where I start a server with this off, and
then I start turning it on for specific transactions; it isn't exactly clear
what happens, since there may or may not be a running synchronous rep slave
available.  (I love the idea though)


 If more than one standby server specifies synchronous_replication, then
 whichever standby replies first will release waiting commits.


I don't want you to think I am setting an expectation, but I'm curious about
the possibility of requiring more than 1 server to reply?


 Turning this setting off for a standby allows the administrator to
 exclude certain standby servers from releasing waiting transactions.
 This is useful if not all standby servers are designated as potential
 future primary servers. On the standby, this parameter only takes effect
 at server start.


 25.2.6.2. Planning for Performance
 Synchronous replication usually requires carefully planned and placed
 standby servers to ensure applications perform acceptably. Waiting
 doesn't utilise system resources, but transaction locks continue to be
 held until the transfer is confirmed. As a result, incautious use of
 synchronous replication will reduce performance for database
 applications because of increased response times and higher contention.

 PostgreSQL allows the application developer to specify the durability
 level required via replication. This can be specified for the system
 overall, though it can also be specified for specific users or
 connections, or even individual transactions.

 For example, an application workload might consist of: 10% of changes
 are important customer details, while 90% of changes are less important
 data that the business can more easily survive if it is lost, such as
 chat messages between users.

 With synchronous replication options specified at the application level
 (on the master) we can offer sync rep for the most important changes,
 without slowing down the bulk of the total workload. Application level
 options are an important and practical tool for allowing the benefits of
 synchronous replication for high performance applications. This feature
 is unique to PostgreSQL.


 25.2.6.3. Planning for High Availability
 The easiest and safest 

Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Marti Raudsepp
Most of your doc uses the terms primary and standby, but a few
instances of master and slave have slipped in. I think it's better
to stick to consistent terminology.

On Thu, Dec 30, 2010 at 21:04, Simon Riggs si...@2ndquadrant.com wrote:
 With synchronous replication options specified at the application level
 (on the master) we can offer sync rep for the most important changes,
 without slowing down the bulk of the total workload. Application level
 options are an important and practical tool for allowing the benefits of
 synchronous replication for high performance applications. This feature
 is unique to PostgreSQL.

I think a comment about the head-of-line blocking nature of
streaming repliaction is in order. If you execute massive writes in
async mode and then run a transaction in sync mode, its commit will be
delayed until all the async transactions before it have been applied
on the slave.

 synchronous_replication_timeout (boolean)

Doesn't look like a boolean to me :)

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] estimating # of distinct values

2010-12-30 Thread Alvaro Herrera
Excerpts from Tomas Vondra's message of jue dic 30 16:38:03 -0300 2010:

  Since the need to regularly VACUUM tables hit by updated or deleted
  won't go away any time soon, we could piggy-back the bit field
  rebuilding onto VACUUM to avoid a second scan.
 
 Well, I guess it's a bit more complicated. First of all, there's a local
 VACUUM when doing HOT updates. Second, you need to handle inserts too
 (what if the table just grows?).
 
 But I'm not a VACUUM expert, so maybe I'm wrong and this is the right
 place to handle rebuilds of distinct stats.

I was thinking that we could have two different ANALYZE modes, one
full and one incremental; autovacuum could be modified to use one or
the other depending on how many changes there are (of course, the user
could request one or the other, too; not sure what should be the default
behavior).  So the incremental one wouldn't worry about deletes, only
inserts, and could be called very frequently.  The other one would
trigger a full table scan (or nearly so) to produce a better estimate in
the face of many deletions.

I haven't followed this discussion closely so I'm not sure that this
would be workable.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump --split patch

2010-12-30 Thread Robert Treat
On Thu, Dec 30, 2010 at 2:13 AM, Joel Jacobson j...@gluefinance.com wrote:

 2010/12/29 Dimitri Fontaine dimi...@2ndquadrant.fr

 Please have a look at getddl:

  https://github.com/dimitri/getddl


 Nice! Looks like a nifty tool.
 When I tried it, ./getddl.py -f -F /crypt/funcs -d glue, I got the error
 No such file or directory: 'sql/schemas.sql'.

 While the task of splitting objects into separate files could be solved by
 an external wrapper tool like yours around pg_dump,
 I argue it makes more sense of putting the (minimal required) logics into
 pg_dump, due to a number of reasons, most importantly because it's simplier
 and less complex, thus less error prone.

 My patch is only a few lines of code and doesn't add any logics to pg_dump,
 it merely reroutes the fwrite() system calls based on the toc entries.

 Just the fact you and others had to create own tools to do the splitting
 shows the feature is important, which I think should be included in the
 normal pg_dump tool.


As someone whose own version of getddl helped inspire Dimitri to create
his own version, I've both enjoyed reading this thread and seeing this wheel
reinvented yet again, and wholeheartedly +1 the idea of building this
directly into pg_dump. (The only thing better would be to make everything
thing sql callable, but that's a problem for another day).


Robert Treat
http://www.xzilla.net


Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Simon Riggs
On Thu, 2010-12-30 at 15:07 -0500, Robert Treat wrote:
  If more than one standby server specifies synchronous_replication,
 then
  whichever standby replies first will release waiting commits.

 I don't want you to think I am setting an expectation, but I'm curious
 about the possibility of requiring more than 1 server to reply?

I was initially interested in this myself, but after a long discussion
on quorum commit it was decided to go with first past post.

That is easier to manage, requires one less parameter, performs better
and doesn't really add that much additional confidence.

It was also discussed that we would have a plugin API, but I'm less sure
about that now. Perhaps we can add that option in the future, but its
not high on my list of things for this release.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Simon Riggs
On Thu, 2010-12-30 at 15:07 -0500, Robert Treat wrote:
  We use a single parameter to enable synchronous replication, set in
  postgresql.conf on both primary and standby servers:
 
  synchronous_replication = off (default) | on
 
  On the primary, synchronous_replication can be set for particular
 users
  or databases, or dynamically by applications programs.
 
 
 This seems like a potential issue, where I start a server with this
 off, and then I start turning it on for specific transactions; it
 isn't exactly clear what happens, since there may or may not be a
 running synchronous rep slave available.  (I love the idea though)

Not really an issue. Even if there was a standby there a moment ago, the
standby can go away at any time. So we must cope gracefully with what
happens if you do this. By default, the parameters specify that in the
case you mention we will just use async replication (no wait!).
Options exist to change that, since some people want to wait until the
sysadmin adds a standby.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Stefan Kaltenbrunner

On 12/30/2010 08:04 PM, Simon Riggs wrote:

On Thu, 2010-12-30 at 18:42 +0100, Stefan Kaltenbrunner wrote:


it would help if this would just be a simple text-only description of
the design that people can actually comment on inline. I don't think
sending technical design proposals as a pdf (which seems to be written
in doc-style as well) is a good idea to encourage discussion on -hackers :(


25.2.6. Synchronous Replication
Streaming replication is by default asynchronous. Transactions on the
primary server write commit records to WAL, yet do not know whether or
when a standby has received and processed those changes. So with
asynchronous replication, if the primary crashes, transactions committed
on the primary might not have been received by any standby. As a result,
failover from primary to standby could cause data loss because
transaction completions are absent, relative to the primary. The amount
of data loss is proportional to the replication delay at the time of
failover.

Synchronous replication offers the ability to guarantee that all changes
made by a transaction have been transferred to at least one remote
standby server. This is an extension to the standard level of durability
offered by a transaction commit. This is referred to as semi-synchronous
replication.

When synchronous replication is requested, the commit of a write
transaction will wait until confirmation that the commit record has been
transferred successfully to at least one standby server. Waiting for
confirmation increases the user's confidence that the changes will not
be lost in the event of server crashes but it also necessarily increases
the response time for the requesting transaction. The minimum wait time
is the roundtrip time from primary to standby.


hmm this is one of the main problems I see with the proposed master is 
sometimes aware of the standby(as in the feedback mode) concept this 
proposal has. If it waits for only one of the standbys there is some 
issue with the terminology. As a DBA I would expect the master to only 
return if ALL of the sync replication declared nodes replied ok.





Read only transactions and transaction rollbacks need not wait for
replies from standby servers. Subtransaction commits do not wait for
responses from standby servers, only final top-level commits. Long
running actions such as data loading or index building do not wait until
the very final commit message.


25.2.6.1. Basic Configuration
Synchronous replication must be enabled on both the primary and at least
one standby server. If synchronous replication is disabled on the
master, or enabled on the primary but not enabled on any slaves, the
primary will use asynchronous replication by default.

We use a single parameter to enable synchronous replication, set in
postgresql.conf on both primary and standby servers:


this reads as if you can only set it there



synchronous_replication = off (default) | on

On the primary, synchronous_replication can be set for particular users
or databases, or dynamically by applications programs.


this says otherwise



If more than one standby server specifies synchronous_replication, then
whichever standby replies first will release waiting commits.


see above for why I think this violates the configuration promise - if I 
say this is a sync standby I better expect it to be...




Turning this setting off for a standby allows the administrator to
exclude certain standby servers from releasing waiting transactions.
This is useful if not all standby servers are designated as potential
future primary servers. On the standby, this parameter only takes effect
at server start.


25.2.6.2. Planning for Performance
Synchronous replication usually requires carefully planned and placed
standby servers to ensure applications perform acceptably. Waiting
doesn't utilise system resources, but transaction locks continue to be
held until the transfer is confirmed. As a result, incautious use of
synchronous replication will reduce performance for database
applications because of increased response times and higher contention.

PostgreSQL allows the application developer to specify the durability
level required via replication. This can be specified for the system
overall, though it can also be specified for specific users or
connections, or even individual transactions.

For example, an application workload might consist of: 10% of changes
are important customer details, while 90% of changes are less important
data that the business can more easily survive if it is lost, such as
chat messages between users.

With synchronous replication options specified at the application level
(on the master) we can offer sync rep for the most important changes,
without slowing down the bulk of the total workload. Application level
options are an important and practical tool for allowing the benefits of
synchronous replication for high performance applications. This feature
is unique to PostgreSQL.


that seems to be 

Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Aidan Van Dyk
On Thu, Dec 30, 2010 at 3:07 PM, Robert Treat r...@xzilla.net wrote:

 If primary crashes while commits are waiting for acknowledgement, those
 transactions will be marked fully committed if the primary database
 recovers, no matter how allow_standalone_primary is set.

 This seems backwards; if you are waiting for acknowledgement, wouldn't the
 normal assumption be that the transactions *didnt* make it to any standby,
 and should be rolled back ?

This is the standard 2-phase commit problem.  The primary server *has*
committed it, it's fsync has returned, and the only thing keeping it
from returning the commit to the client is that it's waiting on a
synchronous ack from a slave.

You've got 2 options:
1) initiate fsync on the slave first
   - In this case, the slave is farther ahead than the primary, and if
primary fails, you're *forced* to have a failover.  The standby is
head of the primary, so the primary recovering can cause divergence.
And you'll likely have to do a base-backup style sync to get a new
primary/standby setup.
2) initiate fsync on the primary first
   - In this case, the slave is always slightly behind.  If if your
primary falls over, you don't give commit messages to the clients, but
if it recovers, it might have committed data, and slaves will still be
able to catch up.

The thing is that currently, even without replication, #2 can happen.
If your db falls over before it gets the commit packet stuffed out the
network, you're in the same boat.  The data might be committed, even
though you didn't get the commit packet, and when  your DB recovers,
it's got the committed data that you never knew was committed.

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Robert Treat
On Thu, Dec 30, 2010 at 3:36 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On Thu, 2010-12-30 at 15:07 -0500, Robert Treat wrote:
   If more than one standby server specifies synchronous_replication,
  then
   whichever standby replies first will release waiting commits.

  I don't want you to think I am setting an expectation, but I'm curious
  about the possibility of requiring more than 1 server to reply?

 I was initially interested in this myself, but after a long discussion
 on quorum commit it was decided to go with first past post.

 That is easier to manage, requires one less parameter, performs better
 and doesn't really add that much additional confidence.


Yes, I think with a single master, you are probably right (been
dealing with more than my fair share of multi-master based nosql
solutions lately)

Still, one thing that has me concerned is that in the case of two
slaves, you don't know which one is the more up-to-date one if you
need to failover. It'd be nice if you could just guarantee they both
are, but in lieu of that, I guess whatever decision tree is being
used, it needs to look at current xlog location of any potential
failover targets.

 It was also discussed that we would have a plugin API, but I'm less sure
 about that now. Perhaps we can add that option in the future, but its
 not high on my list of things for this release.


Agreed.

Robert Treat
http://www.xzilla.net

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Old git repo

2010-12-30 Thread Jeff Davis
On Thu, 2010-12-30 at 11:02 -0500, Tom Lane wrote:
 I'm with Magnus on this: the risk of confusion seems to greatly
 outweigh any possible benefit from keeping it.  There is no reason for
 anyone to use that old repo unless they are still working with a local
 clone of it, and even if they do have a local clone, such a clone is
 self-sufficient.

The reason I originally asked for it to be kept around was not because
it's hard to rebase, but because there might be references to SHA1s from
that repo floating around.

I don't think these would be very common, nor critical, but I know I
wrote a few emails that included things like look at this commit.
Personally, my utility for the old repo is not much (if it was anything
important, I wouldn't have relied on the unofficial repo). But we should
probably give a little bit of warning for folks that might want to
rebase or translate some old notes.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Simon Riggs
On Thu, 2010-12-30 at 15:07 -0500, Robert Treat wrote:
  When allow_standalone_primary is set, a user will stop waiting once
 the
  replication_timeout has been reached for their specific session.
 Users
  are not waiting for a specific standby to reply, they are waiting
 for a
  reply from any standby, so the unavailability of any one standby is
 not
  significant to a user. It is possible for user sessions to hit
 timeout
  even though standbys are communicating normally. In that case, the
  setting of replication_timeout is probably too low.
 
 
 will a notice or warning be thrown in these cases? I'm thinking
 something
 like the checkpoint timeout warning, but could be something else; it
 just
 seems to me you need some way to know you're timing out.

We can do that, yes.

  The standby sends regular status messages to the primary. If no
 status
  messages have been received for replication_timeout the primary
 server
  will assume the connection is dead and terminate it. This happens
  whatever the setting of allow_standalone_primary.
 
 
 Does the standby attempt to reconnect in these scenarios?

Yes it would, but the reason why we terminated the connection was it
wasn't talking any more, so it is probably dead.

  If primary crashes while commits are waiting for acknowledgement,
 those
  transactions will be marked fully committed if the primary database
  recovers, no matter how allow_standalone_primary is set.
 
 
 This seems backwards; if you are waiting for acknowledgement, wouldn't
 the
 normal assumption be that the transactions *didnt* make it to any
 standby,
 and should be rolled back ?

Well, we can't roll it back. We have already written the commit record
to WAL.

  There is no way
  to be certain that all standbys have received all outstanding WAL
 data
  at time of the crash of the primary. Some transactions may not show
 as
  committed on the standby, even though they show as committed on the
  primary. The guarantee we offer is that the application will not
 receive
  explicit acknowledgement of the successful commit of a transaction
 until
  the WAL data is known to be safely received by the standby. Hence
 this
  mechanism is technically semi synchronous rather than fully
  synchronous replication. Note that replication still not be fully
  synchronous even if we wait for all standby servers, though this
 would
  reduce availability, as described previously.
 
 
 I think we ought to have an example of the best configuration for
 cannot
 afford to lose any data scenarios, where we would prefer an overall
 service
 interruption over the chance of having the primary / secondary out of
 synch.

I say use two or more standbys more than once...

 
 
 somewhat concerned that we seem to need to use double negatives to
 describe
 whats going on here. it makes me think we ought to rename this to
 require_synchronous_standby or similar.

Don't see why we can't use double negatives. ;-)

The parameter is named directly from Fujii Masao's suggestion.

  18.5.6. Standby Servers
  These settings control the behavior of a standby server that is to
  receive replication data.
 

...

 i was expecting this section to mention the synchronous_replication
 (bool)
 somewhere, to control if the standby will participate synchronously or
 asynch; granted it's the same config as listed in 18.5.5 right? Just
 that
 the heading of that section specifically targets the primary.

OK, good idea.

 HTH, looks pretty good at first glance. 

Thanks.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_streamrecv for 9.1?

2010-12-30 Thread Stefan Kaltenbrunner

On 12/29/2010 07:42 PM, Robert Haas wrote:

On Dec 29, 2010, at 1:01 PM, Tom Lanet...@sss.pgh.pa.us  wrote:

Is it really stable enough for bin/?  My impression of the state of
affairs is that there is nothing whatsoever about replication that
is really stable yet.


Well, that's not stopping us from shipping a core feature called replication. 
 I'll defer to others on how mature pg_streamrecv is, but if it's no worse than 
replication in general I think putting it in bin/ is the right thing to do.


well I have not looked at how good pg_streamrecv really is but we 
desperately need to fix the basic usability issues in our current 
replication implementation and pg_streamrecv seems to be a useful tool 
to help with some.
From all the people I talked to with SR they where surprised how 
complex and fragile the initial setup procedure is - it is the lack of 
providing a simple and reliable tool to do a base backup over libpq and 
also a simple way to have that tool tell the master keep the wal 
segments I need for starting the standby. I do realize we need to keep 
the ability to do the basebackup out-of-line but for 99% of the users it 
is tool complex, scary and failure proof (I know nobody who got the 
procedure right the first time - which is a strong hint that we need to 
work on that).




Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Simon Riggs
On Thu, 2010-12-30 at 15:51 -0500, Robert Treat wrote:

 Still, one thing that has me concerned is that in the case of two
 slaves, you don't know which one is the more up-to-date one if you
 need to failover. It'd be nice if you could just guarantee they both
 are...

Regrettably, nobody can know that, without checking.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] small table left outer join big table

2010-12-30 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 I can't get all *that* excited about complicating hash joins as
 proposed.  The query is still fundamentally going to be slow because
 you won't get out of having to seqscan the large table.  The only way
 to make it really fast is to not read all of the large table, and
 nestloop-with-inner-indexscan is the only plan type with a hope of
 doing that.

That sounds somewhat like Loose Indexscan as described in the following
wiki page, right?

  http://wiki.postgresql.org/wiki/Loose_indexscan

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Stefan Kaltenbrunner

On 12/30/2010 10:01 PM, Simon Riggs wrote:

On Thu, 2010-12-30 at 15:51 -0500, Robert Treat wrote:


Still, one thing that has me concerned is that in the case of two
slaves, you don't know which one is the more up-to-date one if you
need to failover. It'd be nice if you could just guarantee they both
are...


Regrettably, nobody can know that, without checking.


how exactly would you check? - this seems like something that needs to 
be done from the SQL and the CLI level and also very well documented 
(which I cannot see in your proposal).




Stefan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Simon Riggs
On Thu, 2010-12-30 at 22:11 +0200, Marti Raudsepp wrote:

 I think a comment about the head-of-line blocking nature of
 streaming repliaction is in order. If you execute massive writes in
 async mode and then run a transaction in sync mode, its commit will be
 delayed until all the async transactions before it have been applied
 on the slave.

Not really sure I understand what you want me to add there. The case you
mention is identical whether we use the word async or sync where you
mention in async mode.

Replication doesn't wait until a sync commit is requested, it is
continuously active.

Sync rep's only addition are the reply messages.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Simon Riggs
On Thu, 2010-12-30 at 21:42 +0100, Stefan Kaltenbrunner wrote:
 
  Synchronous replication offers the ability to guarantee that all changes
  made by a transaction have been transferred to at least one remote
  standby server. This is an extension to the standard level of durability
  offered by a transaction commit. This is referred to as semi-synchronous
  replication.
 
  When synchronous replication is requested, the commit of a write
  transaction will wait until confirmation that the commit record has been
  transferred successfully to at least one standby server. Waiting for
  confirmation increases the user's confidence that the changes will not
  be lost in the event of server crashes but it also necessarily increases
  the response time for the requesting transaction. The minimum wait time
  is the roundtrip time from primary to standby.
 
 hmm this is one of the main problems I see with the proposed master is 
 sometimes aware of the standby(as in the feedback mode) concept this 
 proposal has. If it waits for only one of the standbys there is some 
 issue with the terminology. As a DBA I would expect the master to only 
 return if ALL of the sync replication declared nodes replied ok.

Well, as a DBA, I expect it to work with just one. That's how MySQL and
Oracle work at least. If ALL standbys reply, it takes longer, makes the
code harder, how do you determine what all is robustly etc.. Plus its
been discussed already.

 What I'm really missing with that proposal is how people expect that 
 solution to be managed - 

What aspect do you wish to monitor? I'm happy to consider your
suggestions.

 given there is only sometimes a feedback 
 channel into the master you can't do the monitoring.

Not sure what you mean. Please explain more.

 Even if you could (which we really need!) there is nothing in the 
 proposal yet that will help to determine on what the most recent standby 
 (in the case of more 1 sync standby) might be.

Functions to determine that already exist.

  - but it would require a real standby 
 registration or at least standby management possibility on the master 
 not a halfway done one - so do we really need hot_standby_feedback as 
 part of the inital sync-rep patch?

It is a Hot Standby feature, but so tightly integrated with this code
that it isn't possible for me to submit as two separate patches.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Robert Haas
On Thu, Dec 30, 2010 at 2:04 PM, Simon Riggs si...@2ndquadrant.com wrote:
 We use a single parameter to enable synchronous replication, set in
 postgresql.conf on both primary and standby servers:

 synchronous_replication = off (default) | on

 On the primary, synchronous_replication can be set for particular users
 or databases, or dynamically by applications programs.

 If more than one standby server specifies synchronous_replication, then
 whichever standby replies first will release waiting commits.

 Turning this setting off for a standby allows the administrator to
 exclude certain standby servers from releasing waiting transactions.
 This is useful if not all standby servers are designated as potential
 future primary servers. On the standby, this parameter only takes effect
 at server start.

I think it's a bad idea to use the same parameter to mean different
things on the master and standby.  You proposed this kind of double
meaning for the hot_standby parameter (possibly back when it was
called standby_connections, or something like that) and we (rightly, I
think) did not adopt that, instead ending up with wal_level to control
the master's behavior and hot_standby to control the slave's behavior.

 synchronous_replication (boolean)
        Specifies whether transaction commit will wait for WAL records
        to be replicated before the command returns a success
        indication to the client.

The word replicated here could be taken to mean different things,
most obviously:

- slave has received the WAL
- slave has fsync'd the WAL
- slave has applied the WAL

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Simon Riggs
On Thu, 2010-12-30 at 22:08 +0100, Stefan Kaltenbrunner wrote:
 On 12/30/2010 10:01 PM, Simon Riggs wrote:
  On Thu, 2010-12-30 at 15:51 -0500, Robert Treat wrote:
 
  Still, one thing that has me concerned is that in the case of two
  slaves, you don't know which one is the more up-to-date one if you
  need to failover. It'd be nice if you could just guarantee they both
  are...
 
  Regrettably, nobody can know that, without checking.
 
 how exactly would you check? - this seems like something that needs to 
 be done from the SQL and the CLI level and also very well documented 
 (which I cannot see in your proposal).

This is a proposal for sync rep, not multi-node failover. I'm definitely
not going to widen the scope of this project.

Functions already exist to check the thing you're asking.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2010-12-30 Thread Heikki Linnakangas

On 30.12.2010 16:49, Florian Pflug wrote:

On Dec30, 2010, at 13:31 , Joachim Wieland wrote:

We return snapshot information as a chunk of data to the client. At
the same time however, we set a checksum in shared memory to protect
against modification of the snapshot. A publishing backend can revoke
its snapshot by deleting the checksum and a backend that is asked to
install a snapshot can verify that the snapshot is correct and current
by calculating the checksum and comparing it with the one in shared
memory.


We'd still have to stream these checksums to the standbys though,
or would they be exempt from the checksum checks?

I still wonder whether these checks are worth the complexity. I
believe we'd only allow snapshot modifications for read-only queries
anyway, so what point is there in preventing clients from setting
broken snapshots?


Hmm, our definition of read-only is a bit fuzzy. While a transaction 
doesn't modify the database itself, it could still send NOTIFYs or call 
a PL function to do all sorts of things outside the database. Imagine 
that you're paranoid about data integrity, and have a security definer 
function that runs cross checks on the data. If it finds any 
anomalities, it wakes up the operator or forces shutdown or similar.


Now a malicious user could set a snapshot that passes the basic validity 
checks, ie. xmin = GlobalXmin, but contains a combination of still 
in-progress that never existed in reality. If he then calls the 
paranoia-function, it would see an inconsistent state of committed 
tuples and get upset.


Maybe that's a bit far-stretched, but it's not entirely clear that 
running with an inconsistent snapshot is harmless.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Robert Haas
On Thu, Dec 30, 2010 at 3:42 PM, Stefan Kaltenbrunner
ste...@kaltenbrunner.cc wrote:
 synchronous replication for high performance applications. This feature
 is unique to PostgreSQL.

 that seems to be a bit too much marketing for a reference level document

+1.

 It also does not address the more general (not sync rep specific) problem of
 how to deal with max_keep_segments which is a wart and I was hoping we could
 get rid of in 9.1 - but it would require a real standby registration or at
 least standby management possibility on the master not a halfway done one -
 so do we really need hot_standby_feedback as part of the inital sync-rep
 patch?

And this is really the key point on which previous discussions of sync
rep stalled.  Simon is clearly of the opinion that any system where
the slaves have an individual identities (aka standby registration)
is a bad idea, but the only justification he's offered for that
position is the assertion that it doesn't allow any added
functionality.  As you point out, and as has been pointed out before,
this is not true, but unless Simon has changed his position since the
last time we discussed this, he will not only refuse to include any
kind of standby identifier in any of his proposals, but will also
argue against including any such code even if it is written by someone
else.  I don't understand why, but that's how it is.

Synchronous replication would probably be done and committed by now if
it weren't for this issue.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Vacuum of newly activated 8.3.12 standby receives warnings page xxx is uninitialized --- fixing

2010-12-30 Thread Heikki Linnakangas

On 30.12.2010 10:55, Mark Kirkwood wrote:

Removing the offending

--delete --exclude=backup_label

options from the base backup step makes everything work properly again.


I don't see why --delete would make any difference, but you shouldn't 
exclude backup_label from the base backup. The backup label file is an 
important part of the online backup, it cannot be recovered safely 
without it.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Vacuum of newly activated 8.3.12 standby receives warnings page xxx is uninitialized --- fixing

2010-12-30 Thread Mark Kirkwood

On 31/12/10 11:01, Heikki Linnakangas wrote:

On 30.12.2010 10:55, Mark Kirkwood wrote:

Removing the offending

--delete --exclude=backup_label

options from the base backup step makes everything work properly again.


I don't see why --delete would make any difference, but you shouldn't 
exclude backup_label from the base backup. The backup label file is an 
important part of the online backup, it cannot be recovered safely 
without it.




Yes, you (and Robert) are entirely correct, I was confused in my 
understanding of the --delete --exclude=backup_label and thought it to 
mean exclude the backup label from the delete. Yeah the --delete is 
harmless, it is the exclude backup_label that is causing the problem.


Note to all current Pitrtools users, this impacts you! We need to get a 
corrected version out soon I would think.


I note that this uninitialized pages with standbys has cropped up from 
time to time - I wonder if in most/all the cases folk were using Pitrtools?


regards

Mark





Re: [HACKERS] Re: Vacuum of newly activated 8.3.12 standby receives warnings page xxx is uninitialized --- fixing

2010-12-30 Thread Mark Kirkwood

On 31/12/10 11:11, Mark Kirkwood wrote:


Yes, you (and Robert) are entirely correct, I was confused in my 
understanding of the --delete --exclude=backup_label and thought it 
to mean exclude the backup label from the delete. Yeah the --delete 
is harmless, it is the exclude backup_label that is causing the problem.


Note to all current Pitrtools users, this impacts you! We need to get 
a corrected version out soon I would think.




Also (not surprisingly) I can confirm that data corruption is possible:

1/ Perform approx 14 transactions against the primary
2/ Cancel Pgbench
3/ Issue SELECT pg_switch_xlog() on primary
4/ Bring up standby after checking it has applied last log

The resulting primary and standby should be identical, but:

primary:

bench=# SELECT count(*) FROM branches;
 count
---
   100

bench=# SELECT count(*) FROM accounts;
  count
--
 1000

standby:

bench=# SELECT count(*) FROM branches;
 count
---
   132

bench=# SELECT count(*) FROM accounts;
  count
-
 9998269

The other counts are the same. We have lost some accounts records, but 
have gained duplicates in branches:


bench=# REINDEX TABLE branches;
ERROR:  could not create unique index branches_pkey
DETAIL:  Table contains duplicated values.

regards

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Jim Nasby
On Dec 30, 2010, at 3:27 PM, Robert Haas wrote:
 synchronous_replication (boolean)
Specifies whether transaction commit will wait for WAL records
to be replicated before the command returns a success
indication to the client.
 
 The word replicated here could be taken to mean different things,
 most obviously:
 
 - slave has received the WAL
 - slave has fsync'd the WAL
 - slave has applied the WAL

I think that comment is valid for the entire set of docs, actually. The 
document goes out of its way to avoid simple phrases like replicated, but 
doesn't spell out exactly what is happening, ie:

Synchronous replication offers the ability to guarantee that all changes
made by a transaction have been transferred to at least one remote
standby server. This is an extension to the standard level of durability
offered by a transaction commit. This is referred to as semi-synchronous
replication.

Reading that, I'm left with the sense that this isn't a simple matter of Oh, 
the data has been replicated to the slave before commit returns, but nothing 
does a good job of clearly explaining what the distinction is and what it 
means. This section:

The guarantee we offer is that the application will not receive
explicit acknowledgement of the successful commit of a transaction until
the WAL data is known to be safely received by the standby. Hence this
mechanism is technically semi synchronous rather than fully
synchronous replication.

does provide some enlightenment, but it's at the end of the section. I think it 
would be best if there was a section right at the beginning that talked about 
the data quality issue of sync replication and how we're avoiding it with our 
semi-sync solution.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Simon Riggs
On Thu, 2010-12-30 at 16:27 -0500, Robert Haas wrote:

 I think it's a bad idea to use the same parameter to mean different
 things on the master and standby.  

Obviously if you phrase it like that, nobody would disagree. I would say
I have used the same parameter on both sides in a balanced way to
simplify the configuration, which had been an important factor in the
debate.

You need to set parameter X on both primary and standby seems simple
and clear. It certainly works OK for MySQL.

It's no bother to change, whichever way we decide and I'm happy to do
so.

My previous patch had two parameters:

primary: synchronous_replication = ...
standby: synchronous_replication_service = on | off

Which do people prefer?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Simon Riggs
On Thu, 2010-12-30 at 16:47 -0500, Robert Haas wrote:
 On Thu, Dec 30, 2010 at 3:42 PM, Stefan Kaltenbrunner
 ste...@kaltenbrunner.cc wrote:
  synchronous replication for high performance applications. This feature
  is unique to PostgreSQL.
 
  that seems to be a bit too much marketing for a reference level document
 
 +1.

I've removed the This feature is unique to PostgreSQL, which I agree
belongs in a press release, not docs. The explanation of a use case that
would benefit from the feature seems valid and I've left that in.

PostgreSQL docs are more technical and precise than any other DBMS, even
DB2. Having read everybody else's docs, I'm inclined to say it would be
easier to explain if I left out the details, as they do. You won't find
a detailed explanation of commit guarantees in MySQL docs, for example.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding rewrite in ALTER TABLE ALTER TYPE

2010-12-30 Thread Noah Misch
On Thu, Dec 30, 2010 at 12:57:45AM -0500, Robert Haas wrote:
 On Thu, Dec 30, 2010 at 12:24 AM, Noah Misch n...@leadboat.com wrote:
  On Wed, Dec 29, 2010 at 11:14:37PM -0500, Robert Haas wrote:
  I think for any pair of types (T1, T2) we should first determine
  whether we can skip the scan altogether. ?If yes, we're done. ?If no,
  then we should have a way of determining whether a verify-only scan is
  guaranteed to be sufficient (in your terminology, the verification
  scan is guaranteed to return either positive or error, not negative).
  If yes, then we do a verification scan. ?If no, we do a rewrite.
 
  How would we answer the second question in general?
 
 I am not sure - I guess we'd need to design some sort of mechanism for that.

Okay, here goes.  Given a Var varexpr representing the column we're changing
and an expression tree expr we need to answer two questions (argument lists
simplified -- assume the same RTEs in all cases):

always-noop: Will datumIsEquals(ExecEvalExpr(varexpr), ExecEvalExpr(expr))
return true or yield an error for all possible tuples?
never-error: Will ExecEvalExpr(expr) never throw an error?

Currently we're only interested in the second question when the first is also
true; I'm not sure if there's something fundamental there, or just an artifact
of current needs.  To support answering these questions, extend the CREATE CAST
changes from my earlier proposal, modifying the exemptor signature to return an
int, a bitmask containing one bit for each of these two questions.  Call the
function in find_typmod_coercion_function.  If its return value answers yes to
both questions, return COERCION_PATH_NONE, resulting in omission of the length
coercion node.  For other verdicts, generate the FuncExpr as normal and insert
the verdict in a new FuncExpr field funcexempt.  (That need not increase the
size of FuncExpr, if that's a concern.)

ATPrepAlterColumnType, having generated its transformation expression, will call
a new function that recursively walks the tree to answer the two questions.  The
walker will apply these rules:

1. For a Var with the varno/varattno in question, intrinsically yes to both.
2. A RelabelType node inherits the answers of its sole argument.
3. A CoerceToDomain node inherits the always-noop answer of its sole argument.
When GetDomainConstraints() == NIL, it also inherits the never-error answer.
Otherwise, never-error becomes no.
4. A FuncExpr node has answers given by the bitwise-AND of its funcexempt field
and the answers from its first argument.
5. Any other node answers no to both questions.

If the transformation expression root has yes to both questions, we're done
with no scan.  If only always-noop is true, we do a verification scan only.
Otherwise, we optimize nothing and do a rewrite.

Thoughts?

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Simon Riggs
On Thu, 2010-12-30 at 18:47 -0600, Jim Nasby wrote:
 On Dec 30, 2010, at 3:27 PM, Robert Haas wrote:
  synchronous_replication (boolean)
 Specifies whether transaction commit will wait for WAL records
 to be replicated before the command returns a success
 indication to the client.
  
  The word replicated here could be taken to mean different things,
  most obviously:
  
  - slave has received the WAL
  - slave has fsync'd the WAL
  - slave has applied the WAL
 
 I think that comment is valid for the entire set of docs, actually. The 
 document goes out of its way to avoid simple phrases like replicated, but 
 doesn't spell out exactly what is happening, ie:
 
 Synchronous replication offers the ability to guarantee that all changes
 made by a transaction have been transferred to at least one remote
 standby server. This is an extension to the standard level of durability
 offered by a transaction commit. This is referred to as semi-synchronous
 replication.
 
 Reading that, I'm left with the sense that this isn't a simple matter of Oh, 
 the data has been replicated to the slave before commit returns, but nothing 
 does a good job of clearly explaining what the distinction is and what it 
 means. This section:
 
 The guarantee we offer is that the application will not receive
 explicit acknowledgement of the successful commit of a transaction until
 the WAL data is known to be safely received by the standby. Hence this
 mechanism is technically semi synchronous rather than fully
 synchronous replication.
 
 does provide some enlightenment, but it's at the end of the section. I think 
 it would be best if there was a section right at the beginning that talked 
 about the data quality issue of sync replication and how we're avoiding it 
 with our semi-sync solution.

I'm happy to change the docs. It's the first draft...

If that's the only problem you've got, then I'm feeling good.

Any problems with the user interface itself?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Simon Riggs
On Thu, 2010-12-30 at 16:47 -0500, Robert Haas wrote:

  It also does not address the more general (not sync rep specific) problem of
  how to deal with max_keep_segments which is a wart and I was hoping we could
  get rid of in 9.1 - but it would require a real standby registration or at
  least standby management possibility on the master not a halfway done one -
  so do we really need hot_standby_feedback as part of the inital sync-rep
  patch?
 
 And this is really the key point on which previous discussions of sync
 rep stalled.  Simon is clearly of the opinion that any system where
 the slaves have an individual identities (aka standby registration)
 is a bad idea, but the only justification he's offered for that
 position is the assertion that it doesn't allow any added
 functionality.  As you point out, and as has been pointed out before,
 this is not true, but unless Simon has changed his position since the
 last time we discussed this, he will not only refuse to include any
 kind of standby identifier in any of his proposals, but will also
 argue against including any such code even if it is written by someone
 else.  I don't understand why, but that's how it is.
 
 Synchronous replication would probably be done and committed by now if
 it weren't for this issue.

I'm not very clear what your response has to do with Stefan's comments.

My general perspective is that MySQL released a simple design a year
ahead of us, which should be to our collective shame. I will be working
towards delivering something useful in this release.

Standby registration is complicated and not necessary. If anybody needs
to justify anything, it is the people that claim it is somehow
essential. If you want increased complexity and features, you can have
it, one day, but don't prevent everybody else from benefiting from
simplicity, now. What we do need is performance, otherwise the feature
is mostly unusable for production systems, without splitting your
application into pieces.

I would rather concentrate on a minimal set of functionality that we can
all agree on. To show that, I have gone out of my way to include
features specified by others, including exact names and behaviours of
parameters.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] and it's not a bunny rabbit, either

2010-12-30 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On further reflection, this can still turn into a laundry list in certain 
 cases.

 DETAIL: You can only comment on columns of tables, views, and composite types.

 seems less helpful than:

 DETAIL: Comments on relations with system-generated column names are
 not supported.

 I think that for rules, triggers, constraints, and anything that only
 works on a single relkind, we can't do much better than to list the
 specific object types.  But where there's some sort of guiding
 principle involved I think we'd do well to articulate it.

I'm unconvinced, because the guiding principle is likely to be an
implementation detail that won't actually mean much to users.  Your
example above is a case in point --- I do *not* think the average
user will see that as an improvement.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] estimating # of distinct values

2010-12-30 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 I was thinking that we could have two different ANALYZE modes, one
 full and one incremental; autovacuum could be modified to use one or
 the other depending on how many changes there are (of course, the user
 could request one or the other, too; not sure what should be the default
 behavior).

How is an incremental ANALYZE going to work at all?  It has no way to
find out the recent changes in the table, for *either* inserts or
deletes.  Unless you want to seqscan the whole table looking for tuples
with xmin later than something-or-other ... which more or less defeats
the purpose.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Old git repo

2010-12-30 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 Personally, my utility for the old repo is not much (if it was anything
 important, I wouldn't have relied on the unofficial repo). But we should
 probably give a little bit of warning for folks that might want to
 rebase or translate some old notes.

Well, I guess the question is how much warning.  I suggested O(1 week)
but Robert seems to want O(1 year).  As long as there's some agreed
deadline, I'm not very picky about what it is.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] and it's not a bunny rabbit, either

2010-12-30 Thread Robert Haas
On Thu, Dec 30, 2010 at 8:58 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On further reflection, this can still turn into a laundry list in certain 
 cases.

 DETAIL: You can only comment on columns of tables, views, and composite 
 types.

 seems less helpful than:

 DETAIL: Comments on relations with system-generated column names are
 not supported.

 I think that for rules, triggers, constraints, and anything that only
 works on a single relkind, we can't do much better than to list the
 specific object types.  But where there's some sort of guiding
 principle involved I think we'd do well to articulate it.

 I'm unconvinced, because the guiding principle is likely to be an
 implementation detail that won't actually mean much to users.  Your
 example above is a case in point --- I do *not* think the average
 user will see that as an improvement.

I think this thread has worked itself around to where it's entirely
pointless.  My original complaint was about error messages like this:

%s is not a table, view, composite type, or index

which, once we have foreign tables, needs to be changed to read:

%s is not a table, view, composite type, index, or foreign table

I think that message is the epitome of worthless, and several other
people agreed.  After various proposals of greater and lesser merit,
we've somehow worked around to the suggestion that this should be
reworded to:

ERROR: %s is a sequence
DETAIL: Only attributes of tables, views, composite types, indexes, or
foreign tables can be renamed.

While that may be a marginal improvement in clarity, it does
absolutely nothing to address my original complaint, which is that
adding a relkind forces trivial revisions of messages all over the
system, some of which are already excessively long-winded.  This
message also does nothing to help the user understand WHY we don't
allow renaming the attributes of his sequence or TOAST table, whereas
the proposed revision does.

The absolute worst offenders are messages of the form:

blah is not supported on X, Y, Z, or T.

which now have to be revised to read:

blah is not supported on X,Y, Z, T, or W.

This problem could be avoided by writing:

blah is supported on A and B

Or:

blah is supported only for relation types which quack

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] and it's not a bunny rabbit, either

2010-12-30 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I think this thread has worked itself around to where it's entirely
 pointless.

I understand your frustration, but it's not clear to me that there *is*
any simple solution to this problem.  Fundamentally, adding new relkinds
to the system is always going to require running around and looking at a
lot of code to see what's affected; and that goes for the error messages
too.  I put no stock at all in the idea that writing a guiding
principle in the error messages will avoid anything, because as often
as not, adding a fundamentally new relkind is going to involve some
tweaking of what those principles are.

 ... This message also does nothing to help the user understand WHY we don't
 allow renaming the attributes of his sequence or TOAST table, whereas
 the proposed revision does.

I remain unconvinced that the average user cares, or will be able to
extrapolate the message to understand what's supported or not, even
if he does care about the reason for the restriction.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2010-12-30 Thread Joachim Wieland
On Thu, Dec 30, 2010 at 9:40 AM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Disadvantage of b: It doesn't allow a snapshot to be installed on a
 different server. It requires a serializable open transaction to hold
 the snapshot.

 Why does it require a serializable transaction?  You could simply
 register the snapshot in any transaction.  (Of course, the net effect
 would be pretty similar to a serializable transaction).

I am not assuming that the publishing transaction blocks until its
snapshot is being picked up. A read committed transaction would get a
new snapshot for every other query, so the published snapshot is no
longer represented by an actual backend until it is being picked up by
one. Since nobody is holding off xmin/GlobalXmin, eventually vacuum
would remove tuples that the published-but-not-yet-picked-up snapshot
should still be able to see, no?

Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2010-12-30 Thread Joachim Wieland
On Thu, Dec 30, 2010 at 9:49 AM, Florian Pflug f...@phlo.org wrote:
 On Dec30, 2010, at 13:31 , Joachim Wieland wrote:
 We return snapshot information as a chunk of data to the client. At
 the same time however, we set a checksum in shared memory to protect
 against modification of the snapshot. A publishing backend can revoke
 its snapshot by deleting the checksum and a backend that is asked to
 install a snapshot can verify that the snapshot is correct and current
 by calculating the checksum and comparing it with the one in shared
 memory.

 We'd still have to stream these checksums to the standbys though,
 or would they be exempt from the checksum checks?

I am not talking about having synchronized snapshots among standby
servers at all.

I am only proposing a client API that will work for this future idea as well.


 I still wonder whether these checks are worth the complexity. I
 believe we'd only allow snapshot modifications for read-only queries
 anyway, so what point is there in preventing clients from setting
 broken snapshots?

What's the use case for it? As soon as nobody comes up with a
reasonable use case for it, let's aim for the robust version.


Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Joshua Tolley
On Thu, Dec 30, 2010 at 03:24:09PM -0500, Aidan Van Dyk wrote:
 On Thu, Dec 30, 2010 at 3:07 PM, Robert Treat r...@xzilla.net wrote:
 
  If primary crashes while commits are waiting for acknowledgement, those
  transactions will be marked fully committed if the primary database
  recovers, no matter how allow_standalone_primary is set.
 
  This seems backwards; if you are waiting for acknowledgement, wouldn't the
  normal assumption be that the transactions *didnt* make it to any standby,
  and should be rolled back ?
 
 This is the standard 2-phase commit problem.  The primary server *has*
 committed it, it's fsync has returned, and the only thing keeping it
 from returning the commit to the client is that it's waiting on a
 synchronous ack from a slave.

snip

 2) initiate fsync on the primary first
- In this case, the slave is always slightly behind.  If if your
 primary falls over, you don't give commit messages to the clients, but
 if it recovers, it might have committed data, and slaves will still be
 able to catch up.
 
 The thing is that currently, even without replication, #2 can happen.

For what little it's worth, I vote for this option, because it's a problem
that can already happen (as opposed to adding an entirely new type of problem
to the mix).

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com


signature.asc
Description: Digital signature


Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Robert Haas
On Thu, Dec 30, 2010 at 8:57 PM, Simon Riggs si...@2ndquadrant.com wrote:
 I'm not very clear what your response has to do with Stefan's comments.

 My general perspective is that MySQL released a simple design a year
 ahead of us, which should be to our collective shame. I will be working
 towards delivering something useful in this release.

I don't feel ashamed of our feature set and I am not out to beat MySQL
or anyone else, just to deliver the best product that we can.  Our
community has different interests than the MySQL community and that is
fine.  Still, I don't disagree that we should be aiming at feature
parity.

reads MySQL documentation

I see now that you've tried to design this feature in a way that is
similar to MySQL's offering, which does have some value.  But it
appears to me that the documentation you've written here is
substantially similar to the MySQL 5.5 reference documentation.  That
could get us into a world of legal trouble - that documentation is not
even open source, let alone BSD.

http://dev.mysql.com/doc/refman/5.5/en/replication-semisync.html

 I would rather concentrate on a minimal set of functionality that we can
 all agree on.

Me too; and perhaps your proposal is it.  But I think it's a shame we
didn't put more work into standby registration when we had time to get
that done.  It might not be necessary, but it would have delivered
some nice functionality that we are now not going to have for 9.1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] and it's not a bunny rabbit, either

2010-12-30 Thread Robert Haas
On Thu, Dec 30, 2010 at 9:30 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 I think this thread has worked itself around to where it's entirely
 pointless.

 I understand your frustration, but it's not clear to me that there *is*
 any simple solution to this problem.  Fundamentally, adding new relkinds
 to the system is always going to require running around and looking at a
 lot of code to see what's affected; and that goes for the error messages
 too.  I put no stock at all in the idea that writing a guiding
 principle in the error messages will avoid anything, because as often
 as not, adding a fundamentally new relkind is going to involve some
 tweaking of what those principles are.

I think that's true in some cases but not all.  The system-generated
attribute names thing actually applies in several cases, and I think
it's pretty cut-and-dried.  When you get into something like which
kinds of relations support triggers, that's a lot more arbitrary.

 ... This message also does nothing to help the user understand WHY we don't
 allow renaming the attributes of his sequence or TOAST table, whereas
 the proposed revision does.

 I remain unconvinced that the average user cares, or will be able to
 extrapolate the message to understand what's supported or not, even
 if he does care about the reason for the restriction.

I'm convinced, but that only makes one of us.  I think for now what I
had better do is try to get this SQL/MED patch finished up by
soldiering through this mess rather than trying to fix it.  I think
it's going to be kind of ugly, but we haven't got another plan then
we're just going to have to live with the ugliness.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Problems with autovacuum and vacuum

2010-12-30 Thread Robert Haas
On Thu, Dec 30, 2010 at 12:56 PM, JotaComm jota.c...@gmail.com wrote:
 Last week I had a serious problem with my PostgreSQL database. My autovacuum
 is OFF, but in September it started to prevent the transaction wraparoud;
 however last week the following message appeared continuously in my log:

 WARNING: database production must be vacuumed within 4827083 transactions
 HINT: To avoid a database shutdown, execute a full-database VACUUM in
 production.

 This message appeared for five to six hours; after that, the message
 disappeared from log. Any idea about what could have happened?

I'm thinking that autovacuum kicked into gear to prevent transaction
wraparound.  Once it did enough work to stave off disaster, the
warning messages stopped appearing in the log.

 Every day the vacuum is executed on some tables; and on Sundays it's
 executed on all tables. But as the autovacuum is running since September,
 and it runs for a long time, the vacuum was blocked because autovacuum had
 been running on the same table. How should I procede in this case?

I guess the obvious thing to do would be to turn on autovacuum and
forget about manual vacuums.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding rewrite in ALTER TABLE ALTER TYPE

2010-12-30 Thread Robert Haas
On Thu, Dec 30, 2010 at 8:35 PM, Noah Misch n...@leadboat.com wrote:
 4. A FuncExpr node has answers given by the bitwise-AND of its funcexempt 
 field
 and the answers from its first argument.

Why its first argument?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding rewrite in ALTER TABLE ALTER TYPE

2010-12-30 Thread Noah Misch
On Fri, Dec 31, 2010 at 12:34:50AM -0500, Robert Haas wrote:
 On Thu, Dec 30, 2010 at 8:35 PM, Noah Misch n...@leadboat.com wrote:
  4. A FuncExpr node has answers given by the bitwise-AND of its funcexempt 
  field
  and the answers from its first argument.
 
 Why its first argument?

funcexempt would only be nonzero for FuncExpr of length coercion casts.  Those
have the subject datum as a first argument, typmod as second, and is-explicit
boolean as third.  The other arguments are effectively already validated.

That brings up a point -- the exemptor function also needs an is-explicit
argument, as that affects the decision for some types.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Hannu Krosing

On 30.12.2010 22:27, Robert Haas wrote:

On Thu, Dec 30, 2010 at 2:04 PM, Simon Riggssi...@2ndquadrant.com  wrote:

synchronous_replication (boolean)
Specifies whether transaction commit will wait for WAL records
to be replicated before the command returns a success
indication to the client.

The word replicated here could be taken to mean different things,
most obviously:

- slave has received the WAL
- slave has fsync'd the WAL
- slave has applied the WAL
Perhaps the level of replication guarantee should be decided on the 
slave side, by

having a configuration parameter there

report_as_replicated = received|written_to_disk|fsynced|applied

for different types of hosts may have wildly different guarantees and 
performance
parameters for these. One could envision a WAL-archive type standby 
which is

there for data persistence only will and never apply WAL.

of couse we could put a bitmap in the status update messages from slave 
and have
some quorum on options on master for when the data is in sync, say 
need 5 received
or (1 applied and 1 fsynced), but I am pretty sure that trying to get 
anywhere with this
before applying the basic sync rep patch would push back sync rep to at 
least 9.2 if not 9.5


-
Hannu Krosing


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-30 Thread Hannu Krosing

On 31.12.2010 6:02, Robert Haas wrote:

On Thu, Dec 30, 2010 at 8:57 PM, Simon Riggssi...@2ndquadrant.com  wrote:

I'm not very clear what your response has to do with Stefan's comments.

My general perspective is that MySQL released a simple design a year
ahead of us, which should be to our collective shame. I will be working
towards delivering something useful in this release.

I don't feel ashamed of our feature set and I am not out to beat MySQL
or anyone else, just to deliver the best product that we can.
The key word here is deliver .  The aim is to deliver sync rep, not 
specify, leaving out
controversial details. The registration part has been left out for a 
reason - while the
registration itself is easy, deciding all the interactions with already 
running replication is
not. Doing just the minimal support for sync rep (need acknowledge from 
at least one
standby) and leaving the management of standbys to user enables us to 
get to actual

working code instead of a pie-in-the-sky wishlist.


  Our
community has different interests than the MySQL community and that is
fine.  Still, I don't disagree that we should be aiming at feature
parity.

reads MySQL documentation

I see now that you've tried to design this feature in a way that is
similar to MySQL's offering, which does have some value.  But it
appears to me that the documentation you've written here is
substantially similar to the MySQL 5.5 reference documentation.  That
could get us into a world of legal trouble - that documentation is not
even open source, let alone BSD.

http://dev.mysql.com/doc/refman/5.5/en/replication-semisync.html
Maybe we should get someone who has not read mysql docs to re-write a 
spec in a

clean room fashion, by just inspecting code and asking Simon et.al.


I would rather concentrate on a minimal set of functionality that we can
all agree on.

Me too; and perhaps your proposal is it.  But I think it's a shame we
didn't put more work into standby registration when we had time to get
that done.

When you need _just_ the registration, then make a table and two functions
pg_standby_register(name) and pg_standby_unregister(name)
For a little more added functionality add a third one 
pg_standby_last_seen(name)
to update last seen timestamp and a script that polls all standbys and 
calls this.

   It might not be necessary, but it would have delivered
some nice functionality that we are now not going to have for 9.1.
There are tons of nice functionality we are not going to have for 9.1, 
lets just not

make this cause even more nice functionality being left out !

-
Hannu Krosing


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers