Re: [HACKERS] Sync Rep v17

2011-02-19 Thread Simon Riggs
On Fri, 2011-02-18 at 20:45 -0500, Robert Haas wrote:
 On Fri, Feb 18, 2011 at 7:06 PM, Simon Riggs si...@2ndquadrant.com wrote:
  The patch is very lite touch on a few areas of code, plus a chunk of
  specific code, all on master-side. Pretty straight really. I'm sure
  problems will be found, its not long since I completed this; thanks to
  Daniel Farina for your help with patch assembly.
 
 This looks like it's in far better shape than the previous version.
 Thanks to you and Dan for working on it.
 
 As you have this coded, enabling synchronous_replication forces all
 transactions to commit synchronously.  This means that, when
 synchronous_replication=on, you get synchronous_commit=on behavior
 even if you have synchronous_commit=off.  I'd prefer to see us go the
 other way, and say that you can only get synchronous replication when
 you're also getting synchronous commit.

First, we should be clear to explain that you are referring to the fact
that the request
  synchronous_commit = off
  synchronous_replication = on
makes no sense in the way the replication system is currently designed,
even though it is a wish-list item to make it work in 9.2+

Since that combination makes no sense we need to decide how will we
react when it is requested. I think fail-safe is the best way. We should
make the default the safe way and allow performance options in
non-default paths.

Hence the above combination of options is taken in the patch to be the
same as
  synchronous_commit = on
  synchronous_replication = on

If you want performance, you can still get it with 
  synchronous_commit = off
  synchronous_replication = off
which does have meaning

So the way the patch is coded makes most sense for a safety feature. I
think it does need to be documented also.

Must fly now...

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-19 Thread Simon Riggs
On Fri, 2011-02-18 at 21:39 -0500, Bruce Momjian wrote:
 Simon Riggs wrote:
  Most parameters are set on the primary. Set
  
primary: synchronous_standby_names = 'node1, node2, node3'
  
  which means that whichever of those standbys connect first will
  become the main synchronous standby. Servers arriving later will
  be potential standbys (standby standbys doesn't sound right...).
  synchronous_standby_names can change at reload.
  
  Currently, the standby_name is the application_name parameter
  set in the primary_conninfo.
  
  When we set this for a client, or in postgresql.conf
  
primary: synchronous_replication = on
  
  then we will wait at commit until the synchronous standby has
  reached the WAL location of our commit point.
  
  If the current synchronous standby dies then one of the other standbys
  will take over. (I think it would be a great idea to make the
  list a priority order, but I haven't had time to code that).
  
  If none of the standbys are available, then we don't wait at all
  if allow_standalone_primary is set.
  allow_standalone_primary can change at reload.
 
 Are we going to allow a command to be run when these things happen, like
 archive_command does, or is that something for 9.2?

Not really sure which events you're speaking of, sorry. As I write, I
don't see a place or a reason to run a command, but that might change
with a longer explanation of what you're thinking. I wouldn't rule out
adding something like that in this release, but I'm not around for the
next week to add it.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-19 Thread Simon Riggs
On Fri, 2011-02-18 at 20:45 -0500, Robert Haas wrote:

 On the other hand, I see no particular
 harm in leaving the option in there either, though I definitely think
 the default should be changed to -1.

The default setting should be to *not* freeze up if you lose the
standby. That behaviour unexpectedly leads to an effective server down
situation, rather than 2 minutes of slow running. This is supposed to be
High Availability software so it will be bad for us if we allow that.

We've discussed this many times already. The people that wish to wait
forever have their wait-forever option, but lets not force it on
everyone.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-19 Thread Bruce Momjian
Simon Riggs wrote:
 On Fri, 2011-02-18 at 21:39 -0500, Bruce Momjian wrote:
  Simon Riggs wrote:
   Most parameters are set on the primary. Set
   
 primary: synchronous_standby_names = 'node1, node2, node3'
   
   which means that whichever of those standbys connect first will
   become the main synchronous standby. Servers arriving later will
   be potential standbys (standby standbys doesn't sound right...).
   synchronous_standby_names can change at reload.
   
   Currently, the standby_name is the application_name parameter
   set in the primary_conninfo.
   
   When we set this for a client, or in postgresql.conf
   
 primary: synchronous_replication = on
   
   then we will wait at commit until the synchronous standby has
   reached the WAL location of our commit point.
   
   If the current synchronous standby dies then one of the other standbys
   will take over. (I think it would be a great idea to make the
   list a priority order, but I haven't had time to code that).
   
   If none of the standbys are available, then we don't wait at all
   if allow_standalone_primary is set.
   allow_standalone_primary can change at reload.
  
  Are we going to allow a command to be run when these things happen, like
  archive_command does, or is that something for 9.2?
 
 Not really sure which events you're speaking of, sorry. As I write, I
 don't see a place or a reason to run a command, but that might change
 with a longer explanation of what you're thinking. I wouldn't rule out
 adding something like that in this release, but I'm not around for the
 next week to add it.

Sorry.  I was thinking of allowing a command to alert an administrator
when we switch standby machines, or if we can't do synchronous standby
any longer.  I assume we put a message in the logs, and the admin could
have a script that monitors that, but I thought a pager-like command
would be helpful, though I don't think we do that in any other case, so
maybe it is overkill.  These are all optional ideas.

Also, I like that you are using application name here.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/python invalidate functions with composite arguments

2011-02-19 Thread Peter Eisentraut
On ons, 2011-02-09 at 10:09 +0100, Jan Urbański wrote:
 On 27/01/11 22:42, Jan Urbański wrote:
  On 23/12/10 14:50, Jan Urbański wrote:
  Here's a patch implementing properly invalidating functions that have
  composite type arguments after the type changes, as mentioned in
  http://archives.postgresql.org/pgsql-hackers/2010-12/msg01991.php. It's
  an incremental patch on top of the plpython-refactor patch sent eariler.
  
  Updated to master.
 
 Again.

Committed.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup and wal streaming

2011-02-19 Thread Magnus Hagander
On Fri, Feb 18, 2011 at 20:33, Bruce Momjian br...@momjian.us wrote:
 Magnus Hagander wrote:
 Better late than never (or?), here's the final cleanup of
 pg_streamrecv for moving into the main distribution, per discussion
 back in late dec or early jan. It also includes the stream logs in
 parallel to backup part that was not completed on pg_basebackup.

 Other than that, the only changes to pg_basebackup are the moving of a
 couple of functions into streamutil.c to make them usable from both,
 and the progress format output fix Fujii-san mentioned recently.

 Should be complete except for Win32 support (needs thread/fork thing
 for the  background streaming feature. Shouldn't be too hard, and I
 guess that falls on me anyway..) and the reference documentation.

 And with no feedback to my question here
 (http://archives.postgresql.org/pgsql-hackers/2011-02/msg00805.php), I
 went with the duplicate the macros that are needed to avoid loading
 postgres.h path.

 Yes, I realize that this is far too late in the CF process really, but
 I wanted to post it anyway... If it's too late to be acceptable it
 should be possible to maintain this outside the main repository until
 9.2, since it only changes frontend binaries. So I'm not actually
 going to put it on the CF page unless someone else says that's a good
 idea, to at least share the blame from Robert ;)

 Well, if you are going to stand behind it, the CF is not a requirement
 and you can apply it.

I am, but I'm posting it here because I'd appreciate some review
before it goes in, to protect our users from my bugs :-)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Debian readline/libedit breakage

2011-02-19 Thread Martijn van Oosterhout
On Fri, Feb 18, 2011 at 02:35:42PM -0500, Bruce Momjian wrote:
  /* Get the OpenSSL structure associated with a connection. Returns NULL for
   * unencrypted connections or if any other TLS library is in use. */
  extern void *PQgetssl(PGconn *conn);
  
  We are under no compulsion to emulate OpenSSL if we switch to another
  library.  The design intent is that we'd provide a separate function
  (PQgetnss?) and callers that know how to use that library would call
  that function.  If they don't, it's not our problem.
 
 Who uses this?  ODBC?

There's a few users, that I can find anyway. psql is one. It uses this
to get information about the connection, pgadmin does it also for a
similar reasons I guess.

Adding a seperate function for each SSL library seems odd. It would
mean that psql would need the headers to every possible SSL library
because it (in theory) doesn't know which library might be used at
runtime.

ODBC uses it as well. It really uses it for communication. As far as
Google Code Search can it's the only one that does.

But if the intention is to do it by adding new functions, we can and
let the ODBC guys sort it out (ERROR: Using incompatable SSL
connection).

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Patriotism is when love of your own people comes first; nationalism,
 when hate for people other than your own comes first. 
   - Charles de Gaulle


signature.asc
Description: Digital signature


Re: [HACKERS] Debian readline/libedit breakage

2011-02-19 Thread Martijn van Oosterhout
On Sat, Feb 19, 2011 at 07:42:20PM +0100, Martijn van Oosterhout wrote:
 ODBC uses it as well. It really uses it for communication. As far as
 Google Code Search can it's the only one that does.
 
 But if the intention is to do it by adding new functions, we can and
 let the ODBC guys sort it out (ERROR: Using incompatable SSL
 connection).

Actually, it would just break. There is currently no way to distinguish
between no-SSL and not-OpenSSL, so ODBC would use PQgetssl() to
determine the SSL status, get NULL and assume there is no SSL active,
and subsequently break when trying to communicate via the sockets.

Frankly I think that this solution doesn't meet the usual high
standards of this project...

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Patriotism is when love of your own people comes first; nationalism,
 when hate for people other than your own comes first. 
   - Charles de Gaulle


signature.asc
Description: Digital signature


Re: [HACKERS] Debian readline/libedit breakage

2011-02-19 Thread Andrew Dunstan



On 02/19/2011 01:42 PM, Martijn van Oosterhout wrote:

On Fri, Feb 18, 2011 at 02:35:42PM -0500, Bruce Momjian wrote:

/* Get the OpenSSL structure associated with a connection. Returns NULL for
  * unencrypted connections or if any other TLS library is in use. */
extern void *PQgetssl(PGconn *conn);

We are under no compulsion to emulate OpenSSL if we switch to another
library.  The design intent is that we'd provide a separate function
(PQgetnss?) and callers that know how to use that library would call
that function.  If they don't, it's not our problem.

Who uses this?  ODBC?

There's a few users, that I can find anyway. psql is one. It uses this
to get information about the connection, pgadmin does it also for a
similar reasons I guess.

Adding a seperate function for each SSL library seems odd. It would
mean that psql would need the headers to every possible SSL library
because it (in theory) doesn't know which library might be used at
runtime.



I don't see why. If you plug in a libpq that was compiled against, say, 
NSS under a psql that's expecting OpenSSL you'll get a null back instead 
of a pointer to an SSL object, but then that would be a silly thing to do.




ODBC uses it as well. It really uses it for communication. As far as
Google Code Search can it's the only one that does.

But if the intention is to do it by adding new functions, we can and
let the ODBC guys sort it out (ERROR: Using incompatable SSL
connection).




Yeah. It might make sense to have a library function that tells the 
client what SSL library is being used.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Debian readline/libedit breakage

2011-02-19 Thread Martijn van Oosterhout
On Fri, Feb 18, 2011 at 10:42:20AM -0500, Andrew Dunstan wrote:
 Could we provide an abstraction layer over whatever SSL library is in  
 use with things like read/write/poll? Maybe that's what you had in mind  
 for the passthrough mode.

The suggested interface was as follows. It basically exposes the
read/write interface that libpq itself uses. Whether its enough for all
uses I don't know, but it was extensible.

From the patch:

+ /* Get data about current TLS connection */
+ extern PGresult *PQgettlsinfo(PGconn *conn);
+ 
  /* Tell libpq whether it needs to initialize OpenSSL */
  extern void PQinitSSL(int do_init);
  
+ /* Tell libpq we're taking over the connection. After this, no normal
+  * queries may be sent anymore. When finished you may close the connection */
+ typedef PostgresPollingStatusType (*pq_read_func)( PGconn *conn, void *buf, 
int *len);
+ typedef PostgresPollingStatusType (*pq_write_func)( PGconn *conn, const void 
*buf, int *len);
+ typedef int (*pq_pending_func)( PGconn *conn );
+ 
+ typedef struct {
+   int len;   /* Length of this structure, so users may determine if the
+ info they require is there. For backward compatability,
+ new members can only be added to the end. */
+   pq_read_func read;
+   pq_write_func write;
+   pq_pending_func pending;
+ 
+ /*  char *ssllibname;   Need not yet demonstrated. */
+ /*  void *sslptr; */
+ } PQpassthrough;
+ 
+ /* The pointer returned in state must be freed with PQfreemem() */
+ extern int PQsetPassthrough(PGconn *conn, PQpassthrough **state );
+ 

-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Patriotism is when love of your own people comes first; nationalism,
 when hate for people other than your own comes first. 
   - Charles de Gaulle


signature.asc
Description: Digital signature


Re: [HACKERS] Debian readline/libedit breakage

2011-02-19 Thread Peter Eisentraut
On lör, 2011-02-19 at 13:55 -0500, Andrew Dunstan wrote:
 If you plug in a libpq that was compiled against, say, 
 NSS under a psql that's expecting OpenSSL you'll get a null back
 instead of a pointer to an SSL object, but then that would be a silly
 thing to do.

Not so silly if you consider partial upgrades.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-19 Thread Josh Berkus
On 2/18/11 4:06 PM, Simon Riggs wrote:
 v17 implements what I believe to be the final set of features for sync
 rep. This one I'm actually fairly happy with. It can be enjoyed best at
 DEBUG3.

Yes, but what wine do I serve with it?  ;-)


-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-19 Thread Peter Eisentraut
On fre, 2011-02-18 at 16:57 -0300, Alvaro Herrera wrote:
 2. is md5 the most appropriate digest for this?  If you need a
 cryptographically secure hash, do we need something stronger?  If not,
 why not just use hash_any?

MD5 is probably more appropriate than hash_any, because the latter is
optimized for speed and collision avoidance and doesn't have a
guaranteed external format.  The only consideration against MD5 might be
that it would make us look quite lame.  We should probably provide
builtin SHA1 and SHA2 functions for this and other reasons.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Fix for fuzzystrmatch

2011-02-19 Thread Alexander Korotkov
Hacker,

I found two issues in fuzzystrmatch contrib.
1) Incorrect s_data shift in levenshtein calculation with threshold with
multibyte characters. i index was used instead of start_column.
2) Missing dependency of fuzzystrmatch.o on levenshtein.c

Patch is attached.

--
With best regards,
Alexander Korotkov.
*** a/contrib/fuzzystrmatch/Makefile
--- b/contrib/fuzzystrmatch/Makefile
***
*** 16,18  top_builddir = ../..
--- 16,21 
  include $(top_builddir)/src/Makefile.global
  include $(top_srcdir)/contrib/contrib-global.mk
  endif
+ 
+ fuzzystrmatch.o: fuzzystrmatch.c levenshtein.c
+ 
*** a/contrib/fuzzystrmatch/levenshtein.c
--- b/contrib/fuzzystrmatch/levenshtein.c
***
*** 377,383  levenshtein_internal(text *s, text *t,
  prev[start_column] = max_d + 1;
  curr[start_column] = max_d + 1;
  if (start_column != 0)
! 	s_data += s_char_len != NULL ? s_char_len[i - 1] : 1;
  start_column++;
  			}
  
--- 377,383 
  prev[start_column] = max_d + 1;
  curr[start_column] = max_d + 1;
  if (start_column != 0)
! 	s_data += s_char_len != NULL ? s_char_len[start_column - 1] : 1;
  start_column++;
  			}
  


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: collect frequency statistics for arrays

2011-02-19 Thread Alexander Korotkov
Thanks for feedback on my proposal.
Ok, I'll write it as an separate function. After that I'm going to look if
is there a way to union them without kluge. If I'll not find such way then
I'll propose patch with separate function.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] SSI bug?

2011-02-19 Thread Kevin Grittner
 Heikki Linnakangas  wrote:
 On 14.02.2011 20:10, Kevin Grittner wrote:
 
 Promotion of the lock granularity on the prior tuple is where we
 have problems. If the two tuple versions are in separate pages
 then the second UPDATE could miss the conflict. My first thought
 was to fix that by requiring promotion of a predicate lock on a
 tuple to jump straight to the relation level if nextVersionOfRow
 is set for the lock target and it points to a tuple in a different
 page. But that doesn't cover a situation where we have a heap
 tuple predicate lock which gets promoted to page granularity
 before the tuple is updated. To handle that we would need to say
 that an UPDATE to a tuple on a page which is predicate locked by
 the transaction would need to be promoted to relation granularity
 if the new version of the tuple wasn't on the same page as the old
 version.

 Yeah, promoting the original lock on the UPDATE was my first
 thought too.

 Another idea is to duplicate the original predicate lock on the
 first update, so that the original reader holds a lock on both row
 versions. I think that would ultimately be simpler as we wouldn't
 need the next-prior chains anymore.

 For example, suppose that transaction X is holding a predicate lock
 on tuple A. Transaction Y updates tuple A, creating a new tuple B.
 Transaction Y sees that X holds a lock on tuple A (or the page
 containing A), so it acquires a new predicate lock on tuple B on
 behalf of X.

 If the updater aborts, the lock on the new tuple needs to be
 cleaned up, so that it doesn't get confused with later tuple that's
 stored in the same physical location. We could store the xmin of
 the tuple in the predicate lock to check for that. Whenever you
 check for conflict, if the xmin of the lock doesn't match the xmin
 on the tuple, you know that the lock belonged to an old dead tuple
 stored in the same location, and can be simply removed as the tuple
 doesn't exist anymore.

 That said, the above is about eliminating false negatives from
 some corner cases which escaped notice until now. I don't think
 the changes described above will do anything to prevent the
 problems reported by YAMAMOTO Takashi.

 Agreed, it's a separate issue. Although if we change the way we
 handle the read-update-update problem, the other issue might go
 away too.

 Unless I'm missing something, it sounds like tuple IDs are being
 changed or reused while predicate locks are held on the tuples.
 That's probably not going to be overwhelmingly hard to fix if we
 can identify how that can happen. I tried to cover HOT issues, but
 it seems likely I missed something.

 Storing the xmin of the original tuple would probably help with
 that too. But it would be nice to understand and be able to
 reproduce the issue first.
 
I haven't been able to produce a test case yet, but I'm getting clear
enough about the issue that I think I can see my way to a good fix.
Even if I have a fix first, I'll continue to try to create a test to
show the pre-fix failure (and post-fix success), to include in the
regression suite.  This is the sort of thing which is hard enough to
hit that a regression could slip in because of some change and make
it out in a release unnoticed if we aren't specifically testing for
it.
 
It's pretty clear that the issue is this -- we are cleaning up the
predicate locks for a transaction when the transaction which read the
data becomes irrelevant; but a heap tuple predicate locked by a
transaction may be updated or deleted and become eligible for cleanup
before the transaction become irrelevant.  This can lead to cycles in
the target links if a tuple ID is reused before the transaction is
cleaned up.
 
Since we may want to detect the rw-conflicts with the reading
transaction based on later versions of a row after the tuple it read
has been updated and the old tuple removed and its ID reused,
Heikki's suggestion of predicate lock duplication in these cases goes
beyond being a simple way to avoid the particular symptoms reported;
it is actually necessary for correct behavior.  One adjustment I'm
looking at for it is that I think the tuple xmin can go in the lock
target structure rather than the lock structure, because if the tuple
ID has already been reused, it's pretty clear that there are no
transactions which can now update the old tuple which had that ID, so
any predicate locks attached to a target with the old xmin can be
released and the target reused with the new xmin.
 
This does, however, raise the question of what happens if the tuple
lock has been promoted to a page lock and a tuple on that page is
non-HOT updated.  (Clearly, it's not an issue if the locks on a heap
relation for a transaction have been promoted all the way to the
relation granularity, but it's equally clear we don't want to jump to
that granularity when we can avoid it.)
 
It seems to me that in addition to what Heikki suggested, we need to
create a predicate lock on the new tuple or its (new) page 

Re: [HACKERS] Sync Rep v17

2011-02-19 Thread Marti Raudsepp
On Sat, Feb 19, 2011 at 15:23, Bruce Momjian br...@momjian.us wrote:
 Sorry.  I was thinking of allowing a command to alert an administrator
 when we switch standby machines, or if we can't do synchronous standby
 any longer.  I assume we put a message in the logs, and the admin could
 have a script that monitors that, but I thought a pager-like command
 would be helpful

For asynchronous notifications we already have LISTEN/NOTIFY. Perhaps
that could be used?

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-19 Thread Joachim Wieland
On Sat, Feb 19, 2011 at 9:17 PM, Peter Eisentraut pete...@gmx.net wrote:
 The only consideration against MD5 might be
 that it would make us look quite lame.  We should probably provide
 builtin SHA1 and SHA2 functions for this and other reasons.

In this particular case however the checksum is never exposed to the
user but stays within shared memory. So nobody would notice that we
are still using lame old md5sum :-)


Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-19 Thread Bruce Momjian
Marti Raudsepp wrote:
 On Sat, Feb 19, 2011 at 15:23, Bruce Momjian br...@momjian.us wrote:
  Sorry. ?I was thinking of allowing a command to alert an administrator
  when we switch standby machines, or if we can't do synchronous standby
  any longer. ?I assume we put a message in the logs, and the admin could
  have a script that monitors that, but I thought a pager-like command
  would be helpful
 
 For asynchronous notifications we already have LISTEN/NOTIFY. Perhaps
 that could be used?

Well, those are going to require work to notify someone externally, like
send an email alert.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-19 Thread Jaime Casanova
On Sat, Feb 19, 2011 at 6:05 PM, Bruce Momjian br...@momjian.us wrote:
 Marti Raudsepp wrote:
 On Sat, Feb 19, 2011 at 15:23, Bruce Momjian br...@momjian.us wrote:
  Sorry. ?I was thinking of allowing a command to alert an administrator
  when we switch standby machines, or if we can't do synchronous standby
  any longer. ?I assume we put a message in the logs, and the admin could
  have a script that monitors that, but I thought a pager-like command
  would be helpful

 For asynchronous notifications we already have LISTEN/NOTIFY. Perhaps
 that could be used?

 Well, those are going to require work to notify someone externally, like
 send an email alert.


although, that seems the work for a monitor tool.

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UTF16 surrogate pairs in UTF8 encoding

2011-02-19 Thread Bruce Momjian
Marko Kreen wrote:
 On 9/8/10, Tom Lane t...@sss.pgh.pa.us wrote:
  Marko Kreen mark...@gmail.com writes:
Although it does seem unnecessary.
 
 
  The reason I asked for this to be spelled out is that ordinarily,
   a backslash escape \nnn is a very low-level thing that will insert
   exactly what you say.  To me it's quite unexpected that the system
   would editorialize on that to the extent of replacing two UTF16
   surrogate characters by a single code point.  That's necessary for
   correctness because our underlying storage is UTF8, but it's not
   obvious that it will happen.  (As a counterexample, if our underlying
   storage were UTF16, then very different things would need to happen
   for the exact same SQL input.)
 
   I think a lot of people will have this same question when reading
   this para, which is why I asked for an explanation there.
 
 Ok, but I still don't like the whens.  How about:
 
 -6-digit form technically makes this unnecessary.  (When surrogate
 -pairs are used when the server encoding is literalUTF8/, they
 -are first combined into a single code point that is then encoded
 -in UTF-8.)
 +6-digit form technically makes this unnecessary.  (Surrogate
 +pairs are not stored directly, but combined into a single
 +code point that is then encoded in UTF-8.)

Applied, thanks.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] FDW API: don't like the EXPLAIN mechanism

2011-02-19 Thread Tom Lane
I've been poking at the FDW stuff and file_fdw, and I find myself
dissatisfied with the way that the EXPLAIN support is designed,
namely that we have to compute a string at plan time to be displayed
by EXPLAIN.  There are a couple of problems with that:

1. The explainInfo string is useless overhead during a normal query.

2. There is no way to vary the display depending on EXPLAIN options such
as VERBOSE and COSTS OFF.

It seems fairly obvious to me that there might be scope for showing
more info in VERBOSE mode.  But even more to the point, the current
regression test's example output:

EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_csv;
 Foreign Scan on public.agg_csv
   Output: a, b
   Foreign Plan: file=@abs_srcdir@/data/agg.csv, size=46

is extremely bogus.  COSTS OFF ought to suppress highly-volatile details
like the file size.  To stick with this design, we'd have to have a
convention that explainInfo never shows any more data than would be
appropriate in COSTS OFF mode.

And then there's

3. There is no way to report actual inside-the-FDW execution stats in
EXPLAIN ANALYZE mode.

So this seems very far short of satisfactory.  I think we should drop
FdwPlan.explainInfo and instead define an additional callback function
that is called by EXPLAIN to produce the extra data for EXPLAIN to
display.  This function could have access to the EXPLAIN options as well
as (in ANALYZE mode) the final state of the execution node, so it could
tailor its output appropriately.


BTW, another thing that strikes me as poorly done in the file_fdw code
is that it gathers up all the options of the foreign table, server, and
wrapper at plan time, and stores those in the plan, and uses that
information at runtime.  What happens if the options change underneath
a prepared plan?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Update PostgreSQL shared memory usage table for 9.0?

2011-02-19 Thread Bruce Momjian
Bruce Momjian wrote:
 Can someone update the PostgreSQL shared memory usage table for 9.0?
 
   http://www.postgresql.org/docs/9.0/static/kernel-resources.html#SYSVIPC
 
 Right now it says Approximate shared memory bytes required (as of
 8.3).

This documentation still says as of 8.3. If they are unchanged, can I
just remove the version mention?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-19 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On fre, 2011-02-18 at 16:57 -0300, Alvaro Herrera wrote:
 2. is md5 the most appropriate digest for this?  If you need a
 cryptographically secure hash, do we need something stronger?  If not,
 why not just use hash_any?

 MD5 is probably more appropriate than hash_any, because the latter is
 optimized for speed and collision avoidance and doesn't have a
 guaranteed external format.  The only consideration against MD5 might be
 that it would make us look quite lame.

Only to people who don't understand whether crypto strength is actually
important in a given use-case.

However ... IIRC, hash_any gives different results on bigendian and
littleendian machines.  I'm not sure if a predictable cross-platform
result is important for this use?  If you're hashing data containing
native integers, this is a problem anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-19 Thread Alvaro Herrera
Excerpts from Tom Lane's message of sáb feb 19 21:26:42 -0300 2011:

 However ... IIRC, hash_any gives different results on bigendian and
 littleendian machines.  I'm not sure if a predictable cross-platform
 result is important for this use?  If you're hashing data containing
 native integers, this is a problem anyway.

The current feature only lets you synchronize snapshots within a cluster
-- you can't ship them to another machine.  I don't think dependency on
endianness is a problem here.  (But no, the data doesn't contain native
integers -- it's the snapshot's text representation.)

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] work_mem / maintenance_work_mem maximums

2011-02-19 Thread Bruce Momjian
Stephen Frost wrote:
-- Start of PGP signed section.
 Greetings,
 
   After watching a database import go abysmally slow on a pretty beefy
   box with tons of RAM, I got annoyed and went to hunt down why in the
   world PG wasn't using but a bit of memory.  Turns out to be a well
   known and long-standing issue:
 
   http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg101139.html
 
   Now, we could start by fixing guc.c to correctly have the max value
   for these be MaxAllocSize/1024, for starters, then at least our users
   would know when they set a higher value it's not going to be used.
   That, in my mind, is a pretty clear bug fix.  Of course, that doesn't
   help us poor data-warehousing bastards with 64G+ machines.
 
   Sooo..  I don't know much about what the limit is or why it's there,
   but based on the comments, I'm wondering if we could just move the
   limit to a more 'sane' place than the-function-we-use-to-allocate.  If
   we need a hard limit due to TOAST, let's put it there, but I'm hopeful
   we could work out a way to get rid of this limit in repalloc and that
   we can let sorts and the like (uh, index creation) use what memory the
   user has decided it should be able to.

Is this a TODO?  Can we easily fix the tuplesort.c code?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] work_mem / maintenance_work_mem maximums

2011-02-19 Thread Josh Berkus

 Is this a TODO?  Can we easily fix the tuplesort.c code?

Easily, no.  But that's not a reason for it to not be a TODO.

I, too, would like to be able to make use of 32GB of work_mem effectively.


-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-19 Thread Robert Haas
On Sat, Feb 19, 2011 at 3:28 AM, Simon Riggs si...@2ndquadrant.com wrote:
 First, we should be clear to explain that you are referring to the fact
 that the request
  synchronous_commit = off
  synchronous_replication = on
 makes no sense in the way the replication system is currently designed,
 even though it is a wish-list item to make it work in 9.2+

What exactly do you mean by make it work?  We can either (1) wait
for the local commit and the remote commit (synchronous_commit=on,
synchronous_replication=on), (2) wait for the local commit only
(synchronous_commit=on, synchronous_replication=off), or (3) wait for
neither (synchronous_commit=off, synchronous_replication=off).
There's no fourth possible behavior, AFAICS.

The question is whether synchronous_commit=off,
synchronous_replication=on should behave like (1) or (3); AFAICS
there's no fourth possible behavior.  You have it as #1; I'm arguing
it should be #3.  I realize it's an arguable point; I'm just arguing
for what makes most sense to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW API: don't like the EXPLAIN mechanism

2011-02-19 Thread Tom Lane
I wrote:
 ... I think we should drop
 FdwPlan.explainInfo and instead define an additional callback function
 that is called by EXPLAIN to produce the extra data for EXPLAIN to
 display.  This function could have access to the EXPLAIN options as well
 as (in ANALYZE mode) the final state of the execution node, so it could
 tailor its output appropriately.

My initial idea was to define the new callback function with signature

char *
ExplainForeignScan (ForeignScanState *node,
ExplainState *es);

and stick to pretty much the functionality implemented in the submitted
patch, ie, you could expect to get output looking like this:

Foreign Scan on public.agg_csv
  FDW Info: file=@abs_srcdir@/data/agg.csv, size=46

However, it occurs to me that as long as we're passing the function the
ExplainState, it has what it needs to add arbitrary EXPLAIN result
fields.  Although it could do this the hard way, we could make it a lot
easier by exporting the ExplainPropertyType functions.  Then it'd be
possible to produce output like

Foreign Scan on public.agg_csv
  Foreign File: @abs_srcdir@/data/agg.csv
  Foreign File Size: 46

which seems a whole lot nicer than the current design.  In this scheme
the callback function would just be declared to return void.

Anybody have an objection to doing it like that?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-19 Thread Robert Haas
On Sat, Feb 19, 2011 at 3:35 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, 2011-02-18 at 20:45 -0500, Robert Haas wrote:
 On the other hand, I see no particular
 harm in leaving the option in there either, though I definitely think
 the default should be changed to -1.

 The default setting should be to *not* freeze up if you lose the
 standby. That behaviour unexpectedly leads to an effective server down
 situation, rather than 2 minutes of slow running.

My understanding is that the parameter will wait on every commit, not
just once.  There's no mechanism to do anything else.  But I did some
testing this evening and actually it appears to not work at all.  I
hit walreceiver with a SIGSTOP and the commit never completes, even
after the two minute timeout.  Also, when I restarted walreceiver
after a long time, I got a server crash.

DEBUG:  write 0/3027BC8 flush 0/3014690 apply 0/3014690
DEBUG:  released 0 procs up to 0/3014690
DEBUG:  write 0/3027BC8 flush 0/3027BC8 apply 0/3014690
DEBUG:  released 2 procs up to 0/3027BC8
WARNING:  could not locate ourselves on wait queue
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: DEBUG:
shmem_exit(-1): 0 callbacks to make
DEBUG:  proc_exit(-1): 0 callbacks to make
FATAL:  could not receive data from WAL stream: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

Failed.
! LOG:  record with zero length at 0/3027BC8
DEBUG:  CommitTransaction
DEBUG:  name: unnamed; blockState:   STARTED; state: INPROGR,
xid/subid/cid: 0/1/0, nestlvl: 1, children:
DEBUG:  received replication command: IDENTIFY_SYSTEM
DEBUG:  received replication command: START_REPLICATION 0/300
LOG:  streaming replication successfully connected to primary
DEBUG:  standby standby is a potential synchronous standby
DEBUG:  write 0/0 flush 0/0 apply 0/3027BC8
DEBUG:  released 0 procs up to 0/0
DEBUG:  standby standby has now caught up with primary
DEBUG:  write 0/3027C18 flush 0/0 apply 0/3027BC8
DEBUG:  standby standby is now the synchronous replication standby
DEBUG:  released 0 procs up to 0/0
DEBUG:  write 0/3027C18 flush 0/3027C18 apply 0/3027BC8
DEBUG:  released 0 procs up to 0/3027C18
DEBUG:  write 0/3027C18 flush 0/3027C18 apply 0/3027C18
DEBUG:  released 0 procs up to 0/3027C18

(lots more copies of those last two messages)

I believe the problem is that the definition of IsOnSyncRepQueue is
bogus, so that the loop in SyncRepWaitOnQueue always takes the first
branch.

It was a little confusing to me setting this up that setting only
synchronous_replication did nothing; I had to also set
synchronous_standby_names.  We might need a cross-check there.  I
believe the docs for synchronous_replication also need some updating;
this part appears to be out of date:

+between primary and standby. The commit wait will last until the
+first reply from any standby. Multiple standby servers allow
+increased availability and possibly increase performance as well.

The words on the primary in the next sentence may not be necessary
any more either, as I believe this parameter now has no effect
anywhere else.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW API: don't like the EXPLAIN mechanism

2011-02-19 Thread Robert Haas
On Sat, Feb 19, 2011 at 11:07 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Anybody have an objection to doing it like that?

I don't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Documentation, window functions

2011-02-19 Thread Bruce Momjian
Robert Haas wrote:
 On Thu, Sep 23, 2010 at 11:34 PM, Dennis Bj?rklund d...@zigo.dhs.org wrote:
  On Wed, Sep 22, 2010 at 6:03 AM, Dennis Bj?rklund d...@zigo.dhs.org 
  wrote:
  But I confess that I'm sort of murky on how ORDER affects the window
  frame, or how to rephrase this more sensibly.
 
  The rows included in the calculation of the window function are per default
 
  RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
 
  where CURRENT ROW include all the rows that are equal to the row you are
  at according to the ordering. So if you say order by name then all the
  rows up to your name and all rows with the same name are included, not
  later rows.
 
  If you don't have any ordering, then all rows are equal and all rows are
  included in the computation. That's why your example behaved like it did.
 
  At least that's my understanding of how these things work. I've not used
  window functions very much myself.
 
  This is fairly difficult stuff and it probably don't belong in a tutorial
  but the current wording suggest that you can add any ordering and it won't
  affect the result. That is also a bad since it might teach people the
  wrong thing.
 
 Hmm... it is true that average will produce the same results on any
 ordering of the same set of input values, though.  Perhaps the word
 partition emcompass that, though then again maybe not.
 
 I'd be happy to fix this if I understand what to fix it to.

I clarified the window function ORDER BY wording to avoid mentioning
avg().  Applied patch attached.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/advanced.sgml b/doc/src/sgml/advanced.sgml
index 11859b4..218988e 100644
*** a/doc/src/sgml/advanced.sgml
--- b/doc/src/sgml/advanced.sgml
*** SELECT depname, empno, salary, avg(salar
*** 383,392 
 /para
  
 para
! Although functionavg/ will produce the same result no matter
! what order it processes the partition's rows in, this is not true of all
! window functions.  When needed, you can control that order using
! literalORDER BY/ within literalOVER/.  Here is an example:
  
  programlisting
  SELECT depname, empno, salary, rank() OVER (PARTITION BY depname ORDER BY salary DESC) FROM empsalary;
--- 383,392 
 /para
  
 para
! You can also control the order in which rows are processed by
! window functions using literalORDER BY/ within literalOVER/.
! (The window literalORDER BY/ does not even have to match the
! order in which the rows are output.)  Here is an example:
  
  programlisting
  SELECT depname, empno, salary, rank() OVER (PARTITION BY depname ORDER BY salary DESC) FROM empsalary;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: FDW API

2011-02-19 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 11.02.2011 22:50, Heikki Linnakangas wrote:
 I spent some more time reviewing this, and working on the PostgreSQL FDW
 in tandem. Here's an updated API patch, with a bunch of cosmetic
 changes, and a bug fix for FOR SHARE/UPDATE.

 Another version, rebased against master branch and with a bunch of small 
 cosmetic fixes.

I've applied this after a moderate amount of editorialization.

The question of avoiding extra relkind lookups is still open.  We talked
about adding relkind to RangeTblEntry, but I wonder whether adding a new
RTEKind would be a better idea.  Haven't researched it yet.

I have a hacked-up version of contrib/file_fdw that I've been using to
test it with.  That needs some more cleanup before committing, but I
think it should not take too long.  The one thing that is kind of
annoying is that the regression tests need generated files (to insert
absolute paths) and it seems like the PGXS infrastructure doesn't know
how to clean up the generated files afterwards.  Anybody have any
thoughts about fixing that?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-19 Thread Jaime Casanova
On Sat, Feb 19, 2011 at 10:52 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Feb 19, 2011 at 3:28 AM, Simon Riggs si...@2ndquadrant.com wrote:
 First, we should be clear to explain that you are referring to the fact
 that the request
  synchronous_commit = off
  synchronous_replication = on
 makes no sense in the way the replication system is currently designed,
 even though it is a wish-list item to make it work in 9.2+

 What exactly do you mean by make it work?  We can either (1) wait
 for the local commit and the remote commit (synchronous_commit=on,
 synchronous_replication=on), (2) wait for the local commit only
 (synchronous_commit=on, synchronous_replication=off), or (3) wait for
 neither (synchronous_commit=off, synchronous_replication=off).
 There's no fourth possible behavior, AFAICS.

 The question is whether synchronous_commit=off,
 synchronous_replication=on should behave like (1) or (3); AFAICS
 there's no fourth possible behavior.  You have it as #1; I'm arguing
 it should be #3.  I realize it's an arguable point; I'm just arguing
 for what makes most sense to me.


IMHO, we should stick to the safest option.

considering that synchronous_replication to on means that we *want*
durability, and that synchronous_commit to off means we don't *care*
about durability. Then the real question here is: in the presence of
synchronous_replication to on (which means we want durability), are we
allowed to assume we can loss data?

one way to manage that is simply disallow that combination with an
error, maybe that is a bit strict but we haven't to make assumptions;
the other option is to keep safe which means keep durability so if you
want to risk some data then you should disable synchronous_replication
as well as synchronous_commit... maybe sending a message to the log
when you detect the conflicting situation.

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers