Re: [HACKERS] issue with smlar exension and gist index creation (9.2Beta1)

2012-06-06 Thread mark


 -Original Message-
 From: Tom Lane [mailto:t...@sss.pgh.pa.us]
 Sent: Tuesday, June 05, 2012 9:22 PM
 To: mark
 Cc: 'pgsql-hackers'
 Subject: Re: [HACKERS] issue with smlar exension and gist index
 creation (9.2Beta1)
 
 mark dvlh...@gmail.com writes:
  I am playing around with 9.2Beta1 and the smlar extension that was
  presented at pgcon. Looks like a lot of great work has gone into both
  - so thanks to everyone for all the great work.
  I did run into an issue while creating a GIST index using the
 _text_sml_ops.
  I am getting ERROR: failed to re-find parent for block 25399.
 
 Try beta2 ... this sounds suspiciously like the bug Heikki fixed last
 week:
 http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d1996
 ed5e8bfaf1314e7817015668029c07d3b9b
 
   regards, tom lane


Thanks, I will give that a shot and see if I still have the same issue. 

..: Mark


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inconsistency in libpq connection parameters, and extension thereof

2012-06-06 Thread Magnus Hagander
On Wed, Jun 6, 2012 at 4:38 AM, Daniel Farina dan...@heroku.com wrote:
 On Tue, Jun 5, 2012 at 6:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Daniel Farina dan...@heroku.com writes:
 If that is the case, is there a convention we can use to separate the
 parts of the connection string (in both representations) into the
 parts sent to the server and the part that the client needs?  We
 already abuse this a little bit because URI syntax (in general, not
 just our rendition of it) leaves little room for extension for
 parameters on the client side.  Consider ?sslmode=require.

 In both representations, the net effect of a typo would be that
 instead of magically reading some properties on the client side,
 they'd be sent to the server.  How often is this going  to be so wrong
 that one cannot send a response from the server indicating to the user
 their error?  On casual inspection it doesn't seem like prohibitively
 often, but I haven't mulled over that for very long.

I think that's an excellent example of this being a bad idea. If you
mis-spell sslmode=require, that should absolutely result in an error
on the client side. Otherwise, you might end up sending your password
(or other details that are not as sensitive, but still sensitive) over
an unencrypted connection. If you wait for the error from the server,
it's too late.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ExecStoreTuple going into infinite loop

2012-06-06 Thread Atri Sharma
Hi all,

I am trying to build and store multiple tuples.The code is:

ExecClearTuple(slot);

/The code for fetching the data from which tuple will be formed../
for(;xy;x++){
tuple = 
BuildTupleFromCStrings(TupleDescGetAttInMetadata(node-ss.ss_currentRelation-rd_att),
values);
ExecStoreTuple(tuple, slot, InvalidBuffer, false);
}

return (slot);


When I am including the ExecClearTuple(slot),the result only includes
the last tuple that was built.If I do not include
ExecClearTuple(slot),the code goes into an infinite loop.

Please help.

Atri
-- 
Regards,

Atri
l'apprenant

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interrupting long external library calls

2012-06-06 Thread Sandro Santilli
FYI, I finally committed the code installing a signal handler in PostGIS,
using the pqsignal function:

  https://trac.osgeo.org/postgis/changeset/9850

It is currently only used to interrupt GEOS calls, but the idea is that
it could eventually also be used to interrupt other library calls having
a provision for such interruption request. GEOS itself only supports it
in the 3.4 branch.

In order to test it you'll need to define POSTGIS_ENABLE_GEOS_INTERRUPTIBILITY
at the top of postgis/postgis_module.c - the macro is off by default due
to concerns about possible consequences we may be unaware of.

Your comments will be very useful to reduce (or confirm) the concerns.
If we can get this confidently straight there may be the possibility to 
toggle the default to on before 2.0.1 ...

Thanks in advance.

PS: Tom, I still don't know what you mean by the SIGINFO case.

--strk;

On Mon, May 28, 2012 at 08:48:21AM +0200, Sandro Santilli wrote:
 On Fri, May 25, 2012 at 12:34:54PM -0400, Tom Lane wrote:
  Sandro Santilli s...@keybit.net writes:
   I ended up providing an explicit mechanism to request interruption of
   whatever the library is doing, and experimented (successfully so far)
   requesting the interruption from a SIGINT handler.
  
   Do you see any major drawback in doing so ?
  
  This seems a bit fragile.  It might work all right in Postgres, where
  we tend to set up signal handlers just once at process start, but ISTM
  other systems might assume they can change their signal handlers at
  any time.  The handler itself looks less than portable anyway ---
  what about the SIGINFO case?
 
 Indeed setting the handler from within the library itself was a temporary
 implementation to see how effective it would have been. The idea is to
 move the registration of the hanlder outside the library and into the
 user (PostGIS in this case). The library itself would only expose 
 GEOS_requestInterrupt/GEOS_cancelInterrupt API calls.
 
 I'm guessing PostGIS should use the more portable pqsignal functions ?
 
 What should I know about SIGINFO ?
 
  I assume that the geos::util::Interrupt::request() call sets a flag
  somewhere that's going to be periodically checked in long-running
  loops. 
 
 Yes, this is what will happen.
 
  Would it be possible for the periodic checks to include a
  provision for a callback into Postgres-specific glue code, wherein
  you could test the same flags CHECK_FOR_INTERRUPTS does?  A similar
  approach might then be usable in other contexts, and it seems safer
  to me than messing with a host environment's signal handling.
 
 Would it be enough for the signal handler (installed by PostGIS) 
 to check those flags and call the GEOS_requestInterrupt function
 when appropriate ?
 
 --strk; 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-06-06 Thread Sergey Koposov

Hi,

On Mon, 4 Jun 2012, Ants Aasma wrote:

On Mon, Jun 4, 2012 at 7:44 PM, Merlin Moncure mmonc...@gmail.com wrote:
I tried to keep it simple at first to find an answer to the question
if it's even worth trying before expending large effort on it. If
anyone with a multisocket machine would chip in, I'd love to know how
this patch handles on larger machines. I think the most interesting
workloads are read only loads with heavy buffer trashing but inside OS
memory. Select only pgbench with 32MB shared_buffers was withín error
margin, although slightly faster on my machine (Intel i2500K). The
workload that I used to demonstrate gain was an aggregated index scan
to minimise other overheads.


I've quickly tested your lockfree-getbuffer.patch patch with 
the test case you provided and I barely see any improvement (2% at max)

https://docs.google.com/open?id=0B7koR68V2nM1QVBxWGpZdW4wd0U
tested with 24 core (48 ht cores, Xeon E7- 4807).
Although the tps vs number of threads looks weird

Cheers,
S

*
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ExecStoreTuple going into infinite loop

2012-06-06 Thread Merlin Moncure
On Wed, Jun 6, 2012 at 5:36 AM, Atri Sharma atri.j...@gmail.com wrote:
 Hi all,

 I am trying to build and store multiple tuples.The code is:

 ExecClearTuple(slot);

 /The code for fetching the data from which tuple will be formed../
 for(;xy;x++){
 tuple = 
 BuildTupleFromCStrings(TupleDescGetAttInMetadata(node-ss.ss_currentRelation-rd_att),
 values);
 ExecStoreTuple(tuple, slot, InvalidBuffer, false);
 }

 return (slot);


 When I am including the ExecClearTuple(slot),the result only includes
 the last tuple that was built.If I do not include
 ExecClearTuple(slot),the code goes into an infinite loop.

 Please help.

I've answered off list.  The context here is fdw iteration. Atri was
trying to return multiple rows inside the iteration which is not how
the function is designed to work.  (also this would be more
appropriate question for -general).

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Ability to listen on two unix sockets

2012-06-06 Thread Honza Horak

Hi,

before I ask the main question, just a little background for one issue 
we're currently having in Fedora 17:


PrivateTmp is a systemd's feature, which allows to have private /tmp 
directory for services, which in turn means that such services aren't 
able to access systems's /tmp directory. It's been enabled by some 
services already, including Apache, while PostgreSQL uses system's /tmp 
directory, where its unix socket is located. Naturally, it resulted in a 
state, where Apache or other services with PrivateTmp enabled are not 
able to communicate with PostgreSQL using the socket.


Since we don't want just to move socket for compatibility reasons, I'm 
going to prepare a draft patch to allow PostgreSQL to use a second unix 
socket at a time. A question I'd like to ask now is: Do we need a new 
configuration variable for this or it's enough to have the location 
hard-coded? What are your opinions?


Regards,
Honza

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ExecStoreTuple going into infinite loop

2012-06-06 Thread Tom Lane
Atri Sharma atri.j...@gmail.com writes:
 Hi all,
 I am trying to build and store multiple tuples.The code is:

 ExecClearTuple(slot);

 /The code for fetching the data from which tuple will be formed../
 for(;xy;x++){
 tuple = 
 BuildTupleFromCStrings(TupleDescGetAttInMetadata(node-ss.ss_currentRelation-rd_att),
 values);
 ExecStoreTuple(tuple, slot, InvalidBuffer, false);
 }

 return (slot);

 When I am including the ExecClearTuple(slot),the result only includes
 the last tuple that was built.

I am not sure why you find this surprising.  A tuple slot can only hold
one tuple.

If you're trying to build a set-returning function, you need a
tuplestore not a tuple slot.  Or else restructure the code to return
each tuple as it's built.  Look at existing SRFs for examples.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ability to listen on two unix sockets

2012-06-06 Thread Florian Pflug
On Jun6, 2012, at 15:50 , Honza Horak wrote:
 before I ask the main question, just a little background for one issue we're 
 currently having in Fedora 17:
 
 PrivateTmp is a systemd's feature, which allows to have private /tmp 
 directory for services, which in turn means that such services aren't able to 
 access systems's /tmp directory. It's been enabled by some services already, 
 including Apache, while PostgreSQL uses system's /tmp directory, where its 
 unix socket is located. Naturally, it resulted in a state, where Apache or 
 other services with PrivateTmp enabled are not able to communicate with 
 PostgreSQL using the socket.

Couldn't you simply tell postgres to put it's socket in, say, /var/run, and 
create a symlink to that socket in the global /tmp directory?

 Since we don't want just to move socket for compatibility reasons, I'm going 
 to prepare a draft patch to allow PostgreSQL to use a second unix socket at a 
 time. A question I'd like to ask now is: Do we need a new configuration 
 variable for this or it's enough to have the location hard-coded? What are 
 your opinions?

If we're going to have this at all, we should go all the way and support an 
arbitrary number of sockets. But then, is there any advantage in providing this 
feature natively compare to simply creating symlinks?

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ability to listen on two unix sockets

2012-06-06 Thread Tom Lane
Florian Pflug f...@phlo.org writes:
 Couldn't you simply tell postgres to put it's socket in, say, /var/run, and 
 create a symlink to that socket in the global /tmp directory?

FYI, this proposal emerged out of a discussion between Honza and
myself.  Use a symlink was my first idea too, but on reflection
it seems like it will take less new code to support two sockets.
We already support multiple TCP sockets, so multiple Unix sockets
shouldn't be that much extra trouble.

The reasons a symlink doesn't seem attractive are:

1. The code to create/delete it has to be in the postmaster.  If we
tried to make the Fedora-specific startup script manage it, we would
first have to teach that script how to know which port number the
postmaster will select, which means parsing config files.  Ugh.

2. What if two postmasters try to create a symlink in the same place?
Or we're just trying to decide if the previous creator crashed without
removing it?  So we need a lockfile beside it.  So at this point we are
building a whole bunch of new infrastructure to create symlinks, whereas
we can probably just call the same subroutine twice if we go with the
two-socket design.

 If we're going to have this at all, we should go all the way and
 support an arbitrary number of sockets.

Well, that's what I wanted to discuss before Honza starts coding.
It's not obvious that there are any use-cases for more than two.
It's also not clear whether there is any value in supporting run-time
rather than build-time configuration of the socket locations.  The
Fedora use-case has no need of that, but if people can point to other
cases where it would be sensible, we can write the patch that way.

You might think we should design this exactly like the TCP-socket
multiple-listen-addresses case, ie just have a config variable
containing a list of directory names.  The sticking point there is
that the directories aren't really interchangeable.  In particular,
there is still going to be one directory that is the one hard-wired
into libpq.  So whereas multiple TCP sockets really are pretty much
interchangeable, I think in the Unix-socket case we are going to have
to think of it as being a primary socket and one or more alternate
sockets.  Is there a reason to have more than one alternate, and if
so what is the use-case?

(BTW, we would probably just adopt the Debian solution if we were
sure there were no non-libpq clients out there; but we aren't.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ability to listen on two unix sockets

2012-06-06 Thread Andres Freund
On Wednesday, June 06, 2012 04:38:42 PM Tom Lane wrote:
 Florian Pflug f...@phlo.org writes:

  If we're going to have this at all, we should go all the way and
  support an arbitrary number of sockets.
 
 Well, that's what I wanted to discuss before Honza starts coding.
 It's not obvious that there are any use-cases for more than two.
 It's also not clear whether there is any value in supporting run-time
 rather than build-time configuration of the socket locations.  The
 Fedora use-case has no need of that, but if people can point to other
 cases where it would be sensible, we can write the patch that way.
I had the need to make pg available from multiple chroots via unix sockets. 
The same might come up more frequently with the availability of filesystem 
namespaces...

 You might think we should design this exactly like the TCP-socket
 multiple-listen-addresses case, ie just have a config variable
 containing a list of directory names.  The sticking point there is
 that the directories aren't really interchangeable.  In particular,
 there is still going to be one directory that is the one hard-wired
 into libpq.
I wonder if the whole issue doesn't require libpq to also try multiple 
hardcoded socket locations.

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ability to listen on two unix sockets

2012-06-06 Thread Alvaro Herrera

Excerpts from Tom Lane's message of mié jun 06 10:38:42 -0400 2012:
 Florian Pflug f...@phlo.org writes:
  Couldn't you simply tell postgres to put it's socket in, say, /var/run, and 
  create a symlink to that socket in the global /tmp directory?
 
 FYI, this proposal emerged out of a discussion between Honza and
 myself.  Use a symlink was my first idea too, but on reflection
 it seems like it will take less new code to support two sockets.
 We already support multiple TCP sockets, so multiple Unix sockets
 shouldn't be that much extra trouble.
 
 The reasons a symlink doesn't seem attractive are:
 
 1. The code to create/delete it has to be in the postmaster.  If we
 tried to make the Fedora-specific startup script manage it, we would
 first have to teach that script how to know which port number the
 postmaster will select, which means parsing config files.  Ugh.

Well, you could use
postmaster -C port

The other reason seems compelling enough, though ... particularly,
handling a lockfile sounds messy; if it's a symlink and it's created by
the script, then it would need a separate lockfile, and filling its data
wouldn't be exactly trivial.

 (BTW, we would probably just adopt the Debian solution if we were
 sure there were no non-libpq clients out there; but we aren't.)

Maybe this is a good time to make the /var/run socket location (Debian's
choice) the primary one, and /tmp be the alternate.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ability to listen on two unix sockets

2012-06-06 Thread Honza Horak

On 06/06/2012 04:50 PM, Andres Freund wrote:

On Wednesday, June 06, 2012 04:38:42 PM Tom Lane wrote:

Florian Pflugf...@phlo.org  writes:



If we're going to have this at all, we should go all the way and
support an arbitrary number of sockets.


Well, that's what I wanted to discuss before Honza starts coding.
It's not obvious that there are any use-cases for more than two.
It's also not clear whether there is any value in supporting run-time
rather than build-time configuration of the socket locations.  The
Fedora use-case has no need of that, but if people can point to other
cases where it would be sensible, we can write the patch that way.

I had the need to make pg available from multiple chroots via unix sockets.
The same might come up more frequently with the availability of filesystem
namespaces...


It seems you were not alone with such need:
http://archives.postgresql.org/pgsql-novice/2006-09/msg00172.php

Honza


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ability to listen on two unix sockets

2012-06-06 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Tom Lane's message of mié jun 06 10:38:42 -0400 2012:
 (BTW, we would probably just adopt the Debian solution if we were
 sure there were no non-libpq clients out there; but we aren't.)

 Maybe this is a good time to make the /var/run socket location (Debian's
 choice) the primary one, and /tmp be the alternate.

I'm not really in favor of making /var/run be the out-of-the-box
default, because it would discriminate against personal/testing
installations (ie, you couldn't set it up without root privileges).
It's a reasonable choice for distro-provided packages, but not so
much for one-off builds --- think about the buildfarm if nothing else.

Having said that, if we made it easier to configure things that way than
by patching the source, I bet Martin Pitt isn't going to object.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ability to listen on two unix sockets

2012-06-06 Thread Tom Lane
Honza Horak hho...@redhat.com writes:
 On 06/06/2012 04:50 PM, Andres Freund wrote:
 On Wednesday, June 06, 2012 04:38:42 PM Tom Lane wrote:
 Florian Pflugf...@phlo.org  writes:
 If we're going to have this at all, we should go all the way and
 support an arbitrary number of sockets.

 Well, that's what I wanted to discuss before Honza starts coding.
 It's not obvious that there are any use-cases for more than two.
 It's also not clear whether there is any value in supporting run-time
 rather than build-time configuration of the socket locations.  The
 Fedora use-case has no need of that, but if people can point to other
 cases where it would be sensible, we can write the patch that way.

 I had the need to make pg available from multiple chroots via unix sockets.
 The same might come up more frequently with the availability of filesystem
 namespaces...

 It seems you were not alone with such need:
 http://archives.postgresql.org/pgsql-novice/2006-09/msg00172.php

I had forgotten that conversation, but it does seem like there is
interest in this type of configuration.  Can anybody confirm that
dropping a socket into a chroot or jail would actually work, ie
make it possible to connect from inside the chroot to a postmaster
running outside?  If that's real and not just wishful thinking,
it seems like enough of an argument to justify supporting N sockets.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ability to listen on two unix sockets

2012-06-06 Thread Roger Leigh
On Wed, Jun 06, 2012 at 11:32:45AM -0400, Tom Lane wrote:
 I had forgotten that conversation, but it does seem like there is
 interest in this type of configuration.  Can anybody confirm that
 dropping a socket into a chroot or jail would actually work, ie
 make it possible to connect from inside the chroot to a postmaster
 running outside?  If that's real and not just wishful thinking,
 it seems like enough of an argument to justify supporting N sockets.

We need to deal with exactly this sort of issue with schroot, where
we may want to provide programs in the chroot with access to
facilities outside the chroot.  We generally just bind mount in the
minimal set of stuff needed.  This might mean binding just the socket,
or it could be /var/run/postgresql.  We do this for the X11 socket
for our desktop configuration profile to permit X11 programs to run
in a chroot, though we currently bind mount all of /tmp rather than
just the socket, since we want that as well in any case.

(http://people.debian.org/~rleigh/schroot.pdf)

While not exactly what was proposed (multiple sockets), this allows
one to re-use a single socket without the daemon requiring any
special support for it.


Regards,
Roger

-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linuxhttp://people.debian.org/~rleigh/
 `. `'   schroot and sbuild  http://alioth.debian.org/projects/buildd-tools
   `-GPG Public Key  F33D 281D 470A B443 6756 147C 07B3 C8BC 4083 E800

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inconsistency in libpq connection parameters, and extension thereof

2012-06-06 Thread Daniel Farina
On Wed, Jun 6, 2012 at 1:09 AM, Magnus Hagander mag...@hagander.net wrote:
 On Wed, Jun 6, 2012 at 4:38 AM, Daniel Farina dan...@heroku.com wrote:
 On Tue, Jun 5, 2012 at 6:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Daniel Farina dan...@heroku.com writes:
 If that is the case, is there a convention we can use to separate the
 parts of the connection string (in both representations) into the
 parts sent to the server and the part that the client needs?  We
 already abuse this a little bit because URI syntax (in general, not
 just our rendition of it) leaves little room for extension for
 parameters on the client side.  Consider ?sslmode=require.

 In both representations, the net effect of a typo would be that
 instead of magically reading some properties on the client side,
 they'd be sent to the server.  How often is this going  to be so wrong
 that one cannot send a response from the server indicating to the user
 their error?  On casual inspection it doesn't seem like prohibitively
 often, but I haven't mulled over that for very long.

 I think that's an excellent example of this being a bad idea. If you
 mis-spell sslmode=require, that should absolutely result in an error
 on the client side. Otherwise, you might end up sending your password
 (or other details that are not as sensitive, but still sensitive) over
 an unencrypted connection. If you wait for the error from the server,
 it's too late.

That is an excellent point.  Is there enough time in the day to gripe
about how sslmode=require is not the default?

Well, this seems pretty obviated by the prefix-naming convention, but
it's an iron clad example of how the older idea was a bad one.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] page is not marked all-visible warning in regression tests

2012-06-06 Thread Andres Freund
On Tuesday, June 05, 2012 04:18:44 PM Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On Tuesday, June 05, 2012 03:32:08 PM Tom Lane wrote:
  I got this last night in a perfectly standard build of HEAD:
  + WARNING:  page is not marked all-visible but visibility map bit is set
  in relation pg_db_role_setting page 0 --
  
  I have seen that twice just yesterday. Couldn't reproduce it so far.
  Workload was (pretty exactly):
  
  initdb
  postgres -c fsync=off
  pgbench -i -s 100
  CREATE TABLE data(id serial primary key, data int);
  ALTER SEQUENCE data_id_seq INCREMENT 2;
  VACUUM FREEZE;
  normal shutdown
  postgres -c fsync=on
  pgbench -c 20 -j 20 -T 100
  WARNING: ... pg_depend ...
  WARNING: ... can't remember ...
 
 Hmm ... from memory, what I did was
 
 configure/build/install from a fresh pull
 initdb
 start postmaster, fsync off
 make installcheck
 stop postmaster
 apply Hanada-san's json patch, replace postgres executable
 start postmaster, fsync off
 make installcheck
 
 and it was the second of these runs that failed.  Could we be missing
 flushing some blocks out to disk at shutdown?  Maybe fsync off is a
 contributing factor?
On a cursory lock it might just be a race condition in 
vacuumlazy.c:lazy_scan_heap. If scan_all is set, which it has to be for the 
warning to be visible, all_visible_according_to_vm is determined before we 
loop over all blocks. At the point where one specific heap block is actually 
read and locked that knowledge might be completely outdated by any concurrent 
backend. Am I missing something?

I have to say the whole visibilitymap correctness and crash-safety seems to be 
quite under documented, especially as it seems to be somewhat intricate (to 
me). E.g. not having any note why visibilitymap_test doesn't need locking. (I 
guess the theory is that a 1 byte read will always be consistent. But how does 
that ensure other backends see an up2date value?).

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ability to listen on two unix sockets

2012-06-06 Thread Robert Haas
On Wed, Jun 6, 2012 at 10:38 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, that's what I wanted to discuss before Honza starts coding.
 It's not obvious that there are any use-cases for more than two.
 It's also not clear whether there is any value in supporting run-time
 rather than build-time configuration of the socket locations.  The
 Fedora use-case has no need of that, but if people can point to other
 cases where it would be sensible, we can write the patch that way.

 You might think we should design this exactly like the TCP-socket
 multiple-listen-addresses case, ie just have a config variable
 containing a list of directory names.  The sticking point there is
 that the directories aren't really interchangeable.  In particular,
 there is still going to be one directory that is the one hard-wired
 into libpq.  So whereas multiple TCP sockets really are pretty much
 interchangeable, I think in the Unix-socket case we are going to have
 to think of it as being a primary socket and one or more alternate
 sockets.  Is there a reason to have more than one alternate, and if
 so what is the use-case?

 (BTW, we would probably just adopt the Debian solution if we were
 sure there were no non-libpq clients out there; but we aren't.)

I recently had an urge to make it possible for the postmaster to
listen on multiple ports and even went so far as to code up a patch to
allow that.  It still applies, with offsets, so I'll attach it here.
So I guess I'm +1 on the idea of allowing N UNIX sockets rather than
limiting it to N=2, and really I'd like to do one better and allow
listening on multiple TCP ports as well.  Since the PID file contains
the port number, multiple TCP sockets stop being interchangeable as
soon as you allow multiple ports, but that's not very difficult to
handle.  Now, you might ask whether this has any real-world value, and
obviously I'm going to say yes or I wouldn't be proposing it.  The
reason for wanting multiple UNIX sockets is because those sockets
might be in different places that are not all equally accessible to
everyone, because of things like chroot.  But of course the same thing
is possible in the network space using iptables and similar tools.
For example, you might want to have users connect to application A
using port 5432, and to  application B using port 15432.  Now you can
use network monitoring tools to see how much data each application is
sending and receiving, without needing deep packet inspection.  You
can firewall those ports differently to provide access to different
groups of users.  And you can even decide, if the database gets
overloaded, to cut off access to one of those ports, so that the
application causing the problem becomes inaccessible but the rest of
the database ceases being overloaded and you can still operate.  Of
course, you could also do that by changing pg_hba.conf, but for some
people it might be more convenient (or feel more bullet-proof) to do
it using network management tools.  There are probably other use
cases, as well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


additional-sockets.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incorrect handling of the timeout in pg_receivexlog

2012-06-06 Thread Fujii Masao
On Tue, Jun 5, 2012 at 11:42 PM, Magnus Hagander mag...@hagander.net wrote:
 Works for me. We still need a (reworked) patch, though, right? We just
 move where the move between seconds and milliseconds happens?

Attached is the updated version of the patch.

 I definitely don't think we need subsecond granularity in the user
 facing number. Even a second is pretty short.

Yep.

 (We do need to retain the ability to set it to 0 = off of course).

Yep, a value of zero disables the status updates, and the patch adds
that explanation into the document of pg_basebackup and pg_receivexlog.

Regards,

-- 
Fujii Masao


timeout_handling_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] page is not marked all-visible warning in regression tests

2012-06-06 Thread Robert Haas
On Wed, Jun 6, 2012 at 1:46 PM, Andres Freund and...@2ndquadrant.com wrote:
 On a cursory lock it might just be a race condition in
 vacuumlazy.c:lazy_scan_heap. If scan_all is set, which it has to be for the
 warning to be visible, all_visible_according_to_vm is determined before we
 loop over all blocks. At the point where one specific heap block is actually
 read and locked that knowledge might be completely outdated by any concurrent
 backend. Am I missing something?

No, I think you're right.  I think that warning is bogus.  I added it
in place of some older warning which no longer made sense, but I think
this one doesn't make sense either.

 I have to say the whole visibilitymap correctness and crash-safety seems to be
 quite under documented, especially as it seems to be somewhat intricate (to
 me). E.g. not having any note why visibilitymap_test doesn't need locking. (I
 guess the theory is that a 1 byte read will always be consistent. But how does
 that ensure other backends see an up2date value?).

It's definitely intricate, and it's very possible that we should have
some more documentation.  I am not sure exactly what and where, but
feel free to suggest something.

visibilitymap_test() does have a comment saying that:

/*
 * We don't need to lock the page, as we're only looking at a
single bit.
 */

But that's a bit unsatisfying, because, as you say, it doesn't address
the question of memory-ordering issues.  I think that there's no
situation in which it causes a problem to see the visibility map bit
as unset when in reality it has just recently been set by some other
back-end.  It would be bad if someone did something like:

if (visibilitymap_test(...))
visibilitymap_clear();

...because then memory-ordering issues could cause us to accidentally
fail to clear the bit.   No one should be doing that, though; the
relevant locking and conditional logic is built directly into
visibilitymap_clear().

On the flip side, if someone sees the visibility map bit as set when
it's actually just been cleared, that could cause a problem - most
seriously, index-only scans could return wrong answers.  For that to
happen, someone would have to insert a heap tuple onto a previously
all-visible page, clearing the visibility map bit, and then insert an
index tuple; concurrently, some other backend would need to see the
index tuple but not the fact that the visibility map bit had been
cleared.  I don't think that can happen: after inserting the heap
tuple, the inserting backend would release buffer content lock, which
acts as a full memory barrier; before reading the index tuple, the
index-only-scanning backend would have to take the content lock on the
index buffer, which also acts as a full memory barrier.  So the
inserter can't do the writes out of order, and the index-only-scanner
can't do the reads out of order, so I think it's safe but we
probably do need to explain that somewhere.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog and feedback message

2012-06-06 Thread Fujii Masao
On Tue, Jun 5, 2012 at 11:44 PM, Magnus Hagander mag...@hagander.net wrote:
 On Tue, Jun 5, 2012 at 4:42 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Tue, Jun 5, 2012 at 9:53 PM, Magnus Hagander mag...@hagander.net wrote:
 Right now, pg_receivexlog sets:
                        replymsg-write = InvalidXLogRecPtr;
                        replymsg-flush = InvalidXLogRecPtr;
                        replymsg-apply = InvalidXLogRecPtr;

 when it sends it's status updates.

 I'm thinking it sohuld set replymsg-write = blockpos instad.

 Why? That way you can see in pg_stat_replication what has actually
 been received by pg_receivexlog - not just what we last sent. This can
 be useful in combination with an archive_command that can block WAL
 recycling until it has been saved to the standby. And it would be
 useful as a general monitoring thing as well.

 I think the original reason was that it shouldn't interefer with
 synchronous replication - but it does take away a fairly useful
 usecase...

 I think that not only replaymsg-write but also -flush should be set to
 blockpos in pg_receivexlog. Which allows pg_receivexlog to behave
 as synchronous standby, so we can write WAL to both local and remote
 synchronously. I believe there are some use cases for synchronous
 pg_receivexlog.

 pg_receivexlog doesn't currently fsync() after every write. It only
 fsync():s complete files. So we'd need to set -flush only at the end
 of a segment, right?

Yes.

Currently the status update is sent for each status interval. In sync
replication, transaction has to wait for a while even after pg_receivexlog
has written or flushed the WAL data.

So we should add new option which specifies whether pg_receivexlog
sends the status packet back as soon as it writes or flushes the WAL
data, like the walreceiver does?

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] creating objects in pg_catalog

2012-06-06 Thread Robert Haas
Right now, you can't directly create a relation (table, index,
composite type) in the pg_catalog schema, but you can create a
non-relation (function, domain, etc.) in the pg_catalog schema.
Furthermore, you can create a table in some other schema and then move
it into the pg_catalog schema using ALTER TABLE .. SET SCHEMA.  After
you do that, you can't move it back out again, nor can you drop it; or
at least not without setting allow_system_table_mods.

This all seems pretty wonky and inconsistent to me.  It strikes me
that we ought to either (1) allow users to place SQL objects in
pg_catalog or (2) not.  Having a weird special case that disallows it
only for relations, but then lets you do it anyway via the back door,
seems pretty pointless.

Tabula raza, I'd argue for getting tough on this, and error out on any
attempt to get a user-created SQL object into pg_catalog by any means,
unless allow_system_table_mods is set.  However, considering that we
have two extensions whose extension install scripts do this --
adminpack and sepgsql -- and one of those (adminpack) is extremely
widely used, that seems like it might be asking for trouble.  So maybe
we should just go the opposite direction and just remove the rather
toothless prohibition that currently exists.  Or, as a middle way, we
could tighten up the prohibition, but also provide a GUC other than
allow_system_table_mods that can be changed via SET LOCAL by extension
scripts that need to do this.  allow_system_table_mods requires a
server restart and is purposefully undocumented, so it's not a good
thing to rely on for prohibitions that people might need to work
around on a production system.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \conninfo and SSL

2012-06-06 Thread Robert Haas
On Sun, Jun 3, 2012 at 5:30 AM, Alastair Turner b...@ctrlf5.co.za wrote:
 A one-line change adds the SSL info on its own line like

 --
 You are connected to database scratch as user scratch on host
 127.0.0.1 at port 5432.
 SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256)
 --

 Does this need a more integrated presentation, and therefore a broader
 change to make it translatable?

+1 for doing it that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog and feedback message

2012-06-06 Thread Robert Haas
On Tue, Jun 5, 2012 at 10:44 AM, Magnus Hagander mag...@hagander.net wrote:
 On Tue, Jun 5, 2012 at 4:42 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Tue, Jun 5, 2012 at 9:53 PM, Magnus Hagander mag...@hagander.net wrote:
 Right now, pg_receivexlog sets:
                        replymsg-write = InvalidXLogRecPtr;
                        replymsg-flush = InvalidXLogRecPtr;
                        replymsg-apply = InvalidXLogRecPtr;

 when it sends it's status updates.

 I'm thinking it sohuld set replymsg-write = blockpos instad.

 Why? That way you can see in pg_stat_replication what has actually
 been received by pg_receivexlog - not just what we last sent. This can
 be useful in combination with an archive_command that can block WAL
 recycling until it has been saved to the standby. And it would be
 useful as a general monitoring thing as well.

 I think the original reason was that it shouldn't interefer with
 synchronous replication - but it does take away a fairly useful
 usecase...

 I think that not only replaymsg-write but also -flush should be set to
 blockpos in pg_receivexlog. Which allows pg_receivexlog to behave
 as synchronous standby, so we can write WAL to both local and remote
 synchronously. I believe there are some use cases for synchronous
 pg_receivexlog.

 pg_receivexlog doesn't currently fsync() after every write. It only
 fsync():s complete files. So we'd need to set -flush only at the end
 of a segment, right?

If you want to be able to use it as a synchronous standby, that's not
going to work very well.  You could end up with pg_receivexlog waiting
for the end of the segment before it flushes; meanwhile, all the
clients are sitting there waiting for the flush to happen before they
do anything that could generate more WAL to fill the segment.

Unless you have a solution to that problem, I'd recommend setting
write (which should work with the new remote_write mode for sync rep)
but not setting flush.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-06-06 Thread Ants Aasma
On Wed, Jun 6, 2012 at 2:27 PM, Sergey Koposov kopo...@ast.cam.ac.uk wrote:
 I've quickly tested your lockfree-getbuffer.patch patch with the test case
 you provided and I barely see any improvement (2% at max)
 https://docs.google.com/open?id=0B7koR68V2nM1QVBxWGpZdW4wd0U
 tested with 24 core (48 ht cores, Xeon E7- 4807).
 Although the tps vs number of threads looks weird

Was this the range scan on the test table? (sorry about the error in
the query, the x should really be id) In that case the results look
really suspicious. My machine (4 cores, no ht, @ 4GHz, newer arch)
peaked at 90tps with the stated configuration. Even when upping the
shared_buffers and enabling indexonlyscan I didn't see more than about
540tps per thread. The test is designed to exercise buffer eviction,
doing about 9800 buffer reads per transaction with 32MB of buffers.

Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] page is not marked all-visible warning in regression tests

2012-06-06 Thread Andres Freund
On Wednesday, June 06, 2012 08:19:15 PM Robert Haas wrote:
 On Wed, Jun 6, 2012 at 1:46 PM, Andres Freund and...@2ndquadrant.com 
wrote:
  On a cursory lock it might just be a race condition in
  vacuumlazy.c:lazy_scan_heap. If scan_all is set, which it has to be for
  the warning to be visible, all_visible_according_to_vm is determined
  before we loop over all blocks. At the point where one specific heap
  block is actually read and locked that knowledge might be completely
  outdated by any concurrent backend. Am I missing something?
 
 No, I think you're right.  I think that warning is bogus.  I added it
 in place of some older warning which no longer made sense, but I think
 this one doesn't make sense either.
Agreed.

It might be interesting to recheck the visibility and warn if its wrong. That 
should be infrequent enough to bearable and it does check for an actually 
dangerous case in a new code path.

  I have to say the whole visibilitymap correctness and crash-safety seems
  to be quite under documented, especially as it seems to be somewhat
  intricate (to me). E.g. not having any note why visibilitymap_test
  doesn't need locking. (I guess the theory is that a 1 byte read will
  always be consistent. But how does that ensure other backends see an
  up2date value?).
 
 It's definitely intricate, and it's very possible that we should have
 some more documentation.  I am not sure exactly what and where, but
 feel free to suggest something.
I think some addition to the LOCKING part of visibilitymap.c's header 
explaining some of what you wrote in your email might be a good start.
I would also suggest explictly mentioning that its ok to have the 
visibilitymap and the page disagreeing about the visibility if its the 
visibility map that think that the page contains invisible data but not the 
other way round (I think that can currently happen).

visibilitymap_test() should explain that its results can be outdated if youre 
not holding a buffer lock.

 visibilitymap_test() does have a comment saying that:
 
 /*
  * We don't need to lock the page, as we're only looking at a
 single bit.
  */
Oh. I conveniently skipped that comment in my brain ;)

 But that's a bit unsatisfying, because, as you say, it doesn't address
 the question of memory-ordering issues.  I think that there's no
 situation in which it causes a problem to see the visibility map bit
 as unset when in reality it has just recently been set by some other
 back-end.  It would be bad if someone did something like:
 
 if (visibilitymap_test(...))
 visibilitymap_clear();
 
 ...because then memory-ordering issues could cause us to accidentally
 fail to clear the bit.   No one should be doing that, though; the
 relevant locking and conditional logic is built directly into
 visibilitymap_clear().
Then _test should document that... I don't think its impossible that we will 
grow more uses of the visibilitymap logic.

 On the flip side, if someone sees the visibility map bit as set when
 it's actually just been cleared, that could cause a problem - most
 seriously, index-only scans could return wrong answers.  For that to
 happen, someone would have to insert a heap tuple onto a previously
 all-visible page, clearing the visibility map bit, and then insert an
 index tuple; concurrently, some other backend would need to see the
 index tuple but not the fact that the visibility map bit had been
 cleared.  I don't think that can happen: after inserting the heap
 tuple, the inserting backend would release buffer content lock, which
 acts as a full memory barrier; before reading the index tuple, the
 index-only-scanning backend would have to take the content lock on the
 index buffer, which also acts as a full memory barrier.  So the
 inserter can't do the writes out of order, and the index-only-scanner
 can't do the reads out of order, so I think it's safe but we
 probably do need to explain that somewhere.
Hm. For a short while I thought there would be an issue with heap_delete and 
IOS because the deleting transaction can commit without any barriers happening 
on the IOS side. But that only seems to be possible with non MVCC snapshots 
which are currently not allowed with index only scans.

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3: load path to mitigate load penalty for checksums

2012-06-06 Thread Robert Haas
On Mon, Jun 4, 2012 at 9:26 PM, Jeff Davis pg...@j-davis.com wrote:
 Thoughts?

Simon already proposed a way of doing this that doesn't require
explicit user action, which seems preferable to a method that does
require explicit user action, even though it's a little harder to
implement.  His idea was to store the XID of the process creating the
table in the pg_class row, which I think is *probably* better than
your idea of having a process that waits and then flips the flag.
There are some finicky details though - see previous thread for
discussion of some of the issues.

It would be very nice to have a method that detects whether or not
there is only one open snapshot in a particular backend.  Any time
that condition is met, tuples written into a table created or
truncated in the current transaction can be written with
HEAP_XMIN_COMMITTED already set.  That is not as nice as being able to
set HEAP_XMIN_COMMITTED *and* PD_ALL_VISIBLE *and* the visibility map,
but it would still be a big improvement over the status quo.  I would
like to see us get that part done and committed and then worry about
writing the tuples with PD_ALL_VISIBLE set as a separate project.  In
many cases it would also be nice to write the tuples pre-frozen, so I
think we should look for a design that will support that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Avoiding adjacent checkpoint records

2012-06-06 Thread Tom Lane
In commit 18fb9d8d21a28caddb72c7ffbdd7b96d52ff9724, Simon modified the
rule for when to skip checkpoints on the grounds that not enough
activity has happened since the last one.  However, that commit left the
comment block about it in a nonsensical state:

* If this isn't a shutdown or forced checkpoint, and we have not switched
* to the next WAL file since the start of the last checkpoint, skip the
* checkpoint.  The idea here is to avoid inserting duplicate checkpoints
* when the system is idle. That wastes log space, and more importantly it
* exposes us to possible loss of both current and previous checkpoint
* records if the machine crashes just as we're writing the update.
* (Perhaps it'd make even more sense to checkpoint only when the previous
* checkpoint record is in a different xlog page?)

The new code entirely fails to prevent writing adjacent checkpoint
records, because what it checks is the distance from the previous
checkpoint's REDO pointer, not the previous checkpoint record itself.
So the concern raised in the last two sentences of the comment isn't
being addressed at all: if we corrupt the current page of WAL while
trying to write the new checkpoint record, we risk losing the previous
checkpoint record too.  Should the system then crash, there is enough
logic to back up to the second previous checkpoint record and roll
forward from there --- but since we've lost the last checkpoint and up
to one page's worth of preceding WAL records, there is no guarantee that
we'll manage to reach a database state that is consistent with data
already flushed out to disk during the last checkpoint.

I started to make a quick patch to add an additional check on the
location of the previous checkpoint record, so that we'd skip a new
checkpoint unless we'd moved to a new page of WAL.  However, if we
really want to take this risk seriously, ISTM that allowing adjacent
checkpoint records is bad all the time, not only for non-forced
checkpoints.

What I'm now thinking is that a more appropriate way to address that
risk is to force a skip to a new page (not segment) of WAL after we
write a checkpoint record.  This won't waste much WAL space in view
of the new rule to avoid checkpoints more than once per segment on
average.

On the other hand, you could argue that this concern is entirely
hypothetical, and we're already basically assuming that once a WAL
record has been flushed to disk it's safe there even if we're still
writing more stuff into the same page.  If we don't want to assume
that, then any XLogFlush would have to include skip-to-new-page,
and that's not going to be cheap.

Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WalSndWakeup() and synchronous_commit=off

2012-06-06 Thread Andres Freund
On Tuesday, May 29, 2012 08:42:43 PM Andres Freund wrote:
 Hi,
 
 On Monday, May 28, 2012 07:11:53 PM Tom Lane wrote:
  Andres Freund and...@2ndquadrant.com writes:
   Does anybody have a better idea than to either call WalSndWakeup() at
   essentially the wrong places or calling it inside a critical section?
   
   Tom, what danger do you see from calling it in a critical section?
  
  My concern was basically that it might throw an error.  Looking at the
  current implementation of SetLatch, it seems that's not possible, but
  I wonder whether we want to lock ourselves into that assumption.
 
 The assumption is already made at several other places I think.
 XLogSetAsyncXactLSN does a SetLatch and is called from critical sections;
 several signal handlers call it without any attention to the context.
 
 Requiring it to be called outside would make its usage considerably less
 convenient and I don't really see what could change that would require to
 throw non-panic errors.
 
  Still, if the alternatives are worse, maybe that's the best answer.
  If we do that, though, let's add comments to WalSndWakeup and SetLatch
  mentioning that they mustn't throw error.
 
 Patch attached.
I would like to invite some more review (+commit...) here ;). Imo this is an 
annoying bug which should be fixed before next point release or beta/rc comes 
out...

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding adjacent checkpoint records

2012-06-06 Thread Robert Haas
On Wed, Jun 6, 2012 at 3:08 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 In commit 18fb9d8d21a28caddb72c7ffbdd7b96d52ff9724, Simon modified the
 rule for when to skip checkpoints on the grounds that not enough
 activity has happened since the last one.  However, that commit left the
 comment block about it in a nonsensical state:

    * If this isn't a shutdown or forced checkpoint, and we have not switched
    * to the next WAL file since the start of the last checkpoint, skip the
    * checkpoint.  The idea here is to avoid inserting duplicate checkpoints
    * when the system is idle. That wastes log space, and more importantly it
    * exposes us to possible loss of both current and previous checkpoint
    * records if the machine crashes just as we're writing the update.
    * (Perhaps it'd make even more sense to checkpoint only when the previous
    * checkpoint record is in a different xlog page?)

IIRC, the inspiration for the change was that we were getting a
never-ending series of checkpoints even when nothing was happening at
all:

http://archives.postgresql.org/pgsql-hackers/2011-10/msg00207.php

I felt (and still feel) that this was misguided.  I understand why
people don't want a completely idle system to checkpoint; but I can't
recall a single complaint about a checkpoint on a system low but not
zero activity.  Checkpoints are pretty cheap when there isn't much
data to flush.  The flip side is that I know of real customers who
would have suffered real data loss had this code been present in the
server version they were using.  Checkpoints are the *only* mechanism
by which SLRU pages get flushed to disk on a mostly-idle system.  That
means if something happens to your pg_xlog directory, and you haven't
had a checkpoint, you're screwed.  Letting data sit in memory for
hours, days, weeks, or months because we haven't filled up a WAL
segment is just terrible.  The first user who loses a transaction that
was committed a month ago after running pg_resetxlog is going to hit
the ceiling, and I don't blame them.

It wouldn't be so bad if we had background writing for SLRU pages,
because then you could figure that the OS would eventually have a
chance to write the page out... but we don't.  It'll just sit there in
shared memory, dirty, forever.  CLOG data in particular is FAR too
precious to take that kind of chance with.

I don't think there's much sense in doing push-ups to avoid having the
current and previous checkpoint records are on the same XLOG page.  If
the system is so nearly idle that you get two checkpoint records in
the same 8k block, and that block gets corrupted, it is extremely
likely that you can run pg_resetxlog and be OK.  If not, that means
there were more XLOG records after the corrupted page, and you're not
going to be able to replay those anyway, whether the checkpoint
records are in the same 8k block or not.  So I'm not seeing how your
proposal is buying us any additional measure of safety that we don't
already have.  Of course, if we had a way to skip over the corrupted
portion of WAL and pick up replaying records after that, that would be
very useful (even though you'd have to view the resulting database
with extreme suspicion), but without that I don't see that finding the
previous checkpoint record is doing much for us.  Either way, you've
potentially lost changes that were covered by WAL records emitted
after the most recent checkpoint.  The only thing we can really do is
make sure that there aren't likely to be too many more unreplayed
records after the last checkpoint segment, which goes back to my
previous complaint.

As a side point, another reason not to make the checkpoint record
consume the rest of the page is that, for scalability reasons, we want
to minimize the amount of calculation that has to be done while
holding WALInsertLock, and have as much of the computation as possible
get done before acquiring it.  XLogInsert() is already way more
complicated than anything anyone ought to be doing while holding a
heavily-contended LWLock.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-06-06 Thread Sergey Koposov

On Wed, 6 Jun 2012, Ants Aasma wrote:


On Wed, Jun 6, 2012 at 2:27 PM, Sergey Koposov kopo...@ast.cam.ac.uk wrote:

I've quickly tested your lockfree-getbuffer.patch patch with the test case
you provided and I barely see any improvement (2% at max)
https://docs.google.com/open?id=0B7koR68V2nM1QVBxWGpZdW4wd0U
tested with 24 core (48 ht cores, Xeon E7- 4807).
Although the tps vs number of threads looks weird


Was this the range scan on the test table? (sorry about the error in
the query, the x should really be id) In that case the results look
really suspicious.


Yes, my fault partially, because without much thought I've put value 
instead of x in the script. Now after replacing it by id the tps are 
much smaller.


Here is the tps vs nthreads I did test up to 10 threads on my 24 cpu 
system (I disabled HT though):

https://docs.google.com/open?id=0B7koR68V2nM1Nk9OcWNJOTRrYVE

Your patch clearly improve the situation (the peak tps is ~ 10% higher), 
but the general picture is the same: flattening of tps vs nthreads.


Cheers,
S

*
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \conninfo and SSL

2012-06-06 Thread Alvaro Herrera

Excerpts from Robert Haas's message of mié jun 06 14:45:46 -0400 2012:
 On Sun, Jun 3, 2012 at 5:30 AM, Alastair Turner b...@ctrlf5.co.za wrote:
  A one-line change adds the SSL info on its own line like
 
  --
  You are connected to database scratch as user scratch on host
  127.0.0.1 at port 5432.
  SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256)
  --
 
  Does this need a more integrated presentation, and therefore a broader
  change to make it translatable?
 
 +1 for doing it that way.

Yeah, printSSLInfo already outputs translated stuff so this should be
OK.  Merging both messages into a single translatable unit would be
pretty cumbersome, for no practical gain.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog and feedback message

2012-06-06 Thread Magnus Hagander
On Wed, Jun 6, 2012 at 8:26 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Tue, Jun 5, 2012 at 11:44 PM, Magnus Hagander mag...@hagander.net wrote:
 On Tue, Jun 5, 2012 at 4:42 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Tue, Jun 5, 2012 at 9:53 PM, Magnus Hagander mag...@hagander.net wrote:
 Right now, pg_receivexlog sets:
                        replymsg-write = InvalidXLogRecPtr;
                        replymsg-flush = InvalidXLogRecPtr;
                        replymsg-apply = InvalidXLogRecPtr;

 when it sends it's status updates.

 I'm thinking it sohuld set replymsg-write = blockpos instad.

 Why? That way you can see in pg_stat_replication what has actually
 been received by pg_receivexlog - not just what we last sent. This can
 be useful in combination with an archive_command that can block WAL
 recycling until it has been saved to the standby. And it would be
 useful as a general monitoring thing as well.

 I think the original reason was that it shouldn't interefer with
 synchronous replication - but it does take away a fairly useful
 usecase...

 I think that not only replaymsg-write but also -flush should be set to
 blockpos in pg_receivexlog. Which allows pg_receivexlog to behave
 as synchronous standby, so we can write WAL to both local and remote
 synchronously. I believe there are some use cases for synchronous
 pg_receivexlog.

 pg_receivexlog doesn't currently fsync() after every write. It only
 fsync():s complete files. So we'd need to set -flush only at the end
 of a segment, right?

 Yes.

 Currently the status update is sent for each status interval. In sync
 replication, transaction has to wait for a while even after pg_receivexlog
 has written or flushed the WAL data.

 So we should add new option which specifies whether pg_receivexlog
 sends the status packet back as soon as it writes or flushes the WAL
 data, like the walreceiver does?

That might be useful, but I think that's 9.3 material at this point.

But I think we can get the set the write location in as a bugfix.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inconsistency in libpq connection parameters, and extension thereof

2012-06-06 Thread Magnus Hagander
On Wed, Jun 6, 2012 at 6:58 PM, Daniel Farina dan...@heroku.com wrote:
 On Wed, Jun 6, 2012 at 1:09 AM, Magnus Hagander mag...@hagander.net wrote:
 On Wed, Jun 6, 2012 at 4:38 AM, Daniel Farina dan...@heroku.com wrote:
 On Tue, Jun 5, 2012 at 6:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Daniel Farina dan...@heroku.com writes:
 If that is the case, is there a convention we can use to separate the
 parts of the connection string (in both representations) into the
 parts sent to the server and the part that the client needs?  We
 already abuse this a little bit because URI syntax (in general, not
 just our rendition of it) leaves little room for extension for
 parameters on the client side.  Consider ?sslmode=require.

 In both representations, the net effect of a typo would be that
 instead of magically reading some properties on the client side,
 they'd be sent to the server.  How often is this going  to be so wrong
 that one cannot send a response from the server indicating to the user
 their error?  On casual inspection it doesn't seem like prohibitively
 often, but I haven't mulled over that for very long.

 I think that's an excellent example of this being a bad idea. If you
 mis-spell sslmode=require, that should absolutely result in an error
 on the client side. Otherwise, you might end up sending your password
 (or other details that are not as sensitive, but still sensitive) over
 an unencrypted connection. If you wait for the error from the server,
 it's too late.

 That is an excellent point.  Is there enough time in the day to gripe
 about how sslmode=require is not the default?

Well, you'll get me first in line to back that the current default is stupid.

But I'm not sure sslmode=require is a proper default either. Because
then the connection will fail completely to the vast majority of
servers, which simply don't have SSL support.


 Well, this seems pretty obviated by the prefix-naming convention, but
 it's an iron clad example of how the older idea was a bad one.

Yeah, a prefix based solution would fix this, since we can keep throwing errors.

However, not throwing errors on the URL syntax should be considered a
bug, I think.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Btree or not btree? That is the question

2012-06-06 Thread Greg Sabino Mullane
On Mon, Jun 04, 2012 at 02:09:44PM -0400, Tom Lane wrote:
 Greg Sabino Mullane g...@endpoint.com writes:
  We have a 8.3.18 system (yes, the same one from the previous 
  thread, finally upgraded!) that gave us this error yesterday:
 
  ERROR:  index pg_class_oid_index is not a btree
 
 That means you got bogus data while reading the metapage.
 I'm beginning to wonder about the hardware on this server ...

Thanks for the reply. Me too. This is on a cluster[1], so we 
do have the luxury of testing the hardware on each box and 
shuffling things around. I'm also thinking we need to get away 
from the near-constant updating of pg_class, just as a good 
measure.

[1] Bruce, yet another usage! :)

-- 
Greg Sabino Mullane g...@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8


pgpbXwlXRWU0v.pgp
Description: PGP signature


Re: [HACKERS] Inconsistency in libpq connection parameters, and extension thereof

2012-06-06 Thread Robert Haas
On Wed, Jun 6, 2012 at 4:08 PM, Magnus Hagander mag...@hagander.net wrote:
 However, not throwing errors on the URL syntax should be considered a
 bug, I think.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-06-06 Thread Merlin Moncure
On Wed, Jun 6, 2012 at 2:53 PM, Sergey Koposov kopo...@ast.cam.ac.uk wrote:
 On Wed, 6 Jun 2012, Ants Aasma wrote:

 On Wed, Jun 6, 2012 at 2:27 PM, Sergey Koposov kopo...@ast.cam.ac.uk
 wrote:

 I've quickly tested your lockfree-getbuffer.patch patch with the test
 case
 you provided and I barely see any improvement (2% at max)
 https://docs.google.com/open?id=0B7koR68V2nM1QVBxWGpZdW4wd0U
 tested with 24 core (48 ht cores, Xeon E7- 4807).
 Although the tps vs number of threads looks weird


 Was this the range scan on the test table? (sorry about the error in
 the query, the x should really be id) In that case the results look
 really suspicious.


 Yes, my fault partially, because without much thought I've put value
 instead of x in the script. Now after replacing it by id the tps are
 much smaller.

 Here is the tps vs nthreads I did test up to 10 threads on my 24 cpu system
 (I disabled HT though):
 https://docs.google.com/open?id=0B7koR68V2nM1Nk9OcWNJOTRrYVE

 Your patch clearly improve the situation (the peak tps is ~ 10% higher), but
 the general picture is the same: flattening of tps vs nthreads.

I think this is the expected result.  In the single user case the
spinklock never spins and only has to make the cpu-locking cache
instructions once.  can we see results @24 threads?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding adjacent checkpoint records

2012-06-06 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Jun 6, 2012 at 3:08 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 In commit 18fb9d8d21a28caddb72c7ffbdd7b96d52ff9724, Simon modified the
 rule for when to skip checkpoints on the grounds that not enough
 activity has happened since the last one.

 IIRC, the inspiration for the change was that we were getting a
 never-ending series of checkpoints even when nothing was happening at
 all:
 http://archives.postgresql.org/pgsql-hackers/2011-10/msg00207.php

Right.

 I felt (and still feel) that this was misguided.

Looking at it again, I'm inclined to agree.  The behavior was entirely
correct up until somebody decided to emit a continuing stream of
XLOG_RUNNING_XACTS WAL records even when the system is idle.  Why did
we not fix it by fixing that?

 I don't think there's much sense in doing push-ups to avoid having the
 current and previous checkpoint records are on the same XLOG page.

Perhaps not.  I only got exercised about it after noting that the commit
hadn't updated the comment about it to match what the code is doing.
If we end up reverting that commit and doing something else to fix the
useless-checkpoint problem, I'm happy to let the subject rest, at least
until we get some evidence of a real problem in the area.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding adjacent checkpoint records

2012-06-06 Thread Robert Haas
On Wed, Jun 6, 2012 at 4:24 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I felt (and still feel) that this was misguided.

 Looking at it again, I'm inclined to agree.  The behavior was entirely
 correct up until somebody decided to emit a continuing stream of
 XLOG_RUNNING_XACTS WAL records even when the system is idle.  Why did
 we not fix it by fixing that?

That's exactly what I think we should have done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] creating objects in pg_catalog

2012-06-06 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Right now, you can't directly create a relation (table, index,
 composite type) in the pg_catalog schema, but you can create a
 non-relation (function, domain, etc.) in the pg_catalog schema.

Surely this is true only for superusers.  Superusers can do whatever
they want anyway, no?

 Tabula raza, I'd argue for getting tough on this, and error out on any
 attempt to get a user-created SQL object into pg_catalog by any means,
 unless allow_system_table_mods is set.

allow_system_table_mods is mainly intended to prevent people from
altering the schemas of system catalogs, since it's more than likely
that the backend C code will fail (nastily) to cope with such changes.
I don't think it follows that we should prevent superusers from doing
things that are perfectly safe, like adding new functions in pg_catalog.

It's very likely that the specific restrictions enforced by
allow_system_table_mods could stand a fresh look, but I don't agree
with the idea of radically changing its charter.  Nor do I think we
need to put training wheels on superusers for any changes that aren't
demonstrably likely to result in unrecoverable database corruption.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-06-06 Thread Sergey Koposov

On Wed, 6 Jun 2012, Merlin Moncure wrote:

I think this is the expected result.  In the single user case the
spinklock never spins and only has to make the cpu-locking cache
instructions once.  can we see results @24 threads?


Here 
https://docs.google.com/open?id=0B7koR68V2nM1NDJHLUhNSS0zbUk


*
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding adjacent checkpoint records

2012-06-06 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Jun 6, 2012 at 4:24 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I felt (and still feel) that this was misguided.

 Looking at it again, I'm inclined to agree.  The behavior was entirely
 correct up until somebody decided to emit a continuing stream of
 XLOG_RUNNING_XACTS WAL records even when the system is idle.  Why did
 we not fix it by fixing that?

 That's exactly what I think we should have done.

Actually, it looks like there is an extremely simple way to handle this,
which is to move the call of LogStandbySnapshot (which generates the WAL
record in question) to before the checkpoint's REDO pointer is set, but
after we have decided that we need a checkpoint.  That will result in
later iterations not thinking that some work had happened while the
checkpoint is in progress.  It looks like we would need an extra
release/reacquire of WALInsertLock to avoid holding that lock while
doing LogStandbySnapshot, but that seems relatively negligible in
comparison to the total cost of a checkpoint.

There might be some still-better way to manage all this, but this one
seems safe enough to consider as a post-beta patch.  So I recommend
we revert the change in the when-to-skip-checkpoint test in favor of
reordering these operations.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] creating objects in pg_catalog

2012-06-06 Thread Robert Haas
On Wed, Jun 6, 2012 at 4:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Right now, you can't directly create a relation (table, index,
 composite type) in the pg_catalog schema, but you can create a
 non-relation (function, domain, etc.) in the pg_catalog schema.

 Surely this is true only for superusers.  Superusers can do whatever
 they want anyway, no?

No.

rhaas=# create table pg_catalog.tom (a int);
ERROR:  permission denied to create pg_catalog.tom
DETAIL:  System catalog modifications are currently disallowed.
rhaas=# create table tom (a int);
CREATE TABLE
rhaas=# alter table tom set schema pg_catalog;
ALTER TABLE
rhaas=# create domain pg_catalog.lane as int;
CREATE DOMAIN

The offending error check is in heap_create(), and based on what
you're saying here it seems like we should just rip it out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] creating objects in pg_catalog

2012-06-06 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 rhaas=# create table pg_catalog.tom (a int);
 ERROR:  permission denied to create pg_catalog.tom

 The offending error check is in heap_create(), and based on what
 you're saying here it seems like we should just rip it out.

Hmm.  Yeah, it seems like the regular permissions tests on the schemas
in question should be enough to keep Joe User from making tables there,
and I do not see a reason why the backend would care if there are
non-catalog tables laying about in pg_catalog.

Checking the commit history, it seems this was originally a test to
prevent people from creating tables named pg_xxx:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=f5a10e722c052006886b678995695001958a#patch3

which may or may not have been critical once upon a time, but surely is
not any more.

So no objection to removing that particular test.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inconsistency in libpq connection parameters, and extension thereof

2012-06-06 Thread Daniel Farina
On Wed, Jun 6, 2012 at 1:13 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jun 6, 2012 at 4:08 PM, Magnus Hagander mag...@hagander.net wrote:
 However, not throwing errors on the URL syntax should be considered a
 bug, I think.

 +1.

+1

Here's a patch that just makes the thing an error.  Of course we could
revert it if it makes the URI feature otherwise unusable...but I don't
see a huge and terrible blocker ATM.  A major question mark for me any
extra stuff in JDBC URLs.

Nevertheless, here it is.

-- 
fdr


libpq-error-on-unknown-parameters-in-URIs-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog and feedback message

2012-06-06 Thread Fujii Masao
On Thu, Jun 7, 2012 at 5:05 AM, Magnus Hagander mag...@hagander.net wrote:
 On Wed, Jun 6, 2012 at 8:26 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Tue, Jun 5, 2012 at 11:44 PM, Magnus Hagander mag...@hagander.net wrote:
 On Tue, Jun 5, 2012 at 4:42 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Tue, Jun 5, 2012 at 9:53 PM, Magnus Hagander mag...@hagander.net 
 wrote:
 Right now, pg_receivexlog sets:
                        replymsg-write = InvalidXLogRecPtr;
                        replymsg-flush = InvalidXLogRecPtr;
                        replymsg-apply = InvalidXLogRecPtr;

 when it sends it's status updates.

 I'm thinking it sohuld set replymsg-write = blockpos instad.

 Why? That way you can see in pg_stat_replication what has actually
 been received by pg_receivexlog - not just what we last sent. This can
 be useful in combination with an archive_command that can block WAL
 recycling until it has been saved to the standby. And it would be
 useful as a general monitoring thing as well.

 I think the original reason was that it shouldn't interefer with
 synchronous replication - but it does take away a fairly useful
 usecase...

 I think that not only replaymsg-write but also -flush should be set to
 blockpos in pg_receivexlog. Which allows pg_receivexlog to behave
 as synchronous standby, so we can write WAL to both local and remote
 synchronously. I believe there are some use cases for synchronous
 pg_receivexlog.

 pg_receivexlog doesn't currently fsync() after every write. It only
 fsync():s complete files. So we'd need to set -flush only at the end
 of a segment, right?

 Yes.

 Currently the status update is sent for each status interval. In sync
 replication, transaction has to wait for a while even after pg_receivexlog
 has written or flushed the WAL data.

 So we should add new option which specifies whether pg_receivexlog
 sends the status packet back as soon as it writes or flushes the WAL
 data, like the walreceiver does?

 That might be useful, but I think that's 9.3 material at this point.

Fair enough. That's new feature rather than a bugfix.

 But I think we can get the set the write location in as a bugfix.

Also set the flush location? Sending the flush location back seems
helpful when using pg_receivexlog for WAL archiving purpose. By
seeing the flush location we can ensure that WAL file has been archived
durably (IOW, WAL file has been flushed in remote archive area).

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding adjacent checkpoint records

2012-06-06 Thread Tom Lane
I wrote:
 Actually, it looks like there is an extremely simple way to handle this,
 which is to move the call of LogStandbySnapshot (which generates the WAL
 record in question) to before the checkpoint's REDO pointer is set, but
 after we have decided that we need a checkpoint.

On further contemplation, there is a downside to that idea, which
probably explains why the code was written as it was: if we place the
XLOG_RUNNING_XACTS WAL record emitted during a checkpoint before rather
than after the checkpoint's REDO point, then a hot standby slave
starting up from that checkpoint won't process the XLOG_RUNNING_XACTS
record.  That means its KnownAssignedXids machinery won't be fully
operational until the master starts another checkpoint, which might be
awhile.  So this could result in undesirable delay in hot standby mode
becoming active.

I am not sure how significant this really is though.  Comments?

If we don't like that, I can think of a couple of other ways to get there,
but they have their own downsides:

* Instead of trying to detect after-the-fact whether any concurrent
WAL activity happened during the last checkpoint, we could detect it
during the checkpoint and then keep the info in a static variable in
the checkpointer process until next time.  However, I don't see any
bulletproof way to do this without adding at least one or two lines
of code within XLogInsert, which I'm sure Robert will complain about.

* We could expand checkpoint records to contain two different REDO
pointers, one to be used by hot standby slaves and one for normal
crash recovery.  (The LogStandbySnapshot records would appear between
these two points; we'd still be moving them up to the start of the
checkpoint sequence.)  This is a relatively clean solution but would
force pg_upgrade between beta2 and beta3, so that's not so nice.

* Combining the two ideas, we could take the nominal REDO pointer,
run LogStandbySnapshot, make a fresh note of where the insert point
is (real REDO point, which is what we publish in shared memory for
the bufmgr to compare LSNs to), complete the checkpoint, and write
the checkpoint record using the nominal REDO pointer so that that's
where any crash or HS slave starts from.  But save the real REDO
pointer in checkpointer static state, and in the next checkpoint use
that rather than the nominal pointer to decide if anything's happened
that would force a new checkpoint.  I think this dodges both of the
above complaints, but it feels pretty baroque.

Thoughts, other ideas?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3: load path to mitigate load penalty for checksums

2012-06-06 Thread Noah Misch
On Wed, Jun 06, 2012 at 03:08:05PM -0400, Robert Haas wrote:
 On Mon, Jun 4, 2012 at 9:26 PM, Jeff Davis pg...@j-davis.com wrote:
  Thoughts?
 
 Simon already proposed a way of doing this that doesn't require
 explicit user action, which seems preferable to a method that does
 require explicit user action, even though it's a little harder to
 implement.  His idea was to store the XID of the process creating the
 table in the pg_class row, which I think is *probably* better than
 your idea of having a process that waits and then flips the flag.
 There are some finicky details though - see previous thread for
 discussion of some of the issues.

I think both improvements have a place.  Loads to a new heap should take as
many automatic shortcuts as practical.  Loads to an existing heap also matter,
and Jeff's proposal helps those.

 It would be very nice to have a method that detects whether or not
 there is only one open snapshot in a particular backend.  Any time
 that condition is met, tuples written into a table created or
 truncated in the current transaction can be written with
 HEAP_XMIN_COMMITTED already set.

Perhaps criteria like those in CheckTableNotInUse() would suffice?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3: load path to mitigate load penalty for checksums

2012-06-06 Thread Jeff Davis
On Wed, 2012-06-06 at 15:08 -0400, Robert Haas wrote:
 On Mon, Jun 4, 2012 at 9:26 PM, Jeff Davis pg...@j-davis.com wrote:
  Thoughts?
 
 Simon already proposed a way of doing this that doesn't require
 explicit user action, which seems preferable to a method that does
 require explicit user action, even though it's a little harder to
 implement.  His idea was to store the XID of the process creating the
 table in the pg_class row, which I think is *probably* better than
 your idea of having a process that waits and then flips the flag.
 There are some finicky details though - see previous thread for
 discussion of some of the issues.

My goals include:

* The ability to load into existing tables with existing data
* The ability to load concurrently

My understanding was that the proposal to which you're referring can't
do those things, which seem like major limitations. Did I miss
something?

 In
 many cases it would also be nice to write the tuples pre-frozen, so I
 think we should look for a design that will support that.

You're right, that would be nice.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time for pgindent run?

2012-06-06 Thread Robert Haas
On Tue, Jun 5, 2012 at 10:25 AM, Bruce Momjian br...@momjian.us wrote:
 On Tue, Jun 05, 2012 at 10:21:14AM -0400, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  Is everyone ready for me to run pgindent?  We are nearing the first
  commit-fest (June 15) and will have to branch the git tree soon.

 Also, we should do the pgindent run well before the commitfest, so that
 authors of pending patches have time to rebase their patches in case
 pgindent changes the code they are patching ...

 Ah, good point.  That will affect commit-fest patches.  We could run it
 only on the 9.3 branch, but that makes double-patching very hard.

 Is everyone good for a pgindent run this week?

The sooner the better.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3: load path to mitigate load penalty for checksums

2012-06-06 Thread Noah Misch
On Mon, Jun 04, 2012 at 06:26:04PM -0700, Jeff Davis wrote:
 I propose a special user-initiated loading mode at the table
 granularity. During this time, readers must ignore PD_ALL_VISIBLE,
 HEAP_XMIN_COMMITTED, and the visibility map entirely. However, writers
 may set all of those bits before the writing transaction commits,
 obviating the need to rewrite (and WAL) the data again later. Ideally,
 there would be no work for VACUUM to do after the data load (unless a
 transaction aborted).

Note that, currently, only VACUUM sets PD_ALL_VISIBLE and visibility map bits.
Would you make something else like heap_multi_insert() be able to do so?

 Goals:
 =
 
   * Table granularity (doesn't affect other tables at all)
   * Allows concurrent loaders
   * Allows loading into existing tables with existing data
   * Online (allow reads to proceed, even if degraded)

+1

 Obviously, readers and writers would need a mechanism to honor those
 flags, but I haven't dug into the details yet (additional routines in
 tqual.c?).

Avoiding measurable overhead in tuple visibility checks when the feature is
inactive may well prove to be a key implementation challenge.

 FINALIZE LOAD would first move from state 2 to state 3 by acquiring a
 ShareUpdateExclusiveLock on the table setting optimistichints = false.
 
 Then, it would move from state 3 to state 0 by first waiting for all
 transactions that currently hold a lock on the table, to ensure they see
 the optimistichints=false flag.

This is certainly necessary, but ...

 Then, it would remember the current xid
 as max_loader_xid, and wait until the global xmin is greater than
 max_loader_xid. This should ensure that all snapshots regard all loading
 transactions as complete.

... this might not be.  Each backend could decide, based on its own xmin,
whether to ignore PD_ALL_VISIBLE in a given table.  In other words, your
ignorehints flag could be an xmin set to InvalidTransactionId during stages 1
and 2 and to the earliest safe xmin during stages 0 and 3.

  * INITIATE and FINALIZE probably need to use PreventTransactionChain()
 and multiple transactions, to avoid holding the ShareUpdateExclusiveLock
 for too long. Also, we want to keep people from using it in the same
 transaction as the loading xact, because they might not realize that
 they would get a concurrency of 1 that way (because of the
 ShareUpdateExclusiveLock).

Yes.  You need to commit the transaction modifying pg_class so other backends
can observe the change, at which point you can gather the list to wait on.

Consider splitting the INITIATE UI into two interfaces, one that transitions
from state 0 to state 1 and another that expects state 1 and blocks until we
reach state 2.  You then have no need for PreventTransactionChain(), and the
interfaces could even be normal functions.  It's less clear how reasonably you
could do this for the FINALIZE step, given its implicit VACUUM.  It could be
achieved by having the user do the VACUUM and making the new interface merely
throw an error if a VACUUM is still needed.  The trivial usage pattern might
look like this:

SELECT pg_initiate_load('bigtbl');
SELECT pg_wait_load('bigtbl'); -- not a great name
COPY bigtbl FROM STDIN;
SELECT pg_stop_load('bigtbl');
VACUUM bigtbl;
SELECT pg_finalize_load('bigtbl');

It's definitely less elegant, alas.  Perhaps offer the interface you've
proposed and have it do the above under the hood.  That way, users with
complex needs have the flexibility of the lower-level interfaces while those
who can tolerate PreventTransactionChain() have simplicity.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Early hint bit setting

2012-06-06 Thread Jim Nasby

On 5/30/12 4:42 PM, Ants Aasma wrote:

I was thinking about what is the earliest time where we could set hint
bits. This would be just after the commit has been made visible.


Except that's only true when there are no other transactions running. That's 
been one of the big sticking points about trying to proactively set hint bits; 
in a real system you're not going to gain very much unless you wait a while 
before setting them.

An interesting option might be to keep the first XID that dirtied a page and loop 
through all pages in the background looking for pages where first_dirty_xid is 
 the oldest running XID. Those pages would have hint bits that could be set. 
While scanning the page you would want to set first_dirty_xid to the oldest XID 
that could not be hinted.

This is a modification of the idea to set hint bits when a page is on it's way 
out of the buffer; the advantage here is that it would also handle pages that 
are too hot to leave the buffer.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How could we make it simple to access the log as a table?

2012-06-06 Thread Jim Nasby

On 5/28/12 2:55 PM, Robert Haas wrote:

As far as CSV goes, I think the biggest deficiency is that there's a
mismatch between the way that log files are typically named (e.g. one
per day, or one per hour) and the way that a CSV foreign table is
created (you've got to point it at one particular file).  Maybe we
could have a CSV reader that understands PostgreSQL-format CSV logs,
but you point it at a directory, rather than a single file, and it
reads all the CSV files in the directory.  And maybe it could also be
smart enough that if you've got a WHERE clause that filter by date, it
uses that to skip any files that can be proven irrelevant.  So the
user can just turn on CSV logging, point the FDW at the log directory,
and away they go.


The idea is a really good one, except for one thing: this pattern is 
undoubtedly not unique to PG CSV logs.

ISTM it would be extremely useful to have a generic ability to define a 
filename pattern and have all files matching that pattern be pulled in via FDW; 
possibly be treating each file as a partition, possibly complete with exclusion 
constraints.

ISTM that what's really needed here are a bunch of separate improvements to our 
logging and/or FDW, each of which will add usefulness.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Could we replace SysV semaphores with latches?

2012-06-06 Thread Tom Lane
There has been regular griping in this list about our dependence on SysV
shared memory, but not so much about SysV semaphores, even though the
latter cause their fair share of issues; as seen for example in
buildfarm member spoonbill's recent string of failures:

creating template1 database in 
/home/pgbuild/pgbuildfarm/HEAD/pgsql.25563/src/test/regress/./tmp_check/data/base/1
 ... FATAL:  could not create semaphores: No space left on device
DETAIL:  Failed system call was semget(1, 17, 03600).
HINT:  This error does *not* mean that you have run out of disk space.  It 
occurs when either the system limit for the maximum number of semaphore sets 
(SEMMNI), or the system wide maximum number of semaphores (SEMMNS), would be 
exceeded.  You need to raise the respective kernel parameter.  Alternatively, 
reduce PostgreSQL's consumption of semaphores by reducing its max_connections 
parameter.
The PostgreSQL documentation contains more information about 
configuring your system for PostgreSQL.
child process exited with exit code 1

It strikes me that we have recently put together an independent but just
about equivalent waiting mechanism in the form of latches.  And not only
that, but there's already a latch for each process.  Could we replace
our usages of SysV semaphores with WaitLatch on the procLatch?  Unlike
the situation with shared memory where we need some secondary features
(mumble shm_nattch mumble), I think we aren't really using anything
interesting about SysV semaphores except for the raw ability to wait for
somebody to signal us.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers