date:20110624

Re: [HACKERS] Re: starting to review the Extend NOT NULL representation to pg_constraint patch

2011-06-24 Thread Dean Rasheed

On 25 June 2011 01:59, Alvaro Herrera  wrote:
> Excerpts from Robert Haas's message of vie jun 24 19:01:49 -0400 2011:
>> I would tend to think of the not-null-ness that is required by the
>> primary constraint as a separate constraint, not an intrinsic part of
>> the primary key.  IOW, if you drop the primary key constraint, IMV,
>> that should never cause the column to begin allowing nulls.
>

Really? I would expect the reverse, namely that the not-nullness is
part of the PK constraint and dropping the PK *would* then start
allowing NULLs.

I thought that this was one of the reasons for doing this work in the
first place. See http://wiki.postgresql.org/wiki/Todo#CREATE and the
bug reports linked from there.

> Yeah, that is actually what happens.  (I had never noticed this, but it seems
> obvious in hindsight.)
>
> alvherre=# create table foo (a int primary key);
> NOTICE:  CREATE TABLE / PRIMARY KEY creará el índice implícito «foo_pkey» 
> para la tabla «foo»
> CREATE TABLE
> alvherre=# alter table foo drop constraint foo_pkey;
> ALTER TABLE
> alvherre=# \d foo
>        Tabla «public.foo»
>  Columna |  Tipo   | Modificadores
> -+-+---
>  a       | integer | not null
>

Yeah, that's one of the bugs linked from the TODO item, that I hoped
this patch would fix.
[http://archives.postgresql.org/pgsql-hackers/2009-04/msg00062.php]

> What this says is that this patch needs to be creating pg_constraint
> entries for all PRIMARY KEY columns too, which answers my question above
> quite nicely.
>

If by that you mean that you'd end up with 2 pg_constraint entries for
the PK, then that seems wrong to me. I think the not-nullness should
be part of the PK.

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade defaulting to port 25432

2011-06-24 Thread Robert Haas

On Fri, Jun 24, 2011 at 7:47 PM, Bruce Momjian  wrote:
> Peter Eisentraut wrote:
>> On fre, 2011-06-24 at 16:34 -0400, Bruce Momjian wrote:
>> > > > It also creates two new environment variables,
>> > > > OLDPGPORT and NEWPGPORT, to control the port values because we
>> > don't
>> > > > want to default to PGPORT anymore.
>> > >
>> > > I would prefer that all PostgreSQL-related environment variables
>> > start
>> > > with "PG".
>> >
>> > OK, attached.  I was also using environment variables for PGDATA and
>> > PGBIN do I renamed those too to begin with 'PG'.
>>
>> I'm wondering why pg_upgrade needs environment variables at all.  It's a
>> one-shot operation.  Environment variables are typically used to shared
>> default settings across programs.  I don't see how that applies here.
>
> They were there in the original EnterpriseDB code, and in some cases
> like PGPORT, we _used_ those environment variables.  Also, the
> command-line can get pretty long so we actually illustrate environment
> variable use in its --help:
>
>        For example:
>          pg_upgrade -d oldCluster/data -D newCluster/data -b oldCluster/bin 
> -B newCluster/bin
>        or
>          $ export OLDDATADIR=oldCluster/data
>          $ export NEWDATADIR=newCluster/data
>          $ export OLDBINDIR=oldCluster/bin
>          $ export NEWBINDIR=newCluster/bin
>          $ pg_upgrade
>
> You want the environment variable support removed?

I don't.  It's production usefulness is questionable, but it's quite
handy for testing IMO.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Deriving release notes from git commit messages

2011-06-24 Thread Robert Haas

On Fri, Jun 24, 2011 at 7:51 PM, Greg Smith  wrote:
> On 06/24/2011 03:28 PM, Christopher Browne wrote:
>> I expect that the correlation between commit and [various parties] is
>> something that will need to take place outside git.
>
> Agreed on everything except the "Author" information that is already being
> placed into each commit.  The right data is already going into there, all it
> would take is some small amount of tagging to make it easier to extract
> programatically.

Yeah, I think we should seriously consider doing something about that.

>> The existing CommitFest data goes quite a long ways towards capturing
>> interesting information (with the likely exception of sponsorship);
>> what it's missing, at this point, is a capture of what commit or
>> commits wound up drawing the proposed patch into the official code
>> base.
>
> The main problem with driving this from the CommitFest app is that not every
> feature ends up in there.  Committers who commit their own work are one
> source of those.  Commits for bug fixes that end up being notable enough to
> go into the release notes are another.

Yep.

> I agree it would be nice if every entry marked as "Committed" in the CF app
> included a final link to the message ID of the commit closing it.  But since
> I don't ever see that being the complete data set, I find it hard to justify
> enforcing that work.  And the ability to operate programatically on the
> output from "git log" is a slightly easier path to walk down than extracting
> the same from the CF app, you avoid one pre-processing step:  extracting the
> right entries in the database to get a list of commit IDs.

Yep.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Latch implementation that wakes on postmaster death on both win32 and Unix

2011-06-24 Thread Peter Geoghegan

Attached is patch that addresses Fujii's third and most recent set of concerns.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 4952d22..bfe6bcd 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10158,7 +10158,7 @@ retry:
 	/*
 	 * Wait for more WAL to arrive, or timeout to be reached
 	 */
-	WaitLatch(&XLogCtl->recoveryWakeupLatch, 500L);
+	WaitLatch(&XLogCtl->recoveryWakeupLatch, WL_LATCH_SET | WL_TIMEOUT, 500L);
 	ResetLatch(&XLogCtl->recoveryWakeupLatch);
 }
 else
diff --git a/src/backend/port/unix_latch.c b/src/backend/port/unix_latch.c
index 6dae7c9..6d2e3a1 100644
--- a/src/backend/port/unix_latch.c
+++ b/src/backend/port/unix_latch.c
@@ -93,6 +93,7 @@
 #endif
 
 #include "miscadmin.h"
+#include "postmaster/postmaster.h"
 #include "storage/latch.h"
 #include "storage/shmem.h"
 
@@ -188,22 +189,25 @@ DisownLatch(volatile Latch *latch)
  * backend-local latch initialized with InitLatch, or a shared latch
  * associated with the current process by calling OwnLatch.
  *
- * Returns 'true' if the latch was set, or 'false' if timeout was reached.
+ * Returns bit field indicating which condition(s) caused the wake-up.
+ *
+ * Note that there is no guarantee that callers will have all wake-up conditions
+ * returned, but we will report at least one.
  */
-bool
-WaitLatch(volatile Latch *latch, long timeout)
+int
+WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
 {
-	return WaitLatchOrSocket(latch, PGINVALID_SOCKET, false, false, timeout) > 0;
+	return WaitLatchOrSocket(latch, wakeEvents, PGINVALID_SOCKET, timeout);
 }
 
 /*
  * Like WaitLatch, but will also return when there's data available in
- * 'sock' for reading or writing. Returns 0 if timeout was reached,
- * 1 if the latch was set, 2 if the socket became readable or writable.
+ * 'sock' for reading or writing.
+ *
+ * Returns same bit mask and makes same guarantees as WaitLatch.
  */
 int
-WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
-  bool forWrite, long timeout)
+WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock, long timeout)
 {
 	struct timeval tv,
 			   *tvp = NULL;
@@ -211,12 +215,15 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 	fd_set		output_mask;
 	int			rc;
 	int			result = 0;
+	bool		found = false;
+
+	Assert(wakeEvents != 0);
 
 	if (latch->owner_pid != MyProcPid)
 		elog(ERROR, "cannot wait on a latch owned by another process");
 
 	/* Initialize timeout */
-	if (timeout >= 0)
+	if (timeout >= 0 && (wakeEvents & WL_TIMEOUT))
 	{
 		tv.tv_sec = timeout / 100L;
 		tv.tv_usec = timeout % 100L;
@@ -224,7 +231,7 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 	}
 
 	waiting = true;
-	for (;;)
+	do
 	{
 		int			hifd;
 
@@ -235,16 +242,31 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 		 * do that), and the select() will return immediately.
 		 */
 		drainSelfPipe();
-		if (latch->is_set)
+		if (latch->is_set && (wakeEvents & WL_LATCH_SET))
 		{
-			result = 1;
+			result |= WL_LATCH_SET;
+			found = true;
+			/* Leave loop immediately, avoid blocking again.
+			 *
+			 * Don't attempt to report any other reason
+			 * for returning to callers that may have
+			 * happened to coincide.
+			 */
 			break;
 		}
 
 		FD_ZERO(&input_mask);
 		FD_SET(selfpipe_readfd, &input_mask);
 		hifd = selfpipe_readfd;
-		if (sock != PGINVALID_SOCKET && forRead)
+
+		if (wakeEvents & WL_POSTMASTER_DEATH)
+		{
+			FD_SET(postmaster_alive_fds[POSTMASTER_FD_WATCH], &input_mask);
+			if (postmaster_alive_fds[POSTMASTER_FD_WATCH] > hifd)
+hifd = postmaster_alive_fds[POSTMASTER_FD_WATCH];
+		}
+
+		if (sock != PGINVALID_SOCKET && (wakeEvents & WL_SOCKET_READABLE))
 		{
 			FD_SET(sock, &input_mask);
 			if (sock > hifd)
@@ -252,7 +274,7 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 		}
 
 		FD_ZERO(&output_mask);
-		if (sock != PGINVALID_SOCKET && forWrite)
+		if (sock != PGINVALID_SOCKET && (wakeEvents & WL_SOCKET_WRITEABLE))
 		{
 			FD_SET(sock, &output_mask);
 			if (sock > hifd)
@@ -268,20 +290,33 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 	(errcode_for_socket_access(),
 	 errmsg("select() failed: %m")));
 		}
-		if (rc == 0)
+		if (rc == 0 && (wakeEvents & WL_TIMEOUT))
 		{
 			/* timeout exceeded */
-			result = 0;
-			break;
+			result |= WL_TIMEOUT;
+			found = true;
 		}
-		if (sock != PGINVALID_SOCKET &&
-			((forRead && FD_ISSET(sock, &input_mask)) ||
-			 (forWrite && FD_ISSET(sock, &output_mask
+		if (sock != PGINVALID_SOCKET)
 		{
-			result = 2;
-			break;/* data available in socket */
+			if ((wakeEvents & WL_SOCKET_READABLE ) && FD_ISSET(sock, &input_mask))
+			{
+result |= WL_SOCK

Re: [HACKERS] Re: starting to review the Extend NOT NULL representation to pg_constraint patch

2011-06-24 Thread Alvaro Herrera

Excerpts from Robert Haas's message of vie jun 24 19:01:49 -0400 2011:
> On Fri, Jun 24, 2011 at 6:39 PM, Alvaro Herrera
>  wrote:
> > So remind me ... did we discuss PRIMARY KEY constraints?  Are they
> > supposed to show up as inherited not null rows in the child?  Obviously,
> > they do not show up as PKs in the child, but they *are* not null so my
> > guess is that they need to be inherited as not null as well.  (Right
> > now, unpatched head of course emits the column as attnotnull).
> >
> > In this case, the inherited name (assuming that the child declaration
> > does not explicitely state a constraint name) should be the same as the
> > PK, correct?
> 
> I would tend to think of the not-null-ness that is required by the
> primary constraint as a separate constraint, not an intrinsic part of
> the primary key.  IOW, if you drop the primary key constraint, IMV,
> that should never cause the column to begin allowing nulls.

Yeah, that is actually what happens.  (I had never noticed this, but it seems
obvious in hindsight.)

alvherre=# create table foo (a int primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY creará el índice implícito «foo_pkey» para 
la tabla «foo»
CREATE TABLE
alvherre=# alter table foo drop constraint foo_pkey;
ALTER TABLE
alvherre=# \d foo
Tabla «public.foo»
 Columna |  Tipo   | Modificadores 
-+-+---
 a   | integer | not null

What this says is that this patch needs to be creating pg_constraint
entries for all PRIMARY KEY columns too, which answers my question above
quite nicely.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Deriving release notes from git commit messages

2011-06-24 Thread Bruce Momjian

Greg Smith wrote:
> I tend not to think in terms of solutions that involve web applications 
> because I never build them, but this seems like a useful approach to 
> consider.  Given that the list of tags is pretty static, I could see a 
> table with a line for each commit, and a series of check boxes in 
> columns for each tag next to it.  Then you just click on each of the 
> tags that apply to that line.
> 
> Once that was done, increasing the amount of smarts that go into 
> pre-populating which boxes are already filled in could then happen, with 
> "docs only" being the easiest one to spot.  A really smart 
> implementation here might even eventually make a good guess for "bug 
> fix" too, based on whether a bunch of similar commits happened to other 
> branches around the same time.  Everyone is getting better lately at 
> noting the original SHA1 when fixing a mistake too, so being able to 
> identify "repair" seems likely when that's observed.
> 
> This approach would pull the work from being at commit time, but it 
> would still be easier to do incrementally and to distribute the work 
> around than what's feasible right now.  Release note prep takes critical 
> project contributors a non-trivial amount of time, this would let anyone 
> who felt like tagging things for an hour help with the early stages of 
> that.  And it would provide a functional source for the metadata I've 
> been searching for too, to drive all this derived data about sponsors etc.

We could have the items put into release note categories and have a
button that marks incompatibilties.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Deriving release notes from git commit messages

2011-06-24 Thread Greg Smith


On 06/24/2011 03:21 PM, Robert Haas wrote:

If I were attacking this problem, I'd be inclined to make a web
application that lists all the commits in a format roughly similar to
the git API, and then lets you tag each commit with tags from some
list (feature, bug-fix, revert, repair-of-previous-commit, etc.).
Some of the tagging (e.g. docs-only) could probably even be done
automatically.  Then somebody could go through once a month and update
all the tags.  I'd be more more willing to volunteer to do that than I
would be to trying to get the right metadata tag in every commit...
   


I tend not to think in terms of solutions that involve web applications 
because I never build them, but this seems like a useful approach to 
consider.  Given that the list of tags is pretty static, I could see a 
table with a line for each commit, and a series of check boxes in 
columns for each tag next to it.  Then you just click on each of the 
tags that apply to that line.


Once that was done, increasing the amount of smarts that go into 
pre-populating which boxes are already filled in could then happen, with 
"docs only" being the easiest one to spot.  A really smart 
implementation here might even eventually make a good guess for "bug 
fix" too, based on whether a bunch of similar commits happened to other 
branches around the same time.  Everyone is getting better lately at 
noting the original SHA1 when fixing a mistake too, so being able to 
identify "repair" seems likely when that's observed.


This approach would pull the work from being at commit time, but it 
would still be easier to do incrementally and to distribute the work 
around than what's feasible right now.  Release note prep takes critical 
project contributors a non-trivial amount of time, this would let anyone 
who felt like tagging things for an hour help with the early stages of 
that.  And it would provide a functional source for the metadata I've 
been searching for too, to drive all this derived data about sponsors etc.


Disclaimer:  as a person who does none of this work currently, my 
suggestions are strictly aimed to inspire those who do in a direction 
that might makes things easier for them.  I can get the sponsor stuff 
I've volunteered to work on finished regardless.  I just noticed what 
seems like it could be a good optimization over here while I was working 
on that.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] News on Clang

2011-06-24 Thread Peter Geoghegan

On 25 June 2011 00:27, Peter Eisentraut  wrote:
> On fre, 2011-06-24 at 18:02 +0100, Peter Geoghegan wrote:
>> I'm very encouraged by this - Clang is snapping at the heels of GCC
>> here. I'd really like to see Clang as a better supported compiler,
>
> We have a build farm member for Clang:
> http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=smew&br=HEAD
>
> That doesn't capture the compile time issues that you described, but
> several other issues that caused the builds to fail altogether have been
> worked out recently, and I'd consider clang 2.9+ to be fully supported
> as of PG 9.1.

I did see the D_GNU_SOURCE issue that you described a while back with
2.8. My bleeding edge Fedora 15 system's yum repository only has 2.8,
for some reason.

I'm glad that you feel we're ready to officially support Clang -
should this be in the 9.1 release notes?

Anyway, since the problem has been narrowed to a specific compiler
flag, and we have a good test case, I'm optimistic that the bug can be
fixed quickly.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Deriving release notes from git commit messages

2011-06-24 Thread Greg Smith


On 06/24/2011 03:28 PM, Christopher Browne wrote:

I expect that the correlation between commit and [various parties] is
something that will need to take place outside git.
   


Agreed on everything except the "Author" information that is already 
being placed into each commit.  The right data is already going into 
there, all it would take is some small amount of tagging to make it 
easier to extract programatically.



The existing CommitFest data goes quite a long ways towards capturing
interesting information (with the likely exception of sponsorship);
what it's missing, at this point, is a capture of what commit or
commits wound up drawing the proposed patch into the official code
base.


The main problem with driving this from the CommitFest app is that not 
every feature ends up in there.  Committers who commit their own work 
are one source of those.  Commits for bug fixes that end up being 
notable enough to go into the release notes are another.


I agree it would be nice if every entry marked as "Committed" in the CF 
app included a final link to the message ID of the commit closing it.  
But since I don't ever see that being the complete data set, I find it 
hard to justify enforcing that work.  And the ability to operate 
programatically on the output from "git log" is a slightly easier path 
to walk down than extracting the same from the CF app, you avoid one 
pre-processing step:  extracting the right entries in the database to 
get a list of commit IDs.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Deriving release notes from git commit messages

2011-06-24 Thread Bruce Momjian

Greg Smith wrote:
> On 06/24/2011 04:52 PM, Bruce Momjian wrote:
> > That tagging is basically what I do on my first pass through the release
> > notes.  For the gory details:
> >
> > http://momjian.us/main/blogs/pgblog/2009.html#March_25_2009
> >
> 
> Excellent summary of the process I was trying to suggest might be 
> improved; the two most relevant bits:
> 
> 3 remove insignificant items  2.7k1 day
> 4 research and reword items   1k  5 days
> 
> 
> Some sort of tagging to identify feature changes should drive down the 
> time spent on filtering insignificant items.  And the person doing the 
> commit already has the context you are acquiring later as "research" 
> here.  Would suggesting they try to write a short description at commit 
> time drive it and the "reword" phase time down significantly?  Can't say 
> for sure, but I wanted to throw the idea out for 
> consideration--particularly since solving it well ends up making some of 
> this other derived data people would like to see a lot easier to 
> generate too.

Most of those five days is tracking down commits where I can't figure
out the user-visible change, or if there is one, and wording things to
be in a consistent voice.  Git does allow me to look those up much
faster than CVS.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FWD: fastlock+lazyvzid patch performance

2011-06-24 Thread karavelov

- Цитат от Robert Haas (robertmh...@gmail.com), на 25.06.2011 в 00:16 -

> On Fri, Jun 24, 2011 at 3:31 PM,   wrote:
>> clients beta2 +fastlock +lazyvzid local socket
>> 8 76064 92430 92198 106734
>> 16 64254 90788 90698 105097
>> 32 56629 88189 88269 101202
>> 64 51124 84354 84639 96362
>> 128 45455 79361 79724 90625
>> 256 40370 71904 72737 82434
> 
> I'm having trouble interpreting this table.
> 
> Column 1: # of clients
> Column 2: TPS using 9.1beta2 unpatched
> Column 3: TPS using 9.1beta2 + fastlock patch
> Column 4: TPS using 9.1beta2 + fastlock patch + vxid patch
> Column 5: ???

9.1beta2 + fastlock patch + vxid patch , pgbench run on unix domain 
socket, the other tests are using local TCP connection.

> At any rate, that is a big improvement on a system with only 8 cores.
> I would have thought you would have needed ~16 cores to get that much
> speedup.  I wonder if the -M prepared makes a difference ... I wasn't
> using that option.
> 

Yes, it does make some difference, 
Using unpatched beta2, 8 clients with simple protocol I get 57059 tps.
With all patches and simple protocol I get 60707 tps. So the difference
between patched/stock is not so big. I suppose the system gets CPU bound
on parsing and planning every submitted request. With -M extended I 
get even slower results.

Luben

--
"Perhaps, there is no greater love than that of a
 revolutionary couple where each of the two lovers is
 ready to abandon the other at any moment if revolution
 demands it."
 Zizek

Re: [HACKERS] pg_upgrade defaulting to port 25432

2011-06-24 Thread Bruce Momjian

Peter Eisentraut wrote:
> On fre, 2011-06-24 at 16:34 -0400, Bruce Momjian wrote:
> > > > It also creates two new environment variables,
> > > > OLDPGPORT and NEWPGPORT, to control the port values because we
> > don't
> > > > want to default to PGPORT anymore.
> > > 
> > > I would prefer that all PostgreSQL-related environment variables
> > start
> > > with "PG".
> > 
> > OK, attached.  I was also using environment variables for PGDATA and
> > PGBIN do I renamed those too to begin with 'PG'.
> 
> I'm wondering why pg_upgrade needs environment variables at all.  It's a
> one-shot operation.  Environment variables are typically used to shared
> default settings across programs.  I don't see how that applies here.

They were there in the original EnterpriseDB code, and in some cases
like PGPORT, we _used_ those environment variables.  Also, the
command-line can get pretty long so we actually illustrate environment
variable use in its --help:

For example:
  pg_upgrade -d oldCluster/data -D newCluster/data -b oldCluster/bin -B 
newCluster/bin
or
  $ export OLDDATADIR=oldCluster/data
  $ export NEWDATADIR=newCluster/data
  $ export OLDBINDIR=oldCluster/bin
  $ export NEWBINDIR=newCluster/bin
  $ pg_upgrade

You want the environment variable support removed?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Deriving release notes from git commit messages

2011-06-24 Thread Greg Smith


On 06/24/2011 04:52 PM, Bruce Momjian wrote:

That tagging is basically what I do on my first pass through the release
notes.  For the gory details:

http://momjian.us/main/blogs/pgblog/2009.html#March_25_2009
   


Excellent summary of the process I was trying to suggest might be 
improved; the two most relevant bits:


3   remove insignificant items  2.7k1 day
4   research and reword items   1k  5 days


Some sort of tagging to identify feature changes should drive down the 
time spent on filtering insignificant items.  And the person doing the 
commit already has the context you are acquiring later as "research" 
here.  Would suggesting they try to write a short description at commit 
time drive it and the "reword" phase time down significantly?  Can't say 
for sure, but I wanted to throw the idea out for 
consideration--particularly since solving it well ends up making some of 
this other derived data people would like to see a lot easier to 
generate too.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us

Re: [HACKERS] News on Clang

2011-06-24 Thread Peter Eisentraut

On fre, 2011-06-24 at 18:02 +0100, Peter Geoghegan wrote:
> I'm very encouraged by this - Clang is snapping at the heels of GCC
> here. I'd really like to see Clang as a better supported compiler,

We have a build farm member for Clang:
http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=smew&br=HEAD

That doesn't capture the compile time issues that you described, but
several other issues that caused the builds to fail altogether have been
worked out recently, and I'd consider clang 2.9+ to be fully supported
as of PG 9.1.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Another issue with invalid XML values

2011-06-24 Thread Noah Misch

On Fri, Jun 24, 2011 at 04:15:35PM +0200, Florian Pflug wrote:
> On Jun24, 2011, at 13:24 , Noah Misch wrote:
> > On Wed, Jun 22, 2011 at 12:47:42PM +0200, Florian Pflug wrote:

> > There's also the " parser error :" difference; would that also be easy to
> > re-add?  (I'm assuming it's not a constant but depends on the 
> > xmlErrorDomain.)
> 
> It's the string representation of the error domain and error level. 
> Unfortunately,
> the logic which builds that string representation is contained in the static
> function xmlReportError() deep within libxml, and that function doesn't seem
> to be called at all for errors reported via a structured error handler :-(
> 
> So re-adding that would mean duplicating that code within our xml.c, which
> seems undesirable...

Yes, that seems like sufficient trouble to not undertake merely for the sake of
a cosmetic error message match.

> >> ! typedef enum
> >> ! {
> >> !  PG_XML_STRICTNESS_NONE  /* No error handling */,
> >> !  PG_XML_STRICTNESS_LEGACY/* Ignore notices/warnings/errors unless
> >> !   * function 
> >> result indicates error condition
> >> !   */,
> >> !  PG_XML_STRICTNESS_WELLFORMED/* Ignore non-parser 
> >> notices/warnings/errors */,
> >> !  PG_XML_STRICTNESS_ALL   /* Report all notices/warnings/errors 
> >> */,
> >> ! } PgXmlStrictnessType;
> > 
> > We don't generally prefix backend names with Pg/PG.  Also, I think the word
> > "Type" in "PgXmlStrictnessType" is redundant.  How about just XmlStrictness?
> 
> I agree with removing the "Type" suffix. I'm  not so sure about the "Pg"
> prefix, though. I've added that after actually hitting a conflict between
> one of this enum's constants and some constant defined by libxml. Admittedly,
> that was *before* I renamed the type to "PgXmlStrictnessType", so removing
> the "Pg" prefix now would probably work. But it seems a bit close for
> comfort - libxml defines a ton of constants named XML_something.
> 
> Still, if you feel that the "Pg" prefix looks to weird and believe that it's
> better to wait until there's an actual clash before renaming things, I won't
> object further. Just wanted to bring the issue to your attention...

I do think it looks weird, but I agree that the potential for conflict is
notably higher than normal.  I'm fine with having the prefix.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade defaulting to port 25432

2011-06-24 Thread Peter Eisentraut

On fre, 2011-06-24 at 16:34 -0400, Bruce Momjian wrote:
> > > It also creates two new environment variables,
> > > OLDPGPORT and NEWPGPORT, to control the port values because we
> don't
> > > want to default to PGPORT anymore.
> > 
> > I would prefer that all PostgreSQL-related environment variables
> start
> > with "PG".
> 
> OK, attached.  I was also using environment variables for PGDATA and
> PGBIN do I renamed those too to begin with 'PG'.

I'm wondering why pg_upgrade needs environment variables at all.  It's a
one-shot operation.  Environment variables are typically used to shared
default settings across programs.  I don't see how that applies here.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Deriving release notes from git commit messages

2011-06-24 Thread David Christensen

On Jun 24, 2011, at 2:28 PM, Christopher Browne wrote:
> On Fri, Jun 24, 2011 at 6:47 PM, Greg Smith  wrote:
>> On 06/24/2011 01:42 PM, Robert Haas wrote:
>>> I am not inclined to try to track sponsors in the commit message at
>>> all.
>> 
>> I was not suggesting that information be part of the commit.  We've worked
>> out a reasonable initial process for the people working on sponsored
>> features to record that information completely outside of the commit or
>> release notes data.  It turns out though that process would be easier to
>> drive if it were easier to derive a feature->{commit,author} list
>> though--and that would spit out for free with the rest of this.  Improving
>> the ability to do sponsor tracking is more of a helpful side-effect of
>> something that's useful for other reasons rather than a direct goal.
> 
> In taking a peek at the documentation and comments out on the interweb
> about "git amend," I don't think that it's going to be particularly
> successful if we try to capture all this stuff in the commit message
> and metadata, because it's tough to guarantee that *all* this data
> would be correctly captured at commit time, and you can't revise it
> after the commit gets pushed upstream.

Perhaps `git notes` could be something used to annotate these:

http://www.kernel.org/pub/software/scm/git/docs/git-notes.html

Regards,

David
--
David Christensen
End Point Corporation
da...@endpoint.com





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: starting to review the Extend NOT NULL representation to pg_constraint patch

2011-06-24 Thread Robert Haas

On Fri, Jun 24, 2011 at 6:39 PM, Alvaro Herrera
 wrote:
> So remind me ... did we discuss PRIMARY KEY constraints?  Are they
> supposed to show up as inherited not null rows in the child?  Obviously,
> they do not show up as PKs in the child, but they *are* not null so my
> guess is that they need to be inherited as not null as well.  (Right
> now, unpatched head of course emits the column as attnotnull).
>
> In this case, the inherited name (assuming that the child declaration
> does not explicitely state a constraint name) should be the same as the
> PK, correct?

I would tend to think of the not-null-ness that is required by the
primary constraint as a separate constraint, not an intrinsic part of
the primary key.  IOW, if you drop the primary key constraint, IMV,
that should never cause the column to begin allowing nulls.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: starting to review the Extend NOT NULL representation to pg_constraint patch

2011-06-24 Thread Alvaro Herrera


So remind me ... did we discuss PRIMARY KEY constraints?  Are they
supposed to show up as inherited not null rows in the child?  Obviously,
they do not show up as PKs in the child, but they *are* not null so my
guess is that they need to be inherited as not null as well.  (Right
now, unpatched head of course emits the column as attnotnull).

In this case, the inherited name (assuming that the child declaration
does not explicitely state a constraint name) should be the same as the
PK, correct?

It is unclear to me that primary keys shouldn't be inherited by default.
But I guess that's a separate discussion.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Small 9.1 documentation fix (SSPI auth)

2011-06-24 Thread Christian Ullrich

When Magnus fixed and applied my SSPI-via-GSS patch in January, we 
forgot to fix to the documentation. Suggested patch attached; should I 
also put that four-liner into any CFs?


--
Christian
diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
new file mode 100644
index 575eb3b..e7c7db4
*** a/doc/src/sgml/client-auth.sgml
--- b/doc/src/sgml/client-auth.sgml
*** omicron bryanh  
*** 989,996 
  negotiate mode, which will use
  Kerberos when possible and automatically
  fall back to NTLM in other cases.
! SSPI authentication only works when both
! server and client are running Windows.
 
  
 
--- 989,998 
  negotiate mode, which will use
  Kerberos when possible and automatically
  fall back to NTLM in other cases.
! In addition to clients running on Windows,
! SSPI authentication is also supported by
! libpq-based clients on other platforms, 
! through the use of GSSAPI.
 
  
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2011-06-24 Thread Robert Haas

On Fri, Jun 24, 2011 at 5:28 PM, Simon Riggs  wrote:
> I've explained all of the above points to you already and you're wrong.

We're going to have to agree to disagree on that point.

> Do you advocate that all ALTER TABLE operations use
> AccessExclusiveLock, or just the operations I added? If you see
> problems here, then you must also be willing to increase the lock
> strength of things such as INHERITS, and back patch them to where the
> same problems exist. You'll wriggle out of that, I'm sure. There are
> regrettably, many bugs here and they can't be fixed in the simple
> manner you propose.

I think there is quite a lot of difference between realizing that we
can't fix every problem, and deciding to put out a release that adds a
whole lot more of them that we have no plans to fix.

> It's not me you block Robert, I'm not actually a user and I will sleep
> well whatever happens, happy that I tried to resolve this. Users watch
> and remember.

If you are proposing that I should worry about a posse of angry
PostgreSQL users hunting me down (or abandoning the product) because I
agreed with Tom Lane on the necessity of reverting one of your
patches, then I'm willing to take that chance.  For one thing, there's
a pretty good chance they'll go after Tom first.  For two things,
there's at least an outside chance I might be rescued by an
alternative posse who supports our tradition of putting out high
quality releases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2011-06-24 Thread Simon Riggs

On Fri, Jun 24, 2011 at 10:08 PM, Robert Haas  wrote:
> On Fri, Jun 24, 2011 at 4:27 PM, Simon Riggs  wrote:
>> On Fri, Jun 24, 2011 at 9:00 PM, Robert Haas  wrote:
>>> On Fri, Jun 24, 2011 at 3:46 PM, Simon Riggs  wrote:
 Test case please. I don't understand the problem you're describing.
>>>
>>> S1: select * from foo;
>>> S2: begin;
>>> S2: alter table foo alter column a set storage plain;
>>> S1: select * from foo;
>>> 
>>
>> Er,,.yes, that what locks do. Where is the bug?
>
> I didn't say it was a bug.  I said that the additional locking you
> added acted to remove the much of the benefit of reducing the lock
> strength in the first place.  If a brief exclusive lock (such as would
> be taken by ALTER TABLE SET STORAGE in 9.0 and prior release) wasn't
> acceptable, a brief exclusive lock on a different lock tag that blocks
> most of the same operations isn't going to be any better.  You might
> still see some improvement in the cases where some complicated
> operation like scanning or rewriting the table can be performed
> without holding the relation definition lock, and then only get the
> relation definition lock afterward.  But for something like the above
> example (or the ALTER TABLE SET (n_distinct) feature you mentioned in
> your previous email) the benefit is going to be darned close to zero.
>
>> This patch has locking, but its the most reduced form of locking that
>> is available for a non invasive patch for 9.1
>
> I don't like the extra lock manager traffic this adds to operations
> that aren't even performing DDL, I'm not convinced that it is safe,
> the additional locking negates a significant portion of the benefit of
> the original patch, and Tom's made a fairly convincing argument that
> even with this pg_dump will still become significantly less reliable
> than heretofore even if we did apply it.  In my view, what we need to
> do is revert to using AccessExclusiveLock until some of these other
> details are worked out.  I wasn't entirely convinced that that was
> necessary when Tom first posted this email, but having thought through
> the issues and looked over your patch, it's now my position that that
> is the best way forward.  The fact that your proposed patch appears
> not to be thinking very clearly about when locks need to be taken and
> released only adds to my feeling that we are in for a bad time if we
> go down this route.
>
> In short: -1 from me.

I've explained all of the above points to you already and you're wrong.

I don't think you understand this patch at all, nor even the original feature.

Locking the table is completely different from locking the definition
of a table.

Do you advocate that all ALTER TABLE operations use
AccessExclusiveLock, or just the operations I added? If you see
problems here, then you must also be willing to increase the lock
strength of things such as INHERITS, and back patch them to where the
same problems exist. You'll wriggle out of that, I'm sure. There are
regrettably, many bugs here and they can't be fixed in the simple
manner you propose.

It's not me you block Robert, I'm not actually a user and I will sleep
well whatever happens, happy that I tried to resolve this. Users watch
and remember.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FWD: fastlock+lazyvzid patch performance

2011-06-24 Thread Robert Haas

On Fri, Jun 24, 2011 at 3:31 PM,   wrote:
> clients beta2 +fastlock +lazyvzid local socket
> 8 76064 92430 92198 106734
> 16 64254 90788 90698 105097
> 32 56629 88189 88269 101202
> 64 51124 84354 84639 96362
> 128 45455 79361 79724 90625
> 256 40370 71904 72737 82434

I'm having trouble interpreting this table.

Column 1: # of clients
Column 2: TPS using 9.1beta2 unpatched
Column 3: TPS using 9.1beta2 + fastlock patch
Column 4: TPS using 9.1beta2 + fastlock patch + vxid patch
Column 5: ???

At any rate, that is a big improvement on a system with only 8 cores.
I would have thought you would have needed ~16 cores to get that much
speedup.  I wonder if the -M prepared makes a difference ... I wasn't
using that option.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq SSL with non-blocking sockets

2011-06-24 Thread Steve Singer


On 11-06-15 03:20 PM, Martin Pihlak wrote:


Yes, that sounds like a good idea -- especially considering that COPY
is not the only operation that can cause SSL_write retries.


This is of course still "work in progress", needs cleaning up and
definitely more testing. But at this point before going any further,
I'd really appreciate a quick review from resident libpq gurus.



Martin,

I'm not a libpq guru but I've given your patch a look over.

The idea of storing the original buffer in new members of connection 
structure for later re-use seems like a good way to approach this.


A few things I noticed (that you might be aware of since you mentioned 
it needs cleanup)


-The patch doesn't compile with non-ssl builds,  the debug at the bottom 
of PQSendSome isn't in an #ifdef


-I don't think your handling the return code properly.   Consider this case.

pqSendSome(some data)
  sslRetryBuf = some Data
  return 1
pqSendSome(more data)
  it sends all of 'some data'
  returns 0

I think 1 should be returned because all of 'more data' still needs to 
be sent.  I think returning a 0 will break PQsetnonblocking if you call 
it when there is data in both sslRetryBuf and outputBuffer.
We might even want to try sending the data in outputBuffer after we've 
sent all the data sitting in sslRetryBuf.



If you close the connection with an outstanding sslRetryBuf you need to 
free it.



Other than running your test program I didn't do any testing with this 
patch.


Steve


regards,
Martin

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2011-06-24 Thread Robert Haas

On Fri, Jun 24, 2011 at 4:27 PM, Simon Riggs  wrote:
> On Fri, Jun 24, 2011 at 9:00 PM, Robert Haas  wrote:
>> On Fri, Jun 24, 2011 at 3:46 PM, Simon Riggs  wrote:
>>> Test case please. I don't understand the problem you're describing.
>>
>> S1: select * from foo;
>> S2: begin;
>> S2: alter table foo alter column a set storage plain;
>> S1: select * from foo;
>> 
>
> Er,,.yes, that what locks do. Where is the bug?

I didn't say it was a bug.  I said that the additional locking you
added acted to remove the much of the benefit of reducing the lock
strength in the first place.  If a brief exclusive lock (such as would
be taken by ALTER TABLE SET STORAGE in 9.0 and prior release) wasn't
acceptable, a brief exclusive lock on a different lock tag that blocks
most of the same operations isn't going to be any better.  You might
still see some improvement in the cases where some complicated
operation like scanning or rewriting the table can be performed
without holding the relation definition lock, and then only get the
relation definition lock afterward.  But for something like the above
example (or the ALTER TABLE SET (n_distinct) feature you mentioned in
your previous email) the benefit is going to be darned close to zero.

> This patch has locking, but its the most reduced form of locking that
> is available for a non invasive patch for 9.1

I don't like the extra lock manager traffic this adds to operations
that aren't even performing DDL, I'm not convinced that it is safe,
the additional locking negates a significant portion of the benefit of
the original patch, and Tom's made a fairly convincing argument that
even with this pg_dump will still become significantly less reliable
than heretofore even if we did apply it.  In my view, what we need to
do is revert to using AccessExclusiveLock until some of these other
details are worked out.  I wasn't entirely convinced that that was
necessary when Tom first posted this email, but having thought through
the issues and looked over your patch, it's now my position that that
is the best way forward.  The fact that your proposed patch appears
not to be thinking very clearly about when locks need to be taken and
released only adds to my feeling that we are in for a bad time if we
go down this route.

In short: -1 from me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Deriving release notes from git commit messages

2011-06-24 Thread Bruce Momjian

Robert Haas wrote:
> On Fri, Jun 24, 2011 at 2:47 PM, Greg Smith  wrote:
> > On 06/24/2011 01:42 PM, Robert Haas wrote:
> >> I am disinclined to add a "feature"
> >> annotation. ?I think it is unlikely that will end up being any more
> >> useful than just extracting either the whole commit message or its
> >> first line.
> >
> > I don't see any good way to extract the list of commits relevant to the
> > release notes without something like it. ?Right now, you can't just mine
> > every commit into the release notes without getting more noise than signal.
> > ?Something that tags the ones that are adding new features or other notable
> > updates, as opposed to bug fixes, doc updates, etc., would allow that
> > separation.
> 
> Oh, I see.  There's definitely some fuzziness about which commits make
> it into the release notes right now and which do not - sometimes
> things get missed, or sometimes Bruce omits something I would have
> included or includes something I would have omitted.  OTOH, it's not
> clear that making every committer do that on every commit is going to
> be better than having one person go through the log and decide
> everything all at once.
> 
> If I were attacking this problem, I'd be inclined to make a web
> application that lists all the commits in a format roughly similar to
> the git API, and then lets you tag each commit with tags from some
> list (feature, bug-fix, revert, repair-of-previous-commit, etc.).
> Some of the tagging (e.g. docs-only) could probably even be done
> automatically.  Then somebody could go through once a month and update
> all the tags.  I'd be more more willing to volunteer to do that than I
> would be to trying to get the right metadata tag in every commit...

That tagging is basically what I do on my first pass through the release
notes.  For the gory details:

http://momjian.us/main/blogs/pgblog/2009.html#March_25_2009

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade defaulting to port 25432

2011-06-24 Thread Bruce Momjian

Peter Eisentraut wrote:
> On tor, 2011-06-23 at 21:39 -0400, Bruce Momjian wrote:
> > I have created the following patch which uses 25432 as the default port
> > number for pg_upgrade.
> 
> I don't think we should just steal a port from the reserved range.
> Picking a random port from the private/dynamic range seems more
> appropriate.

Oh, I didn't know about that.  I will use 50432 instead.

> > It also creates two new environment variables,
> > OLDPGPORT and NEWPGPORT, to control the port values because we don't
> > want to default to PGPORT anymore.
> 
> I would prefer that all PostgreSQL-related environment variables start
> with "PG".

OK, attached.  I was also using environment variables for PGDATA and
PGBIN do I renamed those too to begin with 'PG'.

Patch attached.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c
new file mode 100644
index 1ee2aca..5c5ce72
*** a/contrib/pg_upgrade/check.c
--- b/contrib/pg_upgrade/check.c
*** output_check_banner(bool *live_check)
*** 29,34 
--- 29,37 
  	if (user_opts.check && is_server_running(old_cluster.pgdata))
  	{
  		*live_check = true;
+ 		if (old_cluster.port == DEF_PGUPORT)
+ 			pg_log(PG_FATAL, "When checking a live old server, "
+    "you must specify the old server's port number.\n");
  		if (old_cluster.port == new_cluster.port)
  			pg_log(PG_FATAL, "When checking a live server, "
     "the old and new port numbers must be different.\n");
diff --git a/contrib/pg_upgrade/option.c b/contrib/pg_upgrade/option.c
new file mode 100644
index 4401a81..d29aad0
*** a/contrib/pg_upgrade/option.c
--- b/contrib/pg_upgrade/option.c
*** parseCommandLine(int argc, char *argv[])
*** 58,65 
  	os_info.progname = get_progname(argv[0]);
  
  	/* Process libpq env. variables; load values here for usage() output */
! 	old_cluster.port = getenv("PGPORT") ? atoi(getenv("PGPORT")) : DEF_PGPORT;
! 	new_cluster.port = getenv("PGPORT") ? atoi(getenv("PGPORT")) : DEF_PGPORT;
  
  	os_user_effective_id = get_user_info(&os_info.user);
  	/* we override just the database user name;  we got the OS id above */
--- 58,65 
  	os_info.progname = get_progname(argv[0]);
  
  	/* Process libpq env. variables; load values here for usage() output */
! 	old_cluster.port = getenv("PGPORTOLD") ? atoi(getenv("PGPORTOLD")) : DEF_PGUPORT;
! 	new_cluster.port = getenv("PGPORTNEW") ? atoi(getenv("PGPORTNEW")) : DEF_PGUPORT;
  
  	os_user_effective_id = get_user_info(&os_info.user);
  	/* we override just the database user name;  we got the OS id above */
*** parseCommandLine(int argc, char *argv[])
*** 203,215 
  	}
  
  	/* Get values from env if not already set */
! 	check_required_directory(&old_cluster.bindir, "OLDBINDIR", "-b",
  			"old cluster binaries reside");
! 	check_required_directory(&new_cluster.bindir, "NEWBINDIR", "-B",
  			"new cluster binaries reside");
! 	check_required_directory(&old_cluster.pgdata, "OLDDATADIR", "-d",
  			"old cluster data resides");
! 	check_required_directory(&new_cluster.pgdata, "NEWDATADIR", "-D",
  			"new cluster data resides");
  }
  
--- 203,215 
  	}
  
  	/* Get values from env if not already set */
! 	check_required_directory(&old_cluster.bindir, "PGBINOLD", "-b",
  			"old cluster binaries reside");
! 	check_required_directory(&new_cluster.bindir, "PGBINNEW", "-B",
  			"new cluster binaries reside");
! 	check_required_directory(&old_cluster.pgdata, "PGDATAOLD", "-d",
  			"old cluster data resides");
! 	check_required_directory(&new_cluster.pgdata, "PGDATANEW", "-D",
  			"new cluster data resides");
  }
  
*** For example:\n\
*** 254,270 
  or\n"), old_cluster.port, new_cluster.port, os_info.user);
  #ifndef WIN32
  	printf(_("\
!   $ export OLDDATADIR=oldCluster/data\n\
!   $ export NEWDATADIR=newCluster/data\n\
!   $ export OLDBINDIR=oldCluster/bin\n\
!   $ export NEWBINDIR=newCluster/bin\n\
$ pg_upgrade\n"));
  #else
  	printf(_("\
!   C:\\> set OLDDATADIR=oldCluster/data\n\
!   C:\\> set NEWDATADIR=newCluster/data\n\
!   C:\\> set OLDBINDIR=oldCluster/bin\n\
!   C:\\> set NEWBINDIR=newCluster/bin\n\
C:\\> pg_upgrade\n"));
  #endif
  	printf(_("\nReport bugs to .\n"));
--- 254,270 
  or\n"), old_cluster.port, new_cluster.port, os_info.user);
  #ifndef WIN32
  	printf(_("\
!   $ export PGDATAOLD=oldCluster/data\n\
!   $ export PGDATANEW=newCluster/data\n\
!   $ export PGBINOLD=oldCluster/bin\n\
!   $ export PGBINNEW=newCluster/bin\n\
$ pg_upgrade\n"));
  #else
  	printf(_("\
!   C:\\> set PGDATAOLD=oldCluster/data\n\
!   C:\\> set PGDATANEW=newCluster/data\n\
!   C:\\> set PGBINOLD=oldCluster/bin\n\
!   C:\\> set PGBINNEW=newCluster/bin\n\
C:\\> pg_upgrade\n"));
  #endif
  	printf(_("\nReport bugs to .\n"));
diff --git a/contrib/

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2011-06-24 Thread Simon Riggs

On Fri, Jun 24, 2011 at 9:00 PM, Robert Haas  wrote:
> On Fri, Jun 24, 2011 at 3:46 PM, Simon Riggs  wrote:
>> Test case please. I don't understand the problem you're describing.
>
> S1: select * from foo;
> S2: begin;
> S2: alter table foo alter column a set storage plain;
> S1: select * from foo;
> 

Er,,.yes, that what locks do. Where is the bug?

We have these choices of behaviour
1. It doesn't error and doesn't block - not possible for 9.1, probably
not for 9.2 either
2. It doesn't block, but may throw an error sometimes - the reported bug
3. It blocks in some cases for short periods where people do repeated
DDL, but never throws errors - this patch
4. Full scale locking - human sacrifice, cats and dogs, living
together, mass hysteria

If you want to avoid the blocking, then don't hold open the transaction.

Do this

S1: select * from foo
S2: alter table   run in its own transaction
S1: select * from foo

Doesn't block, no errors. Which is exactly what most people do on
their production servers. The ALTER TABLE statements we're talking
about are not schema changes. They don't need to be coordinated with
other DDL.

This patch has locking, but its the most reduced form of locking that
is available for a non invasive patch for 9.1

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_locks documentation vs. SSI

2011-06-24 Thread Robert Haas

On Fri, Jun 24, 2011 at 1:08 PM, Kevin Grittner
 wrote:
>> What I think we should do is replace the existing paragraph with
>> something along these lines:
>>
>> The pg_locks view displays data from both
>> the regular lock manager and the predicate lock manager, which are
>> separate systems.  When this view is accessed, the internal data
>> structures of each lock manager are momentarily locked, and copies
>> are made for the view to display.  Each lock manager will
>> therefore produce a consistent set of results, but as we do not
>> lock both lock managers simultaneously, it is possible for locks
>> to be taken or released after we interrogate the regular lock
>> manager and before we interrogate the predicate lock manager.
>> Each lock manager is only locked for the minimum possible time so
>> as to reduce the performance impact of querying this view, but
>> there could nevertheless be some impact on database performance if
>> it is frequently accessed.
>
> I agree that it's probably better to document it than to increase
> the time that both systems are locked.  If we get complaints about
> it, we can always revisit the issue in a later release.
>
> The above wording looks fine to me.

OK, committed that change.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2011-06-24 Thread Robert Haas

On Fri, Jun 24, 2011 at 3:46 PM, Simon Riggs  wrote:
> Test case please. I don't understand the problem you're describing.

S1: select * from foo;
S2: begin;
S2: alter table foo alter column a set storage plain;
S1: select * from foo;


-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FWD: fastlock+lazyvzid patch performance

2011-06-24 Thread Robert Haas

On Fri, Jun 24, 2011 at 3:31 PM,   wrote:
> I post this results because they somehow contradict with previous results
> posted on the list. In
> my case the patches does not only improve peak performance but also improve
> the performance
> under load - without patches the performance with 256 clients is 53% of the
> peak performance
> that is obtained with 8 clients, with patches the performance with 256
> client is 79% of the peak
> with 8 clients.

I think this is strongly related to core count.  The spinlock
contention problems don't become really bad until you get up above 32
CPUs... at least from what I can tell so far.

So I'm not surprised it was just a straight win on your machine... but
thanks for verifying.  It's helpful to have more data points.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] FWD: fastlock+lazyvzid patch performance

2011-06-24 Thread karavelov


Hello,

I have seen the discussions about fastlock patch and lazy-vxid  performance 
degradation, so I decided to test it myself.

The setup:
- hardware
Supermicro blade 
6xSAS @15k on LSI RAID:
 1 disk for system + pg_xlog 
 4 disk RAID 10 for data 
 1 disk for spare 
2 x Xeon E5405 @2GHz (no HT), 8 cores total
8G RAM

- software 
Debian Sid, linux-2.6.39.1
Postgresql 9.1 beta2, compiled by debian sources
incrementally applied fastlock v3 and lazy-vxid v1 patches. I have to resolve 
manually a conflict in src/backend/storage/lmgr/proc.c
Configuration: increased shared_mem to 2G, max_connections to 500

- pgbench
initiated datasert with scaling factor 100
example command invocation: ./pgbench -h 127.0.0.1 -n -S -T 30 -c 8 -j 8 -M 
prepared pgtest

Results:

clients beta2   +fastlock   +lazyvzid local socket  
8   76064   92430   92198   106734
16  64254   90788   90698   105097
32  56629   88189   88269   101202
64  51124   84354   8463996362
128 45455   79361   7972490625
256 40370   71904   7273782434

All runs are executed on warm cache, I made some runs for 300s with the same 
results (tps).
I have done some runs with -M simple with identical distribution across cleints.

I post this results because they somehow contradict with previous results 
posted on the list. In
my case the patches does not only improve peak performance but also improve the 
performance 
under load - without patches the performance with 256 clients is 53% of the 
peak performance 
that is obtained with 8 clients, with patches the performance with 256 client 
is 79% of the peak 
with 8 clients. 

Best regards
Luben Karavelov

P.S. Excuse me for starting new thread - I am new on the list.

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2011-06-24 Thread Simon Riggs

On Thu, Jun 23, 2011 at 8:59 PM, Robert Haas  wrote:
> On Wed, Jun 22, 2011 at 11:26 AM, Simon Riggs  wrote:
>> For people that still think there is still risk, I've added a
>> parameter called "relaxed_ddl_locking" which defaults to "off". That
>> way people can use this feature in an informed way without exposing us
>> to support risks from its use.
>
> I can't get excited about adding a GUC that says "you can turn this
> off if you want but don't blame us if it breaks".  That just doesn't
> seem like good software engineering to me.

Nobody is suggesting we fix the pre-existing bug - we're just going to
leave it. That doesn't sound like good software engineering either,
but I'm not going to complain because I understand.

We're in a bind here and I'm trying to suggest practical ways out, as
well as cater for the levels of risk people are willing to accept. I
don't think we need that parameter at all, but if you do then I'm
willing to tolerate it.

>> I think we should be clear that this adds *one* lock/unlock for each
>> object for each session, ONCE only after each transaction that
>> executed a DDL statement against an object. So with 200 sessions, a
>> typical ALTER TABLE statement will cause 400 locks. The current lock
>> manager maxes out above 50,000 locks per second, so any comparative
>> analysis will show this is a minor blip on even a heavy workload.
>
> I think that using locks to address this problem is the wrong
> approach.  I think the right fix is to use something other than
> SnapshotNow to do the system catalog scans.

I agree with that, but its a much bigger job than you think and I
really doubt that you will do it for 9.2, definitely not for 9.1.

There are so many call points, I would not bother personally to
attempt a such an critical and widely invasive fix.

So I'd put that down as a Blue Sky will-fix-one-day thing.

> However, if we were going
> to use locking, then we'd need to ensure that:
>
> (1) No transaction which has made changes to a system catalog may
> commit while some other transaction is in the middle of scanning that
> catalog.
> (2) No transaction which has made changes to a set of system catalogs
> may commit while some other transaction is in the midst of fetching
> interrelated data from a subset of those catalogs.
>
> It's important to note, I think, that the problem here doesn't occur
> at the time that the table rows are updated, because those rows aren't
> visible yet.  The problem happens at commit time.  So unless I'm
> missing something, ALTER TABLE would only need to acquire the relation
> definition lock after it has finished all of its other work.  In fact,
> it doesn't even really need to acquire it then, either.  It could just
> queue up a list of relation definition locks to acquire immediately
> before commit, which would be advantageous if the transaction went on
> to abort, since in that case we'd skip the work of acquiring the locks
> at all.

That would work if the only thing ALTER TABLE does is write. There are
various places where we read catalogs and all of those catalogs need
to be relationally integrated. So the only safe way, without lots of
incredibly detailed analysis (which you would then find fault with) is
to lock at the top and hold the lock throughout most of the
processing. That's the safe thing to do, at least.

I do already use the point you make in the patch, where it's used  to
unlock before and then relock after the FK constraint check, so that
the RelationDefinition lock is never held for long periods in any code
path.

> In fact, without doing that, this patch would undo much of the point
> of the original ALTER TABLE lock strength reduction:
>
> begin;
> alter table foo alter column a set storage plain;
> 
> select * from foo;
>
> In master, the SELECT completes without blocking.  With this patch
> applied, the SELECT blocks, just as it did in 9.0, because it can't
> rebuild the relcache entry without getting the relation definition
> lock, which the alter table will hold until commit.

It's not the same at all. Yes, they are both locks; what counts is the
frequency and duration of locking.

With AccessExclusiveLock we prevent all SELECTs from accessing the
table while we queue for the lock, which can be blocked behind a long
running query. So the total outage to the application for one minor
change can be hours - unpredictably. (Which is why I never wrote the
ALTER TABLE set ndistinct myself, even though I suggested it, cos its
mostly unusable).

With the reduced locking level AND this patch the *rebuild* can be
blocked behind the DDL. But the DDL itself is not blocked in the same
way, so the total outage is much reduced and the whole set of actions
goes through fairly quickly.

If you only submit one DDL operation then the rebuild has nothing to
block against, so the whole thing goes through smoothly.

> In fact, the same
> thing happens even if foo has been accessed previously in the same
> session.  If there is alre

Re: [HACKERS] heap_hot_search_buffer refactoring

2011-06-24 Thread Robert Haas

On Sun, Jun 19, 2011 at 2:16 PM, Robert Haas  wrote:
> New patch attached, with that one-line change.

Jeff, are you planning to review this further?  Do you think it's OK to commit?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Deriving release notes from git commit messages

2011-06-24 Thread Christopher Browne

On Fri, Jun 24, 2011 at 6:47 PM, Greg Smith  wrote:
> On 06/24/2011 01:42 PM, Robert Haas wrote:
>> I am not inclined to try to track sponsors in the commit message at
>> all.
>
> I was not suggesting that information be part of the commit.  We've worked
> out a reasonable initial process for the people working on sponsored
> features to record that information completely outside of the commit or
> release notes data.  It turns out though that process would be easier to
> drive if it were easier to derive a feature->{commit,author} list
> though--and that would spit out for free with the rest of this.  Improving
> the ability to do sponsor tracking is more of a helpful side-effect of
> something that's useful for other reasons rather than a direct goal.

In taking a peek at the documentation and comments out on the interweb
about "git amend," I don't think that it's going to be particularly
successful if we try to capture all this stuff in the commit message
and metadata, because it's tough to guarantee that *all* this data
would be correctly captured at commit time, and you can't revise it
after the commit gets pushed upstream.

That applies to anything we might want to track, whether "Author:" (at
the 'likely to be useful', and 'could quite readily get it correct the
first time' end of the spectrum) or "Sponsor:" (which is at more than
one dubious ends of spectra).

I expect that the correlation between commit and [various parties] is
something that will need to take place outside git.

The existing CommitFest data goes quite a long ways towards capturing
interesting information (with the likely exception of sponsorship);
what it's missing, at this point, is a capture of what commit or
commits wound up drawing the proposed patch into the official code
base.  That suggests a pretty natural extension, and this is something
I really would like to see.  I'd love to head to the CommitFest page,
and have a list of URLs pointing to the commits that implemented a
particular change.

Given that particular correspondence (e.g. - commitfest request ->
list of commits), that would help associate authors, reviewers, and
such to each commit that was related to CommitFest work.  That doesn't
cover everything, but it's a plenty good start.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Optimizing pg_trgm makesign() (was Re: [HACKERS] WIP: Fast GiST index build)

2011-06-24 Thread Robert Haas

On Fri, Jun 24, 2011 at 3:01 PM, Heikki Linnakangas
 wrote:
> On 24.06.2011 21:24, Robert Haas wrote:
>>
>> Out of curiosity (and because there is no comment or Assert here), how
>> can you be so sure of the input alignment?
>
> The input TRGM to makesign() is a varlena, so it must be at least 4-byte
> aligned. If it was not for some reason, the existing VARSIZE invocation
> (within GETARR()) would already fail on platforms that are strict about
> alignment.

Hmm, OK.  Might be worth adding a comment, anyway...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Deriving release notes from git commit messages

2011-06-24 Thread Robert Haas

On Fri, Jun 24, 2011 at 2:47 PM, Greg Smith  wrote:
> On 06/24/2011 01:42 PM, Robert Haas wrote:
>> I am disinclined to add a "feature"
>> annotation.  I think it is unlikely that will end up being any more
>> useful than just extracting either the whole commit message or its
>> first line.
>
> I don't see any good way to extract the list of commits relevant to the
> release notes without something like it.  Right now, you can't just mine
> every commit into the release notes without getting more noise than signal.
>  Something that tags the ones that are adding new features or other notable
> updates, as opposed to bug fixes, doc updates, etc., would allow that
> separation.

Oh, I see.  There's definitely some fuzziness about which commits make
it into the release notes right now and which do not - sometimes
things get missed, or sometimes Bruce omits something I would have
included or includes something I would have omitted.  OTOH, it's not
clear that making every committer do that on every commit is going to
be better than having one person go through the log and decide
everything all at once.

If I were attacking this problem, I'd be inclined to make a web
application that lists all the commits in a format roughly similar to
the git API, and then lets you tag each commit with tags from some
list (feature, bug-fix, revert, repair-of-previous-commit, etc.).
Some of the tagging (e.g. docs-only) could probably even be done
automatically.  Then somebody could go through once a month and update
all the tags.  I'd be more more willing to volunteer to do that than I
would be to trying to get the right metadata tag in every commit...

>> I am not inclined to try to track sponsors in the commit message at
>> all.
>
> I was not suggesting that information be part of the commit.  We've worked
> out a reasonable initial process for the people working on sponsored
> features to record that information completely outside of the commit or
> release notes data.  It turns out though that process would be easier to
> drive if it were easier to derive a feature->{commit,author} list
> though--and that would spit out for free with the rest of this.  Improving
> the ability to do sponsor tracking is more of a helpful side-effect of
> something that's useful for other reasons rather than a direct goal.

OK, that makes sense.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Optimizing pg_trgm makesign() (was Re: [HACKERS] WIP: Fast GiST index build)

2011-06-24 Thread Heikki Linnakangas


On 24.06.2011 21:24, Robert Haas wrote:

Out of curiosity (and because there is no comment or Assert here), how
can you be so sure of the input alignment?


The input TRGM to makesign() is a varlena, so it must be at least 4-byte 
aligned. If it was not for some reason, the existing VARSIZE invocation 
(within GETARR()) would already fail on platforms that are strict about 
alignment.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] News on Clang

2011-06-24 Thread Jeroen Vermeulen


On 2011-06-25 00:02, Peter Geoghegan wrote:


At a large presentation that I and other PG community members were
present at during FOSDEM, Postgres was specifically cited as an
example of a medium-sized C program that had considerably improved
compile times on Clang. While I was obviously unable to reproduce the
very impressive compile-time numbers claimed (at -O0), I still think
that Clang has a lot of promise. Here are the slides from that
presentation:

http://www.scribd.com/doc/48921683/LLVM-Clang-Advancing-Compiler-Technology


I notice that the slide about compilation speed on postgres compares 
only front-end speeds between gcc and clang, not the speeds for 
optimization and code generation.  That may explain why the difference 
is more pronounced on the slide than it is for a real build.


By the way, I was amazed to see such a young compiler build libpqxx with 
no other problems than a few justified warnings or errors that gcc 
hadn't issued.  And that's C++, which is a lot harder than C!  The 
output was also far more helpful than gcc's.  IIRC I found clang just 
slightly faster than gcc on a full configure/build/test.



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Deriving release notes from git commit messages

2011-06-24 Thread Greg Smith


On 06/24/2011 01:42 PM, Robert Haas wrote:

I am disinclined to add a "feature"
annotation.  I think it is unlikely that will end up being any more
useful than just extracting either the whole commit message or its
first line.
   


I don't see any good way to extract the list of commits relevant to the 
release notes without something like it.  Right now, you can't just mine 
every commit into the release notes without getting more noise than 
signal.  Something that tags the ones that are adding new features or 
other notable updates, as opposed to bug fixes, doc updates, etc., would 
allow that separation.



I am not inclined to try to track sponsors in the commit message at
all.


I was not suggesting that information be part of the commit.  We've 
worked out a reasonable initial process for the people working on 
sponsored features to record that information completely outside of the 
commit or release notes data.  It turns out though that process would be 
easier to drive if it were easier to derive a feature->{commit,author} 
list though--and that would spit out for free with the rest of this.  
Improving the ability to do sponsor tracking is more of a helpful 
side-effect of something that's useful for other reasons rather than a 
direct goal.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Optimizing pg_trgm makesign() (was Re: [HACKERS] WIP: Fast GiST index build)

2011-06-24 Thread Robert Haas

On Fri, Jun 24, 2011 at 12:51 PM, Heikki Linnakangas
 wrote:
> On 24.06.2011 11:40, Heikki Linnakangas wrote:
>>
>> On 21.06.2011 13:08, Alexander Korotkov wrote:
>>>
>>> I believe it's due to relatively expensive penalty
>>> method in that opclass.
>>
>> Hmm, I wonder if it could be optimized. I did a quick test, creating a
>> gist_trgm_ops index on a list of English words from
>> /usr/share/dict/words. oprofile shows that with the patch, 60% of the
>> CPU time is spent in the makesign() function.
>
> I couldn't resist looking into this, and came up with the attached patch. I
> tested this with:
>
> CREATE TABLE words (word text);
> COPY words FROM '/usr/share/dict/words';
> CREATE INDEX i_words ON words USING gist (word gist_trgm_ops);
>
> And then ran "REINDEX INDEX i_words" a few times with and without the patch.
> Without the patch, reindex takes about 4.7 seconds. With the patch, 3.7
> seconds. That's a worthwhile gain on its own, but becomes even more
> important with Alexander's fast GiST build patch, which calls the penalty
> function more.
>
> I used the attached showsign-debug.patch to verify that the patched makesign
> function produces the same results as the existing code. I haven't tested
> the big-endian code, however.

Out of curiosity (and because there is no comment or Assert here), how
can you be so sure of the input alignment?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] crash-safe visibility map, take five

2011-06-24 Thread Robert Haas

On Fri, Jun 24, 2011 at 1:43 PM, Jeff Davis  wrote:
>> And anything that is WAL-logged must obey the WAL-before-data rule.
>> We have a system already that ensures that when
>> synchronous_commit=off, CLOG pages can't be flushed before the
>> corresponding WAL record makes it to disk.
>
> In this case, how do you prevent the PD_ALL_VISIBLE from making it to
> disk if you never bumped the LSN when it was set? It seems like you just
> don't have the information to do so, and it seems like the information
> required would be variable in size.

Well, I think that would be a problem for the hypothetical implementer
of the persistent snapshot feature.  :-)

More seriously, Heikki and I previously discussed creating some
systematic method for suppressing FPIs when they are not needed,
perhaps by using a bit in the page header to indicate whether an FPI
has been generated since the last checkpoint.  I think it would be
nice to have such a system, but since we don't have a clear agreement
either that it's a good idea or what we'd do after that, I'm not
inclined to invest time in it.  To really get any benefit out of a
change in that area, we'd need probably need to (a) remove the LSN
interlocks that prevent changes from being replayed if the LSN of the
page has already advanced beyond the record LSN and (b) change at
least some of XLOG_HEAP_{INSERT,UPDATE,DELETE} to be idempotent.  But
if we went in that direction then that might help to regularize some
of this and make it a bit less ad-hoc.

> I didn't mean to make this conversation quite so hypothetical. My
> primary points are:
>
> 1. Sometimes it makes sense to break the typical WAL conventions for
> performance reasons. But when we do so, we have to be quite careful,
> because things get complicated quickly.

Yes.

> 2. PD_ALL_VISIBLE is a little bit more complex than other hint bits,
> because the conditions under which it may be set are more complex
> (having to do with both snapshots and cleanup actions). Other hint bits
> are based only on transaction status: either the WAL for that
> transaction completion got flushed (and is therefore permanent), and we
> set the hint bit; or it didn't get flushed and we don't.

I think the term "hint bits" really shouldn't be applied to anything
other than HEAP_{XMIN,XMAX}_{COMMITTED,INVALID}.  Otherwise, we get
into confusing territory pretty quickly.  Our algorithm for
opportunistically killing index entries pointing to dead tuples is not
WAL-logged, but it involves more than a single bit.  OTOH, clearing of
the PD_ALL_VISIBLE bit has always been WAL-logged, so lumping that in
with HEAP_XMIN_COMMITTED is pretty misleading.

> Just having this discussion has been good enough for me to get a better
> idea what's going on, so if you think the comments are sufficient that's
> OK with me.

I'm not 100% certain they are, but let's wait and see if anyone else
wants to weigh in...  please do understand I'm not trying to be a pain
in the neck.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] crash-safe visibility map, take five

2011-06-24 Thread Jeff Davis

On Thu, 2011-06-23 at 22:02 -0400, Robert Haas wrote:
> > 1. INSERT to a new page, marking it with LSN X
> > 2. WAL flushed to LSN Y (Y > X)
> > 2. Some persistent snapshot (that doesn't see the INSERT) is released,
> > and generates WAL recording that fact with LSN Z (Z > Y)
> > 3. Lazy VACUUM marks the newly all-visible page with PD_ALL_VISIBLE
> > 4. page is written out because LSN is still X
> > 5. crash

> I don't really think that's a separate set of assumptions - if we had
> some system whereby snapshots could survive a crash, then they'd have
> to be WAL-logged (because that's how we make things survive crashes).

In the above example, it is WAL-logged (with LSN Z).

> And anything that is WAL-logged must obey the WAL-before-data rule.
> We have a system already that ensures that when
> synchronous_commit=off, CLOG pages can't be flushed before the
> corresponding WAL record makes it to disk.

In this case, how do you prevent the PD_ALL_VISIBLE from making it to
disk if you never bumped the LSN when it was set? It seems like you just
don't have the information to do so, and it seems like the information
required would be variable in size.

> I guess the point you are driving at here is that a page can only go
> from being all-visible to not-all-visible by virtue of being modified.
>  There's no other piece of state (like a persistent snapshot) that can
> be lost as part of a crash that would make us need change our mind and
> decide that an all-visible XID is really not all-visible after all.
> (The reverse is not true: since snapshots are ephemeral, a crash will
> render every row either all-visible or dead.)  I guess I never thought
> about documenting that particular aspect of it because (to me) it
> seems fairly self-evident.  Maybe I'm wrong...

I didn't mean to make this conversation quite so hypothetical. My
primary points are:

1. Sometimes it makes sense to break the typical WAL conventions for
performance reasons. But when we do so, we have to be quite careful,
because things get complicated quickly.

2. PD_ALL_VISIBLE is a little bit more complex than other hint bits,
because the conditions under which it may be set are more complex
(having to do with both snapshots and cleanup actions). Other hint bits
are based only on transaction status: either the WAL for that
transaction completion got flushed (and is therefore permanent), and we
set the hint bit; or it didn't get flushed and we don't.

Just having this discussion has been good enough for me to get a better
idea what's going on, so if you think the comments are sufficient that's
OK with me.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Deriving release notes from git commit messages

2011-06-24 Thread Robert Haas

On Fri, Jun 24, 2011 at 1:15 PM, Greg Smith  wrote:
> There's been a steady flow of messages on pgsql-advocacy since last month
> (threads "Crediting sponsors in release notes?" and "Crediting reviewers &
> bug-reporters in the release notes") talking about who/how should receive
> credited for their work on PostgreSQL.  That discussion seems to be me
> heading in one inevitable direction:  it's not going to be possible to make
> everyone happy unless there's a way to track all of these things for each
> feature added to PostgreSQL:

I don't read -advocacy regularly, but reviewing the archives, it
doesn't seem to me that we've reached any conclusion about whether
including this information in the release notes is a good idea in the
first place.  It seems to me that one name for each feature is about
at the limit of what we can reasonably do without cluttering the
release notes to the point of unreadability.  I am OK with it the way
it is, but if we must change it I would argue that we ought to have
less credit there, not more.  Which is not to say we shouldn't have
credit.  I think crediting sponsors and reviewers and bug reporters is
a good idea.  But I think the web site is the place to do that, not
the release notes.

As for annotating the commit messages, I think something like:

Reporter: Sam Jones
Author: Beverly Smith
Author: Jim Davids
Reviewer: Fred Block
Reviewer: Pauline Andrews

...would be a useful convention.  I am disinclined to add a "feature"
annotation.  I think it is unlikely that will end up being any more
useful than just extracting either the whole commit message or its
first line.

I am not inclined to try to track sponsors in the commit message at
all.  Suppose Jeff Davis submits a patch, Stephen Frost reviews it,
and I commit it.  Besides the three human beings involved,
potentially, you've got three employers who might be considered
sponsors, plus any customers of those employers who might have paid
said employer money to justify the time spent on that patch.  On a big
patch, you could easily have ten companies involved in different
roles, some of whom may have made a far larger real contribution to
the development of the feature than others, and I am 100% opposed to
making it the committer's job to include all that in the commit
message.   Also, unlike individuals (whose names can usually be read
off the thread in a few seconds), it is not necessarily obvious who
the corporate participants are (or which ones even WANT to be
credited).  It is almost certain that the committer will sometimes get
it wrong, and the commit log is a terrible place to record information
that might need to be changed after the fact.

It seems to me that, at least for sponsorship information, it would be
far better to have a separate database that pulls in the commit logs
and then gets annotated by the people who care.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Deriving release notes from git commit messages

2011-06-24 Thread Christopher Browne

On Fri, Jun 24, 2011 at 5:15 PM, Greg Smith  wrote:
> -All of these other ways to analyze of the contributors would be much easier
> to maintain.  A little "Author:" decoration to that section of each commit
> would probably be welcome too.

I think you're quite right, that "mining" the commit logs for these
sorts of information is very much the right answer.

Initially, the choice has been to not use the Author tag in Git; I
think that came as part of the overall intent that, at least, in the
beginning, the workflow of the PostgreSQL project shouldn't diverge
terribly much from how it had been with CVS.

It makes sense to have some debate about additional features to
consider using to capture useful workflow.  Sadly, I think that would
have been a very useful debate to have held at the Devs meeting in
Ottawa, when a lot of the relevant people were in a single room; it's
a bit harder now.

In any case, having some tooling to rummage through commits to
generate proposals for release notes seems like a fine idea.  Even
without policy changes (e.g. - to start using Author:, and possibly
other explicit metadata), it would surely be possible to propose
release note contents that tries to use what it finds.

For instance, if the tool captured all email addresses that it finds
in a commit message, and stows them in a convenient spot, that makes
it easier for the human to review the addresses and classify which
might indicate authorship.  Maybe a step ahead.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: starting to review the Extend NOT NULL representation to pg_constraint patch

2011-06-24 Thread Alvaro Herrera

Excerpts from Bernd Helmle's message of vie jun 24 12:56:26 -0400 2011:
> 
> --On 24. Juni 2011 12:06:17 -0400 Robert Haas  wrote:
> 
> > Uh, really?  pg_upgrade uses SQL commands to recreate the contents of
> > the system catalogs, so if these entries aren't getting created
> > automatically, that sounds like a bug in the patch...
> 
> AFAIR, i've tested it and it worked as expected.

Okay, thanks for the confirmation.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Deriving release notes from git commit messages

2011-06-24 Thread Greg Smith

There's been a steady flow of messages on pgsql-advocacy since last 
month (threads "Crediting sponsors in release notes?" and "Crediting 
reviewers & bug-reporters in the release notes") talking about who/how 
should receive credited for their work on PostgreSQL.  That discussion 
seems to be me heading in one inevitable direction:  it's not going to 
be possible to make everyone happy unless there's a way to track all of 
these things for each feature added to PostgreSQL:


-Short description for the release notes
-Feature author(s)
-Reviewers and bug reporters
-Sponsors
-Main git commit adding the feature

Now, this is clearly the job for a tool, because the idea that any 
person capable of doing this work will actually do it is 
laughable--everyone qualified is too busy.  It strikes me however that 
the current production of the release notes is itself a time consuming 
and error-prone process that could also be improved by automation.  I 
had an idea for pushing forward both these at once.


Committers here are pretty good at writing terse but clear summaries of 
new features when they are added.  These are generally distilled further 
for the release notes.  It strikes me that a little decoration of commit 
messages might go a long way toward saving time in a few areas here.  
I'll pick a simple easy example I did to demonstrate; I wrote a small 
optimization to commit_delay committed at 
http://archives.postgresql.org/message-id/e1pqp72-0001us...@gemulon.postgresql.org


This made its way into the release notes like this:

  Improve performance of commit_siblings (Greg Smith)
  This allows the use of commit_siblings with less overhead.

What if the commit message had been decorated like this?

  Feature:  Improve performance of commit_siblings

  Optimize commit_siblings in two ways to improve group commit.
  First, avoid scanning the whole ProcArray once we know there...

With that simple addition, two things become possible:

-Generating a first draft of the release notes for a new version could 
turn into a script that parses the git commit logs, which has gotta save 
somebody a whole lot of time each release that goes into the first draft 
of the release notes.
-All of these other ways to analyze of the contributors would be much 
easier to maintain.  A little "Author:" decoration to that section of 
each commit would probably be welcome too.


I'm sure someone is going to reply to this suggesting some git metadata 
is the right way to handle this, but that seems like overkill to me.  I 
think there's enough committer time gained in faster release note 
generation for this decoration to payback its overhead, which is 
important to me--I'd want a change here to net close to zero for 
committers.  And the fact that it would also allow deriving all this 
other data makes it easier to drive the goals rising out of advocacy 
forward too.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] News on Clang

2011-06-24 Thread Robert Haas

On Fri, Jun 24, 2011 at 1:02 PM, Peter Geoghegan  wrote:
> [research]

Nice stuff.  Thanks for the update.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_locks documentation vs. SSI

2011-06-24 Thread Kevin Grittner

Robert Haas  wrote:
> Kevin Grittner  wrote:
 
>> It remains true that the heavyweight locking structures are
>> momentarily locked to capture a consistent view of those, and it
>> is also true that the predicate locking structures are
>> momentarily locked to get a consistent view of that data.  Both
>> are not covered by some over-arching lock, but...
> 
> ...but they could be, if we were so inclined.
 
Yes.
 
> What we actually do is acquire all of the lock manager locks,
> snapshot the state, release those locks, then get the predicate
> lock manager locks and SerializableXactHashLock, snapshot the
> state over there, release those locks, and then dump everything
> out.  Since we don't do that, it's possible that "select * from
> pg_locks" could see a state that never really existed.
> 
> For example, it's possible that, after doing GetLockStatusData()
> but before doing GetPredicateLockStatusData(), some backend might
> commit a transaction, releasing a heavyweight lock, begin a new
> transaction, and acquire a predicate lock.  Now, the backend
> looking at the lock table is going to see both of those locks held
> at the same time even though in reality that never happened.
 
That's a fair point.
 
> What I think we should do is replace the existing paragraph with
> something along these lines:
> 
> The pg_locks view displays data from both
> the regular lock manager and the predicate lock manager, which are
> separate systems.  When this view is accessed, the internal data
> structures of each lock manager are momentarily locked, and copies
> are made for the view to display.  Each lock manager will
> therefore produce a consistent set of results, but as we do not
> lock both lock managers simultaneously, it is possible for locks
> to be taken or released after we interrogate the regular lock
> manager and before we interrogate the predicate lock manager. 
> Each lock manager is only locked for the minimum possible time so
> as to reduce the performance impact of querying this view, but
> there could nevertheless be some impact on database performance if
> it is frequently accessed.
 
I agree that it's probably better to document it than to increase
the time that both systems are locked.  If we get complaints about
it, we can always revisit the issue in a later release.
 
The above wording looks fine to me.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] News on Clang

2011-06-24 Thread Peter Geoghegan

A couple of months back, Greg Stark reported mixed results on getting
Clang/LLVM to build Postgres. It could be done, but there were some
bugs, including a bug that caused certain grammar TUs to take way too
long to compile. 2 bug reports were filed. He said that the long
compile time problem was made a lot better by using SVN-tip as it
existed at the time - it brought compile time down from over a minute
to just 15 seconds for preproc.c, which was of particular concern
then.

Last night, I had a go at getting Postgres to build using my own build
of Clang SVN-tip (2.9). I was successful, but I too found certain
grammar TUs to be very slow to compile. Total times were 3m23s for
Clang, to GCC 4.6's 2m1s. I made a (perhaps duplicate) bug report,
which had a preprocessed gram.c as a testcase. Here was the compile
time that I saw for the file:

[peter@peter postgresql]$ time /home/peter/build/Release/bin/clang -O2
-Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -Wformat-security
-fno-strict-aliasing -fwrapv -Wno-error -I. -I. -I
/home/peter/postgresql/src/include -D_GNU_SOURCE   -c -o gram.o
/home/peter/postgresql/src/backend/parser/gram.c
In file included from gram.y:12939:
scan.c:16246:23: warning: unused variable 'yyg' [-Wunused-variable]
struct yyguts_t * yyg = (struct yyguts_t*)yyscanner; /* This var ...
  ^
1 warning generated.

real1m5.780s
user1m4.915s
sys 0m0.086s

The compile time is astronomically high, considering it takes about 2
seconds to compile gram.c on the same machine using GCC 4.6 with the
same flags.

This problem is reportedly a front-end issue. Observe what happens
when I omit the -Wall flag:

[peter@peter postgresql]$ time /home/peter/build/Release/bin/clang -O2
-Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement
-Wendif-labels -Wformat-security -fno-strict-aliasing -fwrapv
-Wno-error -I. -I. -I /home/peter/postgresql/src/include -D_GNU_SOURCE
  -c -o gram.o /home/peter/postgresql/src/backend/parser/gram.c

real0m2.153s
user0m2.073s
sys 0m0.065s
[peter@peter postgresql]$

This is very close to the time for GCC. The problem has been further
isolated to the flag -Wuninitialized.

I hacked our configure file to omit -Wall in $CFLAGS . -Wall is just a
flag to have GCC/Clang "show all reasonable warnings" - it doesn't
affect binaries. I then measured the total build time for Postgres
using Clang SVN-tip. Here's what "time make" outputs:

* SNIP *
real2m7.102s
user1m49.015s
sys 0m10.635s

I'm very encouraged by this - Clang is snapping at the heels of GCC
here. I'd really like to see Clang as a better supported compiler,
because the whole LLVM infrastructure seems to have a lot to offer -
potentially improved compile times, better warnings/errors, a good
static analysis tool, a nice debugger, and a nice modular architecture
for integration with third party components. It is still a bit
immature, but it has the momentum.

At a large presentation that I and other PG community members were
present at during FOSDEM, Postgres was specifically cited as an
example of a medium-sized C program that had considerably improved
compile times on Clang. While I was obviously unable to reproduce the
very impressive compile-time numbers claimed (at -O0), I still think
that Clang has a lot of promise. Here are the slides from that
presentation:

http://www.scribd.com/doc/48921683/LLVM-Clang-Advancing-Compiler-Technology

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: starting to review the Extend NOT NULL representation to pg_constraint patch

2011-06-24 Thread Bernd Helmle




--On 24. Juni 2011 12:06:17 -0400 Robert Haas  wrote:


Uh, really?  pg_upgrade uses SQL commands to recreate the contents of
the system catalogs, so if these entries aren't getting created
automatically, that sounds like a bug in the patch...


AFAIR, i've tested it and it worked as expected.

--
Thanks

Bernd

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Optimizing pg_trgm makesign() (was Re: [HACKERS] WIP: Fast GiST index build)

2011-06-24 Thread Heikki Linnakangas


On 24.06.2011 11:40, Heikki Linnakangas wrote:

On 21.06.2011 13:08, Alexander Korotkov wrote:

I believe it's due to relatively expensive penalty
method in that opclass.


Hmm, I wonder if it could be optimized. I did a quick test, creating a
gist_trgm_ops index on a list of English words from
/usr/share/dict/words. oprofile shows that with the patch, 60% of the
CPU time is spent in the makesign() function.


I couldn't resist looking into this, and came up with the attached 
patch. I tested this with:


CREATE TABLE words (word text);
COPY words FROM '/usr/share/dict/words';
CREATE INDEX i_words ON words USING gist (word gist_trgm_ops);

And then ran "REINDEX INDEX i_words" a few times with and without the 
patch. Without the patch, reindex takes about 4.7 seconds. With the 
patch, 3.7 seconds. That's a worthwhile gain on its own, but becomes 
even more important with Alexander's fast GiST build patch, which calls 
the penalty function more.


I used the attached showsign-debug.patch to verify that the patched 
makesign function produces the same results as the existing code. I 
haven't tested the big-endian code, however.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/contrib/pg_trgm/trgm_gist.c b/contrib/pg_trgm/trgm_gist.c
index b328a09..1f3d3e3 100644
--- a/contrib/pg_trgm/trgm_gist.c
+++ b/contrib/pg_trgm/trgm_gist.c
@@ -84,17 +84,88 @@ gtrgm_out(PG_FUNCTION_ARGS)
 static void
 makesign(BITVECP sign, TRGM *a)
 {
-	int4		k,
-len = ARRNELEM(a);
+	int4		len = ARRNELEM(a);
 	trgm	   *ptr = GETARR(a);
-	int4		tmp = 0;
+	char	   *p;
+	char	   *endptr;
+	uint32		w1,
+w2,
+w3;
+	uint32		trg1,
+trg2,
+trg3,
+trg4;
+	uint32	   *p32;
 
 	MemSet((void *) sign, 0, sizeof(BITVEC));
 	SETBIT(sign, SIGLENBIT);	/* set last unused bit */
-	for (k = 0; k < len; k++)
+
+	if (len == 0)
+		return;
+
+	endptr = (char *) (ptr + len);
+	/*
+	 * We have to extract each trigram into a uint32, and calculate the HASH.
+	 * This would be a lot easier if each trigram was aligned at 4-byte
+	 * boundary, but they're not. The simple way would be to copy each
+	 * trigram byte-per-byte, but that is quite slow, and this function is a
+	 * hotspot in penalty calculation.
+	 *
+	 * The first trigram in the array doesn't begin at a 4-byte boundary, the
+	 * flags byte comes first, but the next one does. So we fetch the first
+	 * trigram as a special case, and after that the following four trigrams
+	 * fall onto 4-byte words like this:
+	 *
+	 *  w1   w2   w3
+	 * AAAB BBCC CDDD
+	 *
+	 * As long as there's at least four trigrams left to process, we fetch
+	 * the next three words and extract the trigrams from them with bit
+	 * operations.
+	 *
+	 */
+	p32 = (uint32 *) (((char *) ptr) - 1);
+
+	/* Fetch and extract the initial word */
+	w1 = *(p32++);
+#ifdef WORDS_BIGENDIAN
+	trg1 = w1 << 8;
+#else
+	trg1 = w1 >> 8;
+#endif
+	HASH(sign, trg1);
+
+	while((char *) p32 < endptr - 12)
 	{
-		CPTRGM(((char *) &tmp), ptr + k);
-		HASH(sign, tmp);
+		w1 = *(p32++);
+		w2 = *(p32++);
+		w3 = *(p32++);
+
+#ifdef WORDS_BIGENDIAN
+		trg1 = w1 & 0xFF00;
+		trg2 = (w1 << 24) | ((w2 & 0x) >> 8);
+		trg3 = ((w2 & 0x) << 16) | ((w3 & 0xFF00) >> 16);
+		trg4 = w3 << 8;
+#else
+		trg1 = w1 & 0x00FF;
+		trg2 = (w1 >> 24) | ((w2 & 0x) << 8);
+		trg3 = ((w2 & 0x) >> 16) | ((w3 & 0x00FF) << 16);
+		trg4 = w3 >> 8;
+#endif
+
+		HASH(sign, trg1);
+		HASH(sign, trg2);
+		HASH(sign, trg3);
+		HASH(sign, trg4);
+	}
+
+	/* Handle remaining 1-3 trigrams the slow way */
+	p = (char *) p32;
+	while (p < endptr)
+	{
+		CPTRGM(((char *) &trg1), p);
+		HASH(sign, trg1);
+		p += 3;
 	}
 }
 
diff --git a/contrib/pg_trgm/trgm_gist.c b/contrib/pg_trgm/trgm_gist.c
index b328a09..b5be800 100644
--- a/contrib/pg_trgm/trgm_gist.c
+++ b/contrib/pg_trgm/trgm_gist.c
@@ -44,6 +44,9 @@ Datum		gtrgm_penalty(PG_FUNCTION_ARGS);
 PG_FUNCTION_INFO_V1(gtrgm_picksplit);
 Datum		gtrgm_picksplit(PG_FUNCTION_ARGS);
 
+PG_FUNCTION_INFO_V1(gtrgm_showsign);
+Datum		gtrgm_showsign(PG_FUNCTION_ARGS);
+
 #define GETENTRY(vec,pos) ((TRGM *) DatumGetPointer((vec)->vector[(pos)].key))
 
 /* Number of one-bits in an unsigned byte */
@@ -98,6 +101,32 @@ makesign(BITVECP sign, TRGM *a)
 	}
 }
 
+static char *
+printsign(BITVECP sign)
+{
+	static char c[200];
+	char *p = c;
+	int i;
+	for(i=0; i < SIGLEN;i++)
+	{
+		p += snprintf(p, 3, "%02x", (unsigned int) (((unsigned char *) sign)[i]));
+	}
+	return c;
+}
+
+Datum
+gtrgm_showsign(PG_FUNCTION_ARGS)
+{
+	text	   *in = PG_GETARG_TEXT_P(0);
+	BITVEC		sign;
+	TRGM	   *trg;
+
+	trg = generate_trgm(VARDATA(in), VARSIZE(in) - VARHDRSZ);
+	makesign(sign, trg);
+
+	PG_RETURN_TEXT_P(cstring_to_text(printsign(sign)));
+}
+
 Datum
 gtrgm_compress(PG_FUNCTION_ARGS)
 {

-- 
Sent via pgsql-hackers mailing list (pgsq

Re: [HACKERS] pg_locks documentation vs. SSI

2011-06-24 Thread Robert Haas

On Fri, Jun 24, 2011 at 12:27 PM, Kevin Grittner
 wrote:
> Robert Haas  wrote:
>> While taking a look at the existing documentation for pg_locks I
>> came across the following paragraph:
>>
>>   
>>    When the pg_locks view is accessed,
>>    the internal lock manager data structures are momentarily
>>    locked, and a copy is made for the view to display.  This
>>    ensures that the view produces a consistent set of results,
>>    while not blocking normal lock manager operations longer than
>>    necessary. Nonetheless there could be some impact on database
>>    performance if this view is frequently accessed.
>>   
>>
>> AFAICS, this is no longer quite true.  The view of the data from
>> the main lock manager will be self-consistent, and the view of the
>> data from the predicate lock manager will be self-consistent, but
>> they need not be consistent with each other.  I don't think that's
>> a problem we need to fix, but we probably ought to make the
>> documentation match the behavior.
>
> It remains true that the heavyweight locking structures are
> momentarily locked to capture a consistent view of those, and it is
> also true that the predicate locking structures are momentarily
> locked to get a consistent view of that data.  Both are not covered
> by some over-arching lock, but...

...but they could be, if we were so inclined.  We could acquire all of
the lock manager partition locks, all of the predicate lock manager
partition locks, and the lock on SerializableXactHashLock - then dump
all the lock state from both tables - then release all the locks.
What we actually do is acquire all of the lock manager locks, snapshot
the state, release those locks, then get the predicate lock manager
locks and SerializableXactHashLock, snapshot the state over there,
release those locks, and then dump everything out.  Since we don't do
that, it's possible that "select * from pg_locks" could see a state
that never really existed.

For example, it's possible that, after doing GetLockStatusData() but
before doing GetPredicateLockStatusData(), some backend might commit a
transaction, releasing a heavyweight lock, begin a new transaction,
and acquire a predicate lock.  Now, the backend looking at the lock
table is going to see both of those locks held at the same time even
though in reality that never happened.  The existing documentation for
pg_locks indicates that such anomalies are possible because it's based
on the (heretofore correct) idea that we're going to grab every single
relevant lwlock at the same time before taking a snapshot of the
system state.

What I think we should do is replace the existing paragraph with
something along these lines:

The pg_locks view displays data from both the
regular lock manager and the predicate lock manager, which are
separate systems.  When this view is accessed, the internal data
structures of each lock manager are momentarily locked, and copies are
made for the view to display.  Each lock manager will therefore
produce a consistent set of results, but as we do not lock both lock
managers simultaneously, it is possible for locks to be taken or
released after we interrogate the regular lock manager and before we
interrogate the predicate lock manager.  Each lock manager is only
locked for the minimum possible time so as to reduce the performance
impact of querying this view, but there could nevertheless be some
impact on database performance if it is frequently accessed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_locks documentation vs. SSI

2011-06-24 Thread Kevin Grittner

Robert Haas  wrote:
> While taking a look at the existing documentation for pg_locks I
> came across the following paragraph:
> 
>   
>When the pg_locks view is accessed,
>the internal lock manager data structures are momentarily
>locked, and a copy is made for the view to display.  This
>ensures that the view produces a consistent set of results,
>while not blocking normal lock manager operations longer than
>necessary. Nonetheless there could be some impact on database
>performance if this view is frequently accessed.
>   
> 
> AFAICS, this is no longer quite true.  The view of the data from
> the main lock manager will be self-consistent, and the view of the
> data from the predicate lock manager will be self-consistent, but
> they need not be consistent with each other.  I don't think that's
> a problem we need to fix, but we probably ought to make the
> documentation match the behavior.

It remains true that the heavyweight locking structures are
momentarily locked to capture a consistent view of those, and it is
also true that the predicate locking structures are momentarily
locked to get a consistent view of that data.  Both are not covered
by some over-arching lock, but that's true at *all* times --
SIReadLock entries can disappear mid-transaction (for READ ONLY
transactions) and can persist past transaction completion (if there
are overlapping transactions with which the completed transaction
can still form a dangerous structure).  So there is never a very
close tie between the timing of the appearance or disappearance for
SIRead locks versus any heavyweight locks.

That is covered to some degree in the section on serializable
transactions:

http://developer.postgresql.org/pgdocs/postgres/transaction-iso.html#XACT-SERIALIZABLE

Any thoughts on what else the docs need to be more clear?

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: starting to review the Extend NOT NULL representation to pg_constraint patch

2011-06-24 Thread Alvaro Herrera

Excerpts from Robert Haas's message of vie jun 24 12:06:17 -0400 2011:
> On Fri, Jun 24, 2011 at 11:17 AM, Alvaro Herrera
>  wrote:
> > Another thing I just noticed is that if you pg_upgrade from a previous
> > release, all the NOT NULL pg_constraint rows are going to be missing.
> 
> Uh, really?  pg_upgrade uses SQL commands to recreate the contents of
> the system catalogs, so if these entries aren't getting created
> automatically, that sounds like a bug in the patch...

Ah -- we should be fine then.  I haven't tested that yet (actually I
haven't finished tinkering with the code).

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_locks documentation vs. SSI

2011-06-24 Thread Robert Haas

While taking a look at the existing documentation for pg_locks I came
across the following paragraph:

  
   When the pg_locks view is accessed, the
   internal lock manager data structures are momentarily locked, and
   a copy is made for the view to display.  This ensures that the
   view produces a consistent set of results, while not blocking
   normal lock manager operations longer than necessary.  Nonetheless
   there could be some impact on database performance if this view is
   frequently accessed.
  

AFAICS, this is no longer quite true.  The view of the data from the
main lock manager will be self-consistent, and the view of the data
from the predicate lock manager will be self-consistent, but they need
not be consistent with each other.  I don't think that's a problem we
need to fix, but we probably ought to make the documentation match the
behavior.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: starting to review the Extend NOT NULL representation to pg_constraint patch

2011-06-24 Thread Robert Haas

On Fri, Jun 24, 2011 at 11:17 AM, Alvaro Herrera
 wrote:
> Another thing I just noticed is that if you pg_upgrade from a previous
> release, all the NOT NULL pg_constraint rows are going to be missing.

Uh, really?  pg_upgrade uses SQL commands to recreate the contents of
the system catalogs, so if these entries aren't getting created
automatically, that sounds like a bug in the patch...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] debugging tools inside postgres

2011-06-24 Thread HuangQi

On 24 June 2011 23:21, Tom Lane  wrote:

> Shigeru Hanada  writes:
> > (2011/06/24 15:35), HuangQi wrote:
> >> I'm trying to debug a modification for the query planner. But I found it
> >> seems the data structure of my planned query is incorrect. I was trying
> to
> >> print out the data structure by use the "p" command in gdb which is
> quite
> >> inconvenient and takes time. May I know is there any embedded function
> in
> >> postgres to print out the node data structures, or any other plan
> related
> >> data structures? Thanks.
>
> > I think nodeToString() would help.
>
> For interactive use in gdb, I generally do
>
>call pprint(..node pointer..)
>
> which prints to the postmaster log.
>
>regards, tom lane
>


Thanks, Tom, this call pprint() works very nice.
-- 
Best Regards
Huang Qi Victor

Re: [HACKERS] debugging tools inside postgres

2011-06-24 Thread Tom Lane

Shigeru Hanada  writes:
> (2011/06/24 15:35), HuangQi wrote:
>> I'm trying to debug a modification for the query planner. But I found it
>> seems the data structure of my planned query is incorrect. I was trying to
>> print out the data structure by use the "p" command in gdb which is quite
>> inconvenient and takes time. May I know is there any embedded function in
>> postgres to print out the node data structures, or any other plan related
>> data structures? Thanks.

> I think nodeToString() would help.

For interactive use in gdb, I generally do

call pprint(..node pointer..)

which prints to the postmaster log.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: starting to review the Extend NOT NULL representation to pg_constraint patch

2011-06-24 Thread Alvaro Herrera


Another thing I just noticed is that if you pg_upgrade from a previous
release, all the NOT NULL pg_constraint rows are going to be missing.
I'm not sure what's the best way to deal with this -- should we just
provide a script for the user to run on each database?

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Pgbuildfarm-members] CREATE FUNCTION hang on test machine polecat on HEAD

2011-06-24 Thread Robert Creager


Got another one (no env since the last changes).  I'll try and run the "kept" 
install from when I killed 22853 (had to use -9) to see if it reproduces the 
problem with the same binaries.  Is there interest in me removing the perl in 
/opt/local/bin/perl?  I can install ccache elsewhere and rename that directory.

  502   310 1   0   0:00.09 ?? 0:00.14 
/Library/PostgreSQL/8.3/bin/postgres -D /Library/PostgreSQL/8.3/data
  502   313   310   0   0:00.36 ?? 0:00.51 postgres: logger process 



 
  502   315   310   0   0:01.10 ?? 0:02.43 postgres: writer process 



  
  502   316   310   0   0:01.03 ?? 0:01.62 postgres: wal writer process 



 
  502   317   310   0   0:00.28 ?? 0:00.40 postgres: autovacuum 
launcher process


 
  502   318   310   0   0:00.29 ?? 0:00.33 postgres: stats collector 
process 


 
  501 22813 1   0   0:00.29 ?? 0:00.38 /Volumes/High 
Usage/usr/local/src/build-farm-4.4/builds/HEAD/inst/bin/postgres -D data-C
  501 22815 22813   0   0:00.57 ?? 0:01.31 postgres: writer process 
  501 22816 22813   0   0:00.53 ?? 0:00.85 postgres: wal writer process 

  501 22817 22813   0   0:00.28 ?? 0:00.65 postgres: autovacuum 
launcher process 
  501 22818 22813   0   0:01.19 ?? 0:01.47 postgres: stats collector 
process 
  501 22853 22813   0  78:13.79 ??89:26.32 postgres: Robert 
pl_regression [local] CREATE FUNCTION  

Robert:/usr/local/src/build-farm-4.4/builds/HEAD
% gdb inst/bin/postgres 22853
GNU gdb 6.3.50-20050815 (Apple version gdb-1518) (Sat Feb 12 02:52:12 UTC 2011)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin"...Reading symbols for shared 
libraries .. done
/Volumes/High Usage/usr/local/src/build-farm-4.4/builds/HEAD/22853: No such 
file or directory

Attaching to program: `/Volumes/High 
Usage/usr/local/src/build-farm-4.4/builds/HEAD/inst/bin/postgres', process 
22853.
Reading symbols for shared libraries .+. done
0x000100a505e4 in Perl_get_hash_seed ()
(gdb) bt
#0  0x000100a505e4 in Perl_get_hash_seed ()
#1  0x000100a69b94 in perl_parse ()
#2  0x0001007c0680 in plperl_init_interp () at plperl.c:781
#3  0x0001007c117a in _PG_init () at plperl.c:443
#4  0x000100304396 in internal_load_library (libname=0x10100d540 
"/Volumes/High 
Usage/usr/local/src/build-farm-4.4/builds/HEAD/inst/lib/postgresql/plperl.so") 
at dfmgr.c:284
#5  0x000100304ce5 in load_external_function (filename=, funcname=0x10100d508 "plperl_validator", 
signalNotFound=1 '\001', filehandle=0x7fff5fbfd3b8) at dfmgr.c:113
#6  0x000100307200 in fmgr_info_C_lang [inlined] () at /Volumes/High 
Usage/usr/local/src/build-farm-4.4/builds/HEAD/pgsql.4091/src/backend/utils/fmgr/fmgr.c:349
#7  0x000100307200 in fmgr_info_cxt_security (functionId=41362, 
finfo=0x7fff5fbfd410, mcxt=, ignore_security=) at fmgr.c:280
#8  0x0001003083f0 in OidFunctionCall1Coll (functionId=, collation=0, arg1=41430) at fmgr.c:1585
#9  0x00010009f58d in ProcedureCreate (procedureName=0x1010064d0 
"perl_elog", procNamespace=2200, replace=1 '\001', returnsSet=0 '\0', 
returnType=2278, languageObjectId=41363, languageValidator=41362, 
prosrc=0x101006958 "\n\n  my $msg = shift;\n  elog(NOT

Re: [HACKERS] pg_upgrade version check improvements and small fixes

2011-06-24 Thread Dan McGee

On Wed, Jun 22, 2011 at 8:54 PM, Bruce Momjian  wrote:
>> 0003 is what I really wanted to solve, which was my failure with
>> pg_upgrade. The call to pg_ctl didn't succeed because the binaries
>> didn't match the data directory, thus resulting in this:
>>
>> The error had nothing to do with "trust" at all; it was simply that I
>> tried to use 9.0 binaries with an 8.4 data directory. My patch checks
>> for this and ensures that the -D bindir is the correct version, just
>> as the -B datadir has to be the correct version.
>
> I had not thought about testing if the bin and data directory were the
> same major version, but you are right that it generates an odd error if
> they are not.
>
> I changed your sscanf format string because the version can be just
> "9.2dev" and there is no need for the minor version.
No problem- you're going to know way more about this than me, and I
was just going off what I saw in my two installed versions and a quick
look over at the code of pg_ctl to make sure that string was never
translated.

On a side note I don't think I ever mentioned to everyone else why
parsing the version from pg_ctl rather than pg_config was done- this
is so we don't introduce any additional binary requirements. Tom Lane
noted in an earlier cleanup series that pg_config is not really needed
at all in the old cluster, so I wanted to keep it that way, but pg_ctl
has always been required.

>   I saw no reason
> to test if the binary version matches the pg_upgrade version because we
> already test the cluster version, and we test the cluster version is the
> same as the binary version.
Duh. That makes sense. Thanks for getting to these so quickly!

To partially toot my own horn but also show where a user like me
encountered this, after some packaging hacking, anyone running Arch
Linux should be able to do a pg_upgrade from here on out by installing
the postgresql-old-upgrade package
(http://www.archlinux.org/packages/extra/x86_64/postgresql-old-upgrade/)
and possible consulting the Arch wiki.

-Dan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] silent_mode and LINUX_OOM_ADJ

2011-06-24 Thread Alvaro Herrera

Excerpts from Heikki Linnakangas's message of vie jun 24 07:01:57 -0400 2011:
> While reviewing Peter Geoghegan's postmaster death patch, I noticed that 
> if you turn on silent_mode, the LINUX_OOM_ADJ code in fork_process() 
> runs when postmaster forks itself into background. That re-enables the 
> OOM killer in postmaster, if you've disabled it in the startup script by 
> adjusting /proc/self/oom_adj. That seems like a bug, albeit a pretty 
> minor one.
> 
> This may be a dumb question, but what is the purpose of silent_mode? 
> Can't you just use nohup?

I think silent_mode is an artifact from when our daemon handling in
general was a lot more primitive (I bet there wasn't even pg_ctl then).
Maybe we could discuss removing it altogether.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Another issue with invalid XML values

2011-06-24 Thread Florian Pflug

On Jun24, 2011, at 13:24 , Noah Misch wrote:
> On Wed, Jun 22, 2011 at 12:47:42PM +0200, Florian Pflug wrote:
>> Updated patch attached.
> 
> This builds and passes all tests on both test systems I used previously
> (CentOS 5 with libxml2-2.6.26-2.1.2.8.el5_5.1 and Ubuntu 8.04 LTS with libxml2
> 2.6.31.dfsg-2ubuntu1.6).  Tested behaviors look good, too.
> 
>> On Jun20, 2011, at 22:45 , Noah Misch wrote:
>>> On Mon, Jun 20, 2011 at 09:15:51PM +0200, Florian Pflug wrote:
 On Jun20, 2011, at 19:57 , Noah Misch wrote:
>>> Assuming it's sufficient to save/restore xmlStructuredErrorContext when it's
>>> available and xmlGenericErrorContext otherwise, I would just add an Autoconf
>>> check to identify which one to use.
>> 
>> It seems that before xmlStructuredErrorContext was introduced, the structured
>> and the generic error handler shared an error context, so doing what you
>> suggested looks sensible.
>> 
>> I don't have any autoconf-fu whatsoever, though, so I'm not 100% certain I 
>> did
>> this the right way. I basically copied a similar check (for setlongjump 
>> AFAIR)
>> and adapted it to check for xmlStructuredErrorContext instead.
> 
> Turned out fine.  Note that, more often than not, committers ask for patches
> not to contain the diff of the generated "configure" itself.  (I personally
> don't mind it when the diff portion is small and generated with the correct
> version of Autoconf, as in this patch.)

Ah, OK. I didn't know what project policy was there, so I figured I'd include
it since the configure script is tracked by git too. Now that I know, I'll 
remove
it from the final patch.

 I believe that the compatibility risk is pretty low here, and removing
 the meaningless prefix makes the error message are bit nicer to read.
 But if you are concerned that there maybe applications out there who
 parse the error text, we could of course add it back. I must say that
 I don't really know what our policy regarding error message stability is...
>>> 
>>> If the libxml2 API makes it a major pain to preserve exact messages, I
>>> wouldn't worry.
>> 
>> It'd only require us to prepend "Entity: " to the message string, so
>> there's no pain there. The question is rather whether we want to continue
>> having a pure noise word in front of every libxml error message for
>> the sake of compatibility.
> 
> There's also the " parser error :" difference; would that also be easy to
> re-add?  (I'm assuming it's not a constant but depends on the xmlErrorDomain.)

It's the string representation of the error domain and error level. 
Unfortunately,
the logic which builds that string representation is contained in the static
function xmlReportError() deep within libxml, and that function doesn't seem
to be called at all for errors reported via a structured error handler :-(

So re-adding that would mean duplicating that code within our xml.c, which
seems undesirable...

>> I vote "Nay", on the grounds that I estimate the actual breakage from
>> such a change to be approximately zero. Plus the fact that libxml
>> error messages aren't completely stable either. I had to suppress the
>> DETAIL output for one of the regression tests to make them work for
>> both 2.6.23 and 2.7.8; libxml chooses to quote an invalid namespace
>> URI in it's error message in one of these versions but not the other...
> 
> I'm not particularly worried about actual breakage; I'm just putting on the
> "one change per patch"-lawyer hat.  All other things being equal, I'd prefer
> one patch that changes the mechanism for marshalling errors while minimally
> changing the format of existing errors, then another patch that proposes new
> formatting.  (Granted, the formatting-only patch would probably founder.)  But
> it's a judgement call, and this change is in a grey area.  I'm fine with
> leaving it up to the committer.

I agree with that principle, in principle. In the light of my paragraph above
about how we'd need to duplicate libxml code to keep the error messages the 
same,
however, I'll leave it up to the committer as you suggested.

> A few minor code comments:
> 
>> *** a/src/backend/utils/adt/xml.c
>> --- b/src/backend/utils/adt/xml.c
> 
>> + static bool xml_error_initialized = false;
> 
> Since xml_error_initialized is only used for asserts and now redundant with
> asserts about xml_strictness == PG_XML_STRICTNESS_NONE, consider removing it.

You're absolutely right. Will remove.

>> ! /*
>> !  * pg_xml_done --- restore libxml state after pg_xml_init().
>> !  *
>> !  * This must be called if pg_xml_init() was called with a
>> !  * purpose other than PG_XML_STRICTNESS_NONE. Resets libxml's global state
>> !  * (i.e. the structured error handler) to the original state.
> 
> The first sentence is obsolete.

Right again. Will remove.

>> *** a/src/include/utils/xml.h
>> --- b/src/include/utils/xml.h
>> *** typedef enum
>> *** 68,74 
>>  XML_STANDALONE_OMITTED
>>  }   XmlStandaloneType;
>> 
>> ! exter

Re: [HACKERS] pg_upgrade defaulting to port 25432

2011-06-24 Thread Robert Haas

On Fri, Jun 24, 2011 at 9:04 AM, Peter Eisentraut  wrote:
>> It also creates two new environment variables,
>> OLDPGPORT and NEWPGPORT, to control the port values because we don't
>> want to default to PGPORT anymore.
>
> I would prefer that all PostgreSQL-related environment variables start
> with "PG".

+1

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade defaulting to port 25432

2011-06-24 Thread Peter Eisentraut

On tor, 2011-06-23 at 21:39 -0400, Bruce Momjian wrote:
> I have created the following patch which uses 25432 as the default port
> number for pg_upgrade.

I don't think we should just steal a port from the reserved range.
Picking a random port from the private/dynamic range seems more
appropriate.

> It also creates two new environment variables,
> OLDPGPORT and NEWPGPORT, to control the port values because we don't
> want to default to PGPORT anymore.

I would prefer that all PostgreSQL-related environment variables start
with "PG".

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] debugging tools inside postgres

2011-06-24 Thread Shigeru Hanada

(2011/06/24 19:14), HuangQi wrote:
> 2011/6/24 Shigeru Hanada
> 
>> (2011/06/24 15:35), HuangQi wrote:
>> ex)
>> elog(DEBUG1, "%s", nodeToString(plan));
> 
> Hi,
> I don't know why but when I am debugging the query evaluation, the elog
> function can not output to shell.

What kind of tool do you use to execute the query to be evaluated?

If you are using an interactive tool such as psql, please check setting
of client_min_messages.  Otherwise, please check settings of
log_destination, logging_collector and log_min_messages to ensure that
elog() prints debugging information into your server log file, or stderr
of the terminal which has been used to start PostgreSQL server.

Regards,
-- 
Shigeru Hanada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Online base backup from the hot-standby

2011-06-24 Thread Steve Singer

On 11-06-24 12:41 AM, Jun Ishiduka wrote:
>
> The logic that not use pg_stop_backup() would be difficult,
> because pg_stop_backup() is used to identify minRecoveryPoint.
>

Considering everything that has been discussed on this thread so far.

Do you still think your patch is the best way to accomplish base backups
from standby servers?
If not what changes do you think should be made?

Steve

> 
> Jun Ishizuka
> NTT Software Corporation
> TEL：045-317-7018
> E-Mail: ishizuka@po.ntts.co.jp
> 
>
>
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Latch implementation that wakes on postmaster death on both win32 and Unix

2011-06-24 Thread Peter Geoghegan

On 24 June 2011 12:30, Fujii Masao  wrote:
> +#ifndef WIN32
> +       /*
> +        * Initialise mechanism that allows waiting latch clients
> +        * to wake on postmaster death, to finish their
> +        * remaining business
> +        */
> +       InitPostmasterDeathWatchHandle();
> +#endif
>
> Calling this function before creating TopMemoryContext looks unsafe. What if
> the function calls ereport(FATAL)?
>
> That ereport() can be called before postgresql.conf is read, i.e., before GUCs
> for error reporting are set. Is this OK? If not,
> InitPostmasterDeathWatchHandle()
> should be moved after SelectConfigFiles().

I see no reason to take the risk that it might at some point - I've moved it.

> +#ifndef WIN32
> +int postmaster_alive_fds[2];
> +#endif
>
> postmaster_alive_fds[] should be initialized to "{-1, -1}"?

Yes, they should. That works better.

I think that Heikki is currently taking another look at my work,
because he indicates in a new message to the list a short time ago
that while reviewing my patch, he realised that there may be an
independent problem with silent_mode. I will wait for his remarks
before producing another version of the patch that incorporates those
two small changes.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Latch implementation that wakes on postmaster death on both win32 and Unix

2011-06-24 Thread Fujii Masao

On Wed, Jun 22, 2011 at 9:11 PM, Peter Geoghegan  wrote:
>>                rc = WaitForMultipleObjects(numevents, events, FALSE,
>>                                                           (timeout >= 0) ? 
>> (timeout / 1000) : INFINITE);
>> -               if (rc == WAIT_FAILED)
>> +               if ( (wakeEvents & WL_POSTMASTER_DEATH) &&
>> +                        !PostmasterIsAlive(true))
>>
>> After WaitForMultipleObjects() detects the death of postmaster,
>> WaitForSingleObject()
>> is called in PostmasterIsAlive(). In this case, what code does
>> WaitForSingleObject() return?
>> I wonder if WaitForSingleObject() returns the code other than
>> WAIT_TIMEOUT and really
>> can detect the death of postmaster.
>
> As noted up-thread, the fact that the archiver does wake and finish on
> Postmaster death can be clearly observed on Windows. I'm not sure why
> you wonder that, as this is fairly standard use of
> PostmasterIsAlive().

Because, if PostmasterHandle is an auto-reset event object, its event state
would be automatically reset just after WaitForMultipleObjects() detects
the postmaster death event, I was afraid. But your observation proved that
my concern was not right.

I have another comments:

+#ifndef WIN32
+   /*
+* Initialise mechanism that allows waiting latch clients
+* to wake on postmaster death, to finish their
+* remaining business
+*/
+   InitPostmasterDeathWatchHandle();
+#endif

Calling this function before creating TopMemoryContext looks unsafe. What if
the function calls ereport(FATAL)?

That ereport() can be called before postgresql.conf is read, i.e., before GUCs
for error reporting are set. Is this OK? If not,
InitPostmasterDeathWatchHandle()
should be moved after SelectConfigFiles().

+#ifndef WIN32
+int postmaster_alive_fds[2];
+#endif

postmaster_alive_fds[] should be initialized to "{-1, -1}"?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Another issue with invalid XML values

2011-06-24 Thread Noah Misch

Hi Florian,

On Wed, Jun 22, 2011 at 12:47:42PM +0200, Florian Pflug wrote:
> Updated patch attached.

This builds and passes all tests on both test systems I used previously
(CentOS 5 with libxml2-2.6.26-2.1.2.8.el5_5.1 and Ubuntu 8.04 LTS with libxml2
2.6.31.dfsg-2ubuntu1.6).  Tested behaviors look good, too.

> On Jun20, 2011, at 22:45 , Noah Misch wrote:
> > On Mon, Jun 20, 2011 at 09:15:51PM +0200, Florian Pflug wrote:
> >> On Jun20, 2011, at 19:57 , Noah Misch wrote:
> > Assuming it's sufficient to save/restore xmlStructuredErrorContext when it's
> > available and xmlGenericErrorContext otherwise, I would just add an Autoconf
> > check to identify which one to use.
> 
> It seems that before xmlStructuredErrorContext was introduced, the structured
> and the generic error handler shared an error context, so doing what you
> suggested looks sensible.
> 
> I don't have any autoconf-fu whatsoever, though, so I'm not 100% certain I did
> this the right way. I basically copied a similar check (for setlongjump AFAIR)
> and adapted it to check for xmlStructuredErrorContext instead.

Turned out fine.  Note that, more often than not, committers ask for patches
not to contain the diff of the generated "configure" itself.  (I personally
don't mind it when the diff portion is small and generated with the correct
version of Autoconf, as in this patch.)

>  !XML_PURPOSE_LEGACY /* Save error message only, don't set error 
>  flag */,
> >>> 
> >>> It's fine to set the flag for legacy users, considering they could just 
> >>> not
> >>> read it.  The important distinction seems to be that we don't emit 
> >>> warnings or
> >>> notices in this case.  Is that correct?  If so, the comment should reflect
> >>> that emphasis.  Then, consider updating the flag unconditionally.
> >> 
> >> I took me a while to remember while I did it that way, but I think I have 
> >> now.
> >> 
> >> I initialled wanted to add an Assert(!xml_error_occurred) to catch any
> >> missing XML_REPORT_ERROR() calls. I seems to have forgotten to do that,
> >> though...
> >> 
> >> So I guess I should either refrain from setting the flag as you suggested,
> >> or add such an Assert(). I think I very slightly prefer the latter, what
> >> do you think?
> > 
> > I do like the idea of that assert.  How about setting the flag anyway, but
> > making it "Assert(xml_purpose == XML_PURPOSE_LEGACY || 
> > !xml_error_occurred)"?
> 
> I added the Assert, but didn't make the setting of the error flag 
> unconditional.
> 
> If I did that, XML_CHECK_AND_EREPORT() would stop working for the LEGACY
> case. Now, that case currently isn't exercised, but breaking nevertheless
> seemed wrong to me.

Okay.

>  *** a/src/test/regress/expected/xml.out
>  --- b/src/test/regress/expected/xml.out
>  *** INSERT INTO xmltest VALUES (3, '  *** 8,14 
>  ERROR:  invalid XML content
>  LINE 1: INSERT INTO xmltest VALUES (3, ' ^
>  ! DETAIL:  Entity: line 1: parser error : Couldn't find end of Start Tag 
>  wrong line 1
>      ^
>  SELECT * FROM xmltest;
>  --- 8,14 
>  ERROR:  invalid XML content
>  LINE 1: INSERT INTO xmltest VALUES (3, ' ^
>  ! DETAIL:  line 1: Couldn't find end of Start Tag wrong line 1
> >>> 
> >>> Any reason to change the error text this way?
> >> 
> >> The "Entity:" prefix is added by libxml only for messages reported
> >> to the generic error handler. It *doesn't* refer to entities as defined
> >> in xml, but instead used in place of the file  name if libxml
> >> doesn't have that at hand (because it's parsing from memory).
> >> 
> >> So that "Entity:" prefix really doesn't have any meaning whatsoever.
> > 
> > So xmlParserPrintFileContext() sends different content to the generic error
> > handler from what "natural" calls to the handler would send?
> 
> xmlParserPrintFileContext() only generates the "context" part of the error
> message. In the example above, these are the two lines
>^
> These lines don't contain the "Entity:" prefix - neither with the patch
> attached nor without.

Ah, my mistake.

> >> I believe that the compatibility risk is pretty low here, and removing
> >> the meaningless prefix makes the error message are bit nicer to read.
> >> But if you are concerned that there maybe applications out there who
> >> parse the error text, we could of course add it back. I must say that
> >> I don't really know what our policy regarding error message stability is...
> > 
> > If the libxml2 API makes it a major pain to preserve exact messages, I
> > wouldn't worry.
> 
> It'd only require us to prepend "Entity: " to the message string, so
> there's no pain there. The question is rather whether we want to continue
> having a pure noise word in front of every libxml error message for
> the sake of compatibility.

There's also the " parser error :" difference; woul

[HACKERS] silent_mode and LINUX_OOM_ADJ

2011-06-24 Thread Heikki Linnakangas

While reviewing Peter Geoghegan's postmaster death patch, I noticed that 
if you turn on silent_mode, the LINUX_OOM_ADJ code in fork_process() 
runs when postmaster forks itself into background. That re-enables the 
OOM killer in postmaster, if you've disabled it in the startup script by 
adjusting /proc/self/oom_adj. That seems like a bug, albeit a pretty 
minor one.


This may be a dumb question, but what is the purpose of silent_mode? 
Can't you just use nohup?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] debugging tools inside postgres

2011-06-24 Thread HuangQi

2011/6/24 Shigeru Hanada 

> (2011/06/24 15:35), HuangQi wrote:
> > Hi,
> > I'm trying to debug a modification for the query planner. But I found
> it
> > seems the data structure of my planned query is incorrect. I was trying
> to
> > print out the data structure by use the "p" command in gdb which is quite
> > inconvenient and takes time. May I know is there any embedded function in
> > postgres to print out the node data structures, or any other plan related
> > data structures? Thanks.
>
> I think nodeToString() would help.  This function converts node tree
> recursively into a string, and it's applicable for any Node-derived
> object, such as Expr, Var and Const.
>
> ex)
>elog(DEBUG1, "%s", nodeToString(plan));
>
> Regards,
> --
> Shigeru Hanada
>

Hi,
   I don't know why but when I am debugging the query evaluation, the elog
function can not output to shell.


-- 
Best Regards
Huang Qi Victor

Re: [HACKERS] WIP: Fast GiST index build

2011-06-24 Thread Alexander Korotkov

On Fri, Jun 24, 2011 at 12:40 PM, Heikki Linnakangas <
heikki.linnakan...@enterprisedb.com> wrote:

> On 21.06.2011 13:08, Alexander Korotkov wrote:
>
>> I've created section about testing in project wiki page:
>> http://wiki.postgresql.org/**wiki/Fast_GiST_index_build_**
>> GSoC_2011#Testing_results
>> Do you have any notes about table structure?
>>
>
> It would be nice to have links to the datasets and scripts used, so that
> others can reproduce the tests.
>
Done.


> It's surprising that the search time differs so much between the point_ops
> tests with uniformly random data with 100M and 10M rows. Just to be sure I'm
> reading it correctly: a small search time is good, right? You might want to
> spell that out explicitly.

Yes, you're reading this correctly. Detailed explanation was added to the
wiki page. It's surprising for me too. I need some more insight into causes
of index quality difference.

Now I found some large enough real-life datasets (thanks to Oleg Bartunov)
and I'm performing tests on them.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] debugging tools inside postgres

2011-06-24 Thread Shigeru Hanada

(2011/06/24 15:35), HuangQi wrote:
> Hi,
> I'm trying to debug a modification for the query planner. But I found it
> seems the data structure of my planned query is incorrect. I was trying to
> print out the data structure by use the "p" command in gdb which is quite
> inconvenient and takes time. May I know is there any embedded function in
> postgres to print out the node data structures, or any other plan related
> data structures? Thanks.

I think nodeToString() would help.  This function converts node tree
recursively into a string, and it's applicable for any Node-derived
object, such as Expr, Var and Const.

ex)
elog(DEBUG1, "%s", nodeToString(plan));

Regards,
-- 
Shigeru Hanada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Fast GiST index build

2011-06-24 Thread Heikki Linnakangas


On 21.06.2011 13:08, Alexander Korotkov wrote:

I've created section about testing in project wiki page:
http://wiki.postgresql.org/wiki/Fast_GiST_index_build_GSoC_2011#Testing_results
Do you have any notes about table structure?


It would be nice to have links to the datasets and scripts used, so that 
others can reproduce the tests.


It's surprising that the search time differs so much between the 
point_ops tests with uniformly random data with 100M and 10M rows. Just 
to be sure I'm reading it correctly: a small search time is good, right? 
You might want to spell that out explicitly.



As you can see I found that CPU usage might be much higher
with gist_trgm_ops.


Yeah, that is a bit worrysome. 6 minutes without the patch and 18 
minutes with it.



I believe it's due to relatively expensive penalty
method in that opclass.


Hmm, I wonder if it could be optimized. I did a quick test, creating a 
gist_trgm_ops index on a list of English words from 
/usr/share/dict/words. oprofile shows that with the patch, 60% of the 
CPU time is spent in the makesign() function.



But, probably index build can be still faster when
index doesn't fit cache even for gist_trgm_ops.


Yep.


Also with that opclass index
quality is slightly worse but the difference is not dramatic.


5-10% difference should be acceptable

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

76 matches

Mail list logo