date:20120902

Re: [HACKERS] [GENERAL] Multiple Slave Failover with PITR

2012-09-02 Thread Daniel Farina

On Sun, Sep 2, 2012 at 5:12 AM, Bruce Momjian  wrote:
>
> Do we ever want to document a way to connect slaves to a new master,
> rather than recreating the slave?

Please, please please do so.  And hopefully it'll be less tricky
sooner than later.

-- 
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol

2012-09-02 Thread Noah Misch

On Mon, Sep 03, 2012 at 12:11:20AM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > I don't think it's libpq's job to enforce security policy this way.  In any
> > event, it has no reason to presume an environment variable is a safer 
> > source.
> 
> One easy thing we could do that would at least narrow the risks is to
> only allow the executable's *directory* to be specified, hardwiring the
> executable file name to "postgres" (or "postgres.exe" I guess).

If the user has any interactive access to the machine, he could make a
/tmp/X/postgres symbolic link to the program of his choosing.

> I'm inclined to think though that environment variables *are* more
> secure in this context.  In the contrib/dblink example, such a
> restriction would certainly help a lot.

dblink_connect() should only let superusers specify standalone_datadir at all;
normal users have no business manipulating other data directories visible to
the backend running dblink.  For superusers, setting both standalone_datadir
and standalone_backend is fair game.

Trusting the environment over connection strings is a wrong policy for, say, a
setuid command-line program.  Suppose such a program passes libpq a fixed
connection string to access its embedded database.  The program will find
itself explicitly clearing this environment variable to regain safety.

> In any case, we should at
> least think of a way that an app using libpq can prevent this type
> of attack short of parsing the conn string for itself.

My recommendation to affected application authors is to pass the connection
string through PQconninfoParse().  Walk the array, validating or rejecting
arguments like "host" and "standalone_datadir".  For maximum safety, the
application would need to reject any unanticipated parameters.  We might
soften that by adding a "bool critical" field to PQconninfoOption that
documents whether the option must be understood by a program validating a
connection.  Compare the idea of the PNG chunk header's "ancillary" bit.
Parameters like "host" and "standalone_datadir" would be critical, while
"application_name" and "connect_timeout" would be ancillary.  But this is a
tough line to draw rigorously.

I was pondering the flexibility from letting the application fork/exec and
supply the client-side descriptor number(s) to libpq.  Suppose it wants to
redirect the backend's stderr to a file.  A single-threaded application would
temporarily redirect its own stderr while calling PQconnectdb().  In a
multithreaded application, that introduces a race condition when other threads
concurrently write to the normal stderr.  By handling redirection in its own
fork/exec code, the application can achieve the outcome safely.  Perhaps
offering this can wait until the feature constitutes a more general
embedded-database offering, though.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol

2012-09-02 Thread Tom Lane

Noah Misch  writes:
> Windows does not have socketpair(), nor a strict pipe() equivalent.  I expect
> switching to socketpair() makes the Windows side trickier in some ways and
> simpler in others.  +1 for exploring that direction first.

A bit of googling suggests that emulating socketpair() on Windows is not
that hard: basically you create an accepting socket and connect to it.
Ugly I guess but likely to be nicer than emulating the two-pipes trick
exactly.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-09-02 Thread Robert Haas

On Sun, Sep 2, 2012 at 5:39 PM, Jeff Janes  wrote:
> On Thu, Aug 30, 2012 at 8:17 PM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> On Thu, Aug 30, 2012 at 4:51 PM, Tom Lane  wrote:
 Bruce Momjian  writes:
> On Thu, May 31, 2012 at 09:20:43AM +0900, Tatsuo Ishii wrote:
>> Ok, I modified the part of pg_dump where tremendous number of LOCK
>> TABLE are issued. I replace them with single LOCK TABLE with multiple
>> tables. With 100k tables LOCK statements took 13 minutes in total, now
>> it only takes 3 seconds. Comments?
>>
> Was this applied?
>>
 No, we fixed the server side instead.
>>
>>> But only for 9.2, right?  So people running back branches are still screwed.
>>
>> Yeah, but they're screwed anyway, because there are a bunch of O(N^2)
>> behaviors involved here, not all of which are masked by what Tatsuo-san
>> suggested.
>
> All of the other ones that I know of were associated with pg_dump
> itself, and since it is recommended to run the newer version of
> pg_dump against the older version of the server, no back patching
> would be necessary to get the benefits of those particular fixes.
>
>> Six months or a year from now, we might have enough confidence in that
>> batch of 9.2 fixes to back-port them en masse.  Don't want to do it
>> today though.
>
>
> What would be the recommendation for people trying to upgrade, but who
> can't get their data out in a reasonable window?
>
> Putting Tatsuo-san's change into a future pg_dump might be more
> conservative than back-porting the server's Lock Table change to the
> server version they are trying to get rid of.

What he said.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade test mods for Windows/Mingw

2012-09-02 Thread Tom Lane

Gurjeet Singh  writes:
> [ this needs a comment: ]
> `uname -a | sed 's/.* //'` = Msys

Also, how about doing that just once and setting a variable to test
as needed later?  Multiple copied-and-pasted instances of line noise
like that are risky.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol

2012-09-02 Thread Tom Lane

Noah Misch  writes:
> On Sun, Sep 02, 2012 at 11:34:42PM -0400, Tom Lane wrote:
>> Heikki Linnakangas  writes:
>>> Are there security issues with this? If a user can specify libpq 
>>> connection options, he can now execute any file he wants by passing it 
>>> as standalone_backend.

>> Hmm, that's a good point.  Maybe we should only allow the executable
>> name to come from an environment variable?  Seems kinda klugy though.

> I don't think it's libpq's job to enforce security policy this way.  In any
> event, it has no reason to presume an environment variable is a safer source.

One easy thing we could do that would at least narrow the risks is to
only allow the executable's *directory* to be specified, hardwiring the
executable file name to "postgres" (or "postgres.exe" I guess).

I'm inclined to think though that environment variables *are* more
secure in this context.  In the contrib/dblink example, such a
restriction would certainly help a lot.  In any case, we should at
least think of a way that an app using libpq can prevent this type
of attack short of parsing the conn string for itself.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol

2012-09-02 Thread Noah Misch

On Sun, Sep 02, 2012 at 11:34:42PM -0400, Tom Lane wrote:
> Heikki Linnakangas  writes:
> > On 03.09.2012 03:23, Tom Lane wrote:
> >> 1. As you can see above, the feature is triggered by specifying the new
> >> connection option "standalone_datadir", whose value must be the location
> >> of the data directory.  I also invented an option "standalone_backend",
> >> which can be set to specify which postgres executable to launch.
> 
> > Are there security issues with this? If a user can specify libpq 
> > connection options, he can now execute any file he wants by passing it 
> > as standalone_backend. Granted, you shouldn't allow an untrusted user to 
> > specify libpq connection options, because allowing to open a TCP 
> > connection to an arbitrary address can be a problem by itself, but it 
> > seems like this might make the situation much worse. contrib/dblink 
> > springs to mind..
> 
> Hmm, that's a good point.  Maybe we should only allow the executable
> name to come from an environment variable?  Seems kinda klugy though.

I don't think it's libpq's job to enforce security policy this way.  In any
event, it has no reason to presume an environment variable is a safer source.

> >> 3. The bulk of the changes have to do with the fact that we need to keep
> >> track of two file descriptors not one.
> 
> > Would socketpair(2) be simpler?
> 
> Hm, yes, but is it portable enough?  It seems to be required by SUS v2,
> so we're likely okay on the Unix side, but does Windows have this?

Windows does not have socketpair(), nor a strict pipe() equivalent.  I expect
switching to socketpair() makes the Windows side trickier in some ways and
simpler in others.  +1 for exploring that direction first.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade bugs

2012-09-02 Thread Tom Lane

Alvaro Herrera  writes:
> Maybe, to reduce future backpatching pain, we could backpatch the change
> to exec_prog() API now that you have fixed the implementation?

I'm inclined to think this is a good idea, but keep in mind we're less
than four days from wrapping 9.2.0.  There's not a lot of margin for
error here.

At the same time, getting pg_upgrade to pass regression on Windows would
be a good thing, and ought to go a long way towards ameliorating worries
about this.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade test mods for Windows/Mingw

2012-09-02 Thread Gurjeet Singh

On Sun, Sep 2, 2012 at 11:29 PM, Andrew Dunstan  wrote:

> The attached patch is what I had to do to get pg_upgrade's "make check" to
> run on Windows under Mingw. Mostly the changes have to do with getting
> paths right between Windows and MSys, or calling generated .bat files
> instead of shell scripts.
>

When reading shell script code like this

`uname -a | sed 's/.* //'` = Msys

and

sed -i -e 's,/,\\,g' -e 's,\\s\\q ,/s/q ,' delete_old_cluster.bat
2>/dev/null

I find it easier to understand and maintain if the comments also describe
what is the original string format that  this pattern-matching expects,
like:

# We expect `uname -a` output like:
#  Windows_NT4.0 Msys

and

# We expect lines of the format:
#   abc/xyz/def/
# and we convert them to
#  abc\xyz\def


BTW, would `uname -o` eliminate the need of pattern matching in the first
snippet? The Wikipedia [1] article suggests so.

[1] http://en.wikipedia.org/wiki/Uname

Best regards,
-- 
Gurjeet Singh

Re: [HACKERS] Yet another failure mode in pg_upgrade

2012-09-02 Thread Tom Lane

Bruce Momjian  writes:
> Updated patch attached.

[ looks at that for a bit... ]  Now I see why you were on about that:
the method you used here requires both clusters to have the same socket
directory.  Which is silly and unnecessary.  Revised patch attached.

regards, tom lane

diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c
index 0fec73e..efb080b 100644
*** a/contrib/pg_upgrade/check.c
--- b/contrib/pg_upgrade/check.c
*** issue_warnings(char *sequence_script_fil
*** 184,191 
  		{
  			prep_status("Adjusting sequences");
  			exec_prog(UTILITY_LOG_FILE, NULL, true,
! 	  "\"%s/psql\" " EXEC_PSQL_ARGS " --port %d --username \"%s\" -f \"%s\"",
! 	  new_cluster.bindir, new_cluster.port, os_info.user,
  	  sequence_script_file_name);
  			unlink(sequence_script_file_name);
  			check_ok();
--- 184,191 
  		{
  			prep_status("Adjusting sequences");
  			exec_prog(UTILITY_LOG_FILE, NULL, true,
! 	  "\"%s/psql\" " EXEC_PSQL_ARGS " %s -f \"%s\"",
! 	  new_cluster.bindir, cluster_conn_opts(&new_cluster),
  	  sequence_script_file_name);
  			unlink(sequence_script_file_name);
  			check_ok();
diff --git a/contrib/pg_upgrade/dump.c b/contrib/pg_upgrade/dump.c
index cfc4017..b905ab0 100644
*** a/contrib/pg_upgrade/dump.c
--- b/contrib/pg_upgrade/dump.c
*** generate_old_dump(void)
*** 24,31 
  	 * restores the frozenid's for databases and relations.
  	 */
  	exec_prog(UTILITY_LOG_FILE, NULL, true,
! 			  "\"%s/pg_dumpall\" --port %d --username \"%s\" --schema-only --binary-upgrade %s -f %s",
! 			  new_cluster.bindir, old_cluster.port, os_info.user,
  			  log_opts.verbose ? "--verbose" : "",
  			  ALL_DUMP_FILE);
  	check_ok();
--- 24,31 
  	 * restores the frozenid's for databases and relations.
  	 */
  	exec_prog(UTILITY_LOG_FILE, NULL, true,
! 			  "\"%s/pg_dumpall\" %s --schema-only --binary-upgrade %s -f %s",
! 			  new_cluster.bindir, cluster_conn_opts(&old_cluster),
  			  log_opts.verbose ? "--verbose" : "",
  			  ALL_DUMP_FILE);
  	check_ok();
diff --git a/contrib/pg_upgrade/option.c b/contrib/pg_upgrade/option.c
index 94bce50..80f7d34 100644
*** a/contrib/pg_upgrade/option.c
--- b/contrib/pg_upgrade/option.c
*** adjust_data_dir(ClusterInfo *cluster)
*** 376,378 
--- 376,439 
  
  	check_ok();
  }
+ 
+ 
+ /*
+  * get_sock_dir
+  *
+  * Identify the socket directory to use for this cluster.  If we're doing
+  * a live check (old cluster only), we need to find out where the postmaster
+  * is listening.  Otherwise, we're going to put the socket into the current
+  * directory.
+  */
+ void
+ get_sock_dir(ClusterInfo *cluster, bool live_check)
+ {
+ #if HAVE_UNIX_SOCKETS
+ 	if (!live_check)
+ 	{
+ 		/* Use the current directory for the socket */
+ 		cluster->sockdir = pg_malloc(MAXPGPATH);
+ 		if (!getcwd(cluster->sockdir, MAXPGPATH))
+ 			pg_log(PG_FATAL, "cannot find current directory\n");
+ 	}
+ 	else
+ 	{
+ 		/*
+ 		 *	If we are doing a live check, we will use the old cluster's Unix
+ 		 *	domain socket directory so we can connect to the live server.
+ 		 */
+ 
+ 		/* sockdir added to the 5th line of postmaster.pid in PG 9.1 */
+ 		if (GET_MAJOR_VERSION(cluster->major_version) >= 901)
+ 		{
+ 			char		filename[MAXPGPATH];
+ 			FILE		*fp;
+ 			int			i;
+ 
+ 			snprintf(filename, sizeof(filename), "%s/postmaster.pid",
+ 	 cluster->pgdata);
+ 			if ((fp = fopen(filename, "r")) == NULL)
+ pg_log(PG_FATAL, "Could not get socket directory of the old server\n");
+ 
+ 			cluster->sockdir = pg_malloc(MAXPGPATH);
+ 			for (i = 0; i < 5; i++)
+ if (fgets(cluster->sockdir, MAXPGPATH, fp) == NULL)
+ 	pg_log(PG_FATAL, "Could not get socket directory of the old server\n");
+ 
+ 			fclose(fp);
+ 
+ 			/* Remove trailing newline */
+ 			if (strchr(cluster->sockdir, '\n') != NULL)
+ *strchr(cluster->sockdir, '\n') = '\0';
+ 		}
+ 		else
+ 		{
+ 			/* Can't get live sockdir, so assume the default is OK. */
+ 			cluster->sockdir = NULL;
+ 		}
+ 	}
+ #else /* !HAVE_UNIX_SOCKETS */
+ 	cluster->sockdir = NULL;
+ #endif
+ }
diff --git a/contrib/pg_upgrade/pg_upgrade.c b/contrib/pg_upgrade/pg_upgrade.c
index c47c8bb..ee3a151 100644
*** a/contrib/pg_upgrade/pg_upgrade.c
--- b/contrib/pg_upgrade/pg_upgrade.c
*** main(int argc, char **argv)
*** 88,93 
--- 88,96 
  	check_cluster_versions();
  	check_cluster_compatibility(live_check);
  
+ 	get_sock_dir(&old_cluster, live_check);
+ 	get_sock_dir(&new_cluster, false);
+ 
  	check_old_cluster(live_check, &sequence_script_file_name);
  
  
*** prepare_new_cluster(void)
*** 211,218 
  	 */
  	prep_status("Analyzing all rows in the new cluster");
  	exec_prog(UTILITY_LOG_FILE, NULL, true,
! 			  "\"%s/vacuumdb\" --port %d --username \"%s\" --all --analyze %s",
! 			  new_cluster.bindir, new_cluster.port, os_info.user,
  			  log_opts.verbose ? "--verbose" : "");
  	check_ok();
  
--- 214,221 
  	 *

Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol

2012-09-02 Thread Tom Lane

Heikki Linnakangas  writes:
> On 03.09.2012 03:23, Tom Lane wrote:
>> 1. As you can see above, the feature is triggered by specifying the new
>> connection option "standalone_datadir", whose value must be the location
>> of the data directory.  I also invented an option "standalone_backend",
>> which can be set to specify which postgres executable to launch.

> Are there security issues with this? If a user can specify libpq 
> connection options, he can now execute any file he wants by passing it 
> as standalone_backend. Granted, you shouldn't allow an untrusted user to 
> specify libpq connection options, because allowing to open a TCP 
> connection to an arbitrary address can be a problem by itself, but it 
> seems like this might make the situation much worse. contrib/dblink 
> springs to mind..

Hmm, that's a good point.  Maybe we should only allow the executable
name to come from an environment variable?  Seems kinda klugy though.

>> 3. The bulk of the changes have to do with the fact that we need to keep
>> track of two file descriptors not one.

> Would socketpair(2) be simpler?

Hm, yes, but is it portable enough?  It seems to be required by SUS v2,
so we're likely okay on the Unix side, but does Windows have this?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_upgrade test mods for Windows/Mingw

2012-09-02 Thread Andrew Dunstan

The attached patch is what I had to do to get pg_upgrade's "make check" 
to run on Windows under Mingw. Mostly the changes have to do with 
getting paths right between Windows and MSys, or calling generated .bat 
files instead of shell scripts.


cheers

andrew
diff --git a/contrib/pg_upgrade/test.sh b/contrib/pg_upgrade/test.sh
index 31e30af..fc2304a 100644
--- a/contrib/pg_upgrade/test.sh
+++ b/contrib/pg_upgrade/test.sh
@@ -43,13 +43,23 @@ if [ "$1" = '--install' ]; then
 	export EXTRA_REGRESS_OPTS
 fi
 
-: ${oldbindir=$bindir}
-
 : ${oldsrc=../..}
-oldsrc=`cd "$oldsrc" && pwd`
-newsrc=`cd ../.. && pwd`
 
-PATH=$bindir:$PATH
+if [ `uname -a | sed 's/.* //'` = Msys ] ; then
+	cp $libdir/libpq.dll $bindir
+	osbindir=`cd $bindir && pwd`
+	bindir=`cd $bindir && pwd -W`
+	oldsrc=`cd "$oldsrc" && pwd -W`
+	newsrc=`cd ../.. && pwd -W`
+else
+	osbindir=$bindir
+	oldsrc=`cd "$oldsrc" && pwd`
+	newsrc=`cd ../.. && pwd`
+fi
+
+: ${oldbindir=$bindir}
+
+PATH=$osbindir:$PATH
 export PATH
 
 PGDATA=$temp_root/data
@@ -104,10 +114,19 @@ mv "${PGDATA}" "${PGDATA}.old"
 
 initdb
 
+if [ `uname -a | sed 's/.* //'` = Msys ] ; then 
+	PGDATA=`cd $PGDATA && pwd -W`
+fi
+
 pg_upgrade -d "${PGDATA}.old" -D "${PGDATA}" -b "$oldbindir" -B "$bindir"
 
 pg_ctl start -l "$logdir/postmaster2.log" -w
-sh ./analyze_new_cluster.sh
+
+if  [ `uname -a | sed 's/.* //'` = Msys ] ; then
+	cmd /c analyze_new_cluster.bat
+else
+	sh ./analyze_new_cluster.sh
+fi
 pg_dumpall >"$temp_root"/dump2.sql || pg_dumpall2_status=$?
 pg_ctl -m fast stop
 if [ -n "$pg_dumpall2_status" ]; then
@@ -115,7 +134,15 @@ if [ -n "$pg_dumpall2_status" ]; then
 	exit 1
 fi
 
-sh ./delete_old_cluster.sh
+if  [ `uname -a | sed 's/.* //'` = Msys ] ; then
+	sed -i -e 's,/,\\,g' -e 's,\\s\\q ,/s/q ,' delete_old_cluster.bat 2>/dev/null
+	chmod +w delete_old_cluster.bat
+	cmd /c delete_old_cluster.bat
+	dos2unix "$temp_root"/dump1.sql >/dev/null
+	dos2unix "$temp_root"/dump2.sql >/dev/null
+else
+	sh ./delete_old_cluster.sh
+fi
 
 if diff -q "$temp_root"/dump1.sql "$temp_root"/dump2.sql; then
 	echo PASSED

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade bugs

2012-09-02 Thread Alvaro Herrera

Excerpts from Andrew Dunstan's message of dom sep 02 22:11:42 -0300 2012:
> 
> I have been wrestling for a couple of days trying to get pg_upgrade 
> testing working on Windows, with a view to having it tested on the 
> buildfarm. The test script has its own issues, which I'll deal with 
> separately, but there are two issues in pg_upgrade's exec.c that make me 
> suspect that if pg_upgrade has ever worked at all on Windows it is a 
> matter of sheer luck. The attached patch fixes these. The first issue is 
> a plain miscall to stlcpy(), where the length argument is wrong. The 
> second is where exec_prog tries to open a log file after the system call 
> returns. This will fail if the command was a 'pg_ctl start', as the 
> running postmaster will have the log file open, so I have simply 
> #ifdef'd it out for the Windows case, as the code does nothing except 
> add a couple of line feeds to the log, missing which won't affect 
> anything much.

The first issue was clearly introduced by me in
088c065ce8e405fafbfa966937184ece9defcf20.  The other one is probably
also my fault.  Bruce has tested this on Windows previously and fixed
some previous bugs also introduced by me in the same area; so we know it
works.

Maybe, to reduce future backpatching pain, we could backpatch the change
to exec_prog() API now that you have fixed the implementation?

> Barring objection I will commit this before long.

Thanks.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol

2012-09-02 Thread Heikki Linnakangas


On 03.09.2012 03:23, Tom Lane wrote:

1. As you can see above, the feature is triggered by specifying the new
connection option "standalone_datadir", whose value must be the location
of the data directory.  I also invented an option "standalone_backend",
which can be set to specify which postgres executable to launch.  If the
latter isn't specified, libpq defaults to trying the installation PGBINDIR
that was selected by configure.  (I don't think it can use the relative
path tricks we use in pg_ctl and elsewhere, since there's no good reason
to assume that it's running in a Postgres-supplied program.)  I'm not
particularly wedded to these names or the syntax, but I think this is the
basic functionality we'd need.


Are there security issues with this? If a user can specify libpq 
connection options, he can now execute any file he wants by passing it 
as standalone_backend. Granted, you shouldn't allow an untrusted user to 
specify libpq connection options, because allowing to open a TCP 
connection to an arbitrary address can be a problem by itself, but it 
seems like this might make the situation much worse. contrib/dblink 
springs to mind..



3. The bulk of the changes have to do with the fact that we need to keep
track of two file descriptors not one.  This was a bit tedious, but fairly
straightforward --- though I was surprised to find that send() and recv()
don't work on pipes, at least not on Linux.  You have to use read() and
write() instead.


Would socketpair(2) be simpler?


7. I haven't tried to make pg_upgrade use this yet.


Hmm, pg_upgrade uses pg_dumpall rather than pg_dump, so it needs to 
connect once per database. That means launching the standalone backend 
multiple times. I guess that works, as long as pg_dumpall doesn't try to 
hold multiple connections open at any one time.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade bugs

2012-09-02 Thread Tom Lane

Andrew Dunstan  writes:
> I have been wrestling for a couple of days trying to get pg_upgrade 
> testing working on Windows, with a view to having it tested on the 
> buildfarm. The test script has its own issues, which I'll deal with 
> separately, but there are two issues in pg_upgrade's exec.c that make me 
> suspect that if pg_upgrade has ever worked at all on Windows it is a 
> matter of sheer luck. The attached patch fixes these. The first issue is 
> a plain miscall to stlcpy(), where the length argument is wrong. The 
> second is where exec_prog tries to open a log file after the system call 
> returns. This will fail if the command was a 'pg_ctl start', as the 
> running postmaster will have the log file open, so I have simply 
> #ifdef'd it out for the Windows case, as the code does nothing except 
> add a couple of line feeds to the log, missing which won't affect 
> anything much.

The strlcpy bug seems to be recently introduced --- I don't see it in
9.2.  I think the other bit has not been there very long either,
though it *is* in 9.2 branch so you'd better back-patch that part.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_upgrade bugs

2012-09-02 Thread Andrew Dunstan



I have been wrestling for a couple of days trying to get pg_upgrade 
testing working on Windows, with a view to having it tested on the 
buildfarm. The test script has its own issues, which I'll deal with 
separately, but there are two issues in pg_upgrade's exec.c that make me 
suspect that if pg_upgrade has ever worked at all on Windows it is a 
matter of sheer luck. The attached patch fixes these. The first issue is 
a plain miscall to stlcpy(), where the length argument is wrong. The 
second is where exec_prog tries to open a log file after the system call 
returns. This will fail if the command was a 'pg_ctl start', as the 
running postmaster will have the log file open, so I have simply 
#ifdef'd it out for the Windows case, as the code does nothing except 
add a couple of line feeds to the log, missing which won't affect 
anything much.


Barring objection I will commit this before long.

cheers

andrew
diff --git a/contrib/pg_upgrade/exec.c b/contrib/pg_upgrade/exec.c
index c75d9db..c780b63 100644
--- a/contrib/pg_upgrade/exec.c
+++ b/contrib/pg_upgrade/exec.c
@@ -52,7 +52,7 @@ exec_prog(const char *log_file, const char *opt_log_file,
 
 	old_umask = umask(S_IRWXG | S_IRWXO);
 
-	written = strlcpy(cmd, SYSTEMQUOTE, strlen(SYSTEMQUOTE));
+	written = strlcpy(cmd, SYSTEMQUOTE, sizeof(cmd));
 	va_start(ap, fmt);
 	written += vsnprintf(cmd + written, MAXCMDLEN - written, fmt, ap);
 	va_end(ap);
@@ -95,10 +95,12 @@ exec_prog(const char *log_file, const char *opt_log_file,
    log_file);
 	}
 
+#ifndef WIN32
 	if ((log = fopen_priv(log_file, "a+")) == NULL)
 		pg_log(PG_FATAL, "cannot write to log file %s\n", log_file);
 	fprintf(log, "\n\n");
 	fclose(log);
+#endif
 
 	return result == 0;
 }

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: pgbench - aggregation of info written into log

2012-09-02 Thread Jeff Janes

On Sun, Sep 2, 2012 at 3:46 PM, Tomas Vondra  wrote:
>
> Fixed. I've kept use_log_agg only and I've removed the default. Also
> I've added one more check (that the total duration is a multiple of
> the aggregation interval).

Hi Tomas,

In the docs, -l is an option, not an application:
"-l"

Also, the docs still refer to -A, rather than to
--aggregate-interval, in a few places.

I think a section in the docs that points out that transactions run by
specifying mulitple -f will be mingled together into an interval would
be a good idea, as that could easily be overlooked (as I did earlier).

The docs do not mention anywhere what the units for the latencies are.

I think the format of the logfile should somehow make it unmistakably
different from the regular -l logfile, for example by prefixing every
line with "Aggregate: ".   Or maybe there is some other solution, but
I think that having two log formats, both of which consist of nothing
but 6 integers, but yet mean very different things, is a recipe for
confusion.

Is it unavoidable that the interval be an integral number of seconds?
I found the units in the original code confusing, so maybe there is
some reason it needs to be like that that I don't understand yet.
I'll look into it some more.

Thanks,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: pgbench - aggregation of info written into log

2012-09-02 Thread Tomas Vondra


Dne 30.08.2012 18:02, Robert Haas napsal:

On Fri, Aug 24, 2012 at 5:25 PM, Tomas Vondra  wrote:

This patch is a bit less polished (and more complex) than the other
pgbench patch I've sent a while back, and I'm not sure how to handle 
the

Windows branch. That needs to be fixed during the commit fest.


What's the problem with the Windows branch?

Could you un-cuddle your brances to follow the PostgreSQL style, omit
braces around single-statement blocks, and remove the spurious
whitespace changes?


Done, or at least I don't see other formatting issues. Let me know if I 
missed something.



Instead of having both use_log_agg and naggseconds, I think you can
get by with just having a single variable that is zero if aggregation
is not in use and is otherwise the aggregation period.  On a related
note, you can't specify -A without an associated value, so there is 
no
point in documenting a "default".  As with your other patch, I 
suggest

using a long option name like --latency-aggregate-interval and then
making the name of the variable in the code match the option name,
with s/-/_/g, for clarity.


Fixed. I've kept use_log_agg only and I've removed the default. Also
I've added one more check (that the total duration is a multiple of
the aggregation interval).

And just as with the sampling patch, I believe the "-l" should not be
enabled by default, but required. But if more people ask to enable it
whenever the aggregation or sampling is used, I'm fine with it.

Tomasdiff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 00cab73..e044564 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -145,6 +145,7 @@ char   *index_tablespace = NULL;
 #define naccounts  10
 
 bool   use_log;/* log transaction latencies to 
a file */
+intuse_log_agg;/* log aggregates instead of 
individual transactions */
 bool   is_connect; /* establish connection for 
each transaction */
 bool   is_latencies;   /* report per-command latencies */
 intmain_pid;   /* main process id used 
in log filename */
@@ -240,6 +241,18 @@ typedef struct
char   *argv[MAX_ARGS]; /* command word list */
 } Command;
 
+typedef struct
+{
+
+   longstart_time; /* when does the interval start 
*/
+   int cnt;/* number of transactions */
+   double  min_duration;   /* min/max durations */
+   double  max_duration;
+   double  sum;/* sum(duration), 
sum(duration^2) - for estimates */
+   double  sum2;
+   
+} AggVals;
+
 static Command **sql_files[MAX_FILES]; /* SQL script files */
 static int num_files;  /* number of script files */
 static int num_commands = 0;   /* total number of Command structs */
@@ -364,6 +377,8 @@ usage(void)
   "  -f FILENAME  read transaction script from FILENAME\n"
   "  -j NUM   number of threads (default: 1)\n"
   "  -l   write transaction times to log file\n"
+  "  --aggregate-interval NUM\n"
+  "   aggregate data over NUM seconds\n"
   "  -M simple|extended|prepared\n"
   "   protocol for submitting queries to server 
(default: simple)\n"
   "  -n   do not run VACUUM before tests\n"
@@ -817,9 +832,22 @@ clientDone(CState *st, bool ok)
return false;   /* always false */
 }
 
+static
+void agg_vals_init(AggVals * aggs, instr_time start)
+{
+   aggs->cnt = 0;
+   aggs->sum = 0;
+   aggs->sum2 = 0;
+   
+   aggs->min_duration = 3600 * 100.0; /* one hour */
+   aggs->max_duration = 0;
+
+   aggs->start_time = INSTR_TIME_GET_DOUBLE(start);
+}
+
 /* return false iff client should be disconnected */
 static bool
-doCustom(TState *thread, CState *st, instr_time *conn_time, FILE *logfile)
+doCustom(TState *thread, CState *st, instr_time *conn_time, FILE *logfile, 
AggVals * agg)
 {
PGresult   *res;
Command   **commands;
@@ -881,17 +909,70 @@ top:
diff = now;
INSTR_TIME_SUBTRACT(diff, st->txn_begin);
usec = (double) INSTR_TIME_GET_MICROSEC(diff);
-
+   
+   /* should we aggregate the results or not? */
+   if (use_log_agg)
+   {
+   
+   /* are we still in the same interval? if yes, 
accumulate the
+* values (print them otherwise) */
+   if (agg->start_time + use_log_agg >= 
INSTR_TIME_GET_DOUBLE(now))
+   {
+

Re: [HACKERS] PATCH: pgbench - random sampling of transaction written into log

2012-09-02 Thread Tomas Vondra


Dne 31.08.2012 00:01, Tomas Vondra napsal:

On 30 Srpen 2012, 23:44, Robert Haas wrote:

On Thu, Aug 30, 2012 at 3:48 PM, Tomas Vondra  wrote:
That sounds like a pretty trivial patch. I've been thinking about 
yet

another option - histograms (regular or with exponential bins).


I thought about that, too, but I think high-outliers is a lot more
useful.  At least for the kinds of things I worry about.


OK, let's fix the current patches first. We can add more options 
later.


So, here is a fixed version of the patch. I've made these changes:

1) The option is now '--sampling-rate' instead of '-R' and accepts 
float arguments. I've decided not to use 0.01 = 1% but use 1 = 1%, so it 
accepts values between 0 and 100. I find this more natural.


2) I've removed one of the two new variables, the remaining one is used 
both to check whether the sampling is enabled and keep the value.


3) I've decided not to enable "-l" whenever the "--sampling-rate" is 
used. Again, I find this a bit less magic behavior.


4) I've fixed some formatting issues - if you notice another one that I 
don't see, let me know.


Tomasdiff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 00cab73..578aeb3 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -130,6 +130,11 @@ intforeign_keys = 0;
 intunlogged_tables = 0;
 
 /*
+ * use log sampling (rate => 1 = 100%, 0 = don't use)
+ */
+double use_sample_rate = 0.0;
+
+/*
  * tablespace selection
  */
 char  *tablespace = NULL;
@@ -364,6 +369,8 @@ usage(void)
   "  -f FILENAME  read transaction script from FILENAME\n"
   "  -j NUM   number of threads (default: 1)\n"
   "  -l   write transaction times to log file\n"
+  "  --sampling-rate NUM\n"
+  "   sampling rate of the log (e.g. 0.01 for 1%% 
sample)\n"
   "  -M simple|extended|prepared\n"
   "   protocol for submitting queries to server 
(default: simple)\n"
   "  -n   do not run VACUUM before tests\n"
@@ -877,21 +884,26 @@ top:
instr_time  diff;
double  usec;
 
-   INSTR_TIME_SET_CURRENT(now);
-   diff = now;
-   INSTR_TIME_SUBTRACT(diff, st->txn_begin);
-   usec = (double) INSTR_TIME_GET_MICROSEC(diff);
+   /* either no sampling or is within the sample */
+   if ((use_sample_rate == 0.0) || 
(pg_erand48(thread->random_state) <= use_sample_rate))
+   {
+
+   INSTR_TIME_SET_CURRENT(now);
+   diff = now;
+   INSTR_TIME_SUBTRACT(diff, st->txn_begin);
+   usec = (double) INSTR_TIME_GET_MICROSEC(diff);
 
 #ifndef WIN32
-   /* This is more than we really ought to know about 
instr_time */
-   fprintf(logfile, "%d %d %.0f %d %ld %ld\n",
-   st->id, st->cnt, usec, st->use_file,
-   (long) now.tv_sec, (long) now.tv_usec);
+   /* This is more than we really ought to know 
about instr_time */
+   fprintf(logfile, "%d %d %.0f %d %ld %ld\n",
+   st->id, st->cnt, usec, 
st->use_file,
+   (long) now.tv_sec, (long) 
now.tv_usec);
 #else
-   /* On Windows, instr_time doesn't provide a timestamp 
anyway */
-   fprintf(logfile, "%d %d %.0f %d 0 0\n",
-   st->id, st->cnt, usec, st->use_file);
+   /* On Windows, instr_time doesn't provide a 
timestamp anyway */
+   fprintf(logfile, "%d %d %.0f %d 0 0\n",
+   st->id, st->cnt, usec, 
st->use_file);
 #endif
+   }
}
 
if (commands[st->state]->type == SQL_COMMAND)
@@ -1918,6 +1930,7 @@ main(int argc, char **argv)
{"index-tablespace", required_argument, NULL, 3},
{"tablespace", required_argument, NULL, 2},
{"unlogged-tables", no_argument, &unlogged_tables, 1},
+   {"sampling-rate", required_argument, NULL, 4},
{NULL, 0, NULL, 0}
};
 
@@ -2123,6 +2136,14 @@ main(int argc, char **argv)
case 3: /* index-tablespace */
index_tablespace = optarg;
break;
+   case 4:
+   use_sample_rate = atof(optarg)/100;
+

[HACKERS] Fwd: [PERFORM] Loose Index Scans by Planner?

2012-09-02 Thread Jeff Janes

I'd like to create a ToDo item for "loose index scans" or "skip
scans", when the lead column has low cardinality and is not used in
the where clause.  This case can potentially be optimized by using the
index as if it were a collection of N "partitioned" indexes, where N
is the cardinality of the lead column.  Any objections?

I don't really have a detailed plan on how to do it.  I expect the
planner part would be harder than the execution part.

See "[PERFORM] Loose Index Scans by Planner" thread.

Thanks,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-09-02 Thread Jeff Janes

On Thu, Aug 30, 2012 at 8:17 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, Aug 30, 2012 at 4:51 PM, Tom Lane  wrote:
>>> Bruce Momjian  writes:
 On Thu, May 31, 2012 at 09:20:43AM +0900, Tatsuo Ishii wrote:
> Ok, I modified the part of pg_dump where tremendous number of LOCK
> TABLE are issued. I replace them with single LOCK TABLE with multiple
> tables. With 100k tables LOCK statements took 13 minutes in total, now
> it only takes 3 seconds. Comments?
>
 Was this applied?
>
>>> No, we fixed the server side instead.
>
>> But only for 9.2, right?  So people running back branches are still screwed.
>
> Yeah, but they're screwed anyway, because there are a bunch of O(N^2)
> behaviors involved here, not all of which are masked by what Tatsuo-san
> suggested.

All of the other ones that I know of were associated with pg_dump
itself, and since it is recommended to run the newer version of
pg_dump against the older version of the server, no back patching
would be necessary to get the benefits of those particular fixes.

> Six months or a year from now, we might have enough confidence in that
> batch of 9.2 fixes to back-port them en masse.  Don't want to do it
> today though.

What would be the recommendation for people trying to upgrade, but who
can't get their data out in a reasonable window?

Putting Tatsuo-san's change into a future pg_dump might be more
conservative than back-porting the server's Lock Table change to the
server version they are trying to get rid of.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.3] Row-Level Security

2012-09-02 Thread Dean Rasheed

On 17 July 2012 05:02, Kohei KaiGai  wrote:
> 2012/7/17 Robert Haas :
>> On Sun, Jul 15, 2012 at 5:52 AM, Kohei KaiGai  wrote:
>>> The attached patch is a revised version of row-level security feature.
>>> ...
>>> According to the Robert's comment, I revised the place to inject
>>> applyRowLevelSecurity(). The reason why it needed to patch on
>>> adjust_appendrel_attrs_mutator() was, we handled expansion from
>>> regular relation to sub-query after expand_inherited_tables().
>>> In this revision, it was moved to the head of sub-query planner.
>>>

Hi,

I had a quick look at this and spotted a problem - certain types of
query are able to bypass the RLS quals. For example:

SELECT * FROM (SELECT * FROM foo) foo;

since the RLS policy doesn't descend into subqueries, and is applied
before they are pulled up into the main query. Similarly for views on
top of tables with RLS, and SRF functions that query a table with RLS
that get inlined.

Also queries using UNION ALL are vulnerable if they end up being
flattened, for example:

SELECT * FROM foo UNION ALL SELECT * FROM foo;

FWIW I recently developed some similar code as part of a patch to
implement automatically updatable views
(http://archives.postgresql.org/pgsql-hackers/2012-08/msg00303.php).
Some parts of that code may be useful, possibly for adding
UPDATE/DELETE support.

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proof of concept: auto updatable views

2012-09-02 Thread Dean Rasheed

On 31 August 2012 07:59, Dean Rasheed  wrote:
> On 30 August 2012 20:05, Robert Haas  wrote:
>> On Sun, Aug 12, 2012 at 5:14 PM, Dean Rasheed  
>> wrote:
>>> None of this new code kicks in for non-security barrier views, so the
>>> kinds of plans I posted upthread remain unchanged in that case. But
>>> now a significant fraction of the patch is code added to handle
>>> security barrier views. Of course we could simply say that such views
>>> aren't updatable, but that seems like an annoying limitation if there
>>> is a feasible way round it.
>>
>> Maybe it'd be a good idea to split this into two patches: the first
>> could implement the feature but exclude security_barrier views, and
>> the second could lift that restriction.
>>
>
> Yes, I think that makes sense.
> I should hopefully get some time to look at it over the weekend.
>

Here's an updated patch for the base feature (without support for
security barrier views) with updated docs and regression tests.

Regards,
Dean


auto-update-views.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Yet another failure mode in pg_upgrade

2012-09-02 Thread Bruce Momjian

On Sun, Sep  2, 2012 at 01:13:52PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Sun, Sep  2, 2012 at 01:06:28AM -0400, Robert Haas wrote:
> >> I don't think this is reducing the number of failure modes; it's just
> >> changing it from one set of obscure cases to a slightly different set
> >> of obscure cases.
> 
> > Tom reported problems with having old/new with different default socket
> > locations.  This fixes that, and reduces the possibility of acciental
> > connections.  What problems does this add?
> 
> I'm going to be needing some fix in this area in any case, though
> whether it's exactly Bruce's current patch is not clear to me.  I found
> out last night while making a test build of 9.2rc1 as a Fedora package
> that pg_upgrade's regression test fails in the Fedora build environment,
> if the postmaster has been patched so that its default socket directory
> is /var/run/postgresql.  That happens because /var/run/postgresql
> doesn't exist in the build environment (it is only going to exist once
> the postgresql-server package is installed), so the postmaster fails
> to start because it can't create a socket where it expects to.
> I have a patch to pg_regress that instructs the temporary postmaster
> to use /tmp as unix_socket_directory regardless of its built-in default,
> so that "make check" works for the regular core and contrib regression
> tests.  However, that doesn't affect pg_upgrade's regression test case.
> 
> It looks rather messy to persuade pg_upgrade to do things differently
> for regression testing and normal use, not to mention that it would make
> the test even less representative of normal use.  So I'm thinking that
> we do need the pg_upgrade feature Bruce is suggesting of forcing the
> socket directory to be the current directory.  What's more, if that's
> not back-patched to 9.2, I'm going to have to carry it as a Fedora patch
> anyway.
> 
> Alternatively, I can prevent "make check" from testing pg_upgrade
> (which is what I did so I could carry on with package testing).
> I'd just as soon not ship it like that, though.

Well, I don't know of any known problems with the patch.  On the other
hand, I don't know our policy in pushing patches into RC releases at the
request of packagers.

If you want to stand behind the patch, it might be OK.  I think that's
how we handle such requests --- someone has to put their neck out for
it.  Fortunately the patch is not very large so is easy to review.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] GPU and Database

2012-09-02 Thread Gaetano Mendola


May be someone of you is interested in this

ADBIS workshop on GPUs in Databases

http://gid2012.cs.put.poznan.pl/


Regards
Gaetano Mendola


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [bugfix] sepgsql didn't follow the latest core API changes

2012-09-02 Thread Kohei KaiGai

This patch fixes a few portions on which sepgsql didn't follow the latest
core API changes.

1) Even though the prototype of ProcessUtility_hook was recently changed,
sepgsql side didn't follow this update, so it made build failed.

2) sepgsql internally uses GETSTRUCT() and HeapTupleGetOid() macro
these were moved to htup_details.h, so it needs an additional #include
for "access/htup_defails.h".

3) sepgsql internally used a bool typed variable named "abort".
I noticed it conflicts with ereport macro because it internally expanded to
ereport_domain that contains invocation of "abort()". So, it renamed this
variables to abort_on_violation.

#define ereport_domain(elevel, domain, rest)\
(errstart(elevel, __FILE__, __LINE__, PG_FUNCNAME_MACRO, domain) ? \
 (errfinish rest) : (void) 0), \
((elevel) >= ERROR ? abort() : (void) 0)

This does not affect to v9.2, so please apply it on the master branch.

Thanks,
-- 
KaiGai Kohei 


sepgsql-fixbug-follow-core-apis.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [bugfix] sepgsql missed a case of CREATE TABLE AS

2012-09-02 Thread Kohei KaiGai

The attached patch fixes a bug in sepgsql module.
Could you apply both v9.2 and master branch?

When post_create hook is invoked, sepgsql's handler checks
whether the current invocation context needs to have permission
checks according to the command-tag saved on ProcessUtility_hook.

But it overlooked T_CreateTableAs tag, thus, neither of security
label nor permission checks were applied, as if the routine did
on toast relation.
This patch adds this command tag as a context to check permissions.

Thanks,
-- 
KaiGai Kohei 


sepgsql-fixbug-create-table-as.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Yet another failure mode in pg_upgrade

2012-09-02 Thread Tom Lane

Bruce Momjian  writes:
> On Sun, Sep  2, 2012 at 01:06:28AM -0400, Robert Haas wrote:
>> I don't think this is reducing the number of failure modes; it's just
>> changing it from one set of obscure cases to a slightly different set
>> of obscure cases.

> Tom reported problems with having old/new with different default socket
> locations.  This fixes that, and reduces the possibility of acciental
> connections.  What problems does this add?

I'm going to be needing some fix in this area in any case, though
whether it's exactly Bruce's current patch is not clear to me.  I found
out last night while making a test build of 9.2rc1 as a Fedora package
that pg_upgrade's regression test fails in the Fedora build environment,
if the postmaster has been patched so that its default socket directory
is /var/run/postgresql.  That happens because /var/run/postgresql
doesn't exist in the build environment (it is only going to exist once
the postgresql-server package is installed), so the postmaster fails
to start because it can't create a socket where it expects to.
I have a patch to pg_regress that instructs the temporary postmaster
to use /tmp as unix_socket_directory regardless of its built-in default,
so that "make check" works for the regular core and contrib regression
tests.  However, that doesn't affect pg_upgrade's regression test case.

It looks rather messy to persuade pg_upgrade to do things differently
for regression testing and normal use, not to mention that it would make
the test even less representative of normal use.  So I'm thinking that
we do need the pg_upgrade feature Bruce is suggesting of forcing the
socket directory to be the current directory.  What's more, if that's
not back-patched to 9.2, I'm going to have to carry it as a Fedora patch
anyway.

Alternatively, I can prevent "make check" from testing pg_upgrade
(which is what I did so I could carry on with package testing).
I'd just as soon not ship it like that, though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fwd: PATCH: psql boolean display

2012-09-02 Thread Pavel Stehule

2012/9/2 Tom Lane :
> I wrote:
>> Phil Sorber  writes:
>>> What my patch was intended to do was let the end user set boolean
>>> output to any arbitrary values. While foo/bar is pretty useless, it
>>> was meant to reinforce that it was capable of any arbitrary value. I
>>> can think of a decent list of other output an end user might want,
>>> such as:
>
>>> true/false
>>> yes/no
>>> y/n
>>> on/off
>>> 1/0
>>> enabled/disabled
>
>>> Plus the different capitalized forms.
>
>> I can readily see that people might want boolean columns displayed in
>> such ways in custom applications.  I'm less convinced that there is much
>> use for it in psql, though.
>
> BTW, another point that your list brings to mind is that somebody who
> wants something like this would very possibly want different displays
> for different columns.  The proposed feature, being one-size-fits-all
> for "boolean", couldn't handle that.
>

I proposed just more cleaner and more conventional boolean output in
psql - nothing more. We can write formatting functions, CASE, we can
use enums.


> What would make a lot more sense in my mind would be to label columns
> *in the database* to show how they ought to be displayed.
>
> One conceivable method is to make a collection of domains over bool,
> and drive the display off the particular domain used.  However we lack
> some infrastructure that would be needed for this (in particular, I'm
> pretty sure the PQftype data delivered to the client identifies the
> underlying type and not the domain).
>
> Another approach is to make a collection of enum types, in which case
> you don't need any client-side support at all.  I experimented with
> this method a bit, and it seems workable:
>
> regression=# create type mybool as enum ('no', 'yes');
> CREATE TYPE
> regression=# create function bool(mybool) returns bool as
> $$ select $1 = 'yes'::mybool $$ language sql immutable;
> CREATE FUNCTION
> regression=# create cast (mybool as bool) with function bool(mybool) as 
> assignment;
> CREATE CAST
> regression=# create table mb(f1 mybool);
> CREATE TABLE
> regression=# insert into mb values('no'),('yes');
> INSERT 0 2
> regression=# select * from mb where f1;
>  f1
> -
>  yes
> (1 row)
>
> regression=# select * from mb where f1 = 'yes';
>  f1
> -
>  yes
> (1 row)
>
> I tried making the cast be implicit, but that caused problems with
> ambiguous operators, so assignment seems to be the best you can do here.
>
> A variant of this is to build casts in the other direction
> (bool::mybool), declare columns in the database as regular bool,
> and apply the casts in queries when you want columns displayed in a
> particular way.  This might be the best solution if the desired
> display is at all context-dependent.

When I worked on PSM I required possibility to simple specification
expected datatype out of SQL statement - some like enhancing
parametrised queries - with fourth parameter - expected types.

Then somebody can set expected type for some column simply - out of
query - and transformation can be fast. It should be used for
unsupported date formats and similar tasks.

Regards

Pavel

>
> regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fwd: PATCH: psql boolean display

2012-09-02 Thread Pavel Stehule

2012/9/2 Phil Sorber :
> On Sun, Sep 2, 2012 at 1:13 AM, Pavel Stehule  wrote:
>> -- Forwarded message --
>> From: Pavel Stehule 
>> Date: 2012/9/1
>> Subject: PATCH: psql boolean display
>> To: Phil Sorber 
>>
>>
>> Hello
>>
>> I am looking to your patch:
>>
>> I have only one note. I don't think so using any text for values
>> "true" and "false" is good idea. I don't see a sense of any special
>> texts like "foo", "bar" from your example.
>>
>> More strict design is better - I wrote simple modification - it is
>> based on psql psets - and it add new configuration settings "boolstyle
>> [char|word]". A advantage of this design is consistency  and possible
>> autocomplete support.
>>
>> Regards
>>
>> Pavel
>>
>>
>>
>> postgres=> select true, false;
>>  bool │ bool
>> ──┼──
>>  t│ f
>> (1 row)
>>
>> postgres=> \pset boolstyle word
>> Bool output style is word.
>> postgres=> select true, false;
>>  bool │ bool
>> ──┼───
>>  true │ false
>> (1 row)
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>
> What you are proposing sounds like it would be better suited to a
> server side GUC. Based on the discussion in the thread that said
> true/false was the SQL standard and we were doing it incorrectly as
> t/f but could not change for legacy reasons. We could even change the
> default but give users a way to revert to the old behavior. Similar to
> the bytea_output GUC. Maybe add the GUC for 9.3 but not change the
> default behavior until 10.0.
>
> What my patch was intended to do was let the end user set boolean
> output to any arbitrary values. While foo/bar is pretty useless, it
> was meant to reinforce that it was capable of any arbitrary value. I
> can think of a decent list of other output an end user might want,
> such as:
>
> true/false
> yes/no
> y/n
> on/off
> 1/0
> enabled/disabled
>
> Plus the different capitalized forms.

If you have these different requests, then you can use enums - or you
can use own formatting function. There is relative strong
recommendation don't use implicit formatting based on database
configuration from application and inside application use explicit
formatting anywhere. I don't thing so using GUC for boolean datatype
is good idea.

Using just chars 't' and 'f' is unlucky design, that must be respected
due compatibility reasons. You don't need to solve it usually, because
transformation from chars to words can do application or database
driver - so I understand this as client issue - psql issue in this
case. And I really don't see any sense for unlimited bool output - in
simple tool like psql. It can be nice to fix issue with chars, because
chars are not too pronounced, but we don't need to supply enums.

Regards

Pavel


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bitmap scan much slower than index scan, hash_search_with_hash_value

2012-09-02 Thread Sergey Koposov


On Sun, 2 Sep 2012, Peter Geoghegan wrote:


On 2 September 2012 16:26, Sergey Koposov  wrote:

That's why we support altering that value with an ALTER TABLE...ALTER
COLUMN DDL statement. You might at least consider increasing the
statistics target for the column first though, to see if that will
make the n_distinct value better accord with reality.


Thanks, I've forgot about that possibility.

After fixing the n_distinct to the correct value the index scan became the 
preferred option. and the row number estimate for the index scan became 
more realistic


Sorry for the noise.
Sergey

*
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fwd: PATCH: psql boolean display

2012-09-02 Thread Tom Lane

I wrote:
> Phil Sorber  writes:
>> What my patch was intended to do was let the end user set boolean
>> output to any arbitrary values. While foo/bar is pretty useless, it
>> was meant to reinforce that it was capable of any arbitrary value. I
>> can think of a decent list of other output an end user might want,
>> such as:

>> true/false
>> yes/no
>> y/n
>> on/off
>> 1/0
>> enabled/disabled

>> Plus the different capitalized forms.

> I can readily see that people might want boolean columns displayed in
> such ways in custom applications.  I'm less convinced that there is much
> use for it in psql, though.

BTW, another point that your list brings to mind is that somebody who
wants something like this would very possibly want different displays
for different columns.  The proposed feature, being one-size-fits-all
for "boolean", couldn't handle that.

What would make a lot more sense in my mind would be to label columns
*in the database* to show how they ought to be displayed.

One conceivable method is to make a collection of domains over bool,
and drive the display off the particular domain used.  However we lack
some infrastructure that would be needed for this (in particular, I'm
pretty sure the PQftype data delivered to the client identifies the
underlying type and not the domain).

Another approach is to make a collection of enum types, in which case
you don't need any client-side support at all.  I experimented with
this method a bit, and it seems workable:

regression=# create type mybool as enum ('no', 'yes');
CREATE TYPE
regression=# create function bool(mybool) returns bool as
$$ select $1 = 'yes'::mybool $$ language sql immutable;
CREATE FUNCTION
regression=# create cast (mybool as bool) with function bool(mybool) as 
assignment;
CREATE CAST
regression=# create table mb(f1 mybool);
CREATE TABLE
regression=# insert into mb values('no'),('yes');
INSERT 0 2
regression=# select * from mb where f1;
 f1  
-
 yes
(1 row)

regression=# select * from mb where f1 = 'yes';
 f1  
-
 yes
(1 row)

I tried making the cast be implicit, but that caused problems with
ambiguous operators, so assignment seems to be the best you can do here.

A variant of this is to build casts in the other direction
(bool::mybool), declare columns in the database as regular bool,
and apply the casts in queries when you want columns displayed in a
particular way.  This might be the best solution if the desired
display is at all context-dependent.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bitmap scan much slower than index scan, hash_search_with_hash_value

2012-09-02 Thread Peter Geoghegan

On 2 September 2012 16:26, Sergey Koposov  wrote:
> After looking at them, I think I understand the reason -- the
> number of n_distinct for crts.data is terribly wrong. In reality it should
> be ~ 130 millions. I already faced this problem at certain point when doing
> "group by objid" and PG was excausting all the memory, because of that wrong
> estimate. But I know that it's a known hard problem to estimate n_distinct.
>
> So probably that's the main reason of a problem...

That's why we support altering that value with an ALTER TABLE...ALTER
COLUMN DDL statement. You might at least consider increasing the
statistics target for the column first though, to see if that will
make the n_distinct value better accord with reality.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bitmap scan much slower than index scan, hash_search_with_hash_value

2012-09-02 Thread Sergey Koposov


On Sun, 2 Sep 2012, Tom Lane wrote:


Sergey Koposov  writes:
The problem is definitely the misestimation here:

 ->  Index Scan using data_objid_idx on data d1  (cost=0.00..26603.32 
rows=415080 width=40) (actual time=0.010..0.050 rows=248 loops=11456)
   Index Cond: (objid = t.objid)

The planner thinks that indexscan will be 2000 times more expensive than
it really is (assuming that the cost per retrieved row is linear, which
it isn't entirely, but close enough for the moment).  Of course, it's
also thinking the bitmap component scan on the same index will be 2000
times more expensive than reality, but that has only perhaps a 4X impact
on the total cost of the bitmap scan, since the use of the other index
is what dominates there.  With a more accurate idea of this join
condition's selectivity, I'm pretty certain it would have gone for a
plain indexscan, or else a bitmap scan using only this index.
So if there's a bug here, it's in eqjoinsel().  Can you pinpoint
anything unusual about the stats of the objid columns?


Here are the pg_stats rows for two tables.

 schemaname | tablename | attname | inherited | null_frac | avg_width | 
n_distinct |







 
most_common_vals

   !






   |





most_common_freq!
s !

  | !


   histogram_bounds

Re: [HACKERS] Fwd: PATCH: psql boolean display

2012-09-02 Thread Tom Lane

Phil Sorber  writes:
> What my patch was intended to do was let the end user set boolean
> output to any arbitrary values. While foo/bar is pretty useless, it
> was meant to reinforce that it was capable of any arbitrary value. I
> can think of a decent list of other output an end user might want,
> such as:

> true/false
> yes/no
> y/n
> on/off
> 1/0
> enabled/disabled

> Plus the different capitalized forms.

I can readily see that people might want boolean columns displayed in
such ways in custom applications.  I'm less convinced that there is much
use for it in psql, though.  In the big scheme of things, psql is a
rather low-level tool, designed for DBAs and SQL programmers.  I'd get
quite upset if psql failed to tell me the truth about what was in a
table I was looking at --- and a feature like this comes pretty close
to not telling the truth, especially if it kicks in on a column I
wasn't expecting it to.

On the whole I think this sort of substitution belongs in a
user-written-application layer of software, not in any of the tools
we supply.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fwd: PATCH: psql boolean display

2012-09-02 Thread Phil Sorber

On Sun, Sep 2, 2012 at 1:13 AM, Pavel Stehule  wrote:
> -- Forwarded message --
> From: Pavel Stehule 
> Date: 2012/9/1
> Subject: PATCH: psql boolean display
> To: Phil Sorber 
>
>
> Hello
>
> I am looking to your patch:
>
> I have only one note. I don't think so using any text for values
> "true" and "false" is good idea. I don't see a sense of any special
> texts like "foo", "bar" from your example.
>
> More strict design is better - I wrote simple modification - it is
> based on psql psets - and it add new configuration settings "boolstyle
> [char|word]". A advantage of this design is consistency  and possible
> autocomplete support.
>
> Regards
>
> Pavel
>
>
>
> postgres=> select true, false;
>  bool │ bool
> ──┼──
>  t│ f
> (1 row)
>
> postgres=> \pset boolstyle word
> Bool output style is word.
> postgres=> select true, false;
>  bool │ bool
> ──┼───
>  true │ false
> (1 row)
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

What you are proposing sounds like it would be better suited to a
server side GUC. Based on the discussion in the thread that said
true/false was the SQL standard and we were doing it incorrectly as
t/f but could not change for legacy reasons. We could even change the
default but give users a way to revert to the old behavior. Similar to
the bytea_output GUC. Maybe add the GUC for 9.3 but not change the
default behavior until 10.0.

What my patch was intended to do was let the end user set boolean
output to any arbitrary values. While foo/bar is pretty useless, it
was meant to reinforce that it was capable of any arbitrary value. I
can think of a decent list of other output an end user might want,
such as:

true/false
yes/no
y/n
on/off
1/0
enabled/disabled

Plus the different capitalized forms.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bitmap scan much slower than index scan, hash_search_with_hash_value

2012-09-02 Thread Tom Lane

Sergey Koposov  writes:
> On Sun, 2 Sep 2012, Peter Geoghegan wrote:
>> One obvious red-flag from your query plans is that there is a
>> misestimation of the row return count of a few orders of magnitude in
>> the Bitmap Index Scan node. Did you trying performing an ANALYZE to
>> see if that helped? It may also be helpful to show pg_stats entries
>> for both the data.mjd and test.mjd columns. You may find, prior to
>> doing an ANALYZE, that there is no entries for one of those tables.

> The main large table is static and was analyzed. The test table was as 
> well. But as mentioned in another recent email, the query is the join, so 
> column correlation is a problem.

The problem is definitely the misestimation here:

  ->  Index Scan using data_objid_idx on data d1  (cost=0.00..26603.32 
rows=415080 width=40) (actual time=0.010..0.050 rows=248 loops=11456)
Index Cond: (objid = t.objid)

The planner thinks that indexscan will be 2000 times more expensive than
it really is (assuming that the cost per retrieved row is linear, which
it isn't entirely, but close enough for the moment).  Of course, it's
also thinking the bitmap component scan on the same index will be 2000
times more expensive than reality, but that has only perhaps a 4X impact
on the total cost of the bitmap scan, since the use of the other index
is what dominates there.  With a more accurate idea of this join
condition's selectivity, I'm pretty certain it would have gone for a
plain indexscan, or else a bitmap scan using only this index.

So if there's a bug here, it's in eqjoinsel().  Can you pinpoint
anything unusual about the stats of the objid columns?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bitmap scan much slower than index scan, hash_search_with_hash_value

2012-09-02 Thread Sergey Koposov

Thanks for your comments. 
On Sun, 2 Sep 2012, Peter Geoghegan wrote:

On 2 September 2012 06:21, Sergey Koposov  wrote:

I think that this kind of question is better suited to the
pgsql-performance list. Granted, it was presented as a bug report
(though they're generally sent to -bugs rather than -hackers), but I
don't think that this is a valid bug.


The reason is that was inclined to think that it is a bug is that I
encountered a similar bug before with bitmap scans and very big 
tables

http://archives.postgresql.org/pgsql-hackers/2011-08/msg00958.php
Furthermore 2 orders of magnitudes more of CPU time for bitmap scans 
comparing to  the index scan didn't sound right to me (although obviously

I'm not in the position to claim that it's 100% bug).


One obvious red-flag from your query plans is that there is a
misestimation of the row return count of a few orders of magnitude in
the Bitmap Index Scan node. Did you trying performing an ANALYZE to
see if that helped? It may also be helpful to show pg_stats entries
for both the data.mjd and test.mjd columns. You may find, prior to
doing an ANALYZE, that there is no entries for one of those tables.


The main large table is static and was analyzed. The test table was as 
well. But as mentioned in another recent email, the query is the join, so 
column correlation is a problem.



Turning off the enable_* planner options in production is generally
discouraged. Certainly, you'd be crazy to do that on a server-wide
basis.


I'm using PG for data mining, data analysis purposes with very few clients 
connected and very large tables, so enable_* is used quite often to fix 
incorrect plans due to incorrect selectivities.


Regards,
Sergey

*
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Multiple Slave Failover with PITR

2012-09-02 Thread Bruce Momjian


Do we ever want to document a way to connect slaves to a new master,
rather than recreating the slave?

---

On Tue, Mar 27, 2012 at 10:47:48AM -0700, Ken Brush wrote:
> Hello everyone,
> 
> I notice that the documentation at:
> http://wiki.postgresql.org/wiki/Binary_Replication_Tutorial
> 
> Doesn't contain steps in a Multiple Slave setup for re-establishing
> them after a slave has become the new master.
> 
> Based on the documentation, here are the most fail-proof steps I came up with:
> 
> 1. Master dies :(
> 2. Touch the trigger file on the most caught up slave.
> 3. Slave is now the new master :)
> 4. use pg_basebackup or other binary replication trick (rsync, tar
> over ssh, etc...) to bring the other slaves up to speed with the new
> master.
> 5. start the other slaves pointing to the new master.
> 
> But, that can take time (about 1-2 hours) with my medium sized DB
> (580GB currently).
> 
> After testing a few different ideas that I gleaned from posts on the
> mail list, I came up with this alternative method:
> 
> 1. Master dies :(
> 2. Touch the trigger file on the most caught up slave
> 3. Slave is now the new master.
> 4. On the other slaves do the following:
> 5. Shutdown postgres on the slave
> 6. Delete every file in /data/pgsql/data/pg_xlog
> 7. Modify the recovery.conf file to point to the new master and
> include the line "recovery_target_timeline='latest'"
> 8. Copy the history file from the new master to the slave (it's the
> most recent #.history file in the xlog directory)
> 9. Startup postgres on the slave and watch it sync up to the new
> master (about 1-5 minutes usually)
> 
> My question is this. Is the alternative method adequate? I tested it a
> bit and couldn't find any problems with data loss or inconsistency.
> 
> I still use the fail-proof method above to re-incorporate the old
> master as a new slave.
> 
> Sincerely,
> -Ken
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bitmap scan much slower than index scan, hash_search_with_hash_value

2012-09-02 Thread Sergey Koposov


On Sun, 2 Sep 2012, Pavel Stehule wrote:



statistics on data_objid_idx  table are absolutly out - so planner
cannot find optimal plan


That doesn't have anything to do with the problem, AFAIU.
First, the data table is static and was analysed.
Second, the query in question is the join, and afaik the estimation of the 
number of rows is known to be incorrect, in the case of column 
correlation.
Third, according at least to my understanding in the fully cached regime 
bitmap scan should not take two orders of magnitude more CPU time than 
index scan.


Sergey


Regard

Pavel Stehule


   Index Cond: (objid = t.objid)
 Total runtime: 66622.026 ms
(11 rows)

Here is the output when bitmap scans are disabled:
QUERY PLAN
 QUERY
PLAN

 Limit  (cost=0.00..329631941.65 rows=1 width=68) (actual
time=0.082..906.876 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..4979486036.95 rows=151062 width=68) (actual
time=0.081..905.683 rows=1 loops=1)
 Join Filter: (t.mjd = d1.mjd)
 ->  Seq Scan on test t  (cost=0.00..2632.77 rows=151677 width=28)
(actual time=0.009..1.679 rows=11456 loops=1)
 ->  Index Scan using data_objid_idx on data d1
(cost=0.00..26603.32 rows=415080 width=40) (actual time=0.010..0.050
rows=248 loops=11456)
   Index Cond: (objid = t.objid)
 Total runtime: 907.462 ms

When the bitmap scans are enabled the "prof" of postgres shows
47.10%  postmaster  postgres   [.] hash_search_with_hash_value
|
--- hash_search_with_hash_value

11.06%  postmaster  postgres   [.] hash_seq_search
|
--- hash_seq_search

 6.95%  postmaster  postgres   [.] hash_any
|
--- hash_any

 5.17%  postmaster  postgres   [.] _bt_checkkeys
|
--- _bt_checkkeys

 4.07%  postmaster  postgres   [.] tbm_add_tuples
|
--- tbm_add_tuples

 3.41%  postmaster  postgres   [.] hash_search
|
--- hash_search


And the last note is that the crts.data table which is being bitmap scanned
is a 1.1Tb table with ~ 20e9 rows. My feeling is that the bitmap index scan
code
is somehow unprepared to combine two bitmaps for such a big table, and this
leads to the terrible performance.

Regards,
Sergey

PS Here are the schemas of the tables, just in case:
wsdb=> \d test
  Table "koposov.test"
 Column  |   Type   | Modifiers
-+--+---
 mjd | double precision |
 fieldid | bigint   |
 intmag  | integer  |
 objid   | bigint   |

wsdb=> \d crts.data
   Table "crts.data"
 Column |   Type   | Modifiers
+--+---
 objid  | bigint   |
 mjd| double precision |
 mag| real |
 emag   | real |
 ra | double precision |
 dec| double precision |
Indexes:
"data_mjd_idx" btree (mjd) WITH (fillfactor=100)
"data_objid_idx" btree (objid) WITH (fillfactor=100)
"data_q3c_ang2ipix_idx" btree (q3c_ang2ipix(ra, "dec")) WITH
(fillfactor=100)

PPS shared_buffers=10GB, work_mem=1GB
All the test shown here were don in fully cached regime.

PPS I can believe that what I'm seeing is a feature, not a bug of bitmap
scans,
and I can live with disabling them, but I still thought it's worth
reporting.


*
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers




*
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Yet another failure mode in pg_upgrade

2012-09-02 Thread Bruce Momjian

On Sat, Sep  1, 2012 at 02:35:06PM -0400, Bruce Momjian wrote:
> On Sat, Sep  1, 2012 at 02:23:22PM -0400, Tom Lane wrote:
> > Bruce Momjian  writes:
> > > + /*
> > > +  *  Report the Unix domain socket directory location to the 
> > > postmaster.
> > > +  */
> > 
> > "Report" seems entirely the wrong verb there.

Fixed.

> > > + #define LISTEN_STR  " -c listen_addresses=''"
> > > + 
> > > + /* Have a sockdir to use? */
> > > + if (strlen(os_info.sockdir) != 0)
> > > + snprintf(socket_string, sizeof(socket_string) - 
> > > strlen(LISTEN_STR),
> > > + " -c %s='%s'",
> > > + (GET_MAJOR_VERSION(cluster->major_version) < 
> > > 903) ?
> > > + "unix_socket_directory" : 
> > > "unix_socket_directories",
> > > + os_info.sockdir);
> > > + 
> > > + /* prevent TCP/IP connections */
> > > + strcat(socket_string, LISTEN_STR);
> > 
> > IMO this would be simpler and more readable if you got rid of the LISTEN_STR
> > #define and just included -c listen_addresses='' in the snprintf format
> > string.  The comment for the whole thing should be something like
> > "If we have a socket directory to use, command the postmaster to use it,
> > and disable TCP/IP connections altogether".
> 
> Well, you only want the unix_socket* if sockdir is defined, but you want
> LISTEN_STR unconditionally, even if there is no sockdir.  Not sure how
> that could cleanly be in a single snprintf.

I restructured the code to add the listen_addresses string first,
allowing the removal of the #define, as Tom suggested.  I also added
unix_socket_permissions=0700 to further restrict socket access.

Updated patch attached.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/option.c b/contrib/pg_upgrade/option.c
new file mode 100644
index 94bce50..8b35d8f
*** a/contrib/pg_upgrade/option.c
--- b/contrib/pg_upgrade/option.c
*** adjust_data_dir(ClusterInfo *cluster)
*** 376,378 
--- 376,431 
  
  	check_ok();
  }
+ 
+ 
+ #if HAVE_UNIX_SOCKETS
+ /*
+  * set_sockdir_and_pghost
+  *
+  * Set the socket directory to either the configured location (live check)
+  * or the current directory.
+  */
+ void
+ set_sockdir_and_pghost(bool live_check)
+ {
+ 	if (!live_check)
+ 	{
+ 		/* Use the current directory for the socket */
+ 		if (!getcwd(os_info.sockdir, MAXPGPATH))
+ 			pg_log(PG_FATAL, "cannot find current directory\n");
+ 	}
+ 	else
+ 	{
+ 		/*
+ 		 *	If we are doing a live check, we will use the old cluster's Unix
+ 		 *	domain socket directory so we can connect to the live server.
+ 		 */
+ 
+ 		/* sockdir added to the 5th line of postmaster.pid in PG 9.1 */
+ 		if (GET_MAJOR_VERSION(old_cluster.major_version) >= 901)
+ 		{
+ 			char		filename[MAXPGPATH];
+ 			FILE		*fp;
+ 			int			i;
+ 
+ 			snprintf(filename, sizeof(filename), "%s/postmaster.pid", old_cluster.pgdata);
+ 			if ((fp = fopen(filename, "r")) == NULL)
+ pg_log(PG_FATAL, "Could not get socket directory of the old server\n");
+ 
+ 			for (i = 0; i < 5; i++)
+ if (fgets(os_info.sockdir, sizeof(os_info.sockdir), fp) == NULL)
+ 	pg_log(PG_FATAL, "Could not get socket directory of the old server\n");
+ 			fclose(fp);
+ 		}
+ 		else
+ 			return;		/* Can't get live sockdir, so use the default. */
+ 
+ 		/* Remove trailing newline */
+ 		if (strchr(os_info.sockdir, '\n') != NULL)
+ 			*strchr(os_info.sockdir, '\n') = '\0';
+ 	}
+ 
+ 	/* For client communication */
+ 	pg_putenv("PGHOST", os_info.sockdir);
+ }
+ #endif
diff --git a/contrib/pg_upgrade/pg_upgrade.c b/contrib/pg_upgrade/pg_upgrade.c
new file mode 100644
index c47c8bb..8cad5c3
*** a/contrib/pg_upgrade/pg_upgrade.c
--- b/contrib/pg_upgrade/pg_upgrade.c
*** main(int argc, char **argv)
*** 88,93 
--- 88,97 
  	check_cluster_versions();
  	check_cluster_compatibility(live_check);
  
+ #if HAVE_UNIX_SOCKETS
+ 	set_sockdir_and_pghost(live_check);
+ #endif
+ 
  	check_old_cluster(live_check, &sequence_script_file_name);
  
  
diff --git a/contrib/pg_upgrade/pg_upgrade.h b/contrib/pg_upgrade/pg_upgrade.h
new file mode 100644
index fa4c6c0..a712e21
*** a/contrib/pg_upgrade/pg_upgrade.h
--- b/contrib/pg_upgrade/pg_upgrade.h
*** typedef struct
*** 267,272 
--- 267,275 
  	const char *progname;		/* complete pathname for this program */
  	char	   *exec_path;		/* full path to my executable */
  	char	   *user;			/* username for clusters */
+ #if HAVE_UNIX_SOCKETS
+ 	char		sockdir[NAMEDATALEN];	/* directory for Unix Domain socket */
+ #endif
  	char	  **tablespaces;	/* tablespaces */
  	int			num_tablespaces;
  	char	  **libraries;		/* loadable libraries */
*** void print_maps(FileNameMap *maps, int n
*** 387,392 
--- 390,398

Re: [HACKERS] Yet another failure mode in pg_upgrade

2012-09-02 Thread Bruce Momjian

On Sun, Sep  2, 2012 at 01:06:28AM -0400, Robert Haas wrote:
> > For "live check" operation, you are checking a running server, so
> > assuming the socket is in the current directory is not going to work.
> > What the code does is to read the 5th line from the running server's
> > postmaster.pid file, which has the socket directory in PG >= 9.1.  For
> > pre-9.1, pg_upgrade uses the compiled-in defaults for socket directory.
> > If the defaults are different between the two servers, the new binaries,
> > e.g. pg_dump, will not work.  The fix is for the user to set pg_upgrade
> > -O to match the old socket directory, and set PGHOST before running
> > pg_upgrade.  I could not find a good way to generate a proper error
> > message because we are blind to the socket directory in pre-9.1.
> > Frankly, this is a problem if the old pre-9.1 server is running in a
> > user-configured socket directory too, so a documentation addition seems
> > right here.
> >
> > So, in summary, this patch moves the socket directory to the current
> > directory all but live check operation, and handles different socket
> > directories for old cluster >= 9.1.  I have added a documentation
> > mention of how to make this work for for pre-9.1 old servers.
> 
> I don't think this is reducing the number of failure modes; it's just
> changing it from one set of obscure cases to a slightly different set
> of obscure cases.

Tom reported problems with having old/new with different default socket
locations.  This fixes that, and reduces the possibility of acciental
connections.  What problems does this add?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bitmap scan much slower than index scan, hash_search_with_hash_value

2012-09-02 Thread Peter Geoghegan

On 2 September 2012 06:21, Sergey Koposov  wrote:
> Hi,
>
> I'm experiencing the case when bitmap scan is ~ 70 times slower than index
> scan which seems to be caused by 1) very big table 2) some hash search logic
> (hash_search_with_hash_value )
>
> Here is the explain analyze of the query with bitmap scans allowed:
>
> wsdb=> explain analyze select * from test as t, crts.data as d1
> where d1.objid=t.objid and d1.mjd=t.mjd limit 1;
>   QUERY
> PLAN
> ---
>  Limit  (cost=11514.04..115493165.44 rows=1 width=68) (actual
> time=27.512..66620.231 rows=1 loops=1)
>->  Nested Loop  (cost=11514.04..1799585184.18 rows=155832 width=68)
> (actual time=27.511..66616.807 rows=1 loops=1)
>  ->  Seq Scan on test t  (cost=0.00..2678.40 rows=156240 width=28)
> (actual time=0.010..4.685 rows=11456 loops=1)
>  ->  Bitmap Heap Scan on data d1  (cost=11514.04..11518.05 rows=1
> width=40) (actual time=5.807..5.807 rows=1 loops=11456)
>Recheck Cond: ((mjd = t.mjd) AND (objid = t.objid))
>->  BitmapAnd  (cost=11514.04..11514.04 rows=1 width=0)
> (actual time=5.777..5.777 rows=0 loops=11456)
>  ->  Bitmap Index Scan on data_mjd_idx
> (cost=0.00..2501.40 rows=42872 width=0) (actual time=3.920..3.920 rows=22241
> loops=11456)
>Index Cond: (mjd = t.mjd)
>  ->  Bitmap Index Scan on data_objid_idx
> (cost=0.00..8897.90 rows=415080 width=0) (actual time=0.025..0.025 rows=248
> loops=11456)
>Index Cond: (objid = t.objid)
>  Total runtime: 66622.026 ms
> (11 rows)
>
> Here is the output when bitmap scans are disabled:
> QUERY PLAN
>  QUERY
> PLAN
> 
>  Limit  (cost=0.00..329631941.65 rows=1 width=68) (actual
> time=0.082..906.876 rows=1 loops=1)
>->  Nested Loop  (cost=0.00..4979486036.95 rows=151062 width=68) (actual
> time=0.081..905.683 rows=1 loops=1)
>  Join Filter: (t.mjd = d1.mjd)
>  ->  Seq Scan on test t  (cost=0.00..2632.77 rows=151677 width=28)
> (actual time=0.009..1.679 rows=11456 loops=1)
>  ->  Index Scan using data_objid_idx on data d1
> (cost=0.00..26603.32 rows=415080 width=40) (actual time=0.010..0.050
> rows=248 loops=11456)
>Index Cond: (objid = t.objid)
>  Total runtime: 907.462 ms
>
> When the bitmap scans are enabled the "prof" of postgres shows
> 47.10%  postmaster  postgres   [.] hash_search_with_hash_value
> |
> --- hash_search_with_hash_value
>
> 11.06%  postmaster  postgres   [.] hash_seq_search
> |
> --- hash_seq_search
>
>  6.95%  postmaster  postgres   [.] hash_any
> |
> --- hash_any
>
>  5.17%  postmaster  postgres   [.] _bt_checkkeys
> |
> --- _bt_checkkeys
>
>  4.07%  postmaster  postgres   [.] tbm_add_tuples
> |
> --- tbm_add_tuples
>
>  3.41%  postmaster  postgres   [.] hash_search
> |
> --- hash_search
>
>
> And the last note is that the crts.data table which is being bitmap scanned
> is a 1.1Tb table with ~ 20e9 rows. My feeling is that the bitmap index scan
> code
> is somehow unprepared to combine two bitmaps for such a big table, and this
> leads to the terrible performance.

I think that this kind of question is better suited to the
pgsql-performance list. Granted, it was presented as a bug report
(though they're generally sent to -bugs rather than -hackers), but I
don't think that this is a valid bug.

One obvious red-flag from your query plans is that there is a
misestimation of the row return count of a few orders of magnitude in
the Bitmap Index Scan node. Did you trying performing an ANALYZE to
see if that helped? It may also be helpful to show pg_stats entries
for both the data.mjd and test.mjd columns. You may find, prior to
doing an ANALYZE, that there is no entries for one of those tables.

Turning off the enable_* planner options in production is generally
discouraged. Certainly, you'd be crazy to do that on a server-wide
basis.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bitmap scan much slower than index scan, hash_search_with_hash_value

2012-09-02 Thread Pavel Stehule

Hello

2012/9/2 Sergey Koposov :
> Hi,
>
> I'm experiencing the case when bitmap scan is ~ 70 times slower than index
> scan which seems to be caused by 1) very big table 2) some hash search logic
> (hash_search_with_hash_value )
>
> Here is the explain analyze of the query with bitmap scans allowed:
>
> wsdb=> explain analyze select * from test as t, crts.data as d1
> where d1.objid=t.objid and d1.mjd=t.mjd limit 1;
>   QUERY
> PLAN
> ---
>  Limit  (cost=11514.04..115493165.44 rows=1 width=68) (actual
> time=27.512..66620.231 rows=1 loops=1)
>->  Nested Loop  (cost=11514.04..1799585184.18 rows=155832 width=68)
> (actual time=27.511..66616.807 rows=1 loops=1)
>  ->  Seq Scan on test t  (cost=0.00..2678.40 rows=156240 width=28)
> (actual time=0.010..4.685 rows=11456 loops=1)
>  ->  Bitmap Heap Scan on data d1  (cost=11514.04..11518.05 rows=1
> width=40) (actual time=5.807..5.807 rows=1 loops=11456)
>Recheck Cond: ((mjd = t.mjd) AND (objid = t.objid))
>->  BitmapAnd  (cost=11514.04..11514.04 rows=1 width=0)
> (actual time=5.777..5.777 rows=0 loops=11456)
>  ->  Bitmap Index Scan on data_mjd_idx
> (cost=0.00..2501.40 rows=42872 width=0) (actual time=3.920..3.920 rows=22241
> loops=11456)
>Index Cond: (mjd = t.mjd)



>  ->  Bitmap Index Scan on data_objid_idx
> (cost=0.00..8897.90 rows=415080 width=0) (actual time=0.025..0.025 rows=248
> loops=11456)

statistics on data_objid_idx  table are absolutly out - so planner
cannot find optimal plan

Regard

Pavel Stehule

>Index Cond: (objid = t.objid)
>  Total runtime: 66622.026 ms
> (11 rows)
>
> Here is the output when bitmap scans are disabled:
> QUERY PLAN
>  QUERY
> PLAN
> 
>  Limit  (cost=0.00..329631941.65 rows=1 width=68) (actual
> time=0.082..906.876 rows=1 loops=1)
>->  Nested Loop  (cost=0.00..4979486036.95 rows=151062 width=68) (actual
> time=0.081..905.683 rows=1 loops=1)
>  Join Filter: (t.mjd = d1.mjd)
>  ->  Seq Scan on test t  (cost=0.00..2632.77 rows=151677 width=28)
> (actual time=0.009..1.679 rows=11456 loops=1)
>  ->  Index Scan using data_objid_idx on data d1
> (cost=0.00..26603.32 rows=415080 width=40) (actual time=0.010..0.050
> rows=248 loops=11456)
>Index Cond: (objid = t.objid)
>  Total runtime: 907.462 ms
>
> When the bitmap scans are enabled the "prof" of postgres shows
> 47.10%  postmaster  postgres   [.] hash_search_with_hash_value
> |
> --- hash_search_with_hash_value
>
> 11.06%  postmaster  postgres   [.] hash_seq_search
> |
> --- hash_seq_search
>
>  6.95%  postmaster  postgres   [.] hash_any
> |
> --- hash_any
>
>  5.17%  postmaster  postgres   [.] _bt_checkkeys
> |
> --- _bt_checkkeys
>
>  4.07%  postmaster  postgres   [.] tbm_add_tuples
> |
> --- tbm_add_tuples
>
>  3.41%  postmaster  postgres   [.] hash_search
> |
> --- hash_search
>
>
> And the last note is that the crts.data table which is being bitmap scanned
> is a 1.1Tb table with ~ 20e9 rows. My feeling is that the bitmap index scan
> code
> is somehow unprepared to combine two bitmaps for such a big table, and this
> leads to the terrible performance.
>
> Regards,
> Sergey
>
> PS Here are the schemas of the tables, just in case:
> wsdb=> \d test
>   Table "koposov.test"
>  Column  |   Type   | Modifiers
> -+--+---
>  mjd | double precision |
>  fieldid | bigint   |
>  intmag  | integer  |
>  objid   | bigint   |
>
> wsdb=> \d crts.data
>Table "crts.data"
>  Column |   Type   | Modifiers
> +--+---
>  objid  | bigint   |
>  mjd| double precision |
>  mag| real |
>  emag   | real |
>  ra | double precision |
>  dec| double precision |
> Indexes:
> "data_mjd_idx" btree (mjd) WITH (fillfactor=100)
> "data_objid_idx" btree (objid) WITH (fillfactor=100)
> "data_q3c_ang2ipix_idx" btree (q3c_ang2ipix(ra, "dec")) WITH
> (fillfactor=100)
>
> PPS shared_buffers=10GB, work_mem=1GB
> All the test shown here were don in fully cached regime.
>
> PPS I can believe that what I'm seeing is a feature, not a bug of bitmap
> scans,
>

44 matches

Mail list logo