Re: [HACKERS] bitmap scan much slower than index scan, hash_search_with_hash_value

2012-09-02 Thread Pavel Stehule
Hello

2012/9/2 Sergey Koposov kopo...@ast.cam.ac.uk:
 Hi,

 I'm experiencing the case when bitmap scan is ~ 70 times slower than index
 scan which seems to be caused by 1) very big table 2) some hash search logic
 (hash_search_with_hash_value )

 Here is the explain analyze of the query with bitmap scans allowed:

 wsdb= explain analyze select * from test as t, crts.data as d1
 where d1.objid=t.objid and d1.mjd=t.mjd limit 1;
   QUERY
 PLAN
 ---
  Limit  (cost=11514.04..115493165.44 rows=1 width=68) (actual
 time=27.512..66620.231 rows=1 loops=1)
-  Nested Loop  (cost=11514.04..1799585184.18 rows=155832 width=68)
 (actual time=27.511..66616.807 rows=1 loops=1)
  -  Seq Scan on test t  (cost=0.00..2678.40 rows=156240 width=28)
 (actual time=0.010..4.685 rows=11456 loops=1)
  -  Bitmap Heap Scan on data d1  (cost=11514.04..11518.05 rows=1
 width=40) (actual time=5.807..5.807 rows=1 loops=11456)
Recheck Cond: ((mjd = t.mjd) AND (objid = t.objid))
-  BitmapAnd  (cost=11514.04..11514.04 rows=1 width=0)
 (actual time=5.777..5.777 rows=0 loops=11456)
  -  Bitmap Index Scan on data_mjd_idx
 (cost=0.00..2501.40 rows=42872 width=0) (actual time=3.920..3.920 rows=22241
 loops=11456)
Index Cond: (mjd = t.mjd)



  -  Bitmap Index Scan on data_objid_idx
 (cost=0.00..8897.90 rows=415080 width=0) (actual time=0.025..0.025 rows=248
 loops=11456)

statistics on data_objid_idx  table are absolutly out - so planner
cannot find optimal plan

Regard

Pavel Stehule

Index Cond: (objid = t.objid)
  Total runtime: 66622.026 ms
 (11 rows)

 Here is the output when bitmap scans are disabled:
 QUERY PLAN
  QUERY
 PLAN
 
  Limit  (cost=0.00..329631941.65 rows=1 width=68) (actual
 time=0.082..906.876 rows=1 loops=1)
-  Nested Loop  (cost=0.00..4979486036.95 rows=151062 width=68) (actual
 time=0.081..905.683 rows=1 loops=1)
  Join Filter: (t.mjd = d1.mjd)
  -  Seq Scan on test t  (cost=0.00..2632.77 rows=151677 width=28)
 (actual time=0.009..1.679 rows=11456 loops=1)
  -  Index Scan using data_objid_idx on data d1
 (cost=0.00..26603.32 rows=415080 width=40) (actual time=0.010..0.050
 rows=248 loops=11456)
Index Cond: (objid = t.objid)
  Total runtime: 907.462 ms

 When the bitmap scans are enabled the prof of postgres shows
 47.10%  postmaster  postgres   [.] hash_search_with_hash_value
 |
 --- hash_search_with_hash_value

 11.06%  postmaster  postgres   [.] hash_seq_search
 |
 --- hash_seq_search

  6.95%  postmaster  postgres   [.] hash_any
 |
 --- hash_any

  5.17%  postmaster  postgres   [.] _bt_checkkeys
 |
 --- _bt_checkkeys

  4.07%  postmaster  postgres   [.] tbm_add_tuples
 |
 --- tbm_add_tuples

  3.41%  postmaster  postgres   [.] hash_search
 |
 --- hash_search


 And the last note is that the crts.data table which is being bitmap scanned
 is a 1.1Tb table with ~ 20e9 rows. My feeling is that the bitmap index scan
 code
 is somehow unprepared to combine two bitmaps for such a big table, and this
 leads to the terrible performance.

 Regards,
 Sergey

 PS Here are the schemas of the tables, just in case:
 wsdb= \d test
   Table koposov.test
  Column  |   Type   | Modifiers
 -+--+---
  mjd | double precision |
  fieldid | bigint   |
  intmag  | integer  |
  objid   | bigint   |

 wsdb= \d crts.data
Table crts.data
  Column |   Type   | Modifiers
 +--+---
  objid  | bigint   |
  mjd| double precision |
  mag| real |
  emag   | real |
  ra | double precision |
  dec| double precision |
 Indexes:
 data_mjd_idx btree (mjd) WITH (fillfactor=100)
 data_objid_idx btree (objid) WITH (fillfactor=100)
 data_q3c_ang2ipix_idx btree (q3c_ang2ipix(ra, dec)) WITH
 (fillfactor=100)

 PPS shared_buffers=10GB, work_mem=1GB
 All the test shown here were don in fully cached regime.

 PPS I can believe that what I'm seeing is a feature, not a bug of bitmap
 scans,
 and I can live with disabling them, but I still thought it's worth
 reporting.


 

Re: [HACKERS] bitmap scan much slower than index scan, hash_search_with_hash_value

2012-09-02 Thread Peter Geoghegan
On 2 September 2012 06:21, Sergey Koposov kopo...@ast.cam.ac.uk wrote:
 Hi,

 I'm experiencing the case when bitmap scan is ~ 70 times slower than index
 scan which seems to be caused by 1) very big table 2) some hash search logic
 (hash_search_with_hash_value )

 Here is the explain analyze of the query with bitmap scans allowed:

 wsdb= explain analyze select * from test as t, crts.data as d1
 where d1.objid=t.objid and d1.mjd=t.mjd limit 1;
   QUERY
 PLAN
 ---
  Limit  (cost=11514.04..115493165.44 rows=1 width=68) (actual
 time=27.512..66620.231 rows=1 loops=1)
-  Nested Loop  (cost=11514.04..1799585184.18 rows=155832 width=68)
 (actual time=27.511..66616.807 rows=1 loops=1)
  -  Seq Scan on test t  (cost=0.00..2678.40 rows=156240 width=28)
 (actual time=0.010..4.685 rows=11456 loops=1)
  -  Bitmap Heap Scan on data d1  (cost=11514.04..11518.05 rows=1
 width=40) (actual time=5.807..5.807 rows=1 loops=11456)
Recheck Cond: ((mjd = t.mjd) AND (objid = t.objid))
-  BitmapAnd  (cost=11514.04..11514.04 rows=1 width=0)
 (actual time=5.777..5.777 rows=0 loops=11456)
  -  Bitmap Index Scan on data_mjd_idx
 (cost=0.00..2501.40 rows=42872 width=0) (actual time=3.920..3.920 rows=22241
 loops=11456)
Index Cond: (mjd = t.mjd)
  -  Bitmap Index Scan on data_objid_idx
 (cost=0.00..8897.90 rows=415080 width=0) (actual time=0.025..0.025 rows=248
 loops=11456)
Index Cond: (objid = t.objid)
  Total runtime: 66622.026 ms
 (11 rows)

 Here is the output when bitmap scans are disabled:
 QUERY PLAN
  QUERY
 PLAN
 
  Limit  (cost=0.00..329631941.65 rows=1 width=68) (actual
 time=0.082..906.876 rows=1 loops=1)
-  Nested Loop  (cost=0.00..4979486036.95 rows=151062 width=68) (actual
 time=0.081..905.683 rows=1 loops=1)
  Join Filter: (t.mjd = d1.mjd)
  -  Seq Scan on test t  (cost=0.00..2632.77 rows=151677 width=28)
 (actual time=0.009..1.679 rows=11456 loops=1)
  -  Index Scan using data_objid_idx on data d1
 (cost=0.00..26603.32 rows=415080 width=40) (actual time=0.010..0.050
 rows=248 loops=11456)
Index Cond: (objid = t.objid)
  Total runtime: 907.462 ms

 When the bitmap scans are enabled the prof of postgres shows
 47.10%  postmaster  postgres   [.] hash_search_with_hash_value
 |
 --- hash_search_with_hash_value

 11.06%  postmaster  postgres   [.] hash_seq_search
 |
 --- hash_seq_search

  6.95%  postmaster  postgres   [.] hash_any
 |
 --- hash_any

  5.17%  postmaster  postgres   [.] _bt_checkkeys
 |
 --- _bt_checkkeys

  4.07%  postmaster  postgres   [.] tbm_add_tuples
 |
 --- tbm_add_tuples

  3.41%  postmaster  postgres   [.] hash_search
 |
 --- hash_search


 And the last note is that the crts.data table which is being bitmap scanned
 is a 1.1Tb table with ~ 20e9 rows. My feeling is that the bitmap index scan
 code
 is somehow unprepared to combine two bitmaps for such a big table, and this
 leads to the terrible performance.

I think that this kind of question is better suited to the
pgsql-performance list. Granted, it was presented as a bug report
(though they're generally sent to -bugs rather than -hackers), but I
don't think that this is a valid bug.

One obvious red-flag from your query plans is that there is a
misestimation of the row return count of a few orders of magnitude in
the Bitmap Index Scan node. Did you trying performing an ANALYZE to
see if that helped? It may also be helpful to show pg_stats entries
for both the data.mjd and test.mjd columns. You may find, prior to
doing an ANALYZE, that there is no entries for one of those tables.

Turning off the enable_* planner options in production is generally
discouraged. Certainly, you'd be crazy to do that on a server-wide
basis.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Yet another failure mode in pg_upgrade

2012-09-02 Thread Bruce Momjian
On Sun, Sep  2, 2012 at 01:06:28AM -0400, Robert Haas wrote:
  For live check operation, you are checking a running server, so
  assuming the socket is in the current directory is not going to work.
  What the code does is to read the 5th line from the running server's
  postmaster.pid file, which has the socket directory in PG = 9.1.  For
  pre-9.1, pg_upgrade uses the compiled-in defaults for socket directory.
  If the defaults are different between the two servers, the new binaries,
  e.g. pg_dump, will not work.  The fix is for the user to set pg_upgrade
  -O to match the old socket directory, and set PGHOST before running
  pg_upgrade.  I could not find a good way to generate a proper error
  message because we are blind to the socket directory in pre-9.1.
  Frankly, this is a problem if the old pre-9.1 server is running in a
  user-configured socket directory too, so a documentation addition seems
  right here.
 
  So, in summary, this patch moves the socket directory to the current
  directory all but live check operation, and handles different socket
  directories for old cluster = 9.1.  I have added a documentation
  mention of how to make this work for for pre-9.1 old servers.
 
 I don't think this is reducing the number of failure modes; it's just
 changing it from one set of obscure cases to a slightly different set
 of obscure cases.

Tom reported problems with having old/new with different default socket
locations.  This fixes that, and reduces the possibility of acciental
connections.  What problems does this add?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Yet another failure mode in pg_upgrade

2012-09-02 Thread Bruce Momjian
On Sat, Sep  1, 2012 at 02:35:06PM -0400, Bruce Momjian wrote:
 On Sat, Sep  1, 2012 at 02:23:22PM -0400, Tom Lane wrote:
  Bruce Momjian br...@momjian.us writes:
   + /*
   +  *  Report the Unix domain socket directory location to the 
   postmaster.
   +  */
  
  Report seems entirely the wrong verb there.

Fixed.

   + #define LISTEN_STR   -c listen_addresses=''
   + 
   + /* Have a sockdir to use? */
   + if (strlen(os_info.sockdir) != 0)
   + snprintf(socket_string, sizeof(socket_string) - 
   strlen(LISTEN_STR),
   +  -c %s='%s',
   + (GET_MAJOR_VERSION(cluster-major_version)  
   903) ?
   + unix_socket_directory : 
   unix_socket_directories,
   + os_info.sockdir);
   + 
   + /* prevent TCP/IP connections */
   + strcat(socket_string, LISTEN_STR);
  
  IMO this would be simpler and more readable if you got rid of the LISTEN_STR
  #define and just included -c listen_addresses='' in the snprintf format
  string.  The comment for the whole thing should be something like
  If we have a socket directory to use, command the postmaster to use it,
  and disable TCP/IP connections altogether.
 
 Well, you only want the unix_socket* if sockdir is defined, but you want
 LISTEN_STR unconditionally, even if there is no sockdir.  Not sure how
 that could cleanly be in a single snprintf.

I restructured the code to add the listen_addresses string first,
allowing the removal of the #define, as Tom suggested.  I also added
unix_socket_permissions=0700 to further restrict socket access.

Updated patch attached.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/option.c b/contrib/pg_upgrade/option.c
new file mode 100644
index 94bce50..8b35d8f
*** a/contrib/pg_upgrade/option.c
--- b/contrib/pg_upgrade/option.c
*** adjust_data_dir(ClusterInfo *cluster)
*** 376,378 
--- 376,431 
  
  	check_ok();
  }
+ 
+ 
+ #if HAVE_UNIX_SOCKETS
+ /*
+  * set_sockdir_and_pghost
+  *
+  * Set the socket directory to either the configured location (live check)
+  * or the current directory.
+  */
+ void
+ set_sockdir_and_pghost(bool live_check)
+ {
+ 	if (!live_check)
+ 	{
+ 		/* Use the current directory for the socket */
+ 		if (!getcwd(os_info.sockdir, MAXPGPATH))
+ 			pg_log(PG_FATAL, cannot find current directory\n);
+ 	}
+ 	else
+ 	{
+ 		/*
+ 		 *	If we are doing a live check, we will use the old cluster's Unix
+ 		 *	domain socket directory so we can connect to the live server.
+ 		 */
+ 
+ 		/* sockdir added to the 5th line of postmaster.pid in PG 9.1 */
+ 		if (GET_MAJOR_VERSION(old_cluster.major_version) = 901)
+ 		{
+ 			char		filename[MAXPGPATH];
+ 			FILE		*fp;
+ 			int			i;
+ 
+ 			snprintf(filename, sizeof(filename), %s/postmaster.pid, old_cluster.pgdata);
+ 			if ((fp = fopen(filename, r)) == NULL)
+ pg_log(PG_FATAL, Could not get socket directory of the old server\n);
+ 
+ 			for (i = 0; i  5; i++)
+ if (fgets(os_info.sockdir, sizeof(os_info.sockdir), fp) == NULL)
+ 	pg_log(PG_FATAL, Could not get socket directory of the old server\n);
+ 			fclose(fp);
+ 		}
+ 		else
+ 			return;		/* Can't get live sockdir, so use the default. */
+ 
+ 		/* Remove trailing newline */
+ 		if (strchr(os_info.sockdir, '\n') != NULL)
+ 			*strchr(os_info.sockdir, '\n') = '\0';
+ 	}
+ 
+ 	/* For client communication */
+ 	pg_putenv(PGHOST, os_info.sockdir);
+ }
+ #endif
diff --git a/contrib/pg_upgrade/pg_upgrade.c b/contrib/pg_upgrade/pg_upgrade.c
new file mode 100644
index c47c8bb..8cad5c3
*** a/contrib/pg_upgrade/pg_upgrade.c
--- b/contrib/pg_upgrade/pg_upgrade.c
*** main(int argc, char **argv)
*** 88,93 
--- 88,97 
  	check_cluster_versions();
  	check_cluster_compatibility(live_check);
  
+ #if HAVE_UNIX_SOCKETS
+ 	set_sockdir_and_pghost(live_check);
+ #endif
+ 
  	check_old_cluster(live_check, sequence_script_file_name);
  
  
diff --git a/contrib/pg_upgrade/pg_upgrade.h b/contrib/pg_upgrade/pg_upgrade.h
new file mode 100644
index fa4c6c0..a712e21
*** a/contrib/pg_upgrade/pg_upgrade.h
--- b/contrib/pg_upgrade/pg_upgrade.h
*** typedef struct
*** 267,272 
--- 267,275 
  	const char *progname;		/* complete pathname for this program */
  	char	   *exec_path;		/* full path to my executable */
  	char	   *user;			/* username for clusters */
+ #if HAVE_UNIX_SOCKETS
+ 	char		sockdir[NAMEDATALEN];	/* directory for Unix Domain socket */
+ #endif
  	char	  **tablespaces;	/* tablespaces */
  	int			num_tablespaces;
  	char	  **libraries;		/* loadable libraries */
*** void print_maps(FileNameMap *maps, int n
*** 387,392 
--- 390,398 
  
  void		parseCommandLine(int argc, char *argv[]);
  void		

Re: [HACKERS] bitmap scan much slower than index scan, hash_search_with_hash_value

2012-09-02 Thread Sergey Koposov

On Sun, 2 Sep 2012, Pavel Stehule wrote:



statistics on data_objid_idx  table are absolutly out - so planner
cannot find optimal plan


That doesn't have anything to do with the problem, AFAIU.
First, the data table is static and was analysed.
Second, the query in question is the join, and afaik the estimation of the 
number of rows is known to be incorrect, in the case of column 
correlation.
Third, according at least to my understanding in the fully cached regime 
bitmap scan should not take two orders of magnitude more CPU time than 
index scan.


Sergey


Regard

Pavel Stehule


   Index Cond: (objid = t.objid)
 Total runtime: 66622.026 ms
(11 rows)

Here is the output when bitmap scans are disabled:
QUERY PLAN
 QUERY
PLAN

 Limit  (cost=0.00..329631941.65 rows=1 width=68) (actual
time=0.082..906.876 rows=1 loops=1)
   -  Nested Loop  (cost=0.00..4979486036.95 rows=151062 width=68) (actual
time=0.081..905.683 rows=1 loops=1)
 Join Filter: (t.mjd = d1.mjd)
 -  Seq Scan on test t  (cost=0.00..2632.77 rows=151677 width=28)
(actual time=0.009..1.679 rows=11456 loops=1)
 -  Index Scan using data_objid_idx on data d1
(cost=0.00..26603.32 rows=415080 width=40) (actual time=0.010..0.050
rows=248 loops=11456)
   Index Cond: (objid = t.objid)
 Total runtime: 907.462 ms

When the bitmap scans are enabled the prof of postgres shows
47.10%  postmaster  postgres   [.] hash_search_with_hash_value
|
--- hash_search_with_hash_value

11.06%  postmaster  postgres   [.] hash_seq_search
|
--- hash_seq_search

 6.95%  postmaster  postgres   [.] hash_any
|
--- hash_any

 5.17%  postmaster  postgres   [.] _bt_checkkeys
|
--- _bt_checkkeys

 4.07%  postmaster  postgres   [.] tbm_add_tuples
|
--- tbm_add_tuples

 3.41%  postmaster  postgres   [.] hash_search
|
--- hash_search


And the last note is that the crts.data table which is being bitmap scanned
is a 1.1Tb table with ~ 20e9 rows. My feeling is that the bitmap index scan
code
is somehow unprepared to combine two bitmaps for such a big table, and this
leads to the terrible performance.

Regards,
Sergey

PS Here are the schemas of the tables, just in case:
wsdb= \d test
  Table koposov.test
 Column  |   Type   | Modifiers
-+--+---
 mjd | double precision |
 fieldid | bigint   |
 intmag  | integer  |
 objid   | bigint   |

wsdb= \d crts.data
   Table crts.data
 Column |   Type   | Modifiers
+--+---
 objid  | bigint   |
 mjd| double precision |
 mag| real |
 emag   | real |
 ra | double precision |
 dec| double precision |
Indexes:
data_mjd_idx btree (mjd) WITH (fillfactor=100)
data_objid_idx btree (objid) WITH (fillfactor=100)
data_q3c_ang2ipix_idx btree (q3c_ang2ipix(ra, dec)) WITH
(fillfactor=100)

PPS shared_buffers=10GB, work_mem=1GB
All the test shown here were don in fully cached regime.

PPS I can believe that what I'm seeing is a feature, not a bug of bitmap
scans,
and I can live with disabling them, but I still thought it's worth
reporting.


*
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers




*
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Multiple Slave Failover with PITR

2012-09-02 Thread Bruce Momjian

Do we ever want to document a way to connect slaves to a new master,
rather than recreating the slave?

---

On Tue, Mar 27, 2012 at 10:47:48AM -0700, Ken Brush wrote:
 Hello everyone,
 
 I notice that the documentation at:
 http://wiki.postgresql.org/wiki/Binary_Replication_Tutorial
 
 Doesn't contain steps in a Multiple Slave setup for re-establishing
 them after a slave has become the new master.
 
 Based on the documentation, here are the most fail-proof steps I came up with:
 
 1. Master dies :(
 2. Touch the trigger file on the most caught up slave.
 3. Slave is now the new master :)
 4. use pg_basebackup or other binary replication trick (rsync, tar
 over ssh, etc...) to bring the other slaves up to speed with the new
 master.
 5. start the other slaves pointing to the new master.
 
 But, that can take time (about 1-2 hours) with my medium sized DB
 (580GB currently).
 
 After testing a few different ideas that I gleaned from posts on the
 mail list, I came up with this alternative method:
 
 1. Master dies :(
 2. Touch the trigger file on the most caught up slave
 3. Slave is now the new master.
 4. On the other slaves do the following:
 5. Shutdown postgres on the slave
 6. Delete every file in /data/pgsql/data/pg_xlog
 7. Modify the recovery.conf file to point to the new master and
 include the line recovery_target_timeline='latest'
 8. Copy the history file from the new master to the slave (it's the
 most recent #.history file in the xlog directory)
 9. Startup postgres on the slave and watch it sync up to the new
 master (about 1-5 minutes usually)
 
 My question is this. Is the alternative method adequate? I tested it a
 bit and couldn't find any problems with data loss or inconsistency.
 
 I still use the fail-proof method above to re-incorporate the old
 master as a new slave.
 
 Sincerely,
 -Ken
 
 -- 
 Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bitmap scan much slower than index scan, hash_search_with_hash_value

2012-09-02 Thread Sergey Koposov
Thanks for your comments. 
On Sun, 2 Sep 2012, Peter Geoghegan wrote:

On 2 September 2012 06:21, Sergey Koposov kopo...@ast.cam.ac.uk wrote:

I think that this kind of question is better suited to the
pgsql-performance list. Granted, it was presented as a bug report
(though they're generally sent to -bugs rather than -hackers), but I
don't think that this is a valid bug.


The reason is that was inclined to think that it is a bug is that I
encountered a similar bug before with bitmap scans and very big 
tables

http://archives.postgresql.org/pgsql-hackers/2011-08/msg00958.php
Furthermore 2 orders of magnitudes more of CPU time for bitmap scans 
comparing to  the index scan didn't sound right to me (although obviously

I'm not in the position to claim that it's 100% bug).


One obvious red-flag from your query plans is that there is a
misestimation of the row return count of a few orders of magnitude in
the Bitmap Index Scan node. Did you trying performing an ANALYZE to
see if that helped? It may also be helpful to show pg_stats entries
for both the data.mjd and test.mjd columns. You may find, prior to
doing an ANALYZE, that there is no entries for one of those tables.


The main large table is static and was analyzed. The test table was as 
well. But as mentioned in another recent email, the query is the join, so 
column correlation is a problem.



Turning off the enable_* planner options in production is generally
discouraged. Certainly, you'd be crazy to do that on a server-wide
basis.


I'm using PG for data mining, data analysis purposes with very few clients 
connected and very large tables, so enable_* is used quite often to fix 
incorrect plans due to incorrect selectivities.


Regards,
Sergey

*
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bitmap scan much slower than index scan, hash_search_with_hash_value

2012-09-02 Thread Tom Lane
Sergey Koposov kopo...@ast.cam.ac.uk writes:
 On Sun, 2 Sep 2012, Peter Geoghegan wrote:
 One obvious red-flag from your query plans is that there is a
 misestimation of the row return count of a few orders of magnitude in
 the Bitmap Index Scan node. Did you trying performing an ANALYZE to
 see if that helped? It may also be helpful to show pg_stats entries
 for both the data.mjd and test.mjd columns. You may find, prior to
 doing an ANALYZE, that there is no entries for one of those tables.

 The main large table is static and was analyzed. The test table was as 
 well. But as mentioned in another recent email, the query is the join, so 
 column correlation is a problem.

The problem is definitely the misestimation here:

  -  Index Scan using data_objid_idx on data d1  (cost=0.00..26603.32 
rows=415080 width=40) (actual time=0.010..0.050 rows=248 loops=11456)
Index Cond: (objid = t.objid)

The planner thinks that indexscan will be 2000 times more expensive than
it really is (assuming that the cost per retrieved row is linear, which
it isn't entirely, but close enough for the moment).  Of course, it's
also thinking the bitmap component scan on the same index will be 2000
times more expensive than reality, but that has only perhaps a 4X impact
on the total cost of the bitmap scan, since the use of the other index
is what dominates there.  With a more accurate idea of this join
condition's selectivity, I'm pretty certain it would have gone for a
plain indexscan, or else a bitmap scan using only this index.

So if there's a bug here, it's in eqjoinsel().  Can you pinpoint
anything unusual about the stats of the objid columns?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: PATCH: psql boolean display

2012-09-02 Thread Phil Sorber
On Sun, Sep 2, 2012 at 1:13 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 -- Forwarded message --
 From: Pavel Stehule pavel.steh...@gmail.com
 Date: 2012/9/1
 Subject: PATCH: psql boolean display
 To: Phil Sorber p...@omniti.com


 Hello

 I am looking to your patch:

 I have only one note. I don't think so using any text for values
 true and false is good idea. I don't see a sense of any special
 texts like foo, bar from your example.

 More strict design is better - I wrote simple modification - it is
 based on psql psets - and it add new configuration settings boolstyle
 [char|word]. A advantage of this design is consistency  and possible
 autocomplete support.

 Regards

 Pavel



 postgres= select true, false;
  bool │ bool
 ──┼──
  t│ f
 (1 row)

 postgres= \pset boolstyle word
 Bool output style is word.
 postgres= select true, false;
  bool │ bool
 ──┼───
  true │ false
 (1 row)


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


What you are proposing sounds like it would be better suited to a
server side GUC. Based on the discussion in the thread that said
true/false was the SQL standard and we were doing it incorrectly as
t/f but could not change for legacy reasons. We could even change the
default but give users a way to revert to the old behavior. Similar to
the bytea_output GUC. Maybe add the GUC for 9.3 but not change the
default behavior until 10.0.

What my patch was intended to do was let the end user set boolean
output to any arbitrary values. While foo/bar is pretty useless, it
was meant to reinforce that it was capable of any arbitrary value. I
can think of a decent list of other output an end user might want,
such as:

true/false
yes/no
y/n
on/off
1/0
enabled/disabled

Plus the different capitalized forms.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: PATCH: psql boolean display

2012-09-02 Thread Tom Lane
Phil Sorber p...@omniti.com writes:
 What my patch was intended to do was let the end user set boolean
 output to any arbitrary values. While foo/bar is pretty useless, it
 was meant to reinforce that it was capable of any arbitrary value. I
 can think of a decent list of other output an end user might want,
 such as:

 true/false
 yes/no
 y/n
 on/off
 1/0
 enabled/disabled

 Plus the different capitalized forms.

I can readily see that people might want boolean columns displayed in
such ways in custom applications.  I'm less convinced that there is much
use for it in psql, though.  In the big scheme of things, psql is a
rather low-level tool, designed for DBAs and SQL programmers.  I'd get
quite upset if psql failed to tell me the truth about what was in a
table I was looking at --- and a feature like this comes pretty close
to not telling the truth, especially if it kicks in on a column I
wasn't expecting it to.

On the whole I think this sort of substitution belongs in a
user-written-application layer of software, not in any of the tools
we supply.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bitmap scan much slower than index scan, hash_search_with_hash_value

2012-09-02 Thread Sergey Koposov

On Sun, 2 Sep 2012, Tom Lane wrote:


Sergey Koposov kopo...@ast.cam.ac.uk writes:
The problem is definitely the misestimation here:

 -  Index Scan using data_objid_idx on data d1  (cost=0.00..26603.32 
rows=415080 width=40) (actual time=0.010..0.050 rows=248 loops=11456)
   Index Cond: (objid = t.objid)

The planner thinks that indexscan will be 2000 times more expensive than
it really is (assuming that the cost per retrieved row is linear, which
it isn't entirely, but close enough for the moment).  Of course, it's
also thinking the bitmap component scan on the same index will be 2000
times more expensive than reality, but that has only perhaps a 4X impact
on the total cost of the bitmap scan, since the use of the other index
is what dominates there.  With a more accurate idea of this join
condition's selectivity, I'm pretty certain it would have gone for a
plain indexscan, or else a bitmap scan using only this index.
So if there's a bug here, it's in eqjoinsel().  Can you pinpoint
anything unusual about the stats of the objid columns?


Here are the pg_stats rows for two tables.

 schemaname | tablename | attname | inherited | null_frac | avg_width | 
n_distinct |







 
most_common_vals

   !






   |





most_common_freq!
s !

  | !


   histogram_bounds 









Re: [HACKERS] bitmap scan much slower than index scan, hash_search_with_hash_value

2012-09-02 Thread Peter Geoghegan
On 2 September 2012 16:26, Sergey Koposov kopo...@ast.cam.ac.uk wrote:
 After looking at them, I think I understand the reason -- the
 number of n_distinct for crts.data is terribly wrong. In reality it should
 be ~ 130 millions. I already faced this problem at certain point when doing
 group by objid and PG was excausting all the memory, because of that wrong
 estimate. But I know that it's a known hard problem to estimate n_distinct.

 So probably that's the main reason of a problem...

That's why we support altering that value with an ALTER TABLE...ALTER
COLUMN DDL statement. You might at least consider increasing the
statistics target for the column first though, to see if that will
make the n_distinct value better accord with reality.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: PATCH: psql boolean display

2012-09-02 Thread Tom Lane
I wrote:
 Phil Sorber p...@omniti.com writes:
 What my patch was intended to do was let the end user set boolean
 output to any arbitrary values. While foo/bar is pretty useless, it
 was meant to reinforce that it was capable of any arbitrary value. I
 can think of a decent list of other output an end user might want,
 such as:

 true/false
 yes/no
 y/n
 on/off
 1/0
 enabled/disabled

 Plus the different capitalized forms.

 I can readily see that people might want boolean columns displayed in
 such ways in custom applications.  I'm less convinced that there is much
 use for it in psql, though.

BTW, another point that your list brings to mind is that somebody who
wants something like this would very possibly want different displays
for different columns.  The proposed feature, being one-size-fits-all
for boolean, couldn't handle that.

What would make a lot more sense in my mind would be to label columns
*in the database* to show how they ought to be displayed.

One conceivable method is to make a collection of domains over bool,
and drive the display off the particular domain used.  However we lack
some infrastructure that would be needed for this (in particular, I'm
pretty sure the PQftype data delivered to the client identifies the
underlying type and not the domain).

Another approach is to make a collection of enum types, in which case
you don't need any client-side support at all.  I experimented with
this method a bit, and it seems workable:

regression=# create type mybool as enum ('no', 'yes');
CREATE TYPE
regression=# create function bool(mybool) returns bool as
$$ select $1 = 'yes'::mybool $$ language sql immutable;
CREATE FUNCTION
regression=# create cast (mybool as bool) with function bool(mybool) as 
assignment;
CREATE CAST
regression=# create table mb(f1 mybool);
CREATE TABLE
regression=# insert into mb values('no'),('yes');
INSERT 0 2
regression=# select * from mb where f1;
 f1  
-
 yes
(1 row)

regression=# select * from mb where f1 = 'yes';
 f1  
-
 yes
(1 row)

I tried making the cast be implicit, but that caused problems with
ambiguous operators, so assignment seems to be the best you can do here.

A variant of this is to build casts in the other direction
(bool::mybool), declare columns in the database as regular bool,
and apply the casts in queries when you want columns displayed in a
particular way.  This might be the best solution if the desired
display is at all context-dependent.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bitmap scan much slower than index scan, hash_search_with_hash_value

2012-09-02 Thread Sergey Koposov

On Sun, 2 Sep 2012, Peter Geoghegan wrote:


On 2 September 2012 16:26, Sergey Koposov kopo...@ast.cam.ac.uk wrote:

That's why we support altering that value with an ALTER TABLE...ALTER
COLUMN DDL statement. You might at least consider increasing the
statistics target for the column first though, to see if that will
make the n_distinct value better accord with reality.


Thanks, I've forgot about that possibility.

After fixing the n_distinct to the correct value the index scan became the 
preferred option. and the row number estimate for the index scan became 
more realistic


Sorry for the noise.
Sergey

*
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: PATCH: psql boolean display

2012-09-02 Thread Pavel Stehule
2012/9/2 Phil Sorber p...@omniti.com:
 On Sun, Sep 2, 2012 at 1:13 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 -- Forwarded message --
 From: Pavel Stehule pavel.steh...@gmail.com
 Date: 2012/9/1
 Subject: PATCH: psql boolean display
 To: Phil Sorber p...@omniti.com


 Hello

 I am looking to your patch:

 I have only one note. I don't think so using any text for values
 true and false is good idea. I don't see a sense of any special
 texts like foo, bar from your example.

 More strict design is better - I wrote simple modification - it is
 based on psql psets - and it add new configuration settings boolstyle
 [char|word]. A advantage of this design is consistency  and possible
 autocomplete support.

 Regards

 Pavel



 postgres= select true, false;
  bool │ bool
 ──┼──
  t│ f
 (1 row)

 postgres= \pset boolstyle word
 Bool output style is word.
 postgres= select true, false;
  bool │ bool
 ──┼───
  true │ false
 (1 row)


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


 What you are proposing sounds like it would be better suited to a
 server side GUC. Based on the discussion in the thread that said
 true/false was the SQL standard and we were doing it incorrectly as
 t/f but could not change for legacy reasons. We could even change the
 default but give users a way to revert to the old behavior. Similar to
 the bytea_output GUC. Maybe add the GUC for 9.3 but not change the
 default behavior until 10.0.

 What my patch was intended to do was let the end user set boolean
 output to any arbitrary values. While foo/bar is pretty useless, it
 was meant to reinforce that it was capable of any arbitrary value. I
 can think of a decent list of other output an end user might want,
 such as:

 true/false
 yes/no
 y/n
 on/off
 1/0
 enabled/disabled

 Plus the different capitalized forms.

If you have these different requests, then you can use enums - or you
can use own formatting function. There is relative strong
recommendation don't use implicit formatting based on database
configuration from application and inside application use explicit
formatting anywhere. I don't thing so using GUC for boolean datatype
is good idea.

Using just chars 't' and 'f' is unlucky design, that must be respected
due compatibility reasons. You don't need to solve it usually, because
transformation from chars to words can do application or database
driver - so I understand this as client issue - psql issue in this
case. And I really don't see any sense for unlimited bool output - in
simple tool like psql. It can be nice to fix issue with chars, because
chars are not too pronounced, but we don't need to supply enums.

Regards

Pavel


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: PATCH: psql boolean display

2012-09-02 Thread Pavel Stehule
2012/9/2 Tom Lane t...@sss.pgh.pa.us:
 I wrote:
 Phil Sorber p...@omniti.com writes:
 What my patch was intended to do was let the end user set boolean
 output to any arbitrary values. While foo/bar is pretty useless, it
 was meant to reinforce that it was capable of any arbitrary value. I
 can think of a decent list of other output an end user might want,
 such as:

 true/false
 yes/no
 y/n
 on/off
 1/0
 enabled/disabled

 Plus the different capitalized forms.

 I can readily see that people might want boolean columns displayed in
 such ways in custom applications.  I'm less convinced that there is much
 use for it in psql, though.

 BTW, another point that your list brings to mind is that somebody who
 wants something like this would very possibly want different displays
 for different columns.  The proposed feature, being one-size-fits-all
 for boolean, couldn't handle that.


I proposed just more cleaner and more conventional boolean output in
psql - nothing more. We can write formatting functions, CASE, we can
use enums.


 What would make a lot more sense in my mind would be to label columns
 *in the database* to show how they ought to be displayed.

 One conceivable method is to make a collection of domains over bool,
 and drive the display off the particular domain used.  However we lack
 some infrastructure that would be needed for this (in particular, I'm
 pretty sure the PQftype data delivered to the client identifies the
 underlying type and not the domain).

 Another approach is to make a collection of enum types, in which case
 you don't need any client-side support at all.  I experimented with
 this method a bit, and it seems workable:

 regression=# create type mybool as enum ('no', 'yes');
 CREATE TYPE
 regression=# create function bool(mybool) returns bool as
 $$ select $1 = 'yes'::mybool $$ language sql immutable;
 CREATE FUNCTION
 regression=# create cast (mybool as bool) with function bool(mybool) as 
 assignment;
 CREATE CAST
 regression=# create table mb(f1 mybool);
 CREATE TABLE
 regression=# insert into mb values('no'),('yes');
 INSERT 0 2
 regression=# select * from mb where f1;
  f1
 -
  yes
 (1 row)

 regression=# select * from mb where f1 = 'yes';
  f1
 -
  yes
 (1 row)

 I tried making the cast be implicit, but that caused problems with
 ambiguous operators, so assignment seems to be the best you can do here.

 A variant of this is to build casts in the other direction
 (bool::mybool), declare columns in the database as regular bool,
 and apply the casts in queries when you want columns displayed in a
 particular way.  This might be the best solution if the desired
 display is at all context-dependent.

When I worked on PSM I required possibility to simple specification
expected datatype out of SQL statement - some like enhancing
parametrised queries - with fourth parameter - expected types.

Then somebody can set expected type for some column simply - out of
query - and transformation can be fast. It should be used for
unsupported date formats and similar tasks.

Regards

Pavel


 regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Yet another failure mode in pg_upgrade

2012-09-02 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 On Sun, Sep  2, 2012 at 01:06:28AM -0400, Robert Haas wrote:
 I don't think this is reducing the number of failure modes; it's just
 changing it from one set of obscure cases to a slightly different set
 of obscure cases.

 Tom reported problems with having old/new with different default socket
 locations.  This fixes that, and reduces the possibility of acciental
 connections.  What problems does this add?

I'm going to be needing some fix in this area in any case, though
whether it's exactly Bruce's current patch is not clear to me.  I found
out last night while making a test build of 9.2rc1 as a Fedora package
that pg_upgrade's regression test fails in the Fedora build environment,
if the postmaster has been patched so that its default socket directory
is /var/run/postgresql.  That happens because /var/run/postgresql
doesn't exist in the build environment (it is only going to exist once
the postgresql-server package is installed), so the postmaster fails
to start because it can't create a socket where it expects to.
I have a patch to pg_regress that instructs the temporary postmaster
to use /tmp as unix_socket_directory regardless of its built-in default,
so that make check works for the regular core and contrib regression
tests.  However, that doesn't affect pg_upgrade's regression test case.

It looks rather messy to persuade pg_upgrade to do things differently
for regression testing and normal use, not to mention that it would make
the test even less representative of normal use.  So I'm thinking that
we do need the pg_upgrade feature Bruce is suggesting of forcing the
socket directory to be the current directory.  What's more, if that's
not back-patched to 9.2, I'm going to have to carry it as a Fedora patch
anyway.

Alternatively, I can prevent make check from testing pg_upgrade
(which is what I did so I could carry on with package testing).
I'd just as soon not ship it like that, though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [bugfix] sepgsql missed a case of CREATE TABLE AS

2012-09-02 Thread Kohei KaiGai
The attached patch fixes a bug in sepgsql module.
Could you apply both v9.2 and master branch?

When post_create hook is invoked, sepgsql's handler checks
whether the current invocation context needs to have permission
checks according to the command-tag saved on ProcessUtility_hook.

But it overlooked T_CreateTableAs tag, thus, neither of security
label nor permission checks were applied, as if the routine did
on toast relation.
This patch adds this command tag as a context to check permissions.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


sepgsql-fixbug-create-table-as.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [bugfix] sepgsql didn't follow the latest core API changes

2012-09-02 Thread Kohei KaiGai
This patch fixes a few portions on which sepgsql didn't follow the latest
core API changes.

1) Even though the prototype of ProcessUtility_hook was recently changed,
sepgsql side didn't follow this update, so it made build failed.

2) sepgsql internally uses GETSTRUCT() and HeapTupleGetOid() macro
these were moved to htup_details.h, so it needs an additional #include
for access/htup_defails.h.

3) sepgsql internally used a bool typed variable named abort.
I noticed it conflicts with ereport macro because it internally expanded to
ereport_domain that contains invocation of abort(). So, it renamed this
variables to abort_on_violation.

#define ereport_domain(elevel, domain, rest)\
(errstart(elevel, __FILE__, __LINE__, PG_FUNCNAME_MACRO, domain) ? \
 (errfinish rest) : (void) 0), \
((elevel) = ERROR ? abort() : (void) 0)

This does not affect to v9.2, so please apply it on the master branch.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


sepgsql-fixbug-follow-core-apis.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] GPU and Database

2012-09-02 Thread Gaetano Mendola

May be someone of you is interested in this

ADBIS workshop on GPUs in Databases

http://gid2012.cs.put.poznan.pl/


Regards
Gaetano Mendola


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Yet another failure mode in pg_upgrade

2012-09-02 Thread Bruce Momjian
On Sun, Sep  2, 2012 at 01:13:52PM -0400, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  On Sun, Sep  2, 2012 at 01:06:28AM -0400, Robert Haas wrote:
  I don't think this is reducing the number of failure modes; it's just
  changing it from one set of obscure cases to a slightly different set
  of obscure cases.
 
  Tom reported problems with having old/new with different default socket
  locations.  This fixes that, and reduces the possibility of acciental
  connections.  What problems does this add?
 
 I'm going to be needing some fix in this area in any case, though
 whether it's exactly Bruce's current patch is not clear to me.  I found
 out last night while making a test build of 9.2rc1 as a Fedora package
 that pg_upgrade's regression test fails in the Fedora build environment,
 if the postmaster has been patched so that its default socket directory
 is /var/run/postgresql.  That happens because /var/run/postgresql
 doesn't exist in the build environment (it is only going to exist once
 the postgresql-server package is installed), so the postmaster fails
 to start because it can't create a socket where it expects to.
 I have a patch to pg_regress that instructs the temporary postmaster
 to use /tmp as unix_socket_directory regardless of its built-in default,
 so that make check works for the regular core and contrib regression
 tests.  However, that doesn't affect pg_upgrade's regression test case.
 
 It looks rather messy to persuade pg_upgrade to do things differently
 for regression testing and normal use, not to mention that it would make
 the test even less representative of normal use.  So I'm thinking that
 we do need the pg_upgrade feature Bruce is suggesting of forcing the
 socket directory to be the current directory.  What's more, if that's
 not back-patched to 9.2, I'm going to have to carry it as a Fedora patch
 anyway.
 
 Alternatively, I can prevent make check from testing pg_upgrade
 (which is what I did so I could carry on with package testing).
 I'd just as soon not ship it like that, though.

Well, I don't know of any known problems with the patch.  On the other
hand, I don't know our policy in pushing patches into RC releases at the
request of packagers.

If you want to stand behind the patch, it might be OK.  I think that's
how we handle such requests --- someone has to put their neck out for
it.  Fortunately the patch is not very large so is easy to review.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proof of concept: auto updatable views

2012-09-02 Thread Dean Rasheed
On 31 August 2012 07:59, Dean Rasheed dean.a.rash...@gmail.com wrote:
 On 30 August 2012 20:05, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Aug 12, 2012 at 5:14 PM, Dean Rasheed dean.a.rash...@gmail.com 
 wrote:
 None of this new code kicks in for non-security barrier views, so the
 kinds of plans I posted upthread remain unchanged in that case. But
 now a significant fraction of the patch is code added to handle
 security barrier views. Of course we could simply say that such views
 aren't updatable, but that seems like an annoying limitation if there
 is a feasible way round it.

 Maybe it'd be a good idea to split this into two patches: the first
 could implement the feature but exclude security_barrier views, and
 the second could lift that restriction.


 Yes, I think that makes sense.
 I should hopefully get some time to look at it over the weekend.


Here's an updated patch for the base feature (without support for
security barrier views) with updated docs and regression tests.

Regards,
Dean


auto-update-views.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.3] Row-Level Security

2012-09-02 Thread Dean Rasheed
On 17 July 2012 05:02, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 2012/7/17 Robert Haas robertmh...@gmail.com:
 On Sun, Jul 15, 2012 at 5:52 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 The attached patch is a revised version of row-level security feature.
 ...
 According to the Robert's comment, I revised the place to inject
 applyRowLevelSecurity(). The reason why it needed to patch on
 adjust_appendrel_attrs_mutator() was, we handled expansion from
 regular relation to sub-query after expand_inherited_tables().
 In this revision, it was moved to the head of sub-query planner.


Hi,

I had a quick look at this and spotted a problem - certain types of
query are able to bypass the RLS quals. For example:

SELECT * FROM (SELECT * FROM foo) foo;

since the RLS policy doesn't descend into subqueries, and is applied
before they are pulled up into the main query. Similarly for views on
top of tables with RLS, and SRF functions that query a table with RLS
that get inlined.

Also queries using UNION ALL are vulnerable if they end up being
flattened, for example:

SELECT * FROM foo UNION ALL SELECT * FROM foo;


FWIW I recently developed some similar code as part of a patch to
implement automatically updatable views
(http://archives.postgresql.org/pgsql-hackers/2012-08/msg00303.php).
Some parts of that code may be useful, possibly for adding
UPDATE/DELETE support.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-09-02 Thread Jeff Janes
On Thu, Aug 30, 2012 at 8:17 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Aug 30, 2012 at 4:51 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Bruce Momjian br...@momjian.us writes:
 On Thu, May 31, 2012 at 09:20:43AM +0900, Tatsuo Ishii wrote:
 Ok, I modified the part of pg_dump where tremendous number of LOCK
 TABLE are issued. I replace them with single LOCK TABLE with multiple
 tables. With 100k tables LOCK statements took 13 minutes in total, now
 it only takes 3 seconds. Comments?

 Was this applied?

 No, we fixed the server side instead.

 But only for 9.2, right?  So people running back branches are still screwed.

 Yeah, but they're screwed anyway, because there are a bunch of O(N^2)
 behaviors involved here, not all of which are masked by what Tatsuo-san
 suggested.

All of the other ones that I know of were associated with pg_dump
itself, and since it is recommended to run the newer version of
pg_dump against the older version of the server, no back patching
would be necessary to get the benefits of those particular fixes.

 Six months or a year from now, we might have enough confidence in that
 batch of 9.2 fixes to back-port them en masse.  Don't want to do it
 today though.


What would be the recommendation for people trying to upgrade, but who
can't get their data out in a reasonable window?

Putting Tatsuo-san's change into a future pg_dump might be more
conservative than back-porting the server's Lock Table change to the
server version they are trying to get rid of.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Fwd: [PERFORM] Loose Index Scans by Planner?

2012-09-02 Thread Jeff Janes
I'd like to create a ToDo item for loose index scans or skip
scans, when the lead column has low cardinality and is not used in
the where clause.  This case can potentially be optimized by using the
index as if it were a collection of N partitioned indexes, where N
is the cardinality of the lead column.  Any objections?

I don't really have a detailed plan on how to do it.  I expect the
planner part would be harder than the execution part.

See [PERFORM] Loose Index Scans by Planner thread.

Thanks,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: pgbench - random sampling of transaction written into log

2012-09-02 Thread Tomas Vondra

Dne 31.08.2012 00:01, Tomas Vondra napsal:

On 30 Srpen 2012, 23:44, Robert Haas wrote:

On Thu, Aug 30, 2012 at 3:48 PM, Tomas Vondra t...@fuzzy.cz wrote:
That sounds like a pretty trivial patch. I've been thinking about 
yet

another option - histograms (regular or with exponential bins).


I thought about that, too, but I think high-outliers is a lot more
useful.  At least for the kinds of things I worry about.


OK, let's fix the current patches first. We can add more options 
later.


So, here is a fixed version of the patch. I've made these changes:

1) The option is now '--sampling-rate' instead of '-R' and accepts 
float arguments. I've decided not to use 0.01 = 1% but use 1 = 1%, so it 
accepts values between 0 and 100. I find this more natural.


2) I've removed one of the two new variables, the remaining one is used 
both to check whether the sampling is enabled and keep the value.


3) I've decided not to enable -l whenever the --sampling-rate is 
used. Again, I find this a bit less magic behavior.


4) I've fixed some formatting issues - if you notice another one that I 
don't see, let me know.


Tomasdiff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 00cab73..578aeb3 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -130,6 +130,11 @@ intforeign_keys = 0;
 intunlogged_tables = 0;
 
 /*
+ * use log sampling (rate = 1 = 100%, 0 = don't use)
+ */
+double use_sample_rate = 0.0;
+
+/*
  * tablespace selection
  */
 char  *tablespace = NULL;
@@ -364,6 +369,8 @@ usage(void)
 -f FILENAME  read transaction script from FILENAME\n
 -j NUM   number of threads (default: 1)\n
 -l   write transaction times to log file\n
+--sampling-rate NUM\n
+ sampling rate of the log (e.g. 0.01 for 1%% 
sample)\n
 -M simple|extended|prepared\n
  protocol for submitting queries to server 
(default: simple)\n
 -n   do not run VACUUM before tests\n
@@ -877,21 +884,26 @@ top:
instr_time  diff;
double  usec;
 
-   INSTR_TIME_SET_CURRENT(now);
-   diff = now;
-   INSTR_TIME_SUBTRACT(diff, st-txn_begin);
-   usec = (double) INSTR_TIME_GET_MICROSEC(diff);
+   /* either no sampling or is within the sample */
+   if ((use_sample_rate == 0.0) || 
(pg_erand48(thread-random_state) = use_sample_rate))
+   {
+
+   INSTR_TIME_SET_CURRENT(now);
+   diff = now;
+   INSTR_TIME_SUBTRACT(diff, st-txn_begin);
+   usec = (double) INSTR_TIME_GET_MICROSEC(diff);
 
 #ifndef WIN32
-   /* This is more than we really ought to know about 
instr_time */
-   fprintf(logfile, %d %d %.0f %d %ld %ld\n,
-   st-id, st-cnt, usec, st-use_file,
-   (long) now.tv_sec, (long) now.tv_usec);
+   /* This is more than we really ought to know 
about instr_time */
+   fprintf(logfile, %d %d %.0f %d %ld %ld\n,
+   st-id, st-cnt, usec, 
st-use_file,
+   (long) now.tv_sec, (long) 
now.tv_usec);
 #else
-   /* On Windows, instr_time doesn't provide a timestamp 
anyway */
-   fprintf(logfile, %d %d %.0f %d 0 0\n,
-   st-id, st-cnt, usec, st-use_file);
+   /* On Windows, instr_time doesn't provide a 
timestamp anyway */
+   fprintf(logfile, %d %d %.0f %d 0 0\n,
+   st-id, st-cnt, usec, 
st-use_file);
 #endif
+   }
}
 
if (commands[st-state]-type == SQL_COMMAND)
@@ -1918,6 +1930,7 @@ main(int argc, char **argv)
{index-tablespace, required_argument, NULL, 3},
{tablespace, required_argument, NULL, 2},
{unlogged-tables, no_argument, unlogged_tables, 1},
+   {sampling-rate, required_argument, NULL, 4},
{NULL, 0, NULL, 0}
};
 
@@ -2123,6 +2136,14 @@ main(int argc, char **argv)
case 3: /* index-tablespace */
index_tablespace = optarg;
break;
+   case 4:
+   use_sample_rate = atof(optarg)/100;
+   if (use_sample_rate = 0.0 || 

Re: [HACKERS] PATCH: pgbench - aggregation of info written into log

2012-09-02 Thread Tomas Vondra

Dne 30.08.2012 18:02, Robert Haas napsal:

On Fri, Aug 24, 2012 at 5:25 PM, Tomas Vondra t...@fuzzy.cz wrote:

This patch is a bit less polished (and more complex) than the other
pgbench patch I've sent a while back, and I'm not sure how to handle 
the

Windows branch. That needs to be fixed during the commit fest.


What's the problem with the Windows branch?

Could you un-cuddle your brances to follow the PostgreSQL style, omit
braces around single-statement blocks, and remove the spurious
whitespace changes?


Done, or at least I don't see other formatting issues. Let me know if I 
missed something.



Instead of having both use_log_agg and naggseconds, I think you can
get by with just having a single variable that is zero if aggregation
is not in use and is otherwise the aggregation period.  On a related
note, you can't specify -A without an associated value, so there is 
no
point in documenting a default.  As with your other patch, I 
suggest

using a long option name like --latency-aggregate-interval and then
making the name of the variable in the code match the option name,
with s/-/_/g, for clarity.


Fixed. I've kept use_log_agg only and I've removed the default. Also
I've added one more check (that the total duration is a multiple of
the aggregation interval).

And just as with the sampling patch, I believe the -l should not be
enabled by default, but required. But if more people ask to enable it
whenever the aggregation or sampling is used, I'm fine with it.

Tomasdiff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 00cab73..e044564 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -145,6 +145,7 @@ char   *index_tablespace = NULL;
 #define naccounts  10
 
 bool   use_log;/* log transaction latencies to 
a file */
+intuse_log_agg;/* log aggregates instead of 
individual transactions */
 bool   is_connect; /* establish connection for 
each transaction */
 bool   is_latencies;   /* report per-command latencies */
 intmain_pid;   /* main process id used 
in log filename */
@@ -240,6 +241,18 @@ typedef struct
char   *argv[MAX_ARGS]; /* command word list */
 } Command;
 
+typedef struct
+{
+
+   longstart_time; /* when does the interval start 
*/
+   int cnt;/* number of transactions */
+   double  min_duration;   /* min/max durations */
+   double  max_duration;
+   double  sum;/* sum(duration), 
sum(duration^2) - for estimates */
+   double  sum2;
+   
+} AggVals;
+
 static Command **sql_files[MAX_FILES]; /* SQL script files */
 static int num_files;  /* number of script files */
 static int num_commands = 0;   /* total number of Command structs */
@@ -364,6 +377,8 @@ usage(void)
 -f FILENAME  read transaction script from FILENAME\n
 -j NUM   number of threads (default: 1)\n
 -l   write transaction times to log file\n
+--aggregate-interval NUM\n
+ aggregate data over NUM seconds\n
 -M simple|extended|prepared\n
  protocol for submitting queries to server 
(default: simple)\n
 -n   do not run VACUUM before tests\n
@@ -817,9 +832,22 @@ clientDone(CState *st, bool ok)
return false;   /* always false */
 }
 
+static
+void agg_vals_init(AggVals * aggs, instr_time start)
+{
+   aggs-cnt = 0;
+   aggs-sum = 0;
+   aggs-sum2 = 0;
+   
+   aggs-min_duration = 3600 * 100.0; /* one hour */
+   aggs-max_duration = 0;
+
+   aggs-start_time = INSTR_TIME_GET_DOUBLE(start);
+}
+
 /* return false iff client should be disconnected */
 static bool
-doCustom(TState *thread, CState *st, instr_time *conn_time, FILE *logfile)
+doCustom(TState *thread, CState *st, instr_time *conn_time, FILE *logfile, 
AggVals * agg)
 {
PGresult   *res;
Command   **commands;
@@ -881,17 +909,70 @@ top:
diff = now;
INSTR_TIME_SUBTRACT(diff, st-txn_begin);
usec = (double) INSTR_TIME_GET_MICROSEC(diff);
-
+   
+   /* should we aggregate the results or not? */
+   if (use_log_agg)
+   {
+   
+   /* are we still in the same interval? if yes, 
accumulate the
+* values (print them otherwise) */
+   if (agg-start_time + use_log_agg = 
INSTR_TIME_GET_DOUBLE(now))
+   {
+   
+

Re: [HACKERS] PATCH: pgbench - aggregation of info written into log

2012-09-02 Thread Jeff Janes
On Sun, Sep 2, 2012 at 3:46 PM, Tomas Vondra t...@fuzzy.cz wrote:

 Fixed. I've kept use_log_agg only and I've removed the default. Also
 I've added one more check (that the total duration is a multiple of
 the aggregation interval).

Hi Tomas,

In the docs, -l is an option, not an application:
application-l/application

Also, the docs still refer to option-A/option, rather than to
option--aggregate-interval/option, in a few places.

I think a section in the docs that points out that transactions run by
specifying mulitple -f will be mingled together into an interval would
be a good idea, as that could easily be overlooked (as I did earlier).

The docs do not mention anywhere what the units for the latencies are.

I think the format of the logfile should somehow make it unmistakably
different from the regular -l logfile, for example by prefixing every
line with Aggregate: .   Or maybe there is some other solution, but
I think that having two log formats, both of which consist of nothing
but 6 integers, but yet mean very different things, is a recipe for
confusion.

Is it unavoidable that the interval be an integral number of seconds?
I found the units in the original code confusing, so maybe there is
some reason it needs to be like that that I don't understand yet.
I'll look into it some more.

Thanks,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_upgrade bugs

2012-09-02 Thread Andrew Dunstan


I have been wrestling for a couple of days trying to get pg_upgrade 
testing working on Windows, with a view to having it tested on the 
buildfarm. The test script has its own issues, which I'll deal with 
separately, but there are two issues in pg_upgrade's exec.c that make me 
suspect that if pg_upgrade has ever worked at all on Windows it is a 
matter of sheer luck. The attached patch fixes these. The first issue is 
a plain miscall to stlcpy(), where the length argument is wrong. The 
second is where exec_prog tries to open a log file after the system call 
returns. This will fail if the command was a 'pg_ctl start', as the 
running postmaster will have the log file open, so I have simply 
#ifdef'd it out for the Windows case, as the code does nothing except 
add a couple of line feeds to the log, missing which won't affect 
anything much.


Barring objection I will commit this before long.

cheers

andrew
diff --git a/contrib/pg_upgrade/exec.c b/contrib/pg_upgrade/exec.c
index c75d9db..c780b63 100644
--- a/contrib/pg_upgrade/exec.c
+++ b/contrib/pg_upgrade/exec.c
@@ -52,7 +52,7 @@ exec_prog(const char *log_file, const char *opt_log_file,
 
 	old_umask = umask(S_IRWXG | S_IRWXO);
 
-	written = strlcpy(cmd, SYSTEMQUOTE, strlen(SYSTEMQUOTE));
+	written = strlcpy(cmd, SYSTEMQUOTE, sizeof(cmd));
 	va_start(ap, fmt);
 	written += vsnprintf(cmd + written, MAXCMDLEN - written, fmt, ap);
 	va_end(ap);
@@ -95,10 +95,12 @@ exec_prog(const char *log_file, const char *opt_log_file,
    log_file);
 	}
 
+#ifndef WIN32
 	if ((log = fopen_priv(log_file, a+)) == NULL)
 		pg_log(PG_FATAL, cannot write to log file %s\n, log_file);
 	fprintf(log, \n\n);
 	fclose(log);
+#endif
 
 	return result == 0;
 }

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade bugs

2012-09-02 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 I have been wrestling for a couple of days trying to get pg_upgrade 
 testing working on Windows, with a view to having it tested on the 
 buildfarm. The test script has its own issues, which I'll deal with 
 separately, but there are two issues in pg_upgrade's exec.c that make me 
 suspect that if pg_upgrade has ever worked at all on Windows it is a 
 matter of sheer luck. The attached patch fixes these. The first issue is 
 a plain miscall to stlcpy(), where the length argument is wrong. The 
 second is where exec_prog tries to open a log file after the system call 
 returns. This will fail if the command was a 'pg_ctl start', as the 
 running postmaster will have the log file open, so I have simply 
 #ifdef'd it out for the Windows case, as the code does nothing except 
 add a couple of line feeds to the log, missing which won't affect 
 anything much.

The strlcpy bug seems to be recently introduced --- I don't see it in
9.2.  I think the other bit has not been there very long either,
though it *is* in 9.2 branch so you'd better back-patch that part.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol

2012-09-02 Thread Heikki Linnakangas

On 03.09.2012 03:23, Tom Lane wrote:

1. As you can see above, the feature is triggered by specifying the new
connection option standalone_datadir, whose value must be the location
of the data directory.  I also invented an option standalone_backend,
which can be set to specify which postgres executable to launch.  If the
latter isn't specified, libpq defaults to trying the installation PGBINDIR
that was selected by configure.  (I don't think it can use the relative
path tricks we use in pg_ctl and elsewhere, since there's no good reason
to assume that it's running in a Postgres-supplied program.)  I'm not
particularly wedded to these names or the syntax, but I think this is the
basic functionality we'd need.


Are there security issues with this? If a user can specify libpq 
connection options, he can now execute any file he wants by passing it 
as standalone_backend. Granted, you shouldn't allow an untrusted user to 
specify libpq connection options, because allowing to open a TCP 
connection to an arbitrary address can be a problem by itself, but it 
seems like this might make the situation much worse. contrib/dblink 
springs to mind..



3. The bulk of the changes have to do with the fact that we need to keep
track of two file descriptors not one.  This was a bit tedious, but fairly
straightforward --- though I was surprised to find that send() and recv()
don't work on pipes, at least not on Linux.  You have to use read() and
write() instead.


Would socketpair(2) be simpler?


7. I haven't tried to make pg_upgrade use this yet.


Hmm, pg_upgrade uses pg_dumpall rather than pg_dump, so it needs to 
connect once per database. That means launching the standalone backend 
multiple times. I guess that works, as long as pg_dumpall doesn't try to 
hold multiple connections open at any one time.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade bugs

2012-09-02 Thread Alvaro Herrera
Excerpts from Andrew Dunstan's message of dom sep 02 22:11:42 -0300 2012:
 
 I have been wrestling for a couple of days trying to get pg_upgrade 
 testing working on Windows, with a view to having it tested on the 
 buildfarm. The test script has its own issues, which I'll deal with 
 separately, but there are two issues in pg_upgrade's exec.c that make me 
 suspect that if pg_upgrade has ever worked at all on Windows it is a 
 matter of sheer luck. The attached patch fixes these. The first issue is 
 a plain miscall to stlcpy(), where the length argument is wrong. The 
 second is where exec_prog tries to open a log file after the system call 
 returns. This will fail if the command was a 'pg_ctl start', as the 
 running postmaster will have the log file open, so I have simply 
 #ifdef'd it out for the Windows case, as the code does nothing except 
 add a couple of line feeds to the log, missing which won't affect 
 anything much.

The first issue was clearly introduced by me in
088c065ce8e405fafbfa966937184ece9defcf20.  The other one is probably
also my fault.  Bruce has tested this on Windows previously and fixed
some previous bugs also introduced by me in the same area; so we know it
works.

Maybe, to reduce future backpatching pain, we could backpatch the change
to exec_prog() API now that you have fixed the implementation?

 Barring objection I will commit this before long.

Thanks.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_upgrade test mods for Windows/Mingw

2012-09-02 Thread Andrew Dunstan
The attached patch is what I had to do to get pg_upgrade's make check 
to run on Windows under Mingw. Mostly the changes have to do with 
getting paths right between Windows and MSys, or calling generated .bat 
files instead of shell scripts.


cheers

andrew
diff --git a/contrib/pg_upgrade/test.sh b/contrib/pg_upgrade/test.sh
index 31e30af..fc2304a 100644
--- a/contrib/pg_upgrade/test.sh
+++ b/contrib/pg_upgrade/test.sh
@@ -43,13 +43,23 @@ if [ $1 = '--install' ]; then
 	export EXTRA_REGRESS_OPTS
 fi
 
-: ${oldbindir=$bindir}
-
 : ${oldsrc=../..}
-oldsrc=`cd $oldsrc  pwd`
-newsrc=`cd ../..  pwd`
 
-PATH=$bindir:$PATH
+if [ `uname -a | sed 's/.* //'` = Msys ] ; then
+	cp $libdir/libpq.dll $bindir
+	osbindir=`cd $bindir  pwd`
+	bindir=`cd $bindir  pwd -W`
+	oldsrc=`cd $oldsrc  pwd -W`
+	newsrc=`cd ../..  pwd -W`
+else
+	osbindir=$bindir
+	oldsrc=`cd $oldsrc  pwd`
+	newsrc=`cd ../..  pwd`
+fi
+
+: ${oldbindir=$bindir}
+
+PATH=$osbindir:$PATH
 export PATH
 
 PGDATA=$temp_root/data
@@ -104,10 +114,19 @@ mv ${PGDATA} ${PGDATA}.old
 
 initdb
 
+if [ `uname -a | sed 's/.* //'` = Msys ] ; then 
+	PGDATA=`cd $PGDATA  pwd -W`
+fi
+
 pg_upgrade -d ${PGDATA}.old -D ${PGDATA} -b $oldbindir -B $bindir
 
 pg_ctl start -l $logdir/postmaster2.log -w
-sh ./analyze_new_cluster.sh
+
+if  [ `uname -a | sed 's/.* //'` = Msys ] ; then
+	cmd /c analyze_new_cluster.bat
+else
+	sh ./analyze_new_cluster.sh
+fi
 pg_dumpall $temp_root/dump2.sql || pg_dumpall2_status=$?
 pg_ctl -m fast stop
 if [ -n $pg_dumpall2_status ]; then
@@ -115,7 +134,15 @@ if [ -n $pg_dumpall2_status ]; then
 	exit 1
 fi
 
-sh ./delete_old_cluster.sh
+if  [ `uname -a | sed 's/.* //'` = Msys ] ; then
+	sed -i -e 's,/,\\,g' -e 's,\\s\\q ,/s/q ,' delete_old_cluster.bat 2/dev/null
+	chmod +w delete_old_cluster.bat
+	cmd /c delete_old_cluster.bat
+	dos2unix $temp_root/dump1.sql /dev/null
+	dos2unix $temp_root/dump2.sql /dev/null
+else
+	sh ./delete_old_cluster.sh
+fi
 
 if diff -q $temp_root/dump1.sql $temp_root/dump2.sql; then
 	echo PASSED

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol

2012-09-02 Thread Tom Lane
Heikki Linnakangas hlinn...@iki.fi writes:
 On 03.09.2012 03:23, Tom Lane wrote:
 1. As you can see above, the feature is triggered by specifying the new
 connection option standalone_datadir, whose value must be the location
 of the data directory.  I also invented an option standalone_backend,
 which can be set to specify which postgres executable to launch.

 Are there security issues with this? If a user can specify libpq 
 connection options, he can now execute any file he wants by passing it 
 as standalone_backend. Granted, you shouldn't allow an untrusted user to 
 specify libpq connection options, because allowing to open a TCP 
 connection to an arbitrary address can be a problem by itself, but it 
 seems like this might make the situation much worse. contrib/dblink 
 springs to mind..

Hmm, that's a good point.  Maybe we should only allow the executable
name to come from an environment variable?  Seems kinda klugy though.

 3. The bulk of the changes have to do with the fact that we need to keep
 track of two file descriptors not one.

 Would socketpair(2) be simpler?

Hm, yes, but is it portable enough?  It seems to be required by SUS v2,
so we're likely okay on the Unix side, but does Windows have this?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Yet another failure mode in pg_upgrade

2012-09-02 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 Updated patch attached.

[ looks at that for a bit... ]  Now I see why you were on about that:
the method you used here requires both clusters to have the same socket
directory.  Which is silly and unnecessary.  Revised patch attached.

regards, tom lane

diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c
index 0fec73e..efb080b 100644
*** a/contrib/pg_upgrade/check.c
--- b/contrib/pg_upgrade/check.c
*** issue_warnings(char *sequence_script_fil
*** 184,191 
  		{
  			prep_status(Adjusting sequences);
  			exec_prog(UTILITY_LOG_FILE, NULL, true,
! 	  \%s/psql\  EXEC_PSQL_ARGS  --port %d --username \%s\ -f \%s\,
! 	  new_cluster.bindir, new_cluster.port, os_info.user,
  	  sequence_script_file_name);
  			unlink(sequence_script_file_name);
  			check_ok();
--- 184,191 
  		{
  			prep_status(Adjusting sequences);
  			exec_prog(UTILITY_LOG_FILE, NULL, true,
! 	  \%s/psql\  EXEC_PSQL_ARGS  %s -f \%s\,
! 	  new_cluster.bindir, cluster_conn_opts(new_cluster),
  	  sequence_script_file_name);
  			unlink(sequence_script_file_name);
  			check_ok();
diff --git a/contrib/pg_upgrade/dump.c b/contrib/pg_upgrade/dump.c
index cfc4017..b905ab0 100644
*** a/contrib/pg_upgrade/dump.c
--- b/contrib/pg_upgrade/dump.c
*** generate_old_dump(void)
*** 24,31 
  	 * restores the frozenid's for databases and relations.
  	 */
  	exec_prog(UTILITY_LOG_FILE, NULL, true,
! 			  \%s/pg_dumpall\ --port %d --username \%s\ --schema-only --binary-upgrade %s -f %s,
! 			  new_cluster.bindir, old_cluster.port, os_info.user,
  			  log_opts.verbose ? --verbose : ,
  			  ALL_DUMP_FILE);
  	check_ok();
--- 24,31 
  	 * restores the frozenid's for databases and relations.
  	 */
  	exec_prog(UTILITY_LOG_FILE, NULL, true,
! 			  \%s/pg_dumpall\ %s --schema-only --binary-upgrade %s -f %s,
! 			  new_cluster.bindir, cluster_conn_opts(old_cluster),
  			  log_opts.verbose ? --verbose : ,
  			  ALL_DUMP_FILE);
  	check_ok();
diff --git a/contrib/pg_upgrade/option.c b/contrib/pg_upgrade/option.c
index 94bce50..80f7d34 100644
*** a/contrib/pg_upgrade/option.c
--- b/contrib/pg_upgrade/option.c
*** adjust_data_dir(ClusterInfo *cluster)
*** 376,378 
--- 376,439 
  
  	check_ok();
  }
+ 
+ 
+ /*
+  * get_sock_dir
+  *
+  * Identify the socket directory to use for this cluster.  If we're doing
+  * a live check (old cluster only), we need to find out where the postmaster
+  * is listening.  Otherwise, we're going to put the socket into the current
+  * directory.
+  */
+ void
+ get_sock_dir(ClusterInfo *cluster, bool live_check)
+ {
+ #if HAVE_UNIX_SOCKETS
+ 	if (!live_check)
+ 	{
+ 		/* Use the current directory for the socket */
+ 		cluster-sockdir = pg_malloc(MAXPGPATH);
+ 		if (!getcwd(cluster-sockdir, MAXPGPATH))
+ 			pg_log(PG_FATAL, cannot find current directory\n);
+ 	}
+ 	else
+ 	{
+ 		/*
+ 		 *	If we are doing a live check, we will use the old cluster's Unix
+ 		 *	domain socket directory so we can connect to the live server.
+ 		 */
+ 
+ 		/* sockdir added to the 5th line of postmaster.pid in PG 9.1 */
+ 		if (GET_MAJOR_VERSION(cluster-major_version) = 901)
+ 		{
+ 			char		filename[MAXPGPATH];
+ 			FILE		*fp;
+ 			int			i;
+ 
+ 			snprintf(filename, sizeof(filename), %s/postmaster.pid,
+ 	 cluster-pgdata);
+ 			if ((fp = fopen(filename, r)) == NULL)
+ pg_log(PG_FATAL, Could not get socket directory of the old server\n);
+ 
+ 			cluster-sockdir = pg_malloc(MAXPGPATH);
+ 			for (i = 0; i  5; i++)
+ if (fgets(cluster-sockdir, MAXPGPATH, fp) == NULL)
+ 	pg_log(PG_FATAL, Could not get socket directory of the old server\n);
+ 
+ 			fclose(fp);
+ 
+ 			/* Remove trailing newline */
+ 			if (strchr(cluster-sockdir, '\n') != NULL)
+ *strchr(cluster-sockdir, '\n') = '\0';
+ 		}
+ 		else
+ 		{
+ 			/* Can't get live sockdir, so assume the default is OK. */
+ 			cluster-sockdir = NULL;
+ 		}
+ 	}
+ #else /* !HAVE_UNIX_SOCKETS */
+ 	cluster-sockdir = NULL;
+ #endif
+ }
diff --git a/contrib/pg_upgrade/pg_upgrade.c b/contrib/pg_upgrade/pg_upgrade.c
index c47c8bb..ee3a151 100644
*** a/contrib/pg_upgrade/pg_upgrade.c
--- b/contrib/pg_upgrade/pg_upgrade.c
*** main(int argc, char **argv)
*** 88,93 
--- 88,96 
  	check_cluster_versions();
  	check_cluster_compatibility(live_check);
  
+ 	get_sock_dir(old_cluster, live_check);
+ 	get_sock_dir(new_cluster, false);
+ 
  	check_old_cluster(live_check, sequence_script_file_name);
  
  
*** prepare_new_cluster(void)
*** 211,218 
  	 */
  	prep_status(Analyzing all rows in the new cluster);
  	exec_prog(UTILITY_LOG_FILE, NULL, true,
! 			  \%s/vacuumdb\ --port %d --username \%s\ --all --analyze %s,
! 			  new_cluster.bindir, new_cluster.port, os_info.user,
  			  log_opts.verbose ? --verbose : );
  	check_ok();
  
--- 214,221 
  	 */
  	prep_status(Analyzing all rows in the new cluster);
  	

Re: [HACKERS] pg_upgrade test mods for Windows/Mingw

2012-09-02 Thread Gurjeet Singh
On Sun, Sep 2, 2012 at 11:29 PM, Andrew Dunstan and...@dunslane.net wrote:

 The attached patch is what I had to do to get pg_upgrade's make check to
 run on Windows under Mingw. Mostly the changes have to do with getting
 paths right between Windows and MSys, or calling generated .bat files
 instead of shell scripts.


When reading shell script code like this

`uname -a | sed 's/.* //'` = Msys

and

sed -i -e 's,/,\\,g' -e 's,\\s\\q ,/s/q ,' delete_old_cluster.bat
2/dev/null

I find it easier to understand and maintain if the comments also describe
what is the original string format that  this pattern-matching expects,
like:

# We expect `uname -a` output like:
#  Windows_NT4.0 Msys

and

# We expect lines of the format:
#   abc/xyz/def/
# and we convert them to
#  abc\xyz\def


BTW, would `uname -o` eliminate the need of pattern matching in the first
snippet? The Wikipedia [1] article suggests so.

[1] http://en.wikipedia.org/wiki/Uname

Best regards,
-- 
Gurjeet Singh


Re: [HACKERS] pg_upgrade bugs

2012-09-02 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Maybe, to reduce future backpatching pain, we could backpatch the change
 to exec_prog() API now that you have fixed the implementation?

I'm inclined to think this is a good idea, but keep in mind we're less
than four days from wrapping 9.2.0.  There's not a lot of margin for
error here.

At the same time, getting pg_upgrade to pass regression on Windows would
be a good thing, and ought to go a long way towards ameliorating worries
about this.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol

2012-09-02 Thread Noah Misch
On Sun, Sep 02, 2012 at 11:34:42PM -0400, Tom Lane wrote:
 Heikki Linnakangas hlinn...@iki.fi writes:
  On 03.09.2012 03:23, Tom Lane wrote:
  1. As you can see above, the feature is triggered by specifying the new
  connection option standalone_datadir, whose value must be the location
  of the data directory.  I also invented an option standalone_backend,
  which can be set to specify which postgres executable to launch.
 
  Are there security issues with this? If a user can specify libpq 
  connection options, he can now execute any file he wants by passing it 
  as standalone_backend. Granted, you shouldn't allow an untrusted user to 
  specify libpq connection options, because allowing to open a TCP 
  connection to an arbitrary address can be a problem by itself, but it 
  seems like this might make the situation much worse. contrib/dblink 
  springs to mind..
 
 Hmm, that's a good point.  Maybe we should only allow the executable
 name to come from an environment variable?  Seems kinda klugy though.

I don't think it's libpq's job to enforce security policy this way.  In any
event, it has no reason to presume an environment variable is a safer source.

  3. The bulk of the changes have to do with the fact that we need to keep
  track of two file descriptors not one.
 
  Would socketpair(2) be simpler?
 
 Hm, yes, but is it portable enough?  It seems to be required by SUS v2,
 so we're likely okay on the Unix side, but does Windows have this?

Windows does not have socketpair(), nor a strict pipe() equivalent.  I expect
switching to socketpair() makes the Windows side trickier in some ways and
simpler in others.  +1 for exploring that direction first.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol

2012-09-02 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 On Sun, Sep 02, 2012 at 11:34:42PM -0400, Tom Lane wrote:
 Heikki Linnakangas hlinn...@iki.fi writes:
 Are there security issues with this? If a user can specify libpq 
 connection options, he can now execute any file he wants by passing it 
 as standalone_backend.

 Hmm, that's a good point.  Maybe we should only allow the executable
 name to come from an environment variable?  Seems kinda klugy though.

 I don't think it's libpq's job to enforce security policy this way.  In any
 event, it has no reason to presume an environment variable is a safer source.

One easy thing we could do that would at least narrow the risks is to
only allow the executable's *directory* to be specified, hardwiring the
executable file name to postgres (or postgres.exe I guess).

I'm inclined to think though that environment variables *are* more
secure in this context.  In the contrib/dblink example, such a
restriction would certainly help a lot.  In any case, we should at
least think of a way that an app using libpq can prevent this type
of attack short of parsing the conn string for itself.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade test mods for Windows/Mingw

2012-09-02 Thread Tom Lane
Gurjeet Singh singh.gurj...@gmail.com writes:
 [ this needs a comment: ]
 `uname -a | sed 's/.* //'` = Msys

Also, how about doing that just once and setting a variable to test
as needed later?  Multiple copied-and-pasted instances of line noise
like that are risky.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-09-02 Thread Robert Haas
On Sun, Sep 2, 2012 at 5:39 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Thu, Aug 30, 2012 at 8:17 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Aug 30, 2012 at 4:51 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Bruce Momjian br...@momjian.us writes:
 On Thu, May 31, 2012 at 09:20:43AM +0900, Tatsuo Ishii wrote:
 Ok, I modified the part of pg_dump where tremendous number of LOCK
 TABLE are issued. I replace them with single LOCK TABLE with multiple
 tables. With 100k tables LOCK statements took 13 minutes in total, now
 it only takes 3 seconds. Comments?

 Was this applied?

 No, we fixed the server side instead.

 But only for 9.2, right?  So people running back branches are still screwed.

 Yeah, but they're screwed anyway, because there are a bunch of O(N^2)
 behaviors involved here, not all of which are masked by what Tatsuo-san
 suggested.

 All of the other ones that I know of were associated with pg_dump
 itself, and since it is recommended to run the newer version of
 pg_dump against the older version of the server, no back patching
 would be necessary to get the benefits of those particular fixes.

 Six months or a year from now, we might have enough confidence in that
 batch of 9.2 fixes to back-port them en masse.  Don't want to do it
 today though.


 What would be the recommendation for people trying to upgrade, but who
 can't get their data out in a reasonable window?

 Putting Tatsuo-san's change into a future pg_dump might be more
 conservative than back-porting the server's Lock Table change to the
 server version they are trying to get rid of.

What he said.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol

2012-09-02 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 Windows does not have socketpair(), nor a strict pipe() equivalent.  I expect
 switching to socketpair() makes the Windows side trickier in some ways and
 simpler in others.  +1 for exploring that direction first.

A bit of googling suggests that emulating socketpair() on Windows is not
that hard: basically you create an accepting socket and connect to it.
Ugly I guess but likely to be nicer than emulating the two-pipes trick
exactly.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol

2012-09-02 Thread Noah Misch
On Mon, Sep 03, 2012 at 12:11:20AM -0400, Tom Lane wrote:
 Noah Misch n...@leadboat.com writes:
  I don't think it's libpq's job to enforce security policy this way.  In any
  event, it has no reason to presume an environment variable is a safer 
  source.
 
 One easy thing we could do that would at least narrow the risks is to
 only allow the executable's *directory* to be specified, hardwiring the
 executable file name to postgres (or postgres.exe I guess).

If the user has any interactive access to the machine, he could make a
/tmp/X/postgres symbolic link to the program of his choosing.

 I'm inclined to think though that environment variables *are* more
 secure in this context.  In the contrib/dblink example, such a
 restriction would certainly help a lot.

dblink_connect() should only let superusers specify standalone_datadir at all;
normal users have no business manipulating other data directories visible to
the backend running dblink.  For superusers, setting both standalone_datadir
and standalone_backend is fair game.

Trusting the environment over connection strings is a wrong policy for, say, a
setuid command-line program.  Suppose such a program passes libpq a fixed
connection string to access its embedded database.  The program will find
itself explicitly clearing this environment variable to regain safety.

 In any case, we should at
 least think of a way that an app using libpq can prevent this type
 of attack short of parsing the conn string for itself.

My recommendation to affected application authors is to pass the connection
string through PQconninfoParse().  Walk the array, validating or rejecting
arguments like host and standalone_datadir.  For maximum safety, the
application would need to reject any unanticipated parameters.  We might
soften that by adding a bool critical field to PQconninfoOption that
documents whether the option must be understood by a program validating a
connection.  Compare the idea of the PNG chunk header's ancillary bit.
Parameters like host and standalone_datadir would be critical, while
application_name and connect_timeout would be ancillary.  But this is a
tough line to draw rigorously.


I was pondering the flexibility from letting the application fork/exec and
supply the client-side descriptor number(s) to libpq.  Suppose it wants to
redirect the backend's stderr to a file.  A single-threaded application would
temporarily redirect its own stderr while calling PQconnectdb().  In a
multithreaded application, that introduces a race condition when other threads
concurrently write to the normal stderr.  By handling redirection in its own
fork/exec code, the application can achieve the outcome safely.  Perhaps
offering this can wait until the feature constitutes a more general
embedded-database offering, though.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers