Re: [HACKERS] Enabling Checksums

2012-12-04 Thread Jeff Davis
On Mon, 2012-12-03 at 13:16 +, Simon Riggs wrote:
 On 3 December 2012 09:56, Simon Riggs si...@2ndquadrant.com wrote:
 
  I think the way forwards for this is...
 
  1. Break out the changes around inCommit flag, since that is just
  uncontroversial refactoring. I can do that. That reduces the noise
  level in the patch and makes it easier to understand the meaningful
  changes.
 
 Done.

Thank you.

One minor thing I noticed: it looks like nwaits is a useless variable.
Your original checksums patch used it to generate a warning, but now
that is gone. It's not throwing a compiler warning for some reason.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby conflict resolution handling

2012-12-04 Thread Andres Freund
Hi,

On 2012-12-04 12:30:43 +0530, Pavan Deolasee wrote:
 I was trying some simple queries on a Hot Standby with streaming
 replication.

 On standby, I do this:
 postgres=# begin transaction isolation level repeatable read;
 BEGIN
 postgres=# explain verbose select count(b) from test WHERE a  10;

 On master, I insert a bunch of tuples in the table and run VACUUM ANALYZE.
 postgres=# INSERT INTO test VALUES (generate_series(110001,12), 'foo',
 1);
 INSERT 0 1
 postgres=# VACUUM ANALYZE test;
 VACUUM

 After max_standby_streaming_delay, the standby starts cancelling the
 queries. I get an error like this on the standby:
 postgres=# explain verbose select count(b) from test WHERE a  10;
 FATAL:  terminating connection due to conflict with recovery
 DETAIL:  User query might have needed to see row versions that must be
 removed.
 HINT:  In a moment you should be able to reconnect to the database and
 repeat your command.
 server closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.
 The connection to the server was lost. Attempting reset: Succeeded.

 So I've couple questions/concerns here

 1. Why to throw a FATAL error here ? A plain ERROR should be enough to
 abort the transaction. There are four places in ProcessInterrupts() where
 we throw these kind of errors and three of them are FATAL.

The problem here is that were in IDLE IN TRANSACTION in this case. Which
currently cannot be cancelled (i.e. pg_cancel_backend() just won't do
anything).

There are two problems making this non-trivial. For one, while we're in
IDLE IN TXN the client doesn't expect a response on a protocol level, so
we can't simply ereport() at that time.
For another, when were in IDLE IN TXN we're potentially inside openssl
so we can't jump out of there anyway because that would quite likely
corrupt the internal state of openssl.

I tried to fix this before (c.f. Idle in transaction cancellation or
similar) but while I had some kind of fix for the first issue (i saved
the error and reported it later when the protocol state allows it) I
missed the jumping out of openssl bit. I think its not that hard to
solve though. I remember having something preliminary but I never had
the time to finish it. If I remember correctly the trick was to set
openssl into non-blocking mode temporarily and return to the caller
inside be-secure.c:my_sock_read.
At that location ProcessInterrupts can run safely, error out silently,
and reraise the error once were in the right protocol state.

 2836 else if (RecoveryConflictPending  RecoveryConflictRetryable)
 2837 {
 2838 pgstat_report_recovery_conflict(RecoveryConflictReason);
 2839 ereport(FATAL,
 2840 (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
 2841   errmsg(terminating connection due to conflict with
 recovery),
 2842  errdetail_recovery_conflict()));
 2843 }
 2844 else if (RecoveryConflictPending)
 2845 {
 2846 /* Currently there is only one non-retryable recovery
 conflict */
 2847 Assert(RecoveryConflictReason ==
 PROCSIG_RECOVERY_CONFLICT_DATABASE);
 2848 pgstat_report_recovery_conflict(RecoveryConflictReason);
 2849 ereport(FATAL,
 2850 (errcode(ERRCODE_DATABASE_DROPPED),
 2851   errmsg(terminating connection due to conflict with
 recovery),
 2852  errdetail_recovery_conflict()));
 2853 }

 AFAICS the first of these should be ereport(ERROR). Otherwise irrespective
 of whether RecoveryConflictRetryable is true or false, we will always
 ereport(FATAL).

Which is fine, because were below if (ProcDiePending). Note there's a
separate path for QueryCancelPending. We go on to kill connections once
the normal conflict handling has tried several times.

 2. For my test, the error message itself looks wrong because I did not
 actually remove any rows on the master. VACUUM probably marked a bunch of
 pages as all-visible and that should have triggered a conflict on the
 standby in order to support index-only scans. IMHO we should  improve the
 error message to avoid any confusion. Or we can add a new ProcSignalReason
 to differentiate between a cancel due to clean up vs visibilitymap_set()
 operation.

I think we desparately need to improve *all* of these message with
significantly more detail (cause for cancellation, relation, current
xid, conflicting xid, current/last query).

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] visibilitymap_count() at the end of vacuum

2012-12-04 Thread Amit Kapila

On Tuesday, December 04, 2012 5:14 AM Pavan Deolasee wrote:
On Tue, Dec 4, 2012 at 3:16 AM, Robert Haas robertmh...@gmail.com wrote:

Also, if someone does have a 100GB relation, rereading 2MB of
visibility map pages at the end probably isn't a significant part of
the total cost.  Even if 99.9% of the relation is all-visible and we
skip reading those parts, the visibility map reads will still be only
about 2% of the total read activity, and most of the time you won't be
that lucky.

Hmm. I fully agree its a very small percentage of the total cost. But I
still don't see why it should not be optimised, if possible. Of course, if
not recounting at the end will generate bad query plans most of the time for
most of the workloads or even a few workloads, then the minuscule cost will
pay of. But nobody convincingly argued about that.

Even if the current way is the right way, we should probably just add a
comment there. I also noticed that we call vacuum_delay_point() after
testing every visibility map bit in lazy_scan_heap() which looks overkill,
but visibilitymap_count() runs to the end without even a single call to
vacuum_delay_point(). Is that intended ? Or should we at least check for
interrupts there ?

I think calling vacuum_delay_point(), after every visibility map bit test in
lazy_scan_heap() might not be necessary.
Shouldn't both places be consistent and call vacuum_delay_point() after one
vm page processing?

With Regards,
Amit Kapila.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Extra Daemons / bgworker

2012-12-04 Thread Markus Wanner
On 12/03/2012 10:35 PM, Alvaro Herrera wrote:
 So here's version 8.  This fixes a couple of bugs and most notably
 creates a separate PGPROC list for bgworkers, so that they don't
 interfere with client-connected backends.

Nice, thanks. I'll try to review this again, shortly.

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2012-12-04 Thread Jeff Davis
On Mon, 2012-12-03 at 09:56 +, Simon Riggs wrote:
 1. Break out the changes around inCommit flag, since that is just
 uncontroversial refactoring. I can do that. That reduces the noise
 level in the patch and makes it easier to understand the meaningful
 changes.

Done by you.

 2. Produce an SGML docs page that describes how this works, what the
 limitations and tradeoffs are. Reliability  the WAL could use an
 extra section2 header called Checksums (wal.sgml). This is essential
 for users AND reviewers to ensure everybody has understood this (heck,
 I can't remember everything about this either...)

Agreed. It looks like it would fit best under the Reliability section,
because it's not directly related to WAL. I'll write something up.

 3. I think we need an explicit test of this feature (as you describe
 above), rather than manual testing. corruptiontester?

I agree, but I'm not 100% sure how to proceed. I'll look at Kevin's
tests for SSI and see if I can do something similar, but suggestions are
welcome. A few days away, at the earliest.

 4. We need some general performance testing to show whether this is
 insane or not.

My understanding is that Greg Smith is already working on tests here, so
I will wait for his results.

 But this looks in good shape for commit otherwise.

Great!

For now, I rebased the patches against master, and did some very minor
cleanup.

Regards,
Jeff Davis



checksums-20121203.patch.gz
Description: GNU Zip compressed data


replace-tli-with-checksums-20121203.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] support for LDAP URLs

2012-12-04 Thread Boszormenyi Zoltan

2012-11-13 04:38 keltezéssel, Peter Eisentraut írta:

Here is a patch to support RFC 2255 LDAP URLs in pg_hba.conf.  So,
instead of, say

host ... ldap ldapserver=ldap.example.net ldapbasedn=dc=example, dc=net 
ldapsearchattribute=uid

you could write

host ... ldap lapurl=ldap://ldap.example.net/dc=example,dc=net?uid?sub;

Apache and probably other software uses the same format, and it's easier
to have a common format for all such configuration instead of having to
translate the information provided by the LDAP admin into each
software's particular configuration spellings.

I'm using the OpenLDAP-provided URL parsing routine, which means this
wouldn't be supported on Windows.  But we already support different
authentication settings on different platforms, so this didn't seem such
a big problem.


This patch was committed today but it fails to compile for non-ldap configs:

$ ./configure --prefix=$HOME/pg93dev --enable-debug --enable-cassert 
--enable-depend

make[3]: Entering directory 
`/home/zozo/crosscolumn/psql-c-relax/postgresql.1/src/backend/libpq'
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement 
-Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv 
-fexcess-precision=standard -g -I../../../src/include -D_GNU_SOURCE   -c -o hba.o hba.c 
-MMD -MP -MF .deps/hba.Po

hba.c: In function ‘parse_hba_auth_opt’:
hba.c:1388:23: error: ‘LDAP_SCOPE_SUBTREE’ undeclared (first use in this 
function)
hba.c:1388:23: note: each undeclared identifier is reported only once for each function it 
appears in

hba.c:1451:3: error: unknown type name ‘LDAPURLDesc’
hba.c:1452:7: warning: unused variable ‘rc’ [-Wunused-variable]
hba.c:1451:16: warning: unused variable ‘urldata’ [-Wunused-variable]
make[3]: *** [hba.o] Error 1

The code could use some #ifdef USE_LDAP conditionals.

Best regards,
Zoltán Böszörményi

--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)

2012-12-04 Thread Simon Riggs
On 4 December 2012 01:34, Jeff Davis pg...@j-davis.com wrote:

 I assume that refers to the discussion here:

 http://archives.postgresql.org/pgsql-hackers/2012-02/msg01177.php

 But that was quite a while ago, so is there a more recent discussion
 that prompted this commit now?

Yes, this was discussed within the last month, thread TRUNCATE SERIALIZABLE...

The patch for that was already posted, so I committed it.

 I am a little confused about the case where setting HEAP_XMIN_COMMITTED
 when loading the table in the same transaction is wrong. There was some
 discussion about subtransactions, but those problems only seemed to
 appear when the CREATE and the INSERT/COPY happened in different
 subtransactions.

 So, I guess my question is, why the partial revert?

Well, first, because I realised that it wasn't wanted. I was
re-reading the threads trying to figure out a way to help the checksum
patch further.

Second, because I realised why it wasn't wanted. Setting
HEAP_XMIN_COMMITTED causes MVCC violations within the transaction
issuing the COPY. Accepting the MVCC violation requires explicit
consent from user. If we have that, we may as well set everything we
can. So there's no middle ground. Nothing, or all frozen. We could
change that, but it would require some complex tweaks to tqual
routines, which I tried and was not happy with, since they would need
to get more complex. It's possible a route through that minefield
exists. I'm inclined to believe other approaches are more likely to
yield benefit.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] support for LDAP URLs

2012-12-04 Thread Andres Freund
On 2012-12-04 10:18:36 +0100, Boszormenyi Zoltan wrote:
 2012-11-13 04:38 keltezéssel, Peter Eisentraut írta:
 Here is a patch to support RFC 2255 LDAP URLs in pg_hba.conf.  So,
 instead of, say
 
 host ... ldap ldapserver=ldap.example.net ldapbasedn=dc=example, dc=net 
 ldapsearchattribute=uid
 
 you could write
 
 host ... ldap lapurl=ldap://ldap.example.net/dc=example,dc=net?uid?sub;
 
 Apache and probably other software uses the same format, and it's easier
 to have a common format for all such configuration instead of having to
 translate the information provided by the LDAP admin into each
 software's particular configuration spellings.
 
 I'm using the OpenLDAP-provided URL parsing routine, which means this
 wouldn't be supported on Windows.  But we already support different
 authentication settings on different platforms, so this didn't seem such
 a big problem.

 This patch was committed today but it fails to compile for non-ldap configs:

 $ ./configure --prefix=$HOME/pg93dev --enable-debug --enable-cassert 
 --enable-depend

 make[3]: Entering directory
 `/home/zozo/crosscolumn/psql-c-relax/postgresql.1/src/backend/libpq'
 gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
 -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute
 -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard
 -g -I../../../src/include -D_GNU_SOURCE   -c -o hba.o hba.c -MMD -MP -MF
 .deps/hba.Po
 hba.c: In function ‘parse_hba_auth_opt’:
 hba.c:1388:23: error: ‘LDAP_SCOPE_SUBTREE’ undeclared (first use in this 
 function)
 hba.c:1388:23: note: each undeclared identifier is reported only once for
 each function it appears in
 hba.c:1451:3: error: unknown type name ‘LDAPURLDesc’
 hba.c:1452:7: warning: unused variable ‘rc’ [-Wunused-variable]
 hba.c:1451:16: warning: unused variable ‘urldata’ [-Wunused-variable]
 make[3]: *** [hba.o] Error 1

 The code could use some #ifdef USE_LDAP conditionals.

As I needed to base some stuff on a later commit (5ce108bf3) and I
didn't want to include a revert in the tree, here's what I applied
locally to fix this. Maybe somebody can apply something like that to get
the buildfarm green again?

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 1222c05b6335297e19fd843d3971dab3f5f5a4a7 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Tue, 4 Dec 2012 11:49:38 +0100
Subject: [PATCH] fix build without ldap support after a2fec0a18e4d

---
 src/backend/libpq/hba.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index 2bb661c..36ece55 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -1385,7 +1385,9 @@ parse_hba_line(List *line, int line_num)
 static bool
 parse_hba_auth_opt(char *name, char *val, HbaLine *hbaline, int line_num)
 {
+#ifdef USE_LDAP
 	hbaline-ldapscope = LDAP_SCOPE_SUBTREE;
+#endif
 
 	if (strcmp(name, map) == 0)
 	{
@@ -1448,12 +1450,12 @@ parse_hba_auth_opt(char *name, char *val, HbaLine *hbaline, int line_num)
 	}
 	else if (strcmp(name, ldapurl) == 0)
 	{
+#ifdef LDAP_API_FEATURE_X_OPENLDAP
 		LDAPURLDesc *urldata;
 		int rc;
 
 		REQUIRE_AUTH_OPTION(uaLDAP, ldapurl, ldap);
 
-#ifdef LDAP_API_FEATURE_X_OPENLDAP
 		rc = ldap_url_parse(val, urldata);
 		if (rc != LDAP_SUCCESS)
 		{
-- 
1.7.10.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby conflict resolution handling

2012-12-04 Thread Pavan Deolasee
On Tue, Dec 4, 2012 at 1:44 PM, Andres Freund and...@2ndquadrant.comwrote:


 
  After max_standby_streaming_delay, the standby starts cancelling the
  queries. I get an error like this on the standby:
  postgres=# explain verbose select count(b) from test WHERE a  10;
  FATAL:  terminating connection due to conflict with recovery
  DETAIL:  User query might have needed to see row versions that must be
  removed.
  HINT:  In a moment you should be able to reconnect to the database and
  repeat your command.
  server closed the connection unexpectedly
  This probably means the server terminated abnormally
  before or while processing the request.
  The connection to the server was lost. Attempting reset: Succeeded.
 
  So I've couple questions/concerns here
 
  1. Why to throw a FATAL error here ? A plain ERROR should be enough to
  abort the transaction. There are four places in ProcessInterrupts() where
  we throw these kind of errors and three of them are FATAL.

 The problem here is that were in IDLE IN TRANSACTION in this case. Which
 currently cannot be cancelled (i.e. pg_cancel_backend() just won't do
 anything).

 There are two problems making this non-trivial. For one, while we're in
 IDLE IN TXN the client doesn't expect a response on a protocol level, so
 we can't simply ereport() at that time.
 For another, when were in IDLE IN TXN we're potentially inside openssl
 so we can't jump out of there anyway because that would quite likely
 corrupt the internal state of openssl.

 I tried to fix this before (c.f. Idle in transaction cancellation or
 similar) but while I had some kind of fix for the first issue (i saved
 the error and reported it later when the protocol state allows it) I
 missed the jumping out of openssl bit. I think its not that hard to
 solve though. I remember having something preliminary but I never had
 the time to finish it. If I remember correctly the trick was to set
 openssl into non-blocking mode temporarily and return to the caller
 inside be-secure.c:my_sock_read.


Thanks Andres. I also read the original thread and I now understand why we
are using FATAL here, at least until we have a better solution. Obviously
the connection reset is no good either because as someone commented in the
original discussion, I thought that I'm seeing a server crash while it was
not.




 
  AFAICS the first of these should be ereport(ERROR). Otherwise
 irrespective
  of whether RecoveryConflictRetryable is true or false, we will always
  ereport(FATAL).

 Which is fine, because were below if (ProcDiePending). Note there's a
 separate path for QueryCancelPending. We go on to kill connections once
 the normal conflict handling has tried several times.


Ok. Understood.I now see that every path below if (ProcDiePending) will
call FATAL, albeit with different error codes. That explains the current
code.




 I think we desparately need to improve *all* of these message with
 significantly more detail (cause for cancellation, relation, current
 xid, conflicting xid, current/last query).


I agree.

Thanks,
Pavan


-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] PQconninfo function for libpq

2012-12-04 Thread Magnus Hagander
On Mon, Dec 3, 2012 at 3:20 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Nov 30, 2012 at 1:02 AM, Magnus Hagander mag...@hagander.net wrote:
 In the interest of moving things along, I'll remove these for now at
 least, and commit the rest of the patch, so we can keep working on the
 basebacku part of things.

 I see you committed this - does this possibly provide infrastructure
 that could be used to lift the restriction imposed by the following
 commit?

 commit fe21fcaf8d91a71c15ff25276f9fa81e0cd1dba9
 Author: Bruce Momjian br...@momjian.us
 Date:   Wed Aug 15 19:04:52 2012 -0400

 In psql, if the is no connection object, e.g. due to a server crash,
 require all parameters for \c, rather than using the defaults, which
 might be wrong.

 Because personally, I find the new behavior no end of annoying.

It certainly sounds like it could do that. Though it might be useful
to actually add a connection funciton to libpq that takes such an
array of options directly - AFAICS, right now you'd have to
deconstruct the return value into a string, and then pass it into a
connection function that immediately parses it right back out as
conninfooptions.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: create extension default_full_version

2012-12-04 Thread Ibrar Ahmed
On Mon, Dec 3, 2012 at 11:05 PM, Dimitri Fontaine dimi...@2ndquadrant.frwrote:

 Hi,

 Thanks for your very good review!

 Ibrar Ahmed ibrar.ah...@gmail.com writes:
  I looked at the discussion for this patch and the patch itself. Here
  are my comments and observations about the patch.
  What I got from the discussion is that the patch tries to implement a
  mechanism to install extension from series of SQL scripts from
  base/full version e.g. if a user wants to create an extension 1.1,
  system should run v1.0 script followed by 1.0--1.1 script. In that
  case we need to know about the base or full version which in the above
  case is v1.0. So the patch added a defualt_full_version option in
  extension control file.

 Exactly, that was an idea from Robert and I implemented it quite
 quickly. Too quickly as we can see from your testing report.

  Here are my comments about the patch
 
  * Note: Patch does not apply cleanly on latest code base. You probably
  need to re-base the code

 Done. The thing is that meanwhile another solution to the main problem
 has been found: drop support for installing hstore 1.0. Attached patch
 fixes the problem by reinstalling hstore--1.0.sql and re-enabling this
 version, and removing the hstore--1.1.sql file now that it's enough to
 just have hstore--1.0--1.1.sql to install directly (and by default) the
 newer version.

 I think we will have to decide about taking only the mechanism or both
 the mechanism and the actual change for the hstore contrib.


  * This is a user visible change so documentation change is required here.

 Added coverage of the new parameter.

  * Also, You need to update the comment, because this code is now
  handling default_full_version as well.
 
/*
 * Determine the (unpackaged) version to update from, if any, and
 then
 * figure out what sequence of update scripts we need to apply.
 */
if ((d_old_version  d_old_version-arg) ||
 pcontrol-default_full_version)

 Done. I also fixed the bugs you reported here. Here's an edited version
 of the new (fixed) output:


 dim=# set client_min_messages to debug1;

 dim=# create extension hstore version '1.0';
 DEBUG:  execute_extension_script:
 '/Users/dim/pgsql/ddl/share/extension/hstore--1.0.sql'
 WARNING:  = is deprecated as an operator name
 CREATE EXTENSION

 dim=# create extension hstore version '1.1';
 DEBUG:  execute_extension_script:
 '/Users/dim/pgsql/ddl/share/extension/hstore--1.0.sql'
 WARNING:  = is deprecated as an operator name
 DEBUG:  execute_extension_script:
 '/Users/dim/pgsql/ddl/share/extension/hstore--1.0--1.1.sql'
 CREATE EXTENSION

 dim=# create extension hstore;
 DEBUG:  execute_extension_script:
 '/Users/dim/pgsql/ddl/share/extension/hstore--1.0.sql'
 WARNING:  = is deprecated as an operator name
 DETAIL:  This name may be disallowed altogether in future versions of
 PostgreSQL.
 DEBUG:  execute_extension_script:
 '/Users/dim/pgsql/ddl/share/extension/hstore--1.0--1.1.sql'
 CREATE EXTENSION

  postgres=# CREATE EXTENSION hstore version '1.3' from '1.0';
  WARNING:  /usr/local/pgsql/share/extension/hstore--1.0--1.1.sql
  WARNING:  /usr/local/pgsql/share/extension/hstore--1.1--1.2.sql
  WARNING:  /usr/local/pgsql/share/extension/hstore--1.2--1.3.sql
  CREATE EXTENSION

 I liked your idea of extending the reporting about what files are used,
 but of course we can't keep that at the WARNING level, so I made that
 logging DEBUG1 in the attached patch.

  postgres=# CREATE EXTENSION hstore version '1.3' from '1.0';

 Please try that case again, I believe it's fixed in the attached.


I am still getting the same error message.

Without default_full_version


postgres=# CREATE EXTENSION hstore version '1.3' from '1.1';
DEBUG:  execute_extension_script:
'/usr/local/pgsql/share/extension/hstore--1.1--1.2.sql'
DEBUG:  execute_extension_script:
'/usr/local/pgsql/share/extension/hstore--1.2--1.3.sql'
CREATE EXTENSION



With default_full_version = '1.1'

postgres=# CREATE EXTENSION hstore version '1.3' from '1.1';
DEBUG:  execute_extension_script:
'/usr/local/pgsql/share/extension/hstore--1.1.sql'
DEBUG:  execute_extension_script:
'/usr/local/pgsql/share/extension/hstore--1.1--1.3.sql'
*ERROR:  could not stat file
/usr/local/pgsql/share/extension/hstore--1.1--1.3.sql: No such file or
directory*


I think there is an issue in this area of the code

if (pcontrol-default_full_version)
{
execute_extension_script(extensionOid, control,
 -- 1.1.sql
 NULL, oldVersionName,
 requiredSchemas,
 schemaName, schemaOid);

ApplyExtensionUpdates(extensionOid, pcontrol,
   -- 1.1--1.3.sql (wrong)
  oldVersionName, updateVersions);


The first statement is executing 1.1.sql  because oldVersionName = 1.1.
Keep in mind that versionName = 1.2 and updateVersions list contain only
version name 1.3. So in case of  default_full_version you are ignoring *
versionName* which is *1.2* that 

Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-12-04 Thread Kevin Grittner
Jan Wieck wrote:

 Thinking about it, I'm not really happy with removing the 
 autovacuum_truncate_lock_check GUC at all.
 
 Fact is that the deadlock detection code and the configuration
 parameter for it should IMHO have nothing to do with all this in
 the first place. A properly implemented application does not
 deadlock.

I don't agree. I believe that in some cases it is possible and
practicable to set access rules which would prevent deadlocks in
application access to a database. In other cases the convolutions
required in the code, the effort in educating dozens or hundreds of
programmers maintaining the code (and keeping the training current
during staff turnover), and the staff time required for compliance
far outweigh the benefit of an occasional transaction retry.
However, it is enough for your argument that there are cases where
it can be done.

 Someone running such a properly implemented application should be
 able to safely set deadlock_timeout to minutes without the
 slightest ill side effect, but with the benefit that the deadlock
 detection code itself does not add to the lock contention. The
 only reason one cannot do so today is because autovacuum's
 truncate phase could then freeze the application with an
 exclusive lock for that long.
 
 I believe the check interval needs to be decoupled from the 
 deadlock_timeout again.

OK

 This will leave us with 2 GUCs at least.

Hmm. What problems do you see with hard-coding reasonable values?
Adding two or three GUC settings for a patch with so little
user-visible impact seems weird. And it seems to me (and also
seemed to Robert) as though the specific values of the other two
settings really aren't that critical as long as they are anywhere
within a reasonable range. Configuring PostgreSQL can be
intimidating enough without adding knobs that really don't do
anything useful. Can you show a case where special values would be
helpful?

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PageIsAllVisible()'s trustworthiness in Hot Standby

2012-12-04 Thread Pavan Deolasee
I was looking at the following code in heapam.c:

 261 /*
 262  * If the all-visible flag indicates that all tuples on the page
are
 263  * visible to everyone, we can skip the per-tuple visibility
tests. But
 264  * not in hot standby mode. A tuple that's already visible to all
 265  * transactions in the master might still be invisible to a
read-only
 266  * transaction in the standby.
 267  */
 268 all_visible = PageIsAllVisible(dp) 
!snapshot-takenDuringRecovery;

Isn't the check for !snapshot-takenDuringRecovery redundant now in master
or whenever since we added crash-safety for VM ? In fact, this comment made
me think if we are really handling index-only scans correctly or not on the
Hot Standby. But apparently we are by forcing conflicting transactions to
abort before redoing VM bit set operation on the standby. The same
mechanism should protect us against the above case. Now I concede that the
entire magic around setting and clearing the page level all-visible bit and
the VM bit and our ability to keep them in sync is something I don't fully
understand, but I see that every operation that sets the page
level PD_ALL_VISIBLE flag also sets the VM bit while holding the buffer
lock and emits a WAL record. So AFAICS  the conflict resolution logic will
take care of the above too.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] PageIsAllVisible()'s trustworthiness in Hot Standby

2012-12-04 Thread Andres Freund
On 2012-12-04 18:38:11 +0530, Pavan Deolasee wrote:
 I was looking at the following code in heapam.c:

  261 /*
  262  * If the all-visible flag indicates that all tuples on the page
 are
  263  * visible to everyone, we can skip the per-tuple visibility
 tests. But
  264  * not in hot standby mode. A tuple that's already visible to all
  265  * transactions in the master might still be invisible to a
 read-only
  266  * transaction in the standby.
  267  */
  268 all_visible = PageIsAllVisible(dp) 
 !snapshot-takenDuringRecovery;

 Isn't the check for !snapshot-takenDuringRecovery redundant now in master
 or whenever since we added crash-safety for VM ? In fact, this comment made
 me think if we are really handling index-only scans correctly or not on the
 Hot Standby. But apparently we are by forcing conflicting transactions to
 abort before redoing VM bit set operation on the standby. The same
 mechanism should protect us against the above case. Now I concede that the
 entire magic around setting and clearing the page level all-visible bit and
 the VM bit and our ability to keep them in sync is something I don't fully
 understand, but I see that every operation that sets the page
 level PD_ALL_VISIBLE flag also sets the VM bit while holding the buffer
 lock and emits a WAL record. So AFAICS  the conflict resolution logic will
 take care of the above too.

Yes, that really seems strange. As you say, were already relying on the
VM to be correct. I think it was simply missed that we can rely on this
since the conflict handling on all-visible was added in
3424bff90f40532527b9cf4f2ad9eaff750682f7.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby conflict resolution handling

2012-12-04 Thread Pavan Deolasee
On Tue, Dec 4, 2012 at 6:01 PM, Pavan Deolasee pavan.deola...@gmail.comwrote:



 Thanks Andres. I also read the original thread and I now understand why we
 are using FATAL here, at least until we have a better solution. Obviously
 the connection reset is no good either because as someone commented in the
 original discussion, I thought that I'm seeing a server crash while it was
 not.


How about attached comment to be added at the appropriate place ? I've
extracted this mostly from Tom's explanation in the original thread.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


clarify-fatal-error-in-conflict-resolve.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in buildfarm client

2012-12-04 Thread Andrew Dunstan


On 12/04/2012 02:40 AM, Christian Ullrich wrote:

Hello all,

the extension modules (TestUpgrade etc.) in the buildfarm client do 
not use the make command override defined in the config file, 
instead hardcoding command lines using make. This fails where make 
is not GNU make.


Patch attached, which fixes the problem on jaguarundi.


Thanks, I will fix it, but this is not the right mailing list for 
buildfarm client bugs. You should be on the pgbuildfarm-members list. 
See http://pgfoundry.org/mailman/listinfo/pgbuildfarm-members


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in buildfarm client

2012-12-04 Thread Christian Ullrich

* Andrew Dunstan wrote:


On 12/04/2012 02:40 AM, Christian Ullrich wrote:



the extension modules (TestUpgrade etc.) in the buildfarm client do
not use the make command override defined in the config file,



Patch attached, which fixes the problem on jaguarundi.


Thanks, I will fix it, but this is not the right mailing list for
buildfarm client bugs. You should be on the pgbuildfarm-members list.


I'm sorry. I checked the archives, saw next to nothing in the last few 
months, and thought that maybe I should use a different list instead.


--
Christian


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PageIsAllVisible()'s trustworthiness in Hot Standby

2012-12-04 Thread Robert Haas
On Tue, Dec 4, 2012 at 8:08 AM, Pavan Deolasee pavan.deola...@gmail.com wrote:

 I was looking at the following code in heapam.c:

  261 /*
  262  * If the all-visible flag indicates that all tuples on the page
 are
  263  * visible to everyone, we can skip the per-tuple visibility tests.
 But
  264  * not in hot standby mode. A tuple that's already visible to all
  265  * transactions in the master might still be invisible to a
 read-only
  266  * transaction in the standby.
  267  */
  268 all_visible = PageIsAllVisible(dp) 
 !snapshot-takenDuringRecovery;

 Isn't the check for !snapshot-takenDuringRecovery redundant now in master
 or whenever since we added crash-safety for VM ? In fact, this comment made
 me think if we are really handling index-only scans correctly or not on the
 Hot Standby. But apparently we are by forcing conflicting transactions to
 abort before redoing VM bit set operation on the standby. The same mechanism
 should protect us against the above case. Now I concede that the entire
 magic around setting and clearing the page level all-visible bit and the VM
 bit and our ability to keep them in sync is something I don't fully
 understand, but I see that every operation that sets the page level
 PD_ALL_VISIBLE flag also sets the VM bit while holding the buffer lock and
 emits a WAL record. So AFAICS  the conflict resolution logic will take care
 of the above too.

I wasn't sure whether that could be safely changed.  There's a subtle
distinction here: the PD_ALL_VISIBLE bit isn't the same as the
visibility map bit.  And, technically, the WAL record only fully
protects the setting of *the visibility map bit* not the
PD_ALL_VISIBLE page-level bit.  The purpose of that WAL logging is to
make sure that the page-level bit is never clear while the
visibility-map bit is set; it does not guarantee that the page-level
bit can never be set without issuing a WAL record.  So, for example,
it's perfectly possible for a crash on the master might leave the
page-level bit set while the VM bit is clear.  Now, if that page
somehow makes its way to the standby - via a base backup or a
full-page image - before the tuples it contains are all-visible
according to the standby's xmin horizon, we've got a problem.  Can
that happen?  It seems unlikely, but can we prove it's not possible?
Perhaps, but I wasn't sure.

Index-only scans are safe, because they're looking at the visibility
map itself, not the page-level bit, but the analysis is a little
murkier for sequential scans.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in buildfarm client

2012-12-04 Thread Andrew Dunstan


On 12/04/2012 08:35 AM, Christian Ullrich wrote:

* Andrew Dunstan wrote:


On 12/04/2012 02:40 AM, Christian Ullrich wrote:



the extension modules (TestUpgrade etc.) in the buildfarm client do
not use the make command override defined in the config file,



Patch attached, which fixes the problem on jaguarundi.


Thanks, I will fix it, but this is not the right mailing list for
buildfarm client bugs. You should be on the pgbuildfarm-members list.


I'm sorry. I checked the archives, saw next to nothing in the last few 
months, and thought that maybe I should use a different list instead.


It's a very low-traffic list :-)

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PageIsAllVisible()'s trustworthiness in Hot Standby

2012-12-04 Thread Andres Freund
On 2012-12-04 08:38:48 -0500, Robert Haas wrote:
 On Tue, Dec 4, 2012 at 8:08 AM, Pavan Deolasee pavan.deola...@gmail.com 
 wrote:
 
  I was looking at the following code in heapam.c:
 
   261 /*
   262  * If the all-visible flag indicates that all tuples on the page
  are
   263  * visible to everyone, we can skip the per-tuple visibility tests.
  But
   264  * not in hot standby mode. A tuple that's already visible to all
   265  * transactions in the master might still be invisible to a
  read-only
   266  * transaction in the standby.
   267  */
   268 all_visible = PageIsAllVisible(dp) 
  !snapshot-takenDuringRecovery;
 
  Isn't the check for !snapshot-takenDuringRecovery redundant now in master
  or whenever since we added crash-safety for VM ? In fact, this comment made
  me think if we are really handling index-only scans correctly or not on the
  Hot Standby. But apparently we are by forcing conflicting transactions to
  abort before redoing VM bit set operation on the standby. The same mechanism
  should protect us against the above case. Now I concede that the entire
  magic around setting and clearing the page level all-visible bit and the VM
  bit and our ability to keep them in sync is something I don't fully
  understand, but I see that every operation that sets the page level
  PD_ALL_VISIBLE flag also sets the VM bit while holding the buffer lock and
  emits a WAL record. So AFAICS  the conflict resolution logic will take care
  of the above too.

 I wasn't sure whether that could be safely changed.  There's a subtle
 distinction here: the PD_ALL_VISIBLE bit isn't the same as the
 visibility map bit.  And, technically, the WAL record only fully
 protects the setting of *the visibility map bit* not the
 PD_ALL_VISIBLE page-level bit.  The purpose of that WAL logging is to
 make sure that the page-level bit is never clear while the
 visibility-map bit is set; it does not guarantee that the page-level
 bit can never be set without issuing a WAL record.  So, for example,
 it's perfectly possible for a crash on the master might leave the
 page-level bit set while the VM bit is clear.  Now, if that page
 somehow makes its way to the standby - via a base backup or a
 full-page image - before the tuples it contains are all-visible
 according to the standby's xmin horizon, we've got a problem.  Can
 that happen?  It seems unlikely, but can we prove it's not possible?
 Perhaps, but I wasn't sure.

Youre right, it currently seems to be possible, there's no LSN interlock
prohibiting this as far as I can see.

in lazy_scan_heap we do:

if (!PageIsAllVisible(page))
{
PageSetAllVisible(page);
MarkBufferDirty(buf);
visibilitymap_set(onerel, blkno, 
InvalidXLogRecPtr, vmbuffer,
  
visibility_cutoff_xid);
}

So buf can be written out independently from the xlog record that
visibilitymap_set emits. ISTM we should set the LSN of the heap page as
well as the one of the visibilitymap page. Not sure about the
performance impact of that.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PageIsAllVisible()'s trustworthiness in Hot Standby

2012-12-04 Thread Robert Haas
On Tue, Dec 4, 2012 at 9:00 AM, Andres Freund and...@2ndquadrant.com wrote:
 Youre right, it currently seems to be possible, there's no LSN interlock
 prohibiting this as far as I can see.

Yeah, there certainly isn't that.  Now you could perhaps make an
argument that no operation that can propagate a set bit from master to
standby can arrive until after the standby's xmin horizon is
sufficiently current, but that does feel a little fragile to me...
even if it's true now, new WAL record types might break it, for
example.

 in lazy_scan_heap we do:

 if (!PageIsAllVisible(page))
 {
 PageSetAllVisible(page);
 MarkBufferDirty(buf);
 visibilitymap_set(onerel, blkno, 
 InvalidXLogRecPtr, vmbuffer,
   
 visibility_cutoff_xid);
 }

 So buf can be written out independently from the xlog record that
 visibilitymap_set emits. ISTM we should set the LSN of the heap page as
 well as the one of the visibilitymap page. Not sure about the
 performance impact of that.

It would result in a massive increase in WAL traffic when vacuuming an
insert-only table; that's why we didn't do it originally.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tablespaces in the data directory

2012-12-04 Thread Robert Haas
On Mon, Dec 3, 2012 at 10:06 PM, Bruce Momjian br...@momjian.us wrote:
 FYI, someone put their new cluster inside an existing old tablespace,
 and when they ran the script to delete their old install, their new
 install was deleted too.  My answer was, Don't do that.

Uh, wow.  I feel bad for that person, but it does seem like a bit of a
self-inflicted injury.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tablespaces in the data directory

2012-12-04 Thread Bruce Momjian
On Tue, Dec  4, 2012 at 09:37:46AM -0500, Robert Haas wrote:
 On Mon, Dec 3, 2012 at 10:06 PM, Bruce Momjian br...@momjian.us wrote:
  FYI, someone put their new cluster inside an existing old tablespace,
  and when they ran the script to delete their old install, their new
  install was deleted too.  My answer was, Don't do that.
 
 Uh, wow.  I feel bad for that person, but it does seem like a bit of a
 self-inflicted injury.

They wanted pg_upgrade to guard against it, and I said that was
possible, but it would require pg_upgrade to know which files to remove,
and that would make pg_upgrade more fragile.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: create extension default_full_version

2012-12-04 Thread Dimitri Fontaine
Ibrar Ahmed ibrar.ah...@gmail.com writes:
 I am still getting the same error message.

With the attached patch (v2), it works well:

create extension hstore version '1.2' from 'unpackaged';
DEBUG:  execute_extension_script: 
'/Users/dim/pgsql/ddl/share/extension/hstore--unpackaged--1.0.sql'
DEBUG:  execute_extension_script: 
'/Users/dim/pgsql/ddl/share/extension/hstore--1.0--1.1.sql'
DEBUG:  execute_extension_script: 
'/Users/dim/pgsql/ddl/share/extension/hstore--1.1--1.2.sql'
CREATE EXTENSION

You have to remember that the spelling FROM 'unpackaged version' really
means that we have previously installed a loose version of the
extension (just \i hstore.sql) and want to apply the upgrade path from
there.

We can't have FROM meaning the same thing as default_full_version.

 With default_full_version = '1.1'
 
 postgres=# CREATE EXTENSION hstore version '1.3' from '1.1';
 DEBUG:  execute_extension_script:
 '/usr/local/pgsql/share/extension/hstore--1.1.sql'
 DEBUG:  execute_extension_script:
 '/usr/local/pgsql/share/extension/hstore--1.1--1.3.sql'
 *ERROR:  could not stat file
 /usr/local/pgsql/share/extension/hstore--1.1--1.3.sql: No such file or
 directory*

That's nonetheless a bug and is fixed now:

-   if (pcontrol-default_full_version)
+   if (pcontrol-default_full_version  !unpackaged)


See attached.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

*** a/contrib/hstore/Makefile
--- b/contrib/hstore/Makefile
***
*** 5,11  OBJS = hstore_io.o hstore_op.o hstore_gist.o hstore_gin.o hstore_compat.o \
  	crc32.o
  
  EXTENSION = hstore
! DATA = hstore--1.1.sql hstore--1.0--1.1.sql hstore--unpackaged--1.0.sql
  
  REGRESS = hstore
  
--- 5,11 
  	crc32.o
  
  EXTENSION = hstore
! DATA = hstore--1.0.sql hstore--1.0--1.1.sql hstore--unpackaged--1.0.sql
  
  REGRESS = hstore
  
*** /dev/null
--- b/contrib/hstore/hstore--1.0.sql
***
*** 0 
--- 1,530 
+ /* contrib/hstore/hstore--1.0.sql */
+ 
+ -- complain if script is sourced in psql, rather than via CREATE EXTENSION
+ \echo Use CREATE EXTENSION hstore to load this file. \quit
+ 
+ CREATE TYPE hstore;
+ 
+ CREATE FUNCTION hstore_in(cstring)
+ RETURNS hstore
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C STRICT IMMUTABLE;
+ 
+ CREATE FUNCTION hstore_out(hstore)
+ RETURNS cstring
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C STRICT IMMUTABLE;
+ 
+ CREATE FUNCTION hstore_recv(internal)
+ RETURNS hstore
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C STRICT IMMUTABLE;
+ 
+ CREATE FUNCTION hstore_send(hstore)
+ RETURNS bytea
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C STRICT IMMUTABLE;
+ 
+ CREATE TYPE hstore (
+ INTERNALLENGTH = -1,
+ INPUT = hstore_in,
+ OUTPUT = hstore_out,
+ RECEIVE = hstore_recv,
+ SEND = hstore_send,
+ STORAGE = extended
+ );
+ 
+ CREATE FUNCTION hstore_version_diag(hstore)
+ RETURNS integer
+ AS 'MODULE_PATHNAME','hstore_version_diag'
+ LANGUAGE C STRICT IMMUTABLE;
+ 
+ CREATE FUNCTION fetchval(hstore,text)
+ RETURNS text
+ AS 'MODULE_PATHNAME','hstore_fetchval'
+ LANGUAGE C STRICT IMMUTABLE;
+ 
+ CREATE OPERATOR - (
+ 	LEFTARG = hstore,
+ 	RIGHTARG = text,
+ 	PROCEDURE = fetchval
+ );
+ 
+ CREATE FUNCTION slice_array(hstore,text[])
+ RETURNS text[]
+ AS 'MODULE_PATHNAME','hstore_slice_to_array'
+ LANGUAGE C STRICT IMMUTABLE;
+ 
+ CREATE OPERATOR - (
+ 	LEFTARG = hstore,
+ 	RIGHTARG = text[],
+ 	PROCEDURE = slice_array
+ );
+ 
+ CREATE FUNCTION slice(hstore,text[])
+ RETURNS hstore
+ AS 'MODULE_PATHNAME','hstore_slice_to_hstore'
+ LANGUAGE C STRICT IMMUTABLE;
+ 
+ CREATE FUNCTION isexists(hstore,text)
+ RETURNS bool
+ AS 'MODULE_PATHNAME','hstore_exists'
+ LANGUAGE C STRICT IMMUTABLE;
+ 
+ CREATE FUNCTION exist(hstore,text)
+ RETURNS bool
+ AS 'MODULE_PATHNAME','hstore_exists'
+ LANGUAGE C STRICT IMMUTABLE;
+ 
+ CREATE OPERATOR ? (
+ 	LEFTARG = hstore,
+ 	RIGHTARG = text,
+ 	PROCEDURE = exist,
+ 	RESTRICT = contsel,
+ 	JOIN = contjoinsel
+ );
+ 
+ CREATE FUNCTION exists_any(hstore,text[])
+ RETURNS bool
+ AS 'MODULE_PATHNAME','hstore_exists_any'
+ LANGUAGE C STRICT IMMUTABLE;
+ 
+ CREATE OPERATOR ?| (
+ 	LEFTARG = hstore,
+ 	RIGHTARG = text[],
+ 	PROCEDURE = exists_any,
+ 	RESTRICT = contsel,
+ 	JOIN = contjoinsel
+ );
+ 
+ CREATE FUNCTION exists_all(hstore,text[])
+ RETURNS bool
+ AS 'MODULE_PATHNAME','hstore_exists_all'
+ LANGUAGE C STRICT IMMUTABLE;
+ 
+ CREATE OPERATOR ? (
+ 	LEFTARG = hstore,
+ 	RIGHTARG = text[],
+ 	PROCEDURE = exists_all,
+ 	RESTRICT = contsel,
+ 	JOIN = contjoinsel
+ );
+ 
+ CREATE FUNCTION isdefined(hstore,text)
+ RETURNS bool
+ AS 'MODULE_PATHNAME','hstore_defined'
+ LANGUAGE C STRICT IMMUTABLE;
+ 
+ CREATE FUNCTION defined(hstore,text)
+ RETURNS bool
+ AS 'MODULE_PATHNAME','hstore_defined'
+ LANGUAGE C STRICT IMMUTABLE;
+ 
+ CREATE FUNCTION delete(hstore,text)
+ RETURNS hstore
+ AS 'MODULE_PATHNAME','hstore_delete'
+ LANGUAGE C STRICT IMMUTABLE;
+ 
+ CREATE FUNCTION 

Re: [HACKERS] Review: Extra Daemons / bgworker

2012-12-04 Thread Alvaro Herrera
Alvaro Herrera wrote:

 One notable thing is that I had to introduce this in the postmaster
 startup sequence:
 
 /*
  * process any libraries that should be preloaded at postmaster start
  */
 process_shared_preload_libraries();
 
 /*
  * If loadable modules have added background workers, MaxBackends needs to
  * be updated.  Do so now.
  */
 // RerunAssignHook(max_connections);
 if (GetNumShmemAttachedBgworkers()  0)
 SetConfigOption(max_connections,
 GetConfigOption(max_connections, false, false),
 PGC_POSTMASTER, PGC_S_OVERRIDE);
 
 Note the intention here is to re-run the GUC assign hook for
 max_connections (hence the commented out hypothetical call to do so).
 Obviously, having to go through GetConfigOption and SetConfigOption is
 not a nice thing to do; we'll have to add some new entry point to guc.c
 for this to have a nicer interface.

After fooling with guc.c to create such a new entry point, I decided
that it looks too ugly.  guc.c is already complex enough with the API we
have today that I don't want to be seen creating a partially-duplicate
interface, even if it is going to result in simplified processing of
this one place.  If we ever get around to needing another place to
require rerunning a variable's assign_hook we can discuss it; for now it
doesn't seem worth it.

This is only called at postmaster start time, so it's not too
performance-sensitive, hence SetConfigOption( .., GetConfigOption(), ..)
seems acceptable from that POV.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PageIsAllVisible()'s trustworthiness in Hot Standby

2012-12-04 Thread Andres Freund
On 2012-12-04 09:33:28 -0500, Robert Haas wrote:
 On Tue, Dec 4, 2012 at 9:00 AM, Andres Freund and...@2ndquadrant.com wrote:
  Youre right, it currently seems to be possible, there's no LSN interlock
  prohibiting this as far as I can see.

 Yeah, there certainly isn't that.  Now you could perhaps make an
 argument that no operation that can propagate a set bit from master to
 standby can arrive until after the standby's xmin horizon is
 sufficiently current, but that does feel a little fragile to me...
 even if it's true now, new WAL record types might break it, for
 example.

I wouldn't want to rely on it...

  in lazy_scan_heap we do:
 
  if (!PageIsAllVisible(page))
  {
  PageSetAllVisible(page);
  MarkBufferDirty(buf);
  visibilitymap_set(onerel, blkno, 
  InvalidXLogRecPtr, vmbuffer,

  visibility_cutoff_xid);
  }
 
  So buf can be written out independently from the xlog record that
  visibilitymap_set emits. ISTM we should set the LSN of the heap page as
  well as the one of the visibilitymap page. Not sure about the
  performance impact of that.

 It would result in a massive increase in WAL traffic when vacuuming an
 insert-only table; that's why we didn't do it originally.

I wonder if we could solve that by having an in-memory-only LSN that
only interlocks the hint bit writes, but doesn't cause full page
writes...

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Patch to fix libecpg.so for isinf missing

2012-12-04 Thread Michael Meskes
On Mon, Dec 03, 2012 at 01:12:48PM +0800, Jiang Guiqing wrote:
 isinf() is not build to libecpg.so if build and install postgresql
 by source on solaris9.
 (isinf() is not contained within solaris9 system.)
 ...
 I modify src/interfaces/ecpg/ecpglib/Makefile to resolve this problem.
 The diff file refer to the attachment ecpglib-Makefile.patch.

Thanks for finding and fixing the problem, patch committed to HEAD and all 9.* 
branches.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at gmail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PageIsAllVisible()'s trustworthiness in Hot Standby

2012-12-04 Thread Robert Haas
On Tue, Dec 4, 2012 at 10:38 AM, Andres Freund and...@2ndquadrant.com wrote:
 I wonder if we could solve that by having an in-memory-only LSN that
 only interlocks the hint bit writes, but doesn't cause full page
 writes...

It's not really a hint bit, because if it fails to get set when the
visibility map bit gets set, you've got queries returning wrong
answers, because the next insert/update/delete on the heap page will
fail to clear the visibility-map bit.

But leaving that aside, I think that might work.  You'd essentially be
preventing the page from being written out of shared_buffers until the
WAL record has hit the disk, and it seems like that should be
sufficient.  Whether it's worth adding that much mechanism for this
problem, I'm less sure about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Switching timeline over streaming replication

2012-12-04 Thread Heikki Linnakangas
After some diversions to fix bugs and refactor existing code, I've 
committed a couple of small parts of this patch, which just add some 
sanity checks to notice incorrect PITR scenarios. Here's a new version 
of the main patch based on current HEAD.


- Heikki


streaming-tli-switch-8.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: create extension default_full_version

2012-12-04 Thread Ibrar Ahmed
On Tue, Dec 4, 2012 at 7:54 PM, Dimitri Fontaine dimi...@2ndquadrant.frwrote:

 Ibrar Ahmed ibrar.ah...@gmail.com writes:
  I am still getting the same error message.

 With the attached patch (v2), it works well:

 create extension hstore version '1.2' from 'unpackaged';
 DEBUG:  execute_extension_script:
 '/Users/dim/pgsql/ddl/share/extension/hstore--unpackaged--1.0.sql'
 DEBUG:  execute_extension_script:
 '/Users/dim/pgsql/ddl/share/extension/hstore--1.0--1.1.sql'
 DEBUG:  execute_extension_script:
 '/Users/dim/pgsql/ddl/share/extension/hstore--1.1--1.2.sql'
 CREATE EXTENSION

 You have to remember that the spelling FROM 'unpackaged version' really
 means that we have previously installed a loose version of the
 extension (just \i hstore.sql) and want to apply the upgrade path from
 there.

 We can't have FROM meaning the same thing as default_full_version.


I know.



  With default_full_version = '1.1'
  
  postgres=# CREATE EXTENSION hstore version '1.3' from '1.1';
  DEBUG:  execute_extension_script:
  '/usr/local/pgsql/share/extension/hstore--1.1.sql'
  DEBUG:  execute_extension_script:
  '/usr/local/pgsql/share/extension/hstore--1.1--1.3.sql'
  *ERROR:  could not stat file
  /usr/local/pgsql/share/extension/hstore--1.1--1.3.sql: No such file or
  directory*

 That's nonetheless a bug and is fixed now:

 -   if (pcontrol-default_full_version)
 +   if (pcontrol-default_full_version  !unpackaged)


 Thanks, I will look at this again in detail.



 See attached.

 Regards,
 --
 Dimitri Fontaine
 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support




-- 
Ibrar Ahmed
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-12-04 Thread Jan Wieck

On 12/4/2012 8:06 AM, Kevin Grittner wrote:

Jan Wieck wrote:

I believe the check interval needs to be decoupled from the
deadlock_timeout again.


OK


This will leave us with 2 GUCs at least.


Hmm. What problems do you see with hard-coding reasonable values?


The question is what is reasonable?

Lets talk about the time to (re)acquire the lock first. In the cases 
where truncating a table can hurt we are dealing with many gigabytes. 
The original vacuumlazy scan of them can take hours if not days. During 
that scan the vacuum worker has probably spent many hours napping in the 
vacuum delay points. For me 50ms interval for 5 seconds would be 
reasonable for (re)acquiring that lock.


The reasoning behind it being that we need some sort of retry mechanism 
because if autovacuum just gave up the exclusive lock because someone 
needed access, it is more or less guaranteed that the immediate attempt 
to reacquire it will fail until that waiter has committed. But if it 
can't get a lock after 5 seconds, the system seems busy enough so that 
autovacuum should come back much later, when the launcher kicks it off 
again.


I don't care much about occupying that autovacuum worker for a few 
seconds. It just spent hours vacuuming that very table. How much harm 
will a couple more seconds do?


The check interval for the LockHasWaiters() call however depends very 
much on the response time constraints of the application. A 200ms 
interval for example would cause the truncate phase to hold onto the 
exclusive lock for 200ms at least. That means that a steady stream of 
short running transactions would see a 100ms blocking on average, 
200ms max. For many applications that is probably OK. If your response 
time constraint is =50ms on 98% of transactions, you might want to have 
that knob though.


I admit I really have no idea what the most reasonable default for that 
value would be. Something between 50ms and deadlock_timeout/2 I guess.



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PageIsAllVisible()'s trustworthiness in Hot Standby

2012-12-04 Thread Pavan Deolasee
On Tue, Dec 4, 2012 at 8:03 PM, Robert Haas robertmh...@gmail.com wrote:

 On Tue, Dec 4, 2012 at 9:00 AM, Andres Freund and...@2ndquadrant.com
 wrote:
  Youre right, it currently seems to be possible, there's no LSN interlock
  prohibiting this as far as I can see.

 Yeah, there certainly isn't that.  Now you could perhaps make an
 argument that no operation that can propagate a set bit from master to
 standby can arrive until after the standby's xmin horizon is
 sufficiently current, but that does feel a little fragile to me...
 even if it's true now, new WAL record types might break it, for
 example.


Hmm. Yeah, I do not have guts to prove that either. I'll probably write up
a comment for your consideration to explain why we don't trust
PD_ALL_VISIBLE in Hot standby for seq scans, but still trust VM for
index-only scans.

Another piece of code that caught my attention is this (sorry for digging
up topics that probably have been discussed to death before and I might not
have paid attention to then):

 Whenever we clear PD_ALL_VISIBLE flag, we do that while holding the heap
page lock and also WAL log that action (in fact, piggy-back with
insert/update/delete). The only exception that I saw is in lazy_scan_heap()

 916 else if (PageIsAllVisible(page)  has_dead_tuples)
 917 {
 918 elog(WARNING, page containing dead tuples is marked as
all-visible in relation \%s\ page %u,
 919  relname, blkno);
 920 PageClearAllVisible(page);
 921 MarkBufferDirty(buf);
 922 visibilitymap_clear(onerel, blkno, vmbuffer);
 923 }

Obviously, this is not a case that we expect to occur. But if it does occur
for whatever reason, then even though we will correct the page flag on
master, this would never be populated to the standby. So the standby may
continue to have that tiny bit set on the page, at least until another
insert/update/delete happens on the page. Not sure if there is anything to
worry about, but looks a bit strange. I looked at the archive but can't see
any report of the warning, so may be we are safe after all.

Another side issue: The way we set visibility map bits in the VACUUM, we do
that in the first phase of the vacuum. Now the very fact that we have
selected the page for vacuuming, it will have one or more dead tuples. As
soon as we find a dead tuple in the page, we decide not to mark the page
all-visible and rightly so. But this page would not get marked all-visible
even if we remove all dead tuples in the second phase. Why don't we push
that work to the second phase or at least retry that again ? Of course, we
can't just do it that way without scanning the page again because new
tuples may get added or updated/deleted in the meanwhile. But that can be
addressed with some tricks - like tracking LSN of every such (which has
only dead and live tuples, but not recently dead or insert-in-progress or
delete-in-progress) and then comparing that LSN with the page LSN during
the second phase. If an update had happened in between, we will know that
and we won't deal with the visibility bits.

Another idea would be to track this intermediate state in the page header
itself, something like PD_ALL_VISIBLE_OR_DEAD. If an intermediate
update/delete/insert comes in, it will clear the bit. If we see the bit set
during the second phase of the vacuum, we will know that it contains only
dead and live tuples and the dead ones are being removed now. So we can
mark the page PD_ALL_VISIBLE.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] [PATCH] Patch to fix libecpg.so for isinf missing

2012-12-04 Thread Bruce Momjian
On Tue, Dec  4, 2012 at 04:45:16PM +0100, Michael Meskes wrote:
 On Mon, Dec 03, 2012 at 01:12:48PM +0800, Jiang Guiqing wrote:
  isinf() is not build to libecpg.so if build and install postgresql
  by source on solaris9.
  (isinf() is not contained within solaris9 system.)
  ...
  I modify src/interfaces/ecpg/ecpglib/Makefile to resolve this problem.
  The diff file refer to the attachment ecpglib-Makefile.patch.
 
 Thanks for finding and fixing the problem, patch committed to HEAD and all 
 9.* branches.

I have applied the attached patch to 9.0 to fix a merge conflict.  I
patterned it after the commits you did to the other branches.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/src/interfaces/ecpg/ecpglib/Makefile b/src/interfaces/ecpg/ecpglib/Makefile
new file mode 100644
index ced2fc4..3adf36d
*** a/src/interfaces/ecpg/ecpglib/Makefile
--- b/src/interfaces/ecpg/ecpglib/Makefile
*** include $(top_srcdir)/src/Makefile.shlib
*** 58,68 
  # necessarily use the same object files as the backend uses. Instead,
  # symlink the source files in here and build our own object file.
  
!  HEAD
! path.c pgstrcasecmp.c snprintf.c strlcpy.c thread.c: % : $(top_srcdir)/src/port/%
! ===
! path.c pgstrcasecmp.c snprintf.c strlcpy.c thread.c win32setlocale.c isinf.c: % : $(top_srcdir)/src/port/%
!  6fc6ebf... Include isinf.o in libecpg if isinf() is not available on the system.
  	rm -f $@  $(LN_S) $ .
  
  misc.o: misc.c $(top_builddir)/src/port/pg_config_paths.h
--- 58,64 
  # necessarily use the same object files as the backend uses. Instead,
  # symlink the source files in here and build our own object file.
  
! path.c pgstrcasecmp.c snprintf.c strlcpy.c thread.c isinf.c: % : $(top_srcdir)/src/port/%
  	rm -f $@  $(LN_S) $ .
  
  misc.o: misc.c $(top_builddir)/src/port/pg_config_paths.h

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor optimizations in lazy_scan_heap

2012-12-04 Thread Robert Haas
On Mon, Dec 3, 2012 at 1:23 AM, Pavan Deolasee pavan.deola...@gmail.com wrote:
 I was looking at the code in lazy_scan_heap() and I realized there are
 couple of low-hanging optimizations that we can do there.

 1. The for-loop walks through each block of the relation. But if scan_all is
 set to false, which would be the case most often, we can jump over to the
 next not-all-visible block directly (after considering the
 SKIP_PAGES_THRESHOLD etc). I understand that the cost of looping with no-op
 may not be considerable, but it looks unnecessary. And it can matter when
 there are thousands and millions of consecutive all-visible blocks in a
 large table.

 2. We also do a visibilitymap_test() for each block. I think it will be more
 prudent to have a visibilitymap API, say visibilitymap_test_range(), which
 can take a range of blocks and return the first not-all-visible block from
 the range. Internally, the function can then test several blocks at a time.
 We can still do this without holding a lock on the VM buffer because when
 scan_all is false, we don't care much about the correctness of the
 visibility check anyway. Also, this function can later be optimized if we
 start saving some summary information about visibility maps, in which case
 we can more efficiently find first not-all-visible block.

 3. I also thought that the call to vacuum_delay_point() for every visibility
 check is not required and a simple CHECK_FOR_INTERRUPTS would be good
 enough. Later I realized that may be we need that because visibility map
 check can do an IO for the VM page. But if we do 2, then we can at least
 limit calling vacuum_delay_point() once for every VM page, instead of one
 per bit. I concede that the cost of calling vacuum_delay_point() may not be
 too high, but it again looks unnecessary and can be taken care by a slight
 re-factoring of the code.

 Comments ? Anyone thinks any/all of above is useful ?

I doubt that any of these things make enough difference to be worth
bothering with, but if you have benchmark results suggesting otherwise
I'm all ears.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: store additional info in GIN index

2012-12-04 Thread Robert Haas
On Sun, Nov 18, 2012 at 4:54 PM, Alexander Korotkov
aekorot...@gmail.com wrote:
 Patch completely changes storage in posting lists and leaf pages of posting
 trees. It uses varbyte encoding for BlockNumber and OffsetNumber.
 BlockNumber are stored incremental in page. Additionally one bit of
 OffsetNumber is reserved for additional information NULL flag. To be able to
 find position in leaf data page quickly patch introduces small index in the
 end of page.

This sounds like it means that this would break pg_upgrade, about
which I'm not too keen.  Ideally, we'd like to have a situation where
new indexes have additional capabilities, but old indexes are still
usable for things that they could do before.  I am not sure whether
that's a realistic goal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for checking file parameters to psql before password prompt

2012-12-04 Thread Robert Haas
On Sun, Dec 2, 2012 at 6:37 AM, Alastair Turner b...@ctrlf5.co.za wrote:
 Patch for the changes discussed in
 http://archives.postgresql.org/pgsql-hackers/2010-10/msg00919.php
 attached (eventually ...)

 In summary: If the input file (-f) doesn't exist or the ouput or log
 files (-o and -l) can't be created psql exits before prompting for a
 password.

s/ouput/output/

Please add this patch here so we don't lose track of it:

https://commitfest.postgresql.org/action/commitfest_view/open

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] xlogreader v3/xlogdump v2

2012-12-04 Thread Andres Freund
Hi everyone,

At
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/xlogreader_v3
git://git.postgresql.org/git/users/andresfreund/postgres.git
you can find my attempt trying to bring the xlogreader from Heikki, as
modified by Alvaro, into a state where it has the capabilities to be
usable for BDR.

This is *preliminary* work, to see whether people roughly agree with the
API, there is some smoothing of edges left.

Changes I made:
* Add XLogFindNextRecord, to find the next valid xlog record = an recptr
* Move the page validation handling into xlogreader
* Add support for reading pages which are only partially valid
* Add callback as a replacement for emode_for_corrupt_record

I don't like the last part, it seems ugly to me, but moving the full
error processing/formatting to a callback seems to involve more work. I
am willing to do that work, but would like some input first.

The xlogdump utility itself is in a mostly good state, some parts of
support infrastructure (ereport wrapper, realpathbackend,
timestamptz_to_str, pfree) need some work.

Any opinions?

I chose not to send the patch but rely on the git repository above, its
already somewhat large. If people prefer to see it fully on the list I
am happy to oblige.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: store additional info in GIN index

2012-12-04 Thread Josh Berkus
On 12/4/12 9:34 AM, Robert Haas wrote:
 On Sun, Nov 18, 2012 at 4:54 PM, Alexander Korotkov
 aekorot...@gmail.com wrote:
 Patch completely changes storage in posting lists and leaf pages of posting
 trees. It uses varbyte encoding for BlockNumber and OffsetNumber.
 BlockNumber are stored incremental in page. Additionally one bit of
 OffsetNumber is reserved for additional information NULL flag. To be able to
 find position in leaf data page quickly patch introduces small index in the
 end of page.
 
 This sounds like it means that this would break pg_upgrade, about
 which I'm not too keen.  Ideally, we'd like to have a situation where
 new indexes have additional capabilities, but old indexes are still
 usable for things that they could do before.  I am not sure whether
 that's a realistic goal.

Is there a reason not to create this as a new type of index?  GIN2 or
whatever?


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: store additional info in GIN index

2012-12-04 Thread Andres Freund
On 2012-12-04 10:04:03 -0800, Josh Berkus wrote:
 On 12/4/12 9:34 AM, Robert Haas wrote:
  On Sun, Nov 18, 2012 at 4:54 PM, Alexander Korotkov
  aekorot...@gmail.com wrote:
  Patch completely changes storage in posting lists and leaf pages of posting
  trees. It uses varbyte encoding for BlockNumber and OffsetNumber.
  BlockNumber are stored incremental in page. Additionally one bit of
  OffsetNumber is reserved for additional information NULL flag. To be able 
  to
  find position in leaf data page quickly patch introduces small index in the
  end of page.
  
  This sounds like it means that this would break pg_upgrade, about
  which I'm not too keen.  Ideally, we'd like to have a situation where
  new indexes have additional capabilities, but old indexes are still
  usable for things that they could do before.  I am not sure whether
  that's a realistic goal.
 
 Is there a reason not to create this as a new type of index?  GIN2 or
 whatever?

Aren't the obvious maintenance problems enough?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] json accessors

2012-12-04 Thread Josh Berkus

 Yes, you are, rather. It might be possible to do something like:
 
 json_get(json, variadic text) = json

Given that I already do the equivalent in Python, this would suit me
well.  Not sure about other users ...

 as long as it doesn't involve any testing beyond field name  / array
 index equivalence.

I'm sure people will *ask* for more in the future, but you could do a
LOT with just an equivalence version.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Patch to fix libecpg.so for isinf missing

2012-12-04 Thread Michael Meskes
On Tue, Dec 04, 2012 at 12:31:28PM -0500, Bruce Momjian wrote:
 I have applied the attached patch to 9.0 to fix a merge conflict.  I
 patterned it after the commits you did to the other branches.

That is interesting. I fixed the same conflict and my git showed the commit,
but apparently that upush didn't work correctly. Sorry.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at gmail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... NOREWRITE option

2012-12-04 Thread Josh Berkus

 However, for a very large user group -- users with ORMs which do their
 own DDL migrations -- we could also use a way to log or WARN for table
 rewrites.  Since the ORMs are not gonna do an EXPLAIN.
 
 An ORM is also quite unlikely to pay attention to a warning, much less a
 postmaster log entry ... so your argument seems unfounded from here.

But the DevOps staff *is*.

There's the workflow:

1. developer writes migrations in Rails/Django/Hibernate
2. DevOps staff tests migrations on test machine.
3. DevOps staff looks at logs for rewrites.

The problem with an EXPLAIN path is that you're requiring the developers
to extract the DDL generated by Rails or whatever from the migration,
something to which the framework is not very friendly.  So we need a
path for identifying rewrites which doesn't require extracting DDL in
SQL form.

EXCEPT: I just realized, if we have explain ALTER then we can just log
explains, no?  In which case, problem solved.

Come to think of it, it would be good to warn about ACCESS EXCLUSIVE
locks as well.  That's another big booby trap for developers.

Note that writing stuff to the log is far from an ideal solution for
this user base.  I think they'd rather have ERROR on REWRITE.  It
would be good to have one of the Heroku folks speak up here ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-12-04 Thread Kevin Grittner
Jan Wieck wrote:

 [arguments for GUCs]

This is getting confusing. I thought I had already conceded the
case for autovacuum_truncate_lock_try, and you appeared to spend
most of your post arguing for it anyway. I think. It's a little
hard to tell. Perhaps the best thing is to present the issue to the
list and solicit more opinions on what to do. Please correct me if
I mis-state any of this.

The primary problem this patch is solving is that in some
workloads, autovacuum will repeatedly try to truncate the unused
pages at the end of a table, but will continually get canceled
after burning resources because another process wants to acquire a
lock on the table which conflicts with the one held by autovacuum.
This is handled by the deadlock checker, so another process must
block for the deadlock_timeout interval each time. All work done by
the truncate phase of autovacuum is lost on each interrupted
attempt. Statistical information is not updated, so another attempt
will trigger the next time autovacuum looks at whether to vacuum
the table.

It's obvious that this pattern not only fails to release
potentially large amounts of unused space back to the OS, but the
headbanging can continue to consume significant resources and for
an extended period, and the repeated blocking for deadlock_timeout
could cause latency problems.

The patch has the truncate work, which requires
AccessExclusiveLock, check at intervals for whether another process
is waiting on its lock. That interval is one of the timings we need
to determine, and for which a GUC was initially proposed. I think
that the check should be fast enough that doing it once every 20ms
as a hard-coded interval would be good enough. When it sees this
situation, it truncates the file for as far as it has managed to
get, releases its lock on the table, sleeps for an interval, and
then checks to see if the lock has become available again.

How long it should sleep between tries to reacquire the lock is
another possible GUC. Again, I'm inclined to think that this could
be hard-coded. Since autovacuum was knocked off-task after doing
some significant work, I'm inclined to make this interval a little
bigger, but I don't think it matters a whole lot. Anything between
20ms and 100ms seens sane. Maybe 50ms?

At any point that it is unable to acquire the lock, there is a
check for how long this autovacuum task has been starved for the
lock. Initially I was arguing for twice the deadlock_timeout on the
basis that this would probably be short enough not to leave the
autovacuum worker sidelined for too long, but long enough for the
attempt to get past a single deadlock between two other processes.
This is the setting Jan is least willing to concede.

If the autovacuum worker does abandon the attempt, it will keep
retrying, since we go out of our way to prevent the autovacuum
process from updating the statistics based on the incomplete
processing. This last interval is not how long it will attempt to
truncate, but how long it will keep one autovacuum worker making
unsuccessful attempts to acquire the lock before it is put to other
uses. Workers will keep coming back to this table until the
truncate phase is completed, just as it does without the patch; the
difference being that anytime it gets the lock, even briefly, it is
able to persist some progress.

So the question on the table is which of these three intervals
should be GUCs, and what values to use if they aren't.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum truncate exclusive lock round two

2012-12-04 Thread Jan Wieck

On 12/4/2012 1:51 PM, Kevin Grittner wrote:

Jan Wieck wrote:


[arguments for GUCs]


This is getting confusing. I thought I had already conceded the
case for autovacuum_truncate_lock_try, and you appeared to spend
most of your post arguing for it anyway. I think. It's a little
hard to tell. Perhaps the best thing is to present the issue to the
list and solicit more opinions on what to do. Please correct me if
I mis-state any of this.

The primary problem this patch is solving is that in some
workloads, autovacuum will repeatedly try to truncate the unused
pages at the end of a table, but will continually get canceled
after burning resources because another process wants to acquire a
lock on the table which conflicts with the one held by autovacuum.
This is handled by the deadlock checker, so another process must
block for the deadlock_timeout interval each time. All work done by
the truncate phase of autovacuum is lost on each interrupted
attempt. Statistical information is not updated, so another attempt
will trigger the next time autovacuum looks at whether to vacuum
the table.

It's obvious that this pattern not only fails to release
potentially large amounts of unused space back to the OS, but the
headbanging can continue to consume significant resources and for
an extended period, and the repeated blocking for deadlock_timeout
could cause latency problems.

The patch has the truncate work, which requires
AccessExclusiveLock, check at intervals for whether another process
is waiting on its lock. That interval is one of the timings we need
to determine, and for which a GUC was initially proposed. I think
that the check should be fast enough that doing it once every 20ms
as a hard-coded interval would be good enough. When it sees this
situation, it truncates the file for as far as it has managed to
get, releases its lock on the table, sleeps for an interval, and
then checks to see if the lock has become available again.

How long it should sleep between tries to reacquire the lock is
another possible GUC. Again, I'm inclined to think that this could
be hard-coded. Since autovacuum was knocked off-task after doing
some significant work, I'm inclined to make this interval a little
bigger, but I don't think it matters a whole lot. Anything between
20ms and 100ms seens sane. Maybe 50ms?

At any point that it is unable to acquire the lock, there is a
check for how long this autovacuum task has been starved for the
lock. Initially I was arguing for twice the deadlock_timeout on the
basis that this would probably be short enough not to leave the
autovacuum worker sidelined for too long, but long enough for the
attempt to get past a single deadlock between two other processes.
This is the setting Jan is least willing to concede.

If the autovacuum worker does abandon the attempt, it will keep
retrying, since we go out of our way to prevent the autovacuum
process from updating the statistics based on the incomplete
processing. This last interval is not how long it will attempt to
truncate, but how long it will keep one autovacuum worker making
unsuccessful attempts to acquire the lock before it is put to other
uses. Workers will keep coming back to this table until the
truncate phase is completed, just as it does without the patch; the
difference being that anytime it gets the lock, even briefly, it is
able to persist some progress.


That is all correct.



So the question on the table is which of these three intervals
should be GUCs, and what values to use if they aren't.


I could live with all the above defaults, but would like to see more 
comments on them.



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: store additional info in GIN index

2012-12-04 Thread Alexander Korotkov
Hi!

On Sun, Dec 2, 2012 at 5:02 AM, Tomas Vondra t...@fuzzy.cz wrote:

 I've tried to apply the patch with the current HEAD, but I'm getting
 segfaults whenever VACUUM runs (either called directly or from autovac
 workers).

 The patch applied cleanly against 9b3ac49e and needed a minor fix when
 applied on HEAD (because of an assert added to ginRedoCreatePTree), but
 that shouldn't be a problem.


Thanks for testing! Patch is rebased with HEAD. The bug you reported was
fixed.

--
With best regards,
Alexander Korotkov.


ginaddinfo.2.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for removng unused targets

2012-12-04 Thread Alexander Korotkov
On Mon, Dec 3, 2012 at 8:31 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Etsuro Fujita fujita.ets...@lab.ntt.co.jp writes:
  Sorry for the delay.  I've reviewed the patch.  It was applied
  successfully, and it worked well for tests I did including the example
  you showed.  I think it's worth the work, but I'm not sure you go
  about it in the right way.  (I feel the patch decreases code
  readability more than it gives an advantage.)

 One thought here is that I don't particularly like adding a field like
 resorderbyonly to TargetEntry in the first place.  That makes this
 optimization the business of the parser, which it should not be; and
 furthermore makes it incumbent on the rewriter, as well as anything else
 that manipulates parsetrees, to maintain the flag correctly while
 rearranging queries.  It would be better if this were strictly the
 business of the planner.

 But having said that, I'm wondering (without having read the patch)
 why you need anything more than the existing resjunk field.


Actually, I don't know all the cases when resjunk flag is set. Is it
reliable to decide target to be used only for ORDER BY if it's resjunk
and neither system or used in grouping? If it's so or there are some other
cases which are easy to determine then I'll remove resorderbyonly flag.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] WIP: store additional info in GIN index

2012-12-04 Thread Alexander Korotkov
On Tue, Dec 4, 2012 at 9:34 PM, Robert Haas robertmh...@gmail.com wrote:

 On Sun, Nov 18, 2012 at 4:54 PM, Alexander Korotkov
 aekorot...@gmail.com wrote:
  Patch completely changes storage in posting lists and leaf pages of
 posting
  trees. It uses varbyte encoding for BlockNumber and OffsetNumber.
  BlockNumber are stored incremental in page. Additionally one bit of
  OffsetNumber is reserved for additional information NULL flag. To be
 able to
  find position in leaf data page quickly patch introduces small index in
 the
  end of page.

 This sounds like it means that this would break pg_upgrade, about
 which I'm not too keen.  Ideally, we'd like to have a situation where
 new indexes have additional capabilities, but old indexes are still
 usable for things that they could do before.  I am not sure whether
 that's a realistic goal.


This means to have two versions of code which deals with posting trees and
lists. For me it seems unlikely we have resources for maintenance of this.

--
With best regards,
Alexander Korotkov.


[HACKERS] Review of Row Level Security

2012-12-04 Thread Simon Riggs
Patch looks good and also like it will/can be ready for 9.3. I'm happy
to put time into this as committer and/or reviewer and take further
responsibility for it, unless others wish to.

LIKES

* It's pretty simple to understand and use

* Check qual is stored in pre-parsed form. That is fast, and also
secure, since changing search_path of the user doesn't change the
security check at all. Nice.

* Performance overhead is low, integration with indexing is clear and
effective and it works with partitioning

* It works, apart from design holes listed below, easily plugged IMHO


DISLIKEs

* Who gets to see stats on the underlying table? Are the stats
protected by security? How does ANALYZE work?

* INSERT ignores the SECURITY clause, on the ground that this has no
meaning. So its possible to INSERT data you can't see. For example,
you can insert medical records for *another* patient, or insert new
top secret information. This also causes a security hole... since
inserted rows can violate defined constraints, letting you know that
other keys you can't see exist. Why don't we treat the SECURITY clause
as a CHECK constraint? That makes intuitive sense to me.

* UPDATE allows you to bypass the SECURITY clause, to produce new rows
that you can't see. (Not good). But you can't get them back again, cos
you can't see them.

* TRUNCATE works, and allows you to remove all rows of a table, even
ones you can't see to run a DELETE on. Er...

None of those things are cool, at all.

Oracle defaults to putting VPD on all event types: INSERT, UPDATE,
DELETE, SELECT. ISTM we should be doing the same, not just say we can
add an INSERT trigger if you want.

Adding a trigger just begs the question as to why we are bothering in
the first place, since this functionality could already be added by
INSERT, UPDATE or DELETE triggers, if they are a full replacement for
this feature. The only answer is ease of use

We can easily add syntax like this

[ROW SECURITY CHECK (  ) [ON [ ALL | INSERT, UPDATE, DELETE, SELECT [..,

with the default being ALL

* The design has nothing at all to do with SECURITY LABELs. Why did we
create them, I wonder? I understood we would have row-level label
security. Doesn't that require us to have a data type, such as
reglabel or similar enum? Seems strange. Oracle has two features:
Oracle Label Security and Row Level Security -

OTHER

* The docs should explain a little better how to optimize using RLS.
Specifically, the fact that indexable operators are marked leakproof
and thus can be optimized ahead of the rlsquals. The docs say rls
quals are guaranteed to be applied first which isn't true in all
cases.

* Why is pg_rowlevelsec in a separate catalog table?

* Internally, I think we should call this rowsecurity rather than
rowlevelsec - the level word is just noise, whereas the security
word benefits from being spelt out in full.

* psql \d support needed

* Docs need work, but thats OK.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why can't plpgsql return a row-expression?

2012-12-04 Thread Pavel Stehule
Hello

I fully agree with Asif's arguments

previous tupconvert implementation was really strict - so using
enhanced tupconver has very minimal impact for old user - and I
checked same speed for plpgsql function - patched and unpatched pg.

tested

CREATE OR REPLACE FUNCTION public.foo(i integer)
 RETURNS SETOF record
 LANGUAGE plpgsql
AS $function$
declare r record;
begin
  r := (10.1,20.1); for i in 1..10 loop return next r; end loop; return;
end;
$function$

select sum(a) from generate_series(1,3000) g(i), lateral foo(i) as (a
numeric, b numeric);

More - everywhere where we use tupconvert internally, then we use
cached conversion map - so I am sure, so speed of ANALYZE cannot be
impacted by this patch

There are other two issue:

it allow to write new differnt slow code - IO cast is about 5-10-20%
slower, and with this path anybody has new possibilities for new bad
code. But it is not a problem of this patch. It is related to plpgsql
design and probably we should to move some conversions to outside
plpgsql to be possible reuse conversion map and enhance plpgsql. I
have a simple half solutions - plpgsql_check_function can detect this
situation in almost typical use cases and can raises a performance
warning. So this issue is not stopper for me - because it is not new
issue in plpgsql.

Second issue is more significant:

there are bug:

postgres=# select sum(a) from generate_series(1,3000) g(i), lateral
foo(i) as (a numeric, b numeric);
   sum
--
 303000.0
(1 row)

postgres=# select sum(a) from generate_series(1,3000) g(i), lateral
foo(i) as (a float, b numeric);
  sum
---
 7.33675699577682e-232
(1 row)

it produces wrong result

And with minimal change it kill session

create or replace function foo(i integer)
returns setof record as $$
declare r record;
begin
  r := (10,20); for i in 1..10 loop return next r; end loop; return;
end;
$$ language plpgsql;

postgres=# select sum(a) from generate_series(1,3000) g(i), lateral
foo(i) as (a numeric, b numeric);
The connection to the server was lost. Attempting reset: Failed.

create or replace function fo(i integer)
returns record as $$
declare r record;
begin
  r := (10,20); return r;
end;
$$ language plpgsql;

postgres=# select sum(a) from generate_series(1,3000) g(i), lateral
fo(i) as (a int, b numeric);
  sum
---
 3
(1 row)

postgres=# select sum(a) from generate_series(1,3000) g(i), lateral
fo(i) as (a numeric, b numeric);
The connection to the server was lost. Attempting reset: Failed.

Regards

Pavel Stehule


2012/12/3 Asif Rehman asifr.reh...@gmail.com:
 Hi,

 Thanks for the review. Please see the updated patch.

 Hmm.  We're running an I/O cast during do_tup_convert() now, and look
 up the required functions for each tuple.  I think if we're going to
 support this operation at that level, we need to look up the necessary
 functions at convert_tuples_by_foo level, and then apply unconditionally
 if they've been set up.

 Done. TupleConversionMap struct should keep the array of functions oid's
 that
 needs to be applied. Though only for those cases where both attribute type
 id's
 do not match. This way no unnecessary casting will happen.


 Also, what are the implicancies for existing users of tupconvert?  Do we
 want to apply casting during ANALYZE for example?  AFAICS the patch
 shouldn't break any case that works today, but I don't see that there
 has been any analysis of this.

 I believe this part of the code should not impact existing users of
 tupconvert.
 As this part of code is relaxing a restriction upon which an error would
 have been
 generated. Though it could have a little impact on performance but should be
 only for
 cases where attribute type id's are different and are implicitly cast able.

 Besides existing users involve tupconvert in case of inheritance. And in
 that case
 an exact match is expected.

 Regards,
 --Asif


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for removng unused targets

2012-12-04 Thread Tom Lane
Alexander Korotkov aekorot...@gmail.com writes:
 On Mon, Dec 3, 2012 at 8:31 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 But having said that, I'm wondering (without having read the patch)
 why you need anything more than the existing resjunk field.

 Actually, I don't know all the cases when resjunk flag is set. Is it
 reliable to decide target to be used only for ORDER BY if it's resjunk
 and neither system or used in grouping? If it's so or there are some other
 cases which are easy to determine then I'll remove resorderbyonly flag.

resjunk means that the target is not supposed to be output by the query.
Since it's there at all, it's presumably referenced by ORDER BY or GROUP
BY or DISTINCT ON, but the meaning of the flag doesn't depend on that.

What you would need to do is verify that the target is resjunk and not
used in any clause besides ORDER BY.  I have not read your patch, but
I rather imagine that what you've got now is that the parser checks this
and sets the new flag for consumption far downstream.  Why not just make
the same check in the planner?

A more invasive, but possibly cleaner in the long run, approach is to
strip all resjunk targets from the query's tlist at the start of
planning and only put them back if needed.

BTW, when I looked at this a couple years ago, it seemed like the major
problem was that the planner assumes that all plans for the query should
emit the same tlist, and thus that tlist eval cost isn't a
distinguishing factor.  Breaking that assumption seemed to require
rather significant refactoring.  I never found the time to try to
actually do it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for removng unused targets

2012-12-04 Thread Alexander Korotkov
On Tue, Dec 4, 2012 at 11:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Alexander Korotkov aekorot...@gmail.com writes:
  On Mon, Dec 3, 2012 at 8:31 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  But having said that, I'm wondering (without having read the patch)
  why you need anything more than the existing resjunk field.

  Actually, I don't know all the cases when resjunk flag is set. Is it
  reliable to decide target to be used only for ORDER BY if it's
 resjunk
  and neither system or used in grouping? If it's so or there are some
 other
  cases which are easy to determine then I'll remove resorderbyonly flag.

 resjunk means that the target is not supposed to be output by the query.
 Since it's there at all, it's presumably referenced by ORDER BY or GROUP
 BY or DISTINCT ON, but the meaning of the flag doesn't depend on that.

 What you would need to do is verify that the target is resjunk and not
 used in any clause besides ORDER BY.  I have not read your patch, but
 I rather imagine that what you've got now is that the parser checks this
 and sets the new flag for consumption far downstream.  Why not just make
 the same check in the planner?

 A more invasive, but possibly cleaner in the long run, approach is to
 strip all resjunk targets from the query's tlist at the start of
 planning and only put them back if needed.

 BTW, when I looked at this a couple years ago, it seemed like the major
 problem was that the planner assumes that all plans for the query should
 emit the same tlist, and thus that tlist eval cost isn't a
 distinguishing factor.  Breaking that assumption seemed to require
 rather significant refactoring.  I never found the time to try to
 actually do it.


May be there is some way to not remove items from tlist, but evade actual
calculation?

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] WIP: store additional info in GIN index

2012-12-04 Thread Alvaro Herrera
Alexander Korotkov escribió:
 On Tue, Dec 4, 2012 at 9:34 PM, Robert Haas robertmh...@gmail.com wrote:
 
  On Sun, Nov 18, 2012 at 4:54 PM, Alexander Korotkov
  aekorot...@gmail.com wrote:
   Patch completely changes storage in posting lists and leaf pages of
  posting
   trees. It uses varbyte encoding for BlockNumber and OffsetNumber.
   BlockNumber are stored incremental in page. Additionally one bit of
   OffsetNumber is reserved for additional information NULL flag. To be
  able to
   find position in leaf data page quickly patch introduces small index in
  the
   end of page.
 
  This sounds like it means that this would break pg_upgrade, about
  which I'm not too keen.  Ideally, we'd like to have a situation where
  new indexes have additional capabilities, but old indexes are still
  usable for things that they could do before.  I am not sure whether
  that's a realistic goal.
 
 This means to have two versions of code which deals with posting trees and
 lists. For me it seems unlikely we have resources for maintenance of this.

Witness how GIN has gone with unfixed bugs for months, even though
patches to fix them have been posted.  We don't have the manpower to
maintain even *one* such implementation, let alone two.

Maybe we can mark GIN indexes as invalid after pg_upgrade somehow, so
that they require reindex in the new cluster before they can be used for
queries or index updates.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)

2012-12-04 Thread Jeff Davis
On Tue, 2012-12-04 at 10:15 +, Simon Riggs wrote:
 On 4 December 2012 01:34, Jeff Davis pg...@j-davis.com wrote:
 
  I assume that refers to the discussion here:
 
  http://archives.postgresql.org/pgsql-hackers/2012-02/msg01177.php
 
  But that was quite a while ago, so is there a more recent discussion
  that prompted this commit now?
 
 Yes, this was discussed within the last month, thread TRUNCATE 
 SERIALIZABLE...
 
 The patch for that was already posted, so I committed it.

Thank you for pointing me toward that thread.

While experimenting with COPY FREEZE, I noticed something:

The simple case of BEGIN; CREATE TABLE ...; COPY ... WITH (FREEZE);
doesn't meet the pre-conditions. It only meets the conditions if
preceded by a TRUNCATE, which all of the tests do. I looked into it, and
I think the test:

  ... 
  cstate-rel-rd_newRelfilenodeSubid == GetCurrentSubTransactionId()

should be:

... 
 (cstate-rel-rd_newRelfilenodeSubid == GetCurrentSubTransactionId() ||
  cstate-rel-rd_createSubid == GetCurrentSubTransactionId())

(haven't tested). Another option to consider is that
rd_newRelfilenodeSubid could be set on newly-created tables as well as
newly-truncated tables.

Also, I recommend a hint along with the NOTICE when the pre-conditions
aren't met to tell the user what they need to do. Perhaps something
like:

   Create or truncate the table in the same transaction as COPY
FREEZE.

The documentation could expand on that, clarifying that a TRUNCATE in
the same transaction (perhaps even ALTER?) allows a COPY FREEZE.

The code comment could be improved a little, while we're at it:

  Note that the stronger test of exactly which subtransaction created
it is crucial for correctness of this optimisation.

is a little vague, can you explain using the reasoning from the thread?
I would say something like:

  The transaction and subtransaction creating or truncating the table
must match that of the COPY FREEZE. Otherwise, it could mix tuples from
different transactions together, and it would be impossible to
distinguish them for the purposes of atomicity.

  I am a little confused about the case where setting HEAP_XMIN_COMMITTED
  when loading the table in the same transaction is wrong. There was some
  discussion about subtransactions, but those problems only seemed to
  appear when the CREATE and the INSERT/COPY happened in different
  subtransactions.
 
  So, I guess my question is, why the partial revert?
 
 Well, first, because I realised that it wasn't wanted. I was
 re-reading the threads trying to figure out a way to help the checksum
 patch further.
 
 Second, because I realised why it wasn't wanted. Setting
 HEAP_XMIN_COMMITTED causes MVCC violations within the transaction
 issuing the COPY.

After reading that thread, I still don't understand why it's unsafe to
set HEAP_XMIN_COMMITTED in those conditions. Even if it is, I would
think that a sufficiently narrow case -- such as CTAS outside of a
transaction block -- would be safe, along with some slightly broader
cases (like BEGIN; CREATE TABLE; INSERT/COPY).

But perhaps that requires more discussion. I was just curious if there
was a concrete problem that I was missing.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: store additional info in GIN index

2012-12-04 Thread Bruce Momjian
On Tue, Dec  4, 2012 at 05:35:24PM -0300, Alvaro Herrera wrote:
  This means to have two versions of code which deals with posting trees and
  lists. For me it seems unlikely we have resources for maintenance of this.
 
 Witness how GIN has gone with unfixed bugs for months, even though
 patches to fix them have been posted.  We don't have the manpower to
 maintain even *one* such implementation, let alone two.
 
 Maybe we can mark GIN indexes as invalid after pg_upgrade somehow, so
 that they require reindex in the new cluster before they can be used for
 queries or index updates.

Yes, pg_upgrade has infrastructure to do that.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: Deparsing DDL command strings

2012-12-04 Thread Pavel Stehule
Hello

2012/11/27 Dimitri Fontaine dimi...@2ndquadrant.fr:
 Hi,

 Pavel Stehule pavel.steh...@gmail.com writes:
 ** missing doc

 I still would prefer to have an ack about the premises of this patch
 before spending time on writing docs for it, namely:

  - we want a normalized command string (schema, auto naming of some
objects such as constraints and indexes, exporting default values,
etc);

  - this normalisation can not happen as an extension given the current
source code organisation and the new ddl_rewrite module is a good fit
to solve that problem;

  - we offer a new event alias ddl_command_trace that will get
activated start of a DROP and end of an ALTER or a CREATE command so
that it's easy to hook at the right place when you want to trace
things;

  - it's now possible and easy to process GENERATED commands if you wish
to do so (explicit sequence and index operations for a create table
containing a serial primary key column).


I agree with these premisses. I am sure so normalised commands strings
are very nice feature, that can be helpful for generation clean audit
logs and messages. I see a one issue - testing a consistency between
commands and their deparsed forms.

Do you have some idea, how to do these tests - Maybe we can write
extension, that will try repeated parsing and deparsing - normalised
string should be same. I am not sure, if this test should be part of
regressions test, but we have to thinking about some tool for
checking.

we can take commands via hooks and we can check ORIGINAL CMD ==
tree1 == DEPARSED STRING == tree2   tree1 = tree2

Please, can others to write an agreement or any rejection now?

Does somebody know why this concept is not acceptable? Speak now, please.


 I think no complaints is encouraging though, so I will probably begin
 the docs effort soonish.

 ** CREATE FUNCTION is not supported

 I've added support for create and drop function in the attached patch.
 Not all commands are rewritten yet though, and not all of them even have
 support for the TG_OBJECTID, TG_OBJECTNAME and TG_SCHEMANAME variables.


I have no problem with defined list of fully unsupported statements.

 I think we could still apply that patch + docs and a limited set of
 commands, and continue filling the gaps till the end of this cycle. Once
 we agree on how to attack the problem at hand, do we really need support
 for ALTER OPERATOR FAMILY for an intermediate commit? You tell me.

 * some wrong in deparsing - doesn't support constraints

 I've been fixing that and reviewing ALTER TABLE rewriting, should be ok
 now. Note that we have to analyze the alter table work queue, not the
 parsetree, if we want to normalize automatic constraints names and such
 like.

 Regards,
 --
 Dimitri Fontaine
 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


I had a small problem with patching and compilation produce warnings too

pavel ~/src/postgresql/src $ cat log | grep warning
scan.c:16247:23: warning: unused variable ‘yyg’ [-Wunused-variable]
../../../src/include/utils/ddl_rewrite.h:24:8: warning: extra tokens
at end of #endif directive [enabled by default]
../../../../src/include/utils/ddl_rewrite.h:24:8: warning: extra
tokens at end of #endif directive [enabled by default]
ddl_rewrite.c:1719:9: warning: variable ‘def’ set but not used
[-Wunused-but-set-variable]
ddl_rewrite.c:1777:21: warning: ‘languageOid’ is used uninitialized in
this function [-Wuninitialized]

All 133 tests passed.

when I tested

postgres=# create table bubuaa(a int unique not null check (a  20));
NOTICE:  snitch: ddl_command_end CREATE TABLE CREATE TABLE
public.bubuaa (a pg_catalog.int4 UNIQUE NOT NULL CHECK ((a  20)),
CHECK ((a  20)));
NOTICE:  snitch: ddl_command_end CREATE INDEX CREATE UNIQUE INDEX ON
public.bubuaa USING btree (a);
CREATE TABLE

constraint is twice time there - it is probably same, but it is confusing

drop table generate command string

postgres=# drop table bubuaa;
NOTICE:  snitch: ddl_command_end DROP TABLE NULL
DROP TABLE

functions are supported :)

backslash statement \dy doesn't work


postgres=# \dy
ERROR:  column evtctxs does not exist
LINE 1: ... array_to_string(array(select x   from unnest(evtctxs) a...


I was little bit surprised so general event trigger without tag
predicate was not triggered for CREATE FUNCTION statement. Is it
documented somewhere?

Regards

Pavel


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: Deparsing DDL command strings

2012-12-04 Thread Dimitri Fontaine
Pavel Stehule pavel.steh...@gmail.com writes:
 I agree with these premisses. I am sure so normalised commands strings
 are very nice feature, that can be helpful for generation clean audit
 logs and messages. I see a one issue - testing a consistency between
 commands and their deparsed forms.

Thanks.

 Do you have some idea, how to do these tests - Maybe we can write
 extension, that will try repeated parsing and deparsing - normalised
 string should be same. I am not sure, if this test should be part of
 regressions test, but we have to thinking about some tool for
 checking.

 we can take commands via hooks and we can check ORIGINAL CMD ==
 tree1 == DEPARSED STRING == tree2   tree1 = tree2

We could have an event trigger that stores the statements in a table,
does some object description like \d or \df+, then drops them all and
replay all the statements from the table and describe the objects again.

I'm not sure how to check that the descriptions are the same from the
regression tests themselves, but once that's manually/externally sorted
out the current facilities will sure provide guards against any
regression.

 Please, can others to write an agreement or any rejection now?
 Does somebody know why this concept is not acceptable? Speak now, please.

+1 :)

 I've added support for create and drop function in the attached patch.
 Not all commands are rewritten yet though, and not all of them even have
 support for the TG_OBJECTID, TG_OBJECTNAME and TG_SCHEMANAME variables.

 I have no problem with defined list of fully unsupported statements.

Cool. The goal still is to support all functions, but I need aproval
first. It's too much work to be wasted 2 months from now.

 I had a small problem with patching and compilation produce warnings too

I will look into that. I know I had to switch in between -O0 and -O2 to
be able to debug some really strange things that happened to have been
due to our build infrastructure not being able to cope with some header
changes. Days like that I really hate the C developper tooling.

 postgres=# create table bubuaa(a int unique not null check (a  20));
 NOTICE:  snitch: ddl_command_end CREATE TABLE CREATE TABLE
 public.bubuaa (a pg_catalog.int4 UNIQUE NOT NULL CHECK ((a  20)),
 CHECK ((a  20)));
 NOTICE:  snitch: ddl_command_end CREATE INDEX CREATE UNIQUE INDEX ON
 public.bubuaa USING btree (a);
 CREATE TABLE

 constraint is twice time there - it is probably same, but it is confusing

That might be due to some refactoring I just did for the ALTER TABLE.
Sharing code means action at a distance sometimes. I need to begin
adding regression tests, but they will look a lot like the ones Robert
did insist on removing last time…

 postgres=# \dy
 ERROR:  column evtctxs does not exist
 LINE 1: ... array_to_string(array(select x   from unnest(evtctxs) a...

My guess is that you didn't initdb the version you tested. As usual I
didn't include the catalog bump in my patch, but initdb still is needed.

 I was little bit surprised so general event trigger without tag
 predicate was not triggered for CREATE FUNCTION statement. Is it
 documented somewhere?

Works for me (from memory). Sequels of the initdb problem?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why can't plpgsql return a row-expression?

2012-12-04 Thread Asif Rehman
Hi,

Here is the updated patch. I overlooked the loop, checking to free the
conversions map. Here are the results now.

postgres=# select sum(a) from generate_series(1,3000) g(i), lateral foo(i)
as (a numeric, b numeric);
   sum
--
 303000.0
(1 row)

postgres=# select sum(a) from generate_series(1,3000) g(i), lateral foo(i)
as (a float, b numeric);
   sum
--
 303000.00012

Regards,
--Asif


On Wed, Dec 5, 2012 at 12:40 AM, Pavel Stehule pavel.steh...@gmail.comwrote:

 Hello

 I fully agree with Asif's arguments

 previous tupconvert implementation was really strict - so using
 enhanced tupconver has very minimal impact for old user - and I
 checked same speed for plpgsql function - patched and unpatched pg.

 tested

 CREATE OR REPLACE FUNCTION public.foo(i integer)
  RETURNS SETOF record
  LANGUAGE plpgsql
 AS $function$
 declare r record;
 begin
   r := (10.1,20.1); for i in 1..10 loop return next r; end loop; return;
 end;
 $function$

 select sum(a) from generate_series(1,3000) g(i), lateral foo(i) as (a
 numeric, b numeric);

 More - everywhere where we use tupconvert internally, then we use
 cached conversion map - so I am sure, so speed of ANALYZE cannot be
 impacted by this patch

 There are other two issue:

 it allow to write new differnt slow code - IO cast is about 5-10-20%
 slower, and with this path anybody has new possibilities for new bad
 code. But it is not a problem of this patch. It is related to plpgsql
 design and probably we should to move some conversions to outside
 plpgsql to be possible reuse conversion map and enhance plpgsql. I
 have a simple half solutions - plpgsql_check_function can detect this
 situation in almost typical use cases and can raises a performance
 warning. So this issue is not stopper for me - because it is not new
 issue in plpgsql.

 Second issue is more significant:

 there are bug:

 postgres=# select sum(a) from generate_series(1,3000) g(i), lateral
 foo(i) as (a numeric, b numeric);
sum
 --
  303000.0
 (1 row)

 postgres=# select sum(a) from generate_series(1,3000) g(i), lateral
 foo(i) as (a float, b numeric);
   sum
 ---
  7.33675699577682e-232
 (1 row)

 it produces wrong result

 And with minimal change it kill session

 create or replace function foo(i integer)
 returns setof record as $$
 declare r record;
 begin
   r := (10,20); for i in 1..10 loop return next r; end loop; return;
 end;
 $$ language plpgsql;

 postgres=# select sum(a) from generate_series(1,3000) g(i), lateral
 foo(i) as (a numeric, b numeric);
 The connection to the server was lost. Attempting reset: Failed.

 create or replace function fo(i integer)
 returns record as $$
 declare r record;
 begin
   r := (10,20); return r;
 end;
 $$ language plpgsql;

 postgres=# select sum(a) from generate_series(1,3000) g(i), lateral
 fo(i) as (a int, b numeric);
   sum
 ---
  3
 (1 row)

 postgres=# select sum(a) from generate_series(1,3000) g(i), lateral
 fo(i) as (a numeric, b numeric);
 The connection to the server was lost. Attempting reset: Failed.

 Regards

 Pavel Stehule


 2012/12/3 Asif Rehman asifr.reh...@gmail.com:
  Hi,
 
  Thanks for the review. Please see the updated patch.
 
  Hmm.  We're running an I/O cast during do_tup_convert() now, and look
  up the required functions for each tuple.  I think if we're going to
  support this operation at that level, we need to look up the necessary
  functions at convert_tuples_by_foo level, and then apply unconditionally
  if they've been set up.
 
  Done. TupleConversionMap struct should keep the array of functions oid's
  that
  needs to be applied. Though only for those cases where both attribute
 type
  id's
  do not match. This way no unnecessary casting will happen.
 
 
  Also, what are the implicancies for existing users of tupconvert?  Do we
  want to apply casting during ANALYZE for example?  AFAICS the patch
  shouldn't break any case that works today, but I don't see that there
  has been any analysis of this.
 
  I believe this part of the code should not impact existing users of
  tupconvert.
  As this part of code is relaxing a restriction upon which an error would
  have been
  generated. Though it could have a little impact on performance but
 should be
  only for
  cases where attribute type id's are different and are implicitly cast
 able.
 
  Besides existing users involve tupconvert in case of inheritance. And in
  that case
  an exact match is expected.
 
  Regards,
  --Asif



return_rowtype.v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tablespaces in the data directory

2012-12-04 Thread Andreas Karlsson

On 2012-12-01 14:45, Magnus Hagander wrote:

Someone just reported a problem when they had created a new tablespace
inside the old data directory. I'm sure there can be other issues
caused by this as well, but this is mainly a confusing scenario for
people now.

As there isn't (as far as I know at least) any actual *point* in
creating a tablespace inside the main data directory, should we
perhaps disallow this in CREATE TABLESPACE? Or at least throw a
WARNING if one does it?


Does this apply when creating a tablespace in another tablespace too? If 
the problem is that pg_basebackup copies that data twice it sounds like 
it should.


--
Andreas Karlsson


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: store additional info in GIN index

2012-12-04 Thread Tomas Vondra
On 4.12.2012 20:12, Alexander Korotkov wrote:
 Hi!
 
 On Sun, Dec 2, 2012 at 5:02 AM, Tomas Vondra t...@fuzzy.cz
 mailto:t...@fuzzy.cz wrote:
 
 I've tried to apply the patch with the current HEAD, but I'm getting
 segfaults whenever VACUUM runs (either called directly or from autovac
 workers).
 
 The patch applied cleanly against 9b3ac49e and needed a minor fix when
 applied on HEAD (because of an assert added to ginRedoCreatePTree), but
 that shouldn't be a problem.
 
 
 Thanks for testing! Patch is rebased with HEAD. The bug you reported was
 fixed.

Applies fine, but I get a segfault in dataPlaceToPage at gindatapage.c.
The whole backtrace is here: http://pastebin.com/YEPuWeuV

The messages written into PostgreSQL log are quite variable - usually it
looks like this:

2012-12-04 22:31:08 CET 31839 LOG:  database system was not properly
shut down; automatic recovery in progress
2012-12-04 22:31:08 CET 31839 LOG:  redo starts at 0/68A76E48
2012-12-04 22:31:08 CET 31839 LOG:  unexpected pageaddr 0/1BE64000 in
log segment 00010069, offset 15089664
2012-12-04 22:31:08 CET 31839 LOG:  redo done at 0/69E63638

but I've seen this message too

2012-12-04 22:20:29 CET 31709 LOG:  database system was not properly
shut down; automatic recovery in progress
2012-12-04 22:20:29 CET 31709 LOG:  redo starts at 0/AEAFAF8
2012-12-04 22:20:29 CET 31709 LOG:  record with zero length at 0/C7D5698
2012-12-04 22:20:29 CET 31709 LOG:  redo done at 0/C7D55E


I wasn't able to prepare a simple testcase to reproduce this, so I've
attached two files from my fun project where I noticed it. It's a
simple DB + a bit of Python for indexing mbox archives inside Pg.

- create.sql - a database structure with a bunch of GIN indexes on
   tsvector columns on messages table

- load.py - script for parsing mbox archives / loading them into the
messages table (warning: it's a bit messy)


Usage:

1) create the DB structure
$ createdb archives
$ psql archives  create.sql

2) fetch some archives (I consistently get SIGSEGV after first three)
$ wget
http://archives.postgresql.org/pgsql-hackers/mbox/pgsql-hackers.1997-01.gz
$ wget
http://archives.postgresql.org/pgsql-hackers/mbox/pgsql-hackers.1997-02.gz
$ wget
http://archives.postgresql.org/pgsql-hackers/mbox/pgsql-hackers.1997-03.gz

3) gunzip and load them using the python script
$ gunzip pgsql-hackers.*.gz
$ ./load.py --db archives pgsql-hackers.*

4) et voila - a SIGSEGV :-(


I suspect this might be related to the fact that the load.py script uses
savepoints quite heavily to handle UNIQUE_VIOLATION (duplicate messages).


Tomas
#!/bin/env python

import argparse
import datetime
import getpass
import multiprocessing
import os
import psycopg2
import psycopg2.extras
import psycopg2.errorcodes
import quopri
import random
import re
import sys
import traceback
import UserDict

from multiprocessing import Process, JoinableQueue

class Message(dict):
	
	def __init__(self, message):
		self.message = message
		self.body= self.body(message)
		self.headers = self.headers(message)
		self.parts   = self.parts(message)
	
	def __getitem__(self, key):
		if self.headers.has_key(key.lower()):
			return self.headers[key.lower()]
		else:
			return None
	
	def __setitem__(self, key, value):
		self.headers.update({key.lower() : value})
	
	def __delitem__(self, key):
		if self.headers.has_key(key.lower()):
			del self.headers[key.lower()]
	
	def keys(self):
		return self.headers.keys()
	
	def get_body(self):
		return self.body

	def get_raw(self):
		return self.message
	
	def get_parts(self):
		return self.parts
	
	def get_headers(self):
		return self.headers

	def get_content_type(self):
		
		if self.headers.has_key('content-type'):
			return self.headers['content-type'].split(';')[0]
		else:
			return None
		
	def __repr__(self):
		return '%s %s' % (type(self).__name__, self.headers)
	
	def is_multipart(self):
		ctype = self.get_content_type()
		if ctype != None and re.match('multipart/.*', ctype):
			return True
		else:
			return False
	
	def part_boundary(self):
	
		if not self.is_multipart():
			return None
		else:
			r = re.match('.*boundary=?([^]*)?', self.headers['content-type'], re.IGNORECASE)
			if r:
return '--' + r.group(1)

	# FIXME this keeps only the last value - needs to keep a list
	def headers(self, message):
		
		lines = message.split(\n)
		
		key = ''
		value = ''
		
		headers = {};
		
		for l in lines:
			if l == '':
if key != '':
	headers.update({key.lower() : value})
break
			
			r = re.match('([a-zA-Z0-9-]*):\s*(.*)', l)
			if r:
if key != '':
	headers.update({key.lower() : value})

key = r.group(1)
value = r.group(2)
			else:
value += ' ' + l.strip()
			
		r = re.match('^From .*@.*\s+([a-zA-Z]*\s+[a-zA-Z]*\s+[0-9]+ [0-9]+:[0-9]+:[0-9]+\s+[0-9]{4})$', lines[0])
		if r:
			headers.update({'message-date' : r.group(1)})
		
		r = re.match('^From 

Re: [HACKERS] ALTER TABLE ... NOREWRITE option

2012-12-04 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 However, for a very large user group -- users with ORMs which do their
 own DDL migrations -- we could also use a way to log or WARN for table
 rewrites.  Since the ORMs are not gonna do an EXPLAIN.

 An ORM is also quite unlikely to pay attention to a warning, much less a
 postmaster log entry ... so your argument seems unfounded from here.

 But the DevOps staff *is*.

Sure, and the DevOps staff would be using the EXPLAIN feature (if we had
it).  After which they could do little anyway except complain to the ORM
authors, who might or might not give a damn.  I don't see that there's
enough value-added from what you suggest to justify the development
time.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: store additional info in GIN index

2012-12-04 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Maybe we can mark GIN indexes as invalid after pg_upgrade somehow, so
 that they require reindex in the new cluster before they can be used for
 queries or index updates.

Bumping the version number in the GIN metapage would be sufficient.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: store additional info in GIN index

2012-12-04 Thread Bruce Momjian
On Tue, Dec  4, 2012 at 05:35:27PM -0500, Tom Lane wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
  Maybe we can mark GIN indexes as invalid after pg_upgrade somehow, so
  that they require reindex in the new cluster before they can be used for
  queries or index updates.
 
 Bumping the version number in the GIN metapage would be sufficient.

And it is easy for pg_upgrade to report which indexes need rebuilding,
and it can create a script file to do the reindexing.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2012-12-04 Thread Jeff Davis
On Tue, 2012-12-04 at 01:03 -0800, Jeff Davis wrote:
  3. I think we need an explicit test of this feature (as you describe
  above), rather than manual testing. corruptiontester?
 
 I agree, but I'm not 100% sure how to proceed. I'll look at Kevin's
 tests for SSI and see if I can do something similar, but suggestions are
 welcome. A few days away, at the earliest.

I looked into this. The SSI tests still use pg_regress to start/stop the
server, and make use of a lot more of the pg_regress framework.
pg_regress doesn't fit what I need to do, at all.

For me, each test involves a fresh initdb, followed by a small data load
(at least a few pages). Then, I shut down the server, inject the faults
under test, and start the server back up. Then, I count the table and
expect an error. Then I throw away the data directory. (I can shortcut
some of the initdb and load time by keeping a good copy of the table
throughout the whole set of tests and copying it back, but that's just a
detail.)

So, I could try to write a test framework in C that would be a candidate
to include with the main distribution and be run by the buildfarm, but
that would be a lot of work. Even then, I couldn't easily abstract away
these kinds of tests into text files, unless I invent a language that is
suitable for describing disk faults to inject.

Or, I could write up a test framework in ruby or python, using the
appropriate pg driver, and some not-so-portable shell commands to start
and stop the server. Then, I can publish that on this list, and that
would at least make it easier to test semi-manually and give greater
confidence in pre-commit revisions.

Suggestions?

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... NOREWRITE option

2012-12-04 Thread Simon Riggs
On 4 December 2012 22:30, Tom Lane t...@sss.pgh.pa.us wrote:
 Josh Berkus j...@agliodbs.com writes:
 However, for a very large user group -- users with ORMs which do their
 own DDL migrations -- we could also use a way to log or WARN for table
 rewrites.  Since the ORMs are not gonna do an EXPLAIN.

 An ORM is also quite unlikely to pay attention to a warning, much less a
 postmaster log entry ... so your argument seems unfounded from here.

 But the DevOps staff *is*.

 Sure, and the DevOps staff would be using the EXPLAIN feature (if we had
 it).  After which they could do little anyway except complain to the ORM
 authors, who might or might not give a damn.  I don't see that there's
 enough value-added from what you suggest to justify the development
 time.

I've never had my POLA violated, so its hard to comment on so many buzzphrases.

But normal people I've worked with do struggle with knowing for
certain what will happen when they run a DDL change script on a
production system, which is the heart of the problem. Other software
lags behind the services we provide, but if we never provide it they
definitely will never use it.

Eyeballs work well, but something automated else would work even better.

Adding EXPLAIN in front of everything doesn't work at all because many
actions depend upon the results of earlier actions, so the test must
actually perform the action in order for the whole script to work.
Plus, if we have to edit a script to add EXPLAIN in front of
everything, you can't put it live without editing out the EXPLAINs,
which leaves you open to the failure caused by editing immediately
prior to go live.

So my NOREWRITE option fails those criteria and should be discarded.

On reflection, what is needed is a way to ask what just happened
when a script is tested. The best way to do that is a special stats
collection mode that can be enabled just for that run and then the
stats analyzed afterwards. Noah's contribution comes closest to what
we need, but its not complete enough, yet.

I'll have a play with this idea some more.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE ... NOREWRITE option

2012-12-04 Thread Josh Berkus

 Sure, and the DevOps staff would be using the EXPLAIN feature (if we had
 it).  After which they could do little anyway except complain to the ORM
 authors, who might or might not give a damn.  I don't see that there's
 enough value-added from what you suggest to justify the development
 time.

You're still thinking of a schema change as a SQL script.  ORM-based
applications usually do not run their schema changes as SQL scripts,
thus there's nothing to EXPLAIN.  Anything which assumes the presense of
a distict, user-accessible SQL script is going to leave out a large
class of our users.

However, as I said, if we had the EXPLAIN ALTER, we could use
auto-explain to log the ALTER plans (finally, a good use for
auto-explain).  So that's a workable workaround. And EXPLAIN ALTER would
offer us more flexibility than any logging option, of course.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] PITR potentially broken in 9.2

2012-12-04 Thread Tom Lane
I wrote:
 So apparently this is something we broke since Nov 18.  Don't know what
 yet --- any thoughts?

Further experimentation shows that reverting commit
ffc3172e4e3caee0327a7e4126b5e7a3c8a1c8cf makes it work.  So there's
something wrong/incomplete about that fix.

This is a bit urgent since we now have to consider whether to withdraw
9.2.2 and issue a hasty 9.2.3.  Do we have a regression here since
9.2.1, and if so how bad is it?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tablespaces in the data directory

2012-12-04 Thread Noah Misch
On Mon, Dec 03, 2012 at 01:14:30PM -0500, Andrew Dunstan wrote:
 I think it would be reasonable for it to complain if it came across a  
 PG_VERSION file in an unexpected location.

That sounds like a reliable approach to detecting the hazard.  Pseudocode:

chdir(proposed_tablespace_path)
do {
if (stat(PG_VERSION))
ereport(WARNING, ...)
} while (chdir(..))


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] PITR potentially broken in 9.2

2012-12-04 Thread Andres Freund
On 2012-12-04 19:35:48 -0500, Tom Lane wrote:
 I wrote:
  So apparently this is something we broke since Nov 18.  Don't know what
  yet --- any thoughts?

 Further experimentation shows that reverting commit
 ffc3172e4e3caee0327a7e4126b5e7a3c8a1c8cf makes it work.  So there's
 something wrong/incomplete about that fix.

ISTM that the code should check ControlFile-backupEndRequired, not just
check for an invalid backupEndPoint. I haven't looked into the specific
issue though.

 This is a bit urgent since we now have to consider whether to withdraw
 9.2.2 and issue a hasty 9.2.3.  Do we have a regression here since
 9.2.1, and if so how bad is it?

Not sure.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] PITR potentially broken in 9.2

2012-12-04 Thread Jeff Janes
On Tue, Dec 4, 2012 at 4:35 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I wrote:
 So apparently this is something we broke since Nov 18.  Don't know what
 yet --- any thoughts?

 Further experimentation shows that reverting commit
 ffc3172e4e3caee0327a7e4126b5e7a3c8a1c8cf makes it work.  So there's
 something wrong/incomplete about that fix.

I can't independently vouch for the correctness of that fix, but I can
vouch that there is so far no evidence that it is incorrect.

It is re-revealing an undesirable (but safe, as far as we know)
behavior that is present in 9.1.x but which was temporarily hidden by
a corruption-risk bug in 9.2.0 and 9.2.1.


 This is a bit urgent since we now have to consider whether to withdraw
 9.2.2 and issue a hasty 9.2.3.  Do we have a regression here since
 9.2.1, and if so how bad is it?

I don't think this is urgent.  The error-message issue in 9.1.6 and
9.2.2 is merely annoying, while the early-opening one in 9.2.0 and
9.2.1 seems fundamentally unsafe.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suggestion for --truncate-tables to pg_restore

2012-12-04 Thread Josh Kupershmidt
Sorry for the delay in following up here.

On Mon, Nov 26, 2012 at 8:30 PM, Karl O. Pinc k...@meme.com wrote:
 On 11/26/2012 08:45:08 PM, Josh Kupershmidt wrote:
 On Mon, Nov 26, 2012 at 3:42 PM, Robert Haas robertmh...@gmail.com
 wrote:
  On Mon, Nov 26, 2012 at 4:51 PM, Karl O. Pinc k...@meme.com wrote:
  P.S.  An outstanding question regards --truncate-tables
  is whether it should drop indexes before truncate
  and re-create them after restore.  Sounds like it should.
 
  Well, that would improve performance, but it also makes the
 behavior
  of object significantly different from what one might expect from
 the
  name.  One of the problems here is that there seem to be a number
 of
  slightly-different things that one might want to do, and it's not
  exactly clear what all of them are, or whether a reasonable number
 of
  options can cater to all of them.

 Another problem: attempting to drop a unique constraint or primary
 key
 (if we're counting these as indexes to be dropped and recreated,
 which
 they should be if the goal is reasonable restore performance) which
 is
 referenced by another table's foreign key will cause:
   ERROR:  cannot drop constraint xxx on table yyy
   because other objects depend on it

 and as discussed upthread, it would be impolite for pg_restore to
 presume it should monkey with dropping+recreating other tables'
 constraints to work around this problem, not to mention impossible
 when pg_restore is not connected to the target database.

 I'm thinking impossible because it's impossible to know
 what the existing FKs are without a db connection.  Impossible is
 a problem.  You may have another reason why it's impossible.

Yes, that's what I meant.

 Meanwhile it sounds like the --truncate-tables patch
 is looking less and less desirable.  I'm ready for
 rejection, but will soldier on in the interest of
 not wasting other people work on this, if given
 direction to move forward.

Well, as far as I was able to tell, the use-case where this patch
worked without trouble was limited to restoring a table, or schema
with table(s), that:
 a.) has some view(s) dependent on it
 b.) has no other tables with FK references to it, so that we don't run into:
ERROR:  cannot truncate a table referenced in a foreign key constraint
 c.) is not so large that it takes forever for data to be restored
with indexes and constraints left intact
 d.) and whose admin does not want to use --clean plus a list-file
which limits pg_restore to the table and its views

I was initially hoping that the patch would be more useful for
restoring a table with FKs pointing to it, but it seems the only
reliable way to do this kind of selective restore with pg_restore is
with --clean and editing the list-file. Editing the list-file is
certainly tedious and prone to manual error, but I'm not sure this
particular patch has a wide enough use-case to alleviate that pain
significantly.

Josh


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suggestion for --truncate-tables to pg_restore

2012-12-04 Thread Karl O. Pinc
On 12/04/2012 09:26:47 PM, Josh Kupershmidt wrote:
 Sorry for the delay in following up here.

No problem at all.

 Well, as far as I was able to tell, the use-case where this patch
 worked without trouble was limited to restoring a table, or schema
 with table(s), that:
  a.) has some view(s) dependent on it
  b.) has no other tables with FK references to it, so that we don't
 run into:
 ERROR:  cannot truncate a table referenced in a foreign key 
 constraint
  c.) is not so large that it takes forever for data to be restored
 with indexes and constraints left intact
  d.) and whose admin does not want to use --clean plus a list-file
 which limits pg_restore to the table and its views
 
 I was initially hoping that the patch would be more useful for
 restoring a table with FKs pointing to it, but it seems the only
 reliable way to do this kind of selective restore with pg_restore is
 with --clean and editing the list-file. Editing the list-file is
 certainly tedious and prone to manual error, but I'm not sure this
 particular patch has a wide enough use-case to alleviate that pain
 significantly.

I think there must be a reliable way to restore tables with FKs 
pointing to them, but getting pg_restore to do it seems perilous; at 
least given your expectations for the behavior of pg_restore in the
context of the existing option set.

As with you, I am not inclined to add another option to pg_restore
unless it's really useful.  (Pg_restore already has gobs of options, 
to the point where it's getting sticky.)  I don't think this 
patch meets the utility bar.  It might be able to be re-worked into 
something useful, or it might need to evolve into some sort of new 
restore utility per your thoughts.  For now better to reject it until
the right/comprehensive way to proceed is figured out.

Thanks for all your work.  I hope that this has at least
moved the discussion forward and not been a waste of everybody's
time.  I would like to see a better way of restoring
tables that are FK reference targets.

Regards,

Karl k...@meme.com
Free Software:  You don't pay back, you pay forward.
 -- Robert A. Heinlein



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] pg_ping utility

2012-12-04 Thread Phil Sorber
On Mon, Dec 3, 2012 at 11:59 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Sun, Dec 2, 2012 at 5:56 AM, Phil Sorber p...@omniti.com wrote:

 Here is the updated patch. I renamed it, but using v5 to stay consistent.


 After looking at this patch, I found the following problems:
 - There are a couple of whitespaces still in the code, particularly at the
 end of those lines
 +   const char *pguser = NULL;
 +   const char *pgdbname = NULL;

Mystery how those got in there. Fixed.

 - When describing the values that are returned by pg_isready, it is awkward
 to refer to the returning values as plain integers as those values are part
 of an enum. You should add references to PQPING_OK, PQPING_REJECT,
 PQPING_NO_RESPONSE and PQPING_NO_ATTEMPT instead. Also add to reference
 links in the docs redirecting to them, with things of the type xref
 linkend=libpq-pqpingparams-pqping-ok or related.

Fixed. I changed the wording a little too.

 - Same thing with this example:
 +   para
 +Standard Usage:
 +screen
 + prompt$/prompt userinputpg_isready/userinput
 + prompt$/prompt userinputecho $?/userinput
 + computeroutput0/computeroutput
 +/screen
 +   /para
 For the time being PQPING_OK returns 0 because it is on top of the enum
 PGPing, but this might change if for a reason or another the order of
 outputs is changed.

So I understand what you mean by the ordering might change, but this
is actual output from the shell. I'm not sure how to convey that
sentiment properly here and still have a real example. Perhaps just
remove the example?


 Once those things are fixed, I think this will be ready for committer review
 as everybody here seem to agree with your approach.

 --
 Michael Paquier
 http://michael.otacoo.com


pg_isready_bin_v6.diff
Description: Binary data


pg_isready_docs_v6.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] pg_ping utility

2012-12-04 Thread Michael Paquier
On 2012/12/05, at 14:46, Phil Sorber p...@omniti.com wrote:

 On Mon, Dec 3, 2012 at 11:59 PM, Michael Paquier
 
 So I understand what you mean by the ordering might change, but this
 is actual output from the shell. I'm not sure how to convey that
 sentiment properly here and still have a real example. Perhaps just
 remove the example?
Yes, removing the example would be ok.

Does someone have another idea of example?

Thanks,
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: Deparsing DDL command strings

2012-12-04 Thread Pavel Stehule
2012/12/4 Dimitri Fontaine dimi...@2ndquadrant.fr:
 Pavel Stehule pavel.steh...@gmail.com writes:
 I agree with these premisses. I am sure so normalised commands strings
 are very nice feature, that can be helpful for generation clean audit
 logs and messages. I see a one issue - testing a consistency between
 commands and their deparsed forms.

 Thanks.

 Do you have some idea, how to do these tests - Maybe we can write
 extension, that will try repeated parsing and deparsing - normalised
 string should be same. I am not sure, if this test should be part of
 regressions test, but we have to thinking about some tool for
 checking.

 we can take commands via hooks and we can check ORIGINAL CMD ==
 tree1 == DEPARSED STRING == tree2   tree1 = tree2

 We could have an event trigger that stores the statements in a table,
 does some object description like \d or \df+, then drops them all and
 replay all the statements from the table and describe the objects again.

 I'm not sure how to check that the descriptions are the same from the
 regression tests themselves, but once that's manually/externally sorted
 out the current facilities will sure provide guards against any
 regression.

 Please, can others to write an agreement or any rejection now?
 Does somebody know why this concept is not acceptable? Speak now, please.

 +1 :)

 I've added support for create and drop function in the attached patch.
 Not all commands are rewritten yet though, and not all of them even have
 support for the TG_OBJECTID, TG_OBJECTNAME and TG_SCHEMANAME variables.

 I have no problem with defined list of fully unsupported statements.

 Cool. The goal still is to support all functions, but I need aproval
 first. It's too much work to be wasted 2 months from now.

 I had a small problem with patching and compilation produce warnings too

 I will look into that. I know I had to switch in between -O0 and -O2 to
 be able to debug some really strange things that happened to have been
 due to our build infrastructure not being able to cope with some header
 changes. Days like that I really hate the C developper tooling.

 postgres=# create table bubuaa(a int unique not null check (a  20));
 NOTICE:  snitch: ddl_command_end CREATE TABLE CREATE TABLE
 public.bubuaa (a pg_catalog.int4 UNIQUE NOT NULL CHECK ((a  20)),
 CHECK ((a  20)));
 NOTICE:  snitch: ddl_command_end CREATE INDEX CREATE UNIQUE INDEX ON
 public.bubuaa USING btree (a);
 CREATE TABLE

 constraint is twice time there - it is probably same, but it is confusing

 That might be due to some refactoring I just did for the ALTER TABLE.
 Sharing code means action at a distance sometimes. I need to begin
 adding regression tests, but they will look a lot like the ones Robert
 did insist on removing last time…

 postgres=# \dy
 ERROR:  column evtctxs does not exist
 LINE 1: ... array_to_string(array(select x   from unnest(evtctxs) a...

 My guess is that you didn't initdb the version you tested. As usual I
 didn't include the catalog bump in my patch, but initdb still is needed.

 I was little bit surprised so general event trigger without tag
 predicate was not triggered for CREATE FUNCTION statement. Is it
 documented somewhere?

 Works for me (from memory). Sequels of the initdb problem?

yes, last two issue are related to missing initdb

Regards

Pavel


 Regards,
 --
 Dimitri Fontaine
 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers