date:20120119

Re: [HACKERS] [v9.2] sepgsql's DROP Permission checks

2012-01-19 Thread Kohei KaiGai

2012/1/19 Robert Haas robertmh...@gmail.com:
 On Wed, Jan 18, 2012 at 9:50 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 In sepgsql side, it determines a case to apply permission checks
 according to the contextual information; that is same technique
 when we implemented create permission.
 Thus, it could checks db_xxx:{drop} permission correctly.

 Why do we need the contextual information in this case?  Why
 can't/shouldn't the decision be made solely on the basis of what
 object is targeted?

Several code-paths to remove a particular objects are not appropriate
to apply permission checks. For example...

[1] Cleanup of temporary objects
on_shmem_eixt() registers RemoveTempRelationsCallback(), then
it eventually calls deleteWhatDependsOn() that have an invocation
of deleteOneObject().
This code path is just an internal cleanup process, not related to
permission of the client.

[2] Cleanup of transient table at VACUUM FULL/CLUSTER command
rebuild_relation() creates a temporary table with make_new_heap(),
then it copies the contents of original table according to the order of
index, and calls finish_heap_swap() that swaps relation files and
removes the temporary table using performDeletion().
This code actually create and drop a table, however, it is quite
internal design and not related to permission of the client.

So, I think sepgsql should only applied to permission checks
object deletion invoked by user's operations, such as DropStmt.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Strange primary key constraint influence to grouping

2012-01-19 Thread Gražvydas Valeika

Thanks for explanation.
Now I remember the discussion on hackers list about this feature, but
anyway, this feature surprised little bit.

G.

Re: [HACKERS] Patch: Allow SQL-language functions to reference parameters by parameter name

2012-01-19 Thread Hitoshi Harada

On Wed, Jan 18, 2012 at 1:11 AM, Hitoshi Harada umi.tan...@gmail.com wrote:
 On Sat, Jan 14, 2012 at 8:10 AM, Matthew Draper matt...@trebex.net wrote:

 I just remembered to make time to advance this from WIP to proposed
 patch this week... and then worked out I'm rudely dropping it into the
 last commitfest at the last minute. :/

 The patch applies clean against master but compiles with warnings.
 functions.c: In function ‘prepare_sql_fn_parse_info’:
 functions.c:212: warning: unused variable ‘argnum’
 functions.c: In function ‘sql_fn_post_column_ref’:
 functions.c:341: warning: implicit declaration of function ‘ParseFuncOrColumn’
 functions.c:345: warning: assignment makes pointer from integer without a cast

 (Now it occurred to me that forgetting the #include parse_func.h might
 hit this breakage..., so I'll fix it here and continue to test, but if
 you'll fix it yourself, let me know)

I fixed it here and it now works with my environment. The regression
tests pass, the feature seems working as aimed, but it seems to me
that it needs more test cases and documentation. For the tests, I
believe at least we need ambiguous case given upthread, so that we
can ensure to keep compatibility. For the document, it should describe
the name resolution rule, as stated in the patch comment.

Aside from them, I wondered at first what if the function is
schema-qualified. Say,

CREATE FUNCTION s.f(a int) RETURNS int AS $$
  SELECT b FROM t WHERE a = s.f.a
$$ LANGUAGE sql;

It actually errors out, since function-name-qualified parameter only
accepts function name without schema name, but it looked weird to me
at first. No better idea from me at the moment, though.

I mark this Waiting on Author for now.

Thanks,
-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

2012-01-19 Thread Hitoshi Harada

On Sun, Jan 15, 2012 at 10:46 PM, Greg Smith g...@2ndquadrant.com wrote:
 On 01/16/2012 01:28 AM, Greg Smith wrote:

 -I can't tell for sure if this is working properly when log_checkpoints is
 off.  This now collects checkpoint end time data in all cases, whereas
 before it ignored that work if log_checkpoints was off.


 ...and there's at least one I missed located already:  inside of md.c.  I'd
 forgotten how many spots where timing calls are optimized out are floating
 around this code path.

I think CF app page for this patch has missing link with wrong message-id.

Thanks,
-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: WIP patch: Improve relation size functions such as pg_relation_size() to avoid producing an error when called against a no longer visible relation

2012-01-19 Thread Heikki Linnakangas


On 23.12.2011 02:01, Phil Sorber wrote:

On Thu, Dec 22, 2011 at 3:19 PM, Robert Haasrobertmh...@gmail.com  wrote:

On Thu, Dec 22, 2011 at 2:02 PM, Phil Sorberp...@omniti.com  wrote:

On Thu, Dec 22, 2011 at 1:33 PM, Tom Lanet...@sss.pgh.pa.us  wrote:

Robert Haasrobertmh...@gmail.com  writes:

I'm wondering if we oughta just return NULL and be done with it.


+1.  There are multiple precedents for that sort of response, which we
introduced exactly so that SELECT some_function(oid) FROM some_catalog
wouldn't fail just because one of the rows had gotten deleted by the
time the scan got to it.  I don't think it's necessary for the
relation-size functions to be any smarter.  Indeed, I'd assumed that's
all that Phil's patch did, since I'd not looked closer till just now.


Here it is without the checking for recently dead. If it can't open
the relation it simply returns NULL.


I think we probably ought to make pg_database_size() and
pg_tablespace_size() behave similarly.


Changes added.


Looks good to me, committed. I added a sentence to the docs mentioning 
the new behavior, and also a code comment to explain why returning NULL 
is better than throwing an error.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] review:tracking temporary files in pg_stat_database

2012-01-19 Thread Pavel Stehule

Hello

This is review of Tomas' patch for counting of using temporary files:

* This patch was cleanly applied and code was compiled
* All 128 tests passed
* There are own regress tests, but usually pg_stat_* functions are not tested
* There is adequate documentation
* This patch was requested
http://archives.postgresql.org/pgsql-hackers/2011-12/msg00950.php
* Code following postgresql coding standards
* Code works


postgres=# create table xx(a int);
CREATE TABLE
postgres=# insert into xx select (random()*10)::int from
generate_series(1,20);
INSERT 0 20
postgres=# \d+ xx;
  Table public.xx
 Column |  Type   | Modifiers | Storage | Stats target | Description
+-+---+-+--+-
 a  | integer |   | plain   |  |
Has OIDs: no

postgres=# \dt+ xx;
   List of relations
 Schema | Name | Type  | Owner |  Size   | Description
+--+---+---+-+-
 public | xx   | table | pavel | 7104 kB |
(1 row)

postgres=# set work_mem to '1MB';
SET
postgres=# select
pg_stat_get_db_temp_bytes(12899),pg_stat_get_db_temp_files(12899);
 pg_stat_get_db_temp_bytes | pg_stat_get_db_temp_files
---+---
9889486560 |  4103
(1 row)

postgres=# select * from xx order by 1;

postgres=# select
pg_stat_get_db_temp_bytes(12899),pg_stat_get_db_temp_files(12899);
 pg_stat_get_db_temp_bytes | pg_stat_get_db_temp_files
---+---
9892288224 |  4104
(1 row)

This patch is ready for commit

Regards

Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IDLE in transaction introspection

2012-01-19 Thread Magnus Hagander

On Wed, Jan 18, 2012 at 16:35, Scott Mead sco...@openscg.com wrote:

 On Wed, Jan 18, 2012 at 8:27 AM, Magnus Hagander mag...@hagander.net
 wrote:

 On Tue, Jan 17, 2012 at 01:43, Scott Mead sco...@openscg.com wrote:
 
 
  On Mon, Jan 16, 2012 at 6:10 PM, Scott Mead sco...@openscg.com wrote:
 
 
  On Sun, Jan 15, 2012 at 4:28 AM, Greg Smith g...@2ndquadrant.com
  wrote:
 
  On 01/12/2012 11:57 AM, Scott Mead wrote:
 
  Pretty delayed, but please find the attached patch that addresses all
  the issues discussed.
 
 
  The docs on this v4 look like they suffered a patch order problem
  here.
   In the v3, you added a whole table describing the pg_stat_activity
  documentation in more detail than before.  v4 actually tries to remove
  those
  new docs, a change which won't even apply as they don't exist
  upstream.
 
  My guess is you committed v3 to somewhere, applied the code changes
  for
  v4, but not the documentation ones.  It's easy to do that and end up
  with a
  patch that removes a bunch of docs the previous patch added.  I have
  to be
  careful to always do something like git diff origin/master to avoid
  this
  class of problem, until I got into that habit I did this sort of thing
  regularly.
 
  gak
 
 
  I did a 'backwards' diff last time.  This time around, I diff-ed off of
  a
  fresh pull of 'master' (and I did the diff in the right direction.
 
  Also includes whitespace cleanup and the pg_stat_replication (procpid
  ==
  pid) regression fix.
 

 I'm reviewing this again, and have changed a few things around. I came
 up with a question, too :-)

 Right now, if you turn off track activities, we put command string
 not enabled in the query text. Shouldn't this also be a state, such
 as disabled? It seems more consistent to me...


 Sure, I was going the route of 'client_addr', i.e. leaving it null when not
 in use just to keep screen clutter down, but I'm not married to it.

Applied with fairly extensive modifications. I moved things around,
switched to using enum instead of int+#define and a few things like
that. Also changed most of the markup in the docs - I may well have
broken some previously good language that, so if I did and someone
spots it, please mention it :-)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

2012-01-19 Thread Alvaro Herrera


Excerpts from Hitoshi Harada's message of jue ene 19 07:08:52 -0300 2012:
 On Sun, Jan 15, 2012 at 10:46 PM, Greg Smith g...@2ndquadrant.com wrote:
  On 01/16/2012 01:28 AM, Greg Smith wrote:
 
  -I can't tell for sure if this is working properly when log_checkpoints is
  off.  This now collects checkpoint end time data in all cases, whereas
  before it ignored that work if log_checkpoints was off.
 
 
  ...and there's at least one I missed located already:  inside of md.c.  I'd
  forgotten how many spots where timing calls are optimized out are floating
  around this code path.
 
 I think CF app page for this patch has missing link with wrong message-id.

Fixed, thanks

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inline Extension

2012-01-19 Thread Heikki Linnakangas


On 08.01.2012 22:36, Dimitri Fontaine wrote:

The extension mechanism we added in 9.1 is aimed at allowing a fully
integrated contrib management, which was big enough a goal to preclude
doing anything else in its first release.


Hooray!


Now we have it and we can think some more about what features we want
covered, and a pretty obvious one that's been left out is the ability to
define and update an extension without resorting to file system support
for those extensions that do not need a shared object library. We could
have been calling that “SQL ONLY” extensions, but to simplify the
grammar support I did use the “inline” keyword so there we go.


Frankly I don't see the point of this. If the extension is an 
independent piece of (SQL) code, developed separately from an 
application, with its own lifecycle, a .sql file seems like the best way 
to distribute it. If it's not, ie. if it's an integral part of the 
database schema, then why package it as an extension in the first place?



Please find attached a WIP patch implementing that.  Note that the main
core benefit to integrating this feature is the ability to easily add
regression tests for extension related features.  Which is not done yet
in the attached.


I'm not sure I buy that argument. These inline extensions are 
sufficiently different from regular extensions that I think you'd need 
to have regression tests for both kinds, anyway.



I'm sending this quite soon because of the pg_dump support.  When an
extension is inline, we want to dump its content, as we currently do in
the binary dump output.  I had in mind that we could output a full
CREATE EXTENSION INLINE script in between some dollar-quoting rather
than adding each extension's object with a ALTER EXTENSION ... ADD line
like what pg_upgrade compatibility is currently doing.


I thought the main point of extensions is that that they're not included 
in pg_dump. Again, if the extension is an integral part of the database, 
then it probably shouldn't be an extension in the first place.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS]

2012-01-19 Thread Brad Ediger



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simulating Clog Contention

2012-01-19 Thread Heikki Linnakangas


On 12.01.2012 14:31, Simon Riggs wrote:

In order to simulate real-world clog contention, we need to use
benchmarks that deal with real world situations.

Currently, pgbench pre-loads data using COPY and executes a VACUUM so
that all hint bits are set on every row of every page of every table.
Thus, as pgbench runs it sees zero clog accesses from historical data.
As a result, clog access is minimised and the effects of clog
contention in the real world go unnoticed.

The following patch adds a pgbench option -I to load data using
INSERTs, so that we can begin benchmark testing with rows that have
large numbers of distinct un-hinted transaction ids. With a database
pre-created using this we will be better able to simulate and thus
more easily measure clog contention. Note that current clog has space
for 1 million xids, so a scale factor of greater than 10 is required
to really stress the clog.


No doubt this is handy for testing this particular area, but overall I 
feel this is too much of a one-trick pony to include in pgbench.


Alternatively, you could do something like this:

do $$
declare
  i int4;
  naccounts int4;
begin
  select count(*) into naccounts from pgbench_accounts;
  for i in 1..naccounts loop
-- use a begin-exception block to create a new subtransaction
begin
  update pgbench_accounts set abalance = abalance where aid = i;
exception
  when division_by_zero then raise 'unexpected division by zero 
error';end;

  end loop;
end;
$$;

after initializing the pgbench database, to assign distinct xmins to all 
rows. Another idea would be to run pg_dump in --inserts mode, edit the 
dump to remove BEGIN/COMMIT from it, and restore it back.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IDLE in transaction introspection

2012-01-19 Thread Fujii Masao

On Thu, Jan 19, 2012 at 10:31 PM, Magnus Hagander mag...@hagander.net wrote:
 Applied with fairly extensive modifications. I moved things around,
 switched to using enum instead of int+#define and a few things like
 that. Also changed most of the markup in the docs - I may well have
 broken some previously good language that, so if I did and someone
 spots it, please mention it :-)

The attached patch seems to need to be committed.

BTW, the following change in the patch is not required, but ISTM that
it's better to change that way.

-SELECT pg_stat_get_backend_pid(s.backendid) AS procpid,
-   pg_stat_get_backend_activity(s.backendid) AS current_query
+SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
+   pg_stat_get_backend_activity(s.backendid) AS query

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


noname
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simulating Clog Contention

2012-01-19 Thread Peter Geoghegan

On 19 January 2012 14:36, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 No doubt this is handy for testing this particular area, but overall I feel
 this is too much of a one-trick pony to include in pgbench.

I don't think that being conservative in accepting pgbench options is
the right way to go. It's already so easy for a non-expert to shoot
themselves in the foot that we don't do ourselves any favours by
carefully weighing the merits of an expert-orientated feature.

Have you ever read the man page for rsync? It's massive, with a huge
number of options, and rsync is supposed to be a tool that's widely
used by sysadmins, not a specialist database benchmarking tool.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-19 Thread Simon Riggs

On Wed, Jan 18, 2012 at 11:12 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jan 18, 2012 at 5:49 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Can I just check with you that the only review comment is a one line
 change? Seems better to make any additional review comments in one go.

 No, I haven't done a full review yet - I just noticed that right at
 the outset and wanted to be sure that got addressed.  I can look it
 over more carefully, and test it.

Corrected your point, and another found during retest.

Works as advertised, docs corrected.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/doc/src/sgml/ref/drop_index.sgml b/doc/src/sgml/ref/drop_index.sgml
index 7177ef2..aeb1531 100644
--- a/doc/src/sgml/ref/drop_index.sgml
+++ b/doc/src/sgml/ref/drop_index.sgml
@@ -21,7 +21,9 @@ PostgreSQL documentation
 
  refsynopsisdiv
 synopsis
-DROP INDEX [ IF EXISTS ] replaceable class=PARAMETERname/replaceable [, ...] [ CASCADE | RESTRICT ]
+DROP INDEX
+	[ IF EXISTS ] replaceable class=PARAMETERname/replaceable [, ...] [ CASCADE | RESTRICT ]
+	CONCURRENTLY replaceable class=PARAMETERname/replaceable
 /synopsis
  /refsynopsisdiv
 
@@ -50,6 +52,30 @@ DROP INDEX [ IF EXISTS ] replaceable class=PARAMETERname/replaceable [, ..
/varlistentry
 
varlistentry
+termliteralCONCURRENTLY/literal/term
+listitem
+ para
+  When this option is used, productnamePostgreSQL/ will drop the
+  index without taking any locks that prevent concurrent selects, inserts,
+  updates, or deletes on the table; whereas a standard index drop
+  waits for a lock that locks out everything on the table until it's done.
+  Concurrent drop index is a two stage process. First, we mark the index
+  both invalid and not ready then commit the change. Next we wait until
+  there are no users locking the table who can see the index.
+ /para
+ para
+  There are several caveats to be aware of when using this option.
+  Only one index name can be specified if the literalCONCURRENTLY/literal
+  parameter is specified. Only one concurrent index drop can occur on a
+  table at a time and no modifications on the table are allowed meanwhile.
+  Regular commandDROP INDEX/ command can be performed within
+  a transaction block, but commandDROP INDEX CONCURRENTLY/ cannot.
+  There is no CASCADE option when dropping an index concurrently.
+ /para
+/listitem
+   /varlistentry
+
+   varlistentry
 termreplaceable class=PARAMETERname/replaceable/term
 listitem
  para
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 0b3d489..6884fdb 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -171,9 +171,10 @@ static void reportDependentObjects(const ObjectAddresses *targetObjects,
 	   DropBehavior behavior,
 	   int msglevel,
 	   const ObjectAddress *origObject);
-static void deleteOneObject(const ObjectAddress *object, Relation depRel);
-static void doDeletion(const ObjectAddress *object);
-static void AcquireDeletionLock(const ObjectAddress *object);
+static void deleteOneObject(const ObjectAddress *object, Relation depRel,
+			int option_flags);
+static void doDeletion(const ObjectAddress *object, int option_flags);
+static void AcquireDeletionLock(const ObjectAddress *object, int option_flags);
 static void ReleaseDeletionLock(const ObjectAddress *object);
 static bool find_expr_references_walker(Node *node,
 			find_expr_references_context *context);
@@ -224,7 +225,7 @@ performDeletion(const ObjectAddress *object,
 	 * Acquire deletion lock on the target object.	(Ideally the caller has
 	 * done this already, but many places are sloppy about it.)
 	 */
-	AcquireDeletionLock(object);
+	AcquireDeletionLock(object, 0);
 
 	/*
 	 * Construct a list of objects to delete (ie, the given object plus
@@ -254,7 +255,7 @@ performDeletion(const ObjectAddress *object,
 	{
 		ObjectAddress *thisobj = targetObjects-refs + i;
 
-		deleteOneObject(thisobj, depRel);
+		deleteOneObject(thisobj, depRel, 0);
 	}
 
 	/* And clean up */
@@ -276,6 +277,13 @@ void
 performMultipleDeletions(const ObjectAddresses *objects,
 		 DropBehavior behavior)
 {
+	performMultipleDeletionsWithOptions(objects, behavior, 0);
+}
+
+void
+performMultipleDeletionsWithOptions(const ObjectAddresses *objects,
+		 DropBehavior behavior, int option_flags)
+{
 	Relation	depRel;
 	ObjectAddresses *targetObjects;
 	int			i;
@@ -308,7 +316,7 @@ performMultipleDeletions(const ObjectAddresses *objects,
 		 * Acquire deletion lock on each target object.  (Ideally the caller
 		 * has done this already, but many places are sloppy about it.)
 		 */
-		AcquireDeletionLock(thisobj);
+		AcquireDeletionLock(thisobj, option_flags);
 
 		findDependentObjects(thisobj,
 			 DEPFLAG_ORIGINAL,
@@ -336,13 +344,17 @@

Re: [HACKERS] Simulating Clog Contention

2012-01-19 Thread Simon Riggs

On Thu, Jan 19, 2012 at 2:36 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 12.01.2012 14:31, Simon Riggs wrote:

 In order to simulate real-world clog contention, we need to use
 benchmarks that deal with real world situations.

 Currently, pgbench pre-loads data using COPY and executes a VACUUM so
 that all hint bits are set on every row of every page of every table.
 Thus, as pgbench runs it sees zero clog accesses from historical data.
 As a result, clog access is minimised and the effects of clog
 contention in the real world go unnoticed.

 The following patch adds a pgbench option -I to load data using
 INSERTs, so that we can begin benchmark testing with rows that have
 large numbers of distinct un-hinted transaction ids. With a database
 pre-created using this we will be better able to simulate and thus
 more easily measure clog contention. Note that current clog has space
 for 1 million xids, so a scale factor of greater than 10 is required
 to really stress the clog.


 No doubt this is handy for testing this particular area, but overall I feel
 this is too much of a one-trick pony to include in pgbench.

 Alternatively, you could do something like this:

I think the one-trick pony is pgbench. It has exactly one starting
condition for its tests and that isn't even a real world condition.

The main point of including the option into pgbench is to have a
utility that produces as initial test condition that works the same
for everyone, so we can accept each others benchmark results. We both
know that if someone posts that they have done $RANDOMSQL on a table
before running a test, it will just be ignored and people will say
user error. Some people will get it wrong when reproducing things and
we'll have chaos.

The patch exists as a way of testing the clog contention improvement
patches and provides a route to long term regression testing that the
solution(s) still work.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inline Extension

2012-01-19 Thread Dimitri Fontaine

Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Frankly I don't see the point of this. If the extension is an independent
 piece of (SQL) code, developed separately from an application, with its own
 lifecycle, a .sql file seems like the best way to distribute it. If it's
 not, ie. if it's an integral part of the database schema, then why package
 it as an extension in the first place?

It allows to easily deploy an extension to N databases (my current use
case has 256 databases) and knowing which version is installed on each
server. It's easier to QA your procedures and upgrades when they are
packaged as extensions, too.

Now, for the dependency on a SQL file hosting the content, it's easier
to just connect to the databases and get them the script in the SQL
command rather than deploying a set of files: that means OS level
packaging, either RPM or debian or some other variant. Or some other
means of easily deploying the files. An SQL connection is all you need
if you're not shipping .so.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Avoiding shutdown checkpoint at failover

2012-01-19 Thread Simon Riggs

On Wed, Jan 18, 2012 at 7:15 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Sun, Nov 13, 2011 at 5:13 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Tue, Nov 1, 2011 at 12:11 PM, Simon Riggs si...@2ndquadrant.com wrote:

 When I say skip the shutdown checkpoint, I mean remove it from the
 critical path of required actions at the end of recovery. We can still
 have a normal checkpoint kicked off at that time, but that no longer
 needs to be on the critical path.

 Any problems foreseen? If not, looks like a quick patch.

 Patch attached for discussion/review.

 This feature is what I want, and very helpful to shorten the failover time in
 streaming replication.

 Here are the review comments. Though I've not checked enough whether
 this feature works fine in all recovery patterns yet.

 LocalSetXLogInsertAllowed() must be called before LogEndOfRecovery().
 LocalXLogInsertAllowed must be set to -1 after LogEndOfRecovery().

 XLOG_END_OF_RECOVERY record is written to the WAL file with new
 assigned timeline ID. But it must be written to the WAL file with old one.
 Otherwise, when re-entering a recovery after failover, we cannot find
 XLOG_END_OF_RECOVERY record at all.

 Before XLOG_END_OF_RECOVERY record is written,
 RmgrTable[rmid].rm_cleanup() might write WAL records. They also
 should be written to the WAL file with old timeline ID.

 When recovery target is specified, we cannot write new WAL to the file
 with old timeline because which means that valid WAL records in it are
 overwritten with new WAL. So when recovery target is specified,
 ISTM that we cannot skip end of recovery checkpoint. Or we might need
 to save all information about timelines in the database cluster instead
 of writing XLOG_END_OF_RECOVERY record, and use it when re-entering
 a recovery.

 LogEndOfRecovery() seems to need to call XLogFlush(). Otherwise,
 what if the server crashes after new timeline history file is created and
 recovery.conf is removed, but before XLOG_END_OF_RECOVERY record
 has not been flushed to the disk yet?

 During recovery, when we replay XLOG_END_OF_RECOVERY record, we
 should close the currently-opened WAL file and read the WAL file with
 the timeline which XLOG_END_OF_RECOVERY record indicates.
 Otherwise, when re-entering a recovery with old timeline, we cannot
 reach new timeline.



OK, some bad things there, thanks for the insightful comments.



I think you're right that we can't skip the checkpoint if xlog_cleanup
writes WAL records, since that implies at least one and maybe more
blocks have changed and need to be flushed. That can be improved upon,
but not now in 9.2.Cleanup WAL is written in either the old or the new
timeline, depending upon whether we increment it. So we don't need to
change anything there, IMHO.

The big problem is how we handle crash recovery after we startup
without a checkpoint. No quick fixes there.

So let me rethink this: The idea was that we can skip the checkpoint
if we promote to normal running during streaming replication.

WALReceiver has been writing to WAL files, so can write more data
without all of the problems noted. Rather than write the
XLOG_END_OF_RECOVERY record via XLogInsert we should write that **from
the WALreceiver** as a dummy record by direct injection into the WAL
stream. So the Startup process sees a WAL record that looks like it
was written by the primary saying promote yourself, although it was
actually written locally by WALreceiver when requested to shutdown.
That doesn't damage anything because we know we've received all the
WAL there is. Most importantly we don't need to change any of the
logic in a way that endangers the other code paths at end of recovery.

Writing the record in that way means we would need to calculate the
new tli slightly earlier, so we can input the correct value into the
record. That also solves the problem of how to get additional standbys
to follow the new master. The XLOG_END_OF_RECOVERY record is simply
the contents of the newly written tli history file.

If we skip the checkpoint and then crash before the next checkpoint we
just change timeline when we see XLOG_END_OF_RECOVERY. When we replay
the XLOG_END_OF_RECOVERY we copy the contents to the appropriate tli
file and then switch to it.

So this solves 2 problems: having other standbys follow us when they
don't have archiving, and avoids the checkpoint.

Let me know what you think.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility

2012-01-19 Thread Noah Misch

Hi Mikko,

First, for everyone else: this patch provides a more-compact binary output
format for arrays.  When the array contains no NULL elements and has a
fixed-length element type, the new format saves four bytes per array element.
We could do more.  We could add support for arrays containing NULLs by way of
a nulls bitmap.  We could reduce the per-array overhead, currently 20 bytes
for a 1-D array, in addition to the per-element overhead.  Does anyone care
about these broader cases?  If so, speak now.

On Thu, Dec 01, 2011 at 04:42:43PM +0200, Mikko Tiihonen wrote:
 On 11/30/2011 06:52 PM, Merlin Moncure wrote:
 On Mon, Nov 28, 2011 at 9:18 AM, Mikko Tiihonen
 mikko.tiiho...@nitorcreations.com  wrote:
 Hi,

 As discussed few days ago it would be beneficial if we could change the v3
 backend-client protocol without breaking backwards compatibility.

 Here is a working patch that exports a supported_binary_minor constant and a
 binary_minor session variable and a that can be used by clients to enable
 newer features.

 I also added an example usage where the array encoding for constant size
 elements is optimized if binary_minor version is new enough.

 I have coded the client side support for binary_minor for jdbc driver and
 tested that it worked. But lets first discuss if this is an acceptable path
 forward.

 I think all references to
 'protocol' should be removed;  Binary wire formats != protocol: the
 protocol could bump to v4 but the wire formats (by happenstance) could
 all still continue to work -- therefore the version isn't minor (it's
 not dependent on protocol version but only on itself).  Therefore,
 don't much like the name 'supported_binary_minor'.  How about
 binary_format_version?

 I was thinking that it would be possible use the new minor version to
 introduce also new protocol messages such as streaming of large data
 into database without knowing it's size beforehand.

I agree with Merlin; the frontend/backend protocol is logically distinct from
the binary send/recv formats of data types.  For one key point, the latter is
not exclusively core-defined; third-party extensions change their send/recv
formats on a different schedule.  They can add myext.binary_format_version
GUCs of their own to cope in a similar way.

Client interfaces that do not interpret individual result column data, such as
libpq, require no update for send/recv format changes.  In contrast, all
client interfaces would need changes for a new protocol message.  Most clients
doing so may as well then advertise their support unconditionally.  In
contrast, send/recv format is something individual _users_ of the same client
library may set differently.  It's reasonable to have an application that
still reads its data in send/recv format v0 even after upgrading to a version
of libpq that talks frontend/backend protocol v4.

I do think this is a great way to handle send/recv format changes.

 There needs to be decision on official policy about breaking backwards
 compatibility of postgresql clients. Is it:

 A) Every x.y postgres release is free to break any parts of the
* protocol
* text encoding
* binary protocol
as long as it is documented
- all client libraries should be coded so that they refuse to
   support version x.y+1 or newer (they might have a option to
   override this but there are no guarantees that it would work)
Good: no random bugs when using old client library
Bad: initial complaints from users before they understand that
 this is the best option for everyone

The ability to use old client libraries with new servers is more valuable than
the combined benefits of all currently-contemplated protocol improvements.

 D) My proposed compromise where there is one minor_version setting
and code in server to support all different minor versions.
The client requests the minor version so that all releases
default to backwards compatible version.

This is workable.

When there combinations starts to create too much maintenance
overhead a new clean v4 version of protocol is specified.
Good: Keeps full backwards compatibility
Good: Allows introducing changes at any time
Bad: Accumulates conditional code to server and clients until
 new version of protocol is released

We would retain support for protocol V3 for years following the first release
to support protocol V4, so think of the conditional code as lasting forever.


The patch works as advertised.  After set binary_minor = 1, the output of
this command shrinks from 102 MiB to 63 MiB:

\copy (select array[0,1,2,3,4,5,6,7,8,9] from 
generate_series(1,100)) to mynums with binary

 --- a/src/backend/utils/misc/guc.c
 +++ b/src/backend/utils/misc/guc.c
 @@ -477,6 +477,7 @@ static intsegment_size;
  static int   wal_block_size;
  static int   wal_segment_size;
  static bool integer_datetimes;
 +static int  supported_binary_minor;
  static int

Re: [HACKERS] Simulating Clog Contention

2012-01-19 Thread Robert Haas

On Thu, Jan 19, 2012 at 10:18 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Thu, Jan 19, 2012 at 2:36 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 On 12.01.2012 14:31, Simon Riggs wrote:

 In order to simulate real-world clog contention, we need to use
 benchmarks that deal with real world situations.

 Currently, pgbench pre-loads data using COPY and executes a VACUUM so
 that all hint bits are set on every row of every page of every table.
 Thus, as pgbench runs it sees zero clog accesses from historical data.
 As a result, clog access is minimised and the effects of clog
 contention in the real world go unnoticed.

 The following patch adds a pgbench option -I to load data using
 INSERTs, so that we can begin benchmark testing with rows that have
 large numbers of distinct un-hinted transaction ids. With a database
 pre-created using this we will be better able to simulate and thus
 more easily measure clog contention. Note that current clog has space
 for 1 million xids, so a scale factor of greater than 10 is required
 to really stress the clog.


 No doubt this is handy for testing this particular area, but overall I feel
 this is too much of a one-trick pony to include in pgbench.

 Alternatively, you could do something like this:

 I think the one-trick pony is pgbench. It has exactly one starting
 condition for its tests and that isn't even a real world condition.

 The main point of including the option into pgbench is to have a
 utility that produces as initial test condition that works the same
 for everyone, so we can accept each others benchmark results. We both
 know that if someone posts that they have done $RANDOMSQL on a table
 before running a test, it will just be ignored and people will say
 user error. Some people will get it wrong when reproducing things and
 we'll have chaos.

 The patch exists as a way of testing the clog contention improvement
 patches and provides a route to long term regression testing that the
 solution(s) still work.

I agree: I think this is useful.

However, I think we should follow the precedent of some of the other
somewhat-obscure options we've added recently and have only a long
form option for this: --inserts.

Also, I don't think the behavior described here should be joined at
the hip to --inserts:

+* We do this after a load by COPY, but before a load via INSERT
+*
+* This is done deliberately to ensure that no heap or index hints are
+* set before we start running the benchmark. This emulates the case
+* where data has arrived row at a time by INSERT, rather than being
+* bulkloaded prior to update.

I think that's also a useful behavior, but if we're going to have it,
we should have a separate option for it, like --create-indexes-early.
Otherwise, someone who wants to (for example) test only the impact of
doing inserts vs. COPY will get misleading answers.

Finally, it's occurred to me that it would be useful to make pgbench
respect -n even in initialization mode, and the SGML doc changes imply
that this patch does that somewhere or other, but maybe only when
you're doing INSERTS and not when you're doing copy, which would be
odd; and there's no sgml doc update for -n, and no command-line help
change either.

In short, I think the reason this patch seems like it's implementing
something fairly arbitrary it's really three pretty good ideas that
are somewhat jumbled together.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

2012-01-19 Thread Robert Haas

On Mon, Jan 16, 2012 at 1:28 AM, Greg Smith g...@2ndquadrant.com wrote:
 One of the most useful bits of feedback on how well checkpoint I/O is going
 is the amount of time taken to sync files to disk.  Right now the only way
 to get that is to parse the logs.  The attached patch publishes the most
 useful three bits of data you could only get from log_checkpoints before out
 to pg_stat_bgwriter.

It's not quite clear from your email, but I gather that the way that
this is intended to work is that these values increment every time we
checkpoint?

Also, forgive for asking this possibly-stupid question, but of what
use is this information? I can't imagine why I'd care about a running
total of the number of files fsync'd to disk.  I also can't really
imagine why I'd care about the length of the write phase, which surely
will almost always be a function of checkpoint_completion_target and
checkpoint_timeout unless I manage to overrun the number of
checkpoint_segments I've allocated.  The only number that really seems
useful to me is the time spent syncing.  I have a clear idea what to
look for there: smaller numbers are better than bigger ones.  For the
rest I'm mystified.

And, it doesn't seem like it's necessarily going to safe me a whole
lot either, because if it turns out that my sync phases are long, the
first question out of my mouth is going to be what percentage of my
total sync time is accounted for by the longest sync?.  And so right
there I'm back to the logs.  It's not clear how such information could
be usefully exposed in pg_stat_bgwriter either, since you probably
want to know only the last few values, not a total over all time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simulating Clog Contention

2012-01-19 Thread Simon Riggs

On Thu, Jan 19, 2012 at 3:41 PM, Robert Haas robertmh...@gmail.com wrote:

 I agree: I think this is useful.

 However, I think we should follow the precedent of some of the other
 somewhat-obscure options we've added recently and have only a long
 form option for this: --inserts.

Yep, no problem.

 Also, I don't think the behavior described here should be joined at
 the hip to --inserts:

 +        * We do this after a load by COPY, but before a load via INSERT
 +        *
 +        * This is done deliberately to ensure that no heap or index hints are
 +        * set before we start running the benchmark. This emulates the case
 +        * where data has arrived row at a time by INSERT, rather than being
 +        * bulkloaded prior to update.

 I think that's also a useful behavior, but if we're going to have it,
 we should have a separate option for it, like --create-indexes-early.
 Otherwise, someone who wants to (for example) test only the impact of
 doing inserts vs. COPY will get misleading answers.

Creating indexes later would invalidate the test conditions I was
trying to create, so that doesn't add a useful new initialisation
mode. (Creating the indexes causes all of the hint bits to be set).

So that's just adding unrelated requirements for additional tests.
Yes, there are lots of additional tests we could get this code to
perform but we don't need to burden this patch with responsibility for
adding them, especially not when the tests mentioned don't refer to
any related patches in this commit fest and could be done at any time.
Any such change is clearly very low priority at this time.

 Finally, it's occurred to me that it would be useful to make pgbench
 respect -n even in initialization mode, and the SGML doc changes imply
 that this patch does that somewhere or other, but maybe only when
 you're doing INSERTS and not when you're doing copy, which would be
 odd; and there's no sgml doc update for -n, and no command-line help
 change either.

I'll fix the docs.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.2] sepgsql's DROP Permission checks

2012-01-19 Thread Robert Haas

On Thu, Jan 19, 2012 at 3:51 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 2012/1/19 Robert Haas robertmh...@gmail.com:
 On Wed, Jan 18, 2012 at 9:50 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 In sepgsql side, it determines a case to apply permission checks
 according to the contextual information; that is same technique
 when we implemented create permission.
 Thus, it could checks db_xxx:{drop} permission correctly.

 Why do we need the contextual information in this case?  Why
 can't/shouldn't the decision be made solely on the basis of what
 object is targeted?

 Several code-paths to remove a particular objects are not appropriate
 to apply permission checks. For example...

 [1] Cleanup of temporary objects
 on_shmem_eixt() registers RemoveTempRelationsCallback(), then
 it eventually calls deleteWhatDependsOn() that have an invocation
 of deleteOneObject().
 This code path is just an internal cleanup process, not related to
 permission of the client.

 [2] Cleanup of transient table at VACUUM FULL/CLUSTER command
 rebuild_relation() creates a temporary table with make_new_heap(),
 then it copies the contents of original table according to the order of
 index, and calls finish_heap_swap() that swaps relation files and
 removes the temporary table using performDeletion().
 This code actually create and drop a table, however, it is quite
 internal design and not related to permission of the client.

 So, I think sepgsql should only applied to permission checks
 object deletion invoked by user's operations, such as DropStmt.

I agree with that theory, but isn't this method of implementing that a
pretty horrible kludge?  For example, if you'd implemented it this way
for 9.1, the recent drop-statement refactoring would have broken it.
Or if, in the future, we add another type of statement that can drop
things, this code will still compile just fine but will no longer work
correctly.  ISTM that we need a way to either (1) not call the hook at
all unless the operation is user-initiated, or (2) call the hook, but
pass a flag indicating what sort of operation this is?

Let's imagine another possible use of this hook: we want to emit some
kind of log message every time a database object gets dropped.  I
think that's a plausible use-case, and in that case what we'd want is:
(1) VACUUM FULL or CLUSTER shouldn't call the hook at all, (2) cleanup
of temporary objects should probably call the hook, but ideally with a
flag to indicate that it's an internal (DB-initiated) operation, and
(3) user activity should definitely call the hook.

I'm not sure how we can cleanly get that behavior, but ISTM that's
what we want...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simulating Clog Contention

2012-01-19 Thread Robert Haas

On Thu, Jan 19, 2012 at 10:55 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Also, I don't think the behavior described here should be joined at
 the hip to --inserts:

 +        * We do this after a load by COPY, but before a load via INSERT
 +        *
 +        * This is done deliberately to ensure that no heap or index hints 
 are
 +        * set before we start running the benchmark. This emulates the case
 +        * where data has arrived row at a time by INSERT, rather than being
 +        * bulkloaded prior to update.

 I think that's also a useful behavior, but if we're going to have it,
 we should have a separate option for it, like --create-indexes-early.
 Otherwise, someone who wants to (for example) test only the impact of
 doing inserts vs. COPY will get misleading answers.

 Creating indexes later would invalidate the test conditions I was
 trying to create, so that doesn't add a useful new initialisation
 mode. (Creating the indexes causes all of the hint bits to be set).

Right, but the point is that to address Heikki's objection that this
is a special-purpose hack, we should try to make it general, so that
it can be used by other people for other things.  For example, if the
options are separated, you can use this to measure how much slower
--inserts vs. the regular way.  But if that also changes the way
indexes are created, then you can't.  Moreover, since the
documentation mentioned only one of those two changes and not the
other, you might reasonably think that you've conducted a valid test.
We could document that --inserts changes the behavior in multiple
ways, but then the switch will end up being a bit of a misnomer, so I
think it's better to have a separate switch for each behavior someone
might want.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inline Extension

2012-01-19 Thread Tom Lane

Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Frankly I don't see the point of this. If the extension is an independent
 piece of (SQL) code, developed separately from an application, with its own
 lifecycle, a .sql file seems like the best way to distribute it. If it's
 not, ie. if it's an integral part of the database schema, then why package
 it as an extension in the first place?

 It allows to easily deploy an extension to N databases (my current use
 case has 256 databases) and knowing which version is installed on each
 server. It's easier to QA your procedures and upgrades when they are
 packaged as extensions, too.

 Now, for the dependency on a SQL file hosting the content, it's easier
 to just connect to the databases and get them the script in the SQL
 command rather than deploying a set of files: that means OS level
 packaging, either RPM or debian or some other variant. Or some other
 means of easily deploying the files. An SQL connection is all you need
 if you're not shipping .so.

I'm with Heikki on not believing that this is a good idea.  If you are
trying to do careful versioning of a set of object definitions, you want
to stick the things in a file, you don't want them just flying by in
submitted SQL.  Also, a large part of the point of the extension
facility is to be able to do uninstall/reinstall and version
upgrades/downgrades, none of which are possible unless the extension
scripts are stored somewhere.

ISTM your distribution concern would be much better addressed by
installing contrib/adminpack and then just using pg_file_write()
to put the new extension script into the remote server's library.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simulating Clog Contention

2012-01-19 Thread Simon Riggs

On Thu, Jan 19, 2012 at 4:12 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jan 19, 2012 at 10:55 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Also, I don't think the behavior described here should be joined at
 the hip to --inserts:

 +        * We do this after a load by COPY, but before a load via INSERT
 +        *
 +        * This is done deliberately to ensure that no heap or index hints 
 are
 +        * set before we start running the benchmark. This emulates the case
 +        * where data has arrived row at a time by INSERT, rather than being
 +        * bulkloaded prior to update.

 I think that's also a useful behavior, but if we're going to have it,
 we should have a separate option for it, like --create-indexes-early.
 Otherwise, someone who wants to (for example) test only the impact of
 doing inserts vs. COPY will get misleading answers.

 Creating indexes later would invalidate the test conditions I was
 trying to create, so that doesn't add a useful new initialisation
 mode. (Creating the indexes causes all of the hint bits to be set).

 Right, but the point is that to address Heikki's objection that this
 is a special-purpose hack, we should try to make it general, so that
 it can be used by other people for other things.

This supports running hundreds of different tests because it creates a
useful general starting condition. It's not a special purpose hack
because its not a hack, nor is it special purpose.

Yes, we could have an option to run with no indexes. Or we could have
an option to run with 2 indexes as well. We could do all sorts of
things. None of that is important, because there aren't any patches in
the queue that need those tests and its too late to do it in this
release. And if it really is important you can do it in the next
release.

If we have time to spend we should be spending it on running the patch
to test the effectiveness of other patches in the queue, not on
inventing new tests that have no relevance.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

2012-01-19 Thread Simon Riggs

On Thu, Jan 19, 2012 at 3:52 PM, Robert Haas robertmh...@gmail.com wrote:

 And, it doesn't seem like it's necessarily going to safe me a whole
 lot either, because if it turns out that my sync phases are long, the
 first question out of my mouth is going to be what percentage of my
 total sync time is accounted for by the longest sync?.  And so right
 there I'm back to the logs.

It seems clear that the purpose of this is to quickly and easily
decide whether there is a potential problem.

If you decide there is a potential problem, you may wish to look at
more detailed information.

So there is a huge time saving from avoiding the need to look at the
detail when its unnecessary and possibly not even enabled.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inline Extension

2012-01-19 Thread David E. Wheeler

On Jan 19, 2012, at 7:21 AM, Dimitri Fontaine wrote:

 Now, for the dependency on a SQL file hosting the content, it's easier
 to just connect to the databases and get them the script in the SQL
 command rather than deploying a set of files: that means OS level
 packaging, either RPM or debian or some other variant. Or some other
 means of easily deploying the files. An SQL connection is all you need
 if you're not shipping .so.

ISTM that if you are managing 256 servers, you’re likely already using a 
packaging system for the deployment of application dependencies. In which case, 
to keep things consistent, you ought to distribute your extensions in exactly 
the same way.

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Group commit, revised

2012-01-19 Thread Robert Haas

On Wed, Jan 18, 2012 at 5:38 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, Jan 18, 2012 at 1:23 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Jan 17, 2012 at 12:37 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 I found it very helpful to reduce wal_writer_delay in pgbench tests, when
 running with synchronous_commit=off. The reason is that hint bits don't get
 set until the commit record is flushed to disk, so making the flushes more
 frequent reduces the contention on the clog. However, Simon made async
 commits nudge WAL writer if the WAL page fills up, so I'm not sure how
 relevant that experience is anymore.

 Still completely relevant and orthogonal to this discussion. The patch
 retains multi-modal behaviour.

I don't know what you mean by this.  I think removing wal_writer_delay
is premature, because I think it still may have some utility, and the
patch removes it.  That's a separate change that should be factored
out of this patch and discussed separately.

 There's still a small but measurable effect there in master.  I think
 we might be able to make it fully auto-tuning, but I don't think we're
 fully there yet (not sure how much this patch changes that equation).

 I suggested a design similar to the one you just proposed to Simon
 when he originally suggested this feature.  It seems that if the WAL
 writer is the only one doing WAL flushes, then there must be some IPC
 overhead - and context switching - involved whenever WAL is flushed.
 But clearly we're saving something somewhere else, on the basis of
 Peter's results, so maybe it's not worth worrying about.  It does seem
 pretty odd to have all the regular backends going through the WAL
 writer and the auxiliary processes using a different mechanism,
 though.  If we got rid of that, maybe WAL writer wouldn't even require
 a lock, if there's only one process that can be doing it at a time.

 When we did sync rep it made sense to have the WALSender do the work
 and for others to just wait. It would be quite strange to require a
 different design for essentially the same thing for normal commits and
 WAL flushes to local disk. I should mention the original proposal for
 streaming replication had each backend send data to standby
 independently and that was recognised as a bad idea after some time.
 Same for sync rep also.

I don't think those cases are directly comparable.  SR is talking to
another machine, and I can't imagine that there is a terribly
convenient or portable way for every backend that needs one to get a
hold of the file descriptor for the socket.  Even if it could, the
data is sent as a stream, so if multiple backends sent to the same
file descriptor you'd have to have some locking to prevent messages
from getting interleaved.  Or else you could have multiple
connections, or use UDP, but that gets rather complex.  Anyway, none
of this is an issue for file I/O: anybody can open the file, and if
two backends write data at different offsets at the same time - or the
same data at the same offset at the same time - there's no problem.
So the fact that it wasn't a good idea for SR doesn't convince me that
it's also a bad idea here.

On the other hand, I'm not saying we *should* do it that way, either -
i.e. I am not trying to require a different design just because it's
fun to make people change things.  Rather, I am trying to figure out
whether the design you've chosen is in fact the best one, and part of
that involves reasoning about why it might or might not be.  There are
obvious reasons to think that having process A kick process B and go
to sleep, then have process B do some work and wake up process A might
be less efficient than having process A just do the work itself, in
the uncontended case.  The fact that that isn't happening seems
awfully strange to me; it's hard to understand why 3 system calls are
faster than one.  That suggests that either the current system is
badly broken in some way that this patch fixes (in which case, it
would be nice to know what the broken thing is) or that there's an
opportunity for further optimization of the new patch (either now or
later, depending on how much work we're talking about).  Just to be
clear, it makes perfect sense that the new system is faster in the
contended case, and the benchmark numbers are very impressive.  It's a
lot harder to understand why it's not slower in the uncontended case.

 Not sure why its odd to have backends do one thing and auxiliaries do
 another. The whole point of auxiliary processes is that they do a
 specific thing different to normal backends. Anyway, the important
 thing is to have auxiliary processes be independent of each other as
 much as possible, which simplifies error handling and state logic in
 the postmaster.

Yeah, I guess the shutdown sequence could get a bit complex if we try
to make everyone go through the WAL writer all the time.  But I wonder
if we could rejiggering things somehow so that

Re: [HACKERS] Simulating Clog Contention

2012-01-19 Thread Marti Raudsepp

On Thu, Jan 19, 2012 at 18:12, Robert Haas robertmh...@gmail.com wrote:
 Right, but the point is that to address Heikki's objection that this
 is a special-purpose hack, we should try to make it general, so that
 it can be used by other people for other things.

Personally I would like to see support for more flexibility in pgbench
scripts. It would be useful to allow scripts to contain custom
initialization sections -- for scripts that want a custom schema, as
well as different ways to populate the standard schema.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simulating Clog Contention

2012-01-19 Thread Robert Haas

On Thu, Jan 19, 2012 at 11:46 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Thu, Jan 19, 2012 at 4:12 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jan 19, 2012 at 10:55 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Also, I don't think the behavior described here should be joined at
 the hip to --inserts:

 +        * We do this after a load by COPY, but before a load via INSERT
 +        *
 +        * This is done deliberately to ensure that no heap or index hints 
 are
 +        * set before we start running the benchmark. This emulates the 
 case
 +        * where data has arrived row at a time by INSERT, rather than 
 being
 +        * bulkloaded prior to update.

 I think that's also a useful behavior, but if we're going to have it,
 we should have a separate option for it, like --create-indexes-early.
 Otherwise, someone who wants to (for example) test only the impact of
 doing inserts vs. COPY will get misleading answers.

 Creating indexes later would invalidate the test conditions I was
 trying to create, so that doesn't add a useful new initialisation
 mode. (Creating the indexes causes all of the hint bits to be set).

 Right, but the point is that to address Heikki's objection that this
 is a special-purpose hack, we should try to make it general, so that
 it can be used by other people for other things.

 This supports running hundreds of different tests because it creates a
 useful general starting condition. It's not a special purpose hack
 because its not a hack, nor is it special purpose.

 Yes, we could have an option to run with no indexes. Or we could have
 an option to run with 2 indexes as well. We could do all sorts of
 things. None of that is important, because there aren't any patches in
 the queue that need those tests and its too late to do it in this
 release. And if it really is important you can do it in the next
 release.

 If we have time to spend we should be spending it on running the patch
 to test the effectiveness of other patches in the queue, not on
 inventing new tests that have no relevance.

I feel I've adequate explained why it makes sense to me to separate
those options.  If you want, I'll do the work myself; it will take
less time than arguing about it.

On the other hand, if you wish to insist that we should only commit
this patch if --inserts makes multiple, unrelated, undocumented
changes to the initial test configurations, then I'll join Heikki in
voting against this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] automating CF submissions (was xlog location arithmetic)

2012-01-19 Thread Alex Shulgin


Greg Smith g...@2ndquadrant.com writes:

 One unicorn I would like to have here would give the CF app a database
 of recent e-mails to pgsql-hackers.  I login to the CF app, click on
 Add recent submission, and anything matching my e-mail address
 appears with a checkbox next to it.  Click on the patch submissions,
 and then something like you described would happen.  That would save
 me the annoying work around looking up message IDs so much.

Another idea: introduce some simple tag system in mails sent to -hackers
to be treated specially, e.g:

@fest add-to-current

to add new patch to the commit fest currently in progress, or

@fest add-to-next

to add it to the next scheduled fest.

Attribute your mail with

@fest comment COMMENT TEXT

or

@fest comment EOF
...
EOF

to add a (long) comment, ditto for patch and review.

How does that sound?

--
Alex

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simulating Clog Contention

2012-01-19 Thread Simon Riggs

On Thu, Jan 19, 2012 at 5:47 PM, Robert Haas robertmh...@gmail.com wrote:

 I feel I've adequate explained why it makes sense to me to separate
 those options.  If you want, I'll do the work myself; it will take
 less time than arguing about it.

If you have time to contribute, please use the patch as stands to test
the other patches in the CF queue.

It's more important that we measure and fix clog contention than have
a new pgbench feature with no immediate value to the next release of
Postgres.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum rate limit in KBps

2012-01-19 Thread Robert Haas

On Sun, Jan 15, 2012 at 4:17 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 +1. I've been thinking we should do that for a long time, but haven't gotten
 around to it.

 I think it makes more sense to use the max read rate as the main knob,
 rather than write rate. That's because the max read rate is higher than the
 write rate, when you don't need to dirty pages. Or do you think saturating
 the I/O system with writes is so much bigger a problem than read I/O that it
 makes more sense to emphasize the writes?

 I was thinking of something like this, in postgresql.conf:

 # - Vacuum Throttling -

 #vacuum_cost_page_miss = 1.0            # measured on an arbitrary scale
 #vacuum_cost_page_dirty = 2.0           # same scale as above
 #vacuum_cost_page_hit = 0.1             # same scale as above
 #vacuum_rate_limit = 8MB                # max reads per second

 This is now similar to the cost settings for the planner, which is good.

I have to say that I find that intensely counterintuitive.  The
current settings are not entirely easy to tune correctly, but at least
they're easy to explain.  What does that 8MB mean and how does it
relate to vacuum_cost_page_miss?  If I double vacuum_rate_page_miss,
does that effectively also double the cost limit, so that dirty pages
and hits become relatively cheaper?  If so, then I think what that
really means is that the limit is 8MB only if there are no hits and no
dirtied pages - otherwise it's less, and the amount by which it is
less is the result of some arcane calculation.  Ugh!

I can really imagine people wanting to limit two things here: either
they want to limit the amount of read I/O, or they want to limit the
amount of write I/O.  If your database fits in physical memory you
probably don't care about the cost of page misses very much at all,
but you probably do care about how much data you dirty.  OTOH, if your
database doesn't fit in physical memory and you have a relatively
small percentage of dirty pages because the tables are lightly
updated, dirtying might be pretty secondary; if you care at all, it's
going to be because busying the disk head with large sequential reads
eats up too much of the system's I/O capacity.  If we added
vacuum_read_rate_limit and vacuum_dirty_rate_limit, totally
independently of each other, and through the current system where
those two things get mixed together in one big bucket out the window
completely, I could maybe sign onto that as an improvement to the UI.

But even then, I think we need to balance the amount of the gain
against the backward compatibility problems we're going to create.  If
we start removing autovacuum options, then, as Greg notes, we have to
figure out how to make old pg_dumps load into new databases and
hopefully do something close to what the DBA intended.  And the DBA
will have to learn the new system.  I'm not sure we're really going to
get enough mileage out changing this to justify the hassle.  It's
basically a cosmetic improvement, and I think we should be careful
about breaking compatibility for cosmetic improvements, especially at
the end of a release cycle when we're under time pressure.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] automating CF submissions (was xlog location arithmetic)

2012-01-19 Thread Andrew Dunstan




On 01/19/2012 12:59 PM, Alex Shulgin wrote:

Greg Smithg...@2ndquadrant.com  writes:


One unicorn I would like to have here would give the CF app a database
of recent e-mails to pgsql-hackers.  I login to the CF app, click on
Add recent submission, and anything matching my e-mail address
appears with a checkbox next to it.  Click on the patch submissions,
and then something like you described would happen.  That would save
me the annoying work around looking up message IDs so much.

Another idea: introduce some simple tag system in mails sent to -hackers
to be treated specially, e.g:

@fest add-to-current

to add new patch to the commit fest currently in progress, or

@fest add-to-next

to add it to the next scheduled fest.

Attribute your mail with

@fest comment COMMENT TEXT

or

@fest commentEOF
...
EOF

to add a (long) comment, ditto for patch and review.

How does that sound?



Like a recipe for something that requires constant fixups, to be honest.

Seriously, adding something to the CF isn't *that* hard. I like Greg's 
idea of a list of recent emails that you could choose from.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Postgres ReviewFest Patch: URI connection string support for libpq

2012-01-19 Thread Alex Shulgin


Nick Roosevelt nro...@gmail.com writes:

I am sorry, seems like my new MUA was misconfigured so the two previous
attempts to reply to -hackers@ failed.  So here goes again.

Just reviewed the patch for adding URI connection string support for
libpg.

Thank you for taking your time on this.

There seem to be many tabs in the patch.  Perhaps the indentation is
not correct.

I believe tabs for indentation is the project's standard.

Also, the patch did not run correctly:
patching file src/interfaces/libpq/fe-connect.c
Hunk #1 succeeded at 282 with fuzz 1.
Hunk #2 FAILED at 300.
Hunk #3 FAILED at 583.
Hunk #4 FAILED at 3742.
Hunk #5 FAILED at 4132.
Hunk #6 FAILED at 4279.
Hunk #7 FAILED at 4455.
6 out of 7 hunks FAILED -- saving rejects to file
src/interfaces/libpq/fe-connect.c.rej
Seems like the file has changed since this patch was created.  Please
recreate the patch.

I've just tried to apply the original patch against a clean checkout of
master branch and it applies without a problem (no hunk failed, no
fuzziness.)

Could you please try again on a clean master branch?

--
Regards,
Alex

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simulating Clog Contention

2012-01-19 Thread Robert Haas

On Thu, Jan 19, 2012 at 1:02 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Thu, Jan 19, 2012 at 5:47 PM, Robert Haas robertmh...@gmail.com wrote:
 I feel I've adequate explained why it makes sense to me to separate
 those options.  If you want, I'll do the work myself; it will take
 less time than arguing about it.

 If you have time to contribute, please use the patch as stands to test
 the other patches in the CF queue.

Those things aren't mutually exclusive; whether or not I spend an hour
whacking this patch around isn't going to have any impact on how much
benchmarking I get done.   Benchmarking is mostly waiting, and I can
do other things while the tests are going.

Just to reiterate a point I've made previously, Nate Boley's test
machine was running another big job for several weeks straight, and I
haven't been able to use the machine for anything.  It seems to be
unloaded at the moment so I'll try to squeeze in some tests, but I
don't know how long it will stay that way.  It's been great to have
nearly unimpeded access to this for most of the cycle, but all good
things must (and do) come to an end.  In any event, none of this has
much impact on the offer above, which is a small amount of code that I
will be happy to attend to if you do not wish to do so.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] automating CF submissions (was xlog location arithmetic)

2012-01-19 Thread Alex Shulgin


Andrew Dunstan and...@dunslane.net writes:

 On 01/19/2012 12:59 PM, Alex Shulgin wrote:

 Another idea: introduce some simple tag system in mails sent to -hackers
 to be treated specially, e.g:

 @fest add-to-current

 to add new patch to the commit fest currently in progress, or

 @fest add-to-next

 to add it to the next scheduled fest.

 Attribute your mail with

 @fest comment COMMENT TEXT

 or

 @fest commentEOF
 ...
 EOF

 to add a (long) comment, ditto for patch and review.

 How does that sound?


 Like a recipe for something that requires constant fixups, to be honest.

 Seriously, adding something to the CF isn't *that* hard. I like Greg's
 idea of a list of recent emails that you could choose from.

I've just added a comment about a patch and it took me to:

a. Login to commitfest app
b. Locate the patch and review I was replying to
c. Fetch archives thread index, refresh the index page for ~10 minutes
to see my reply appear
d. Copy message id and finally register comment in the commitfest app

(IIRC, something close to that was already described in this thread)

With the proposed approach it would only take me to include

@fest comment Patch applies cleanly

and possibly

@fest status Needs Review

to update the patch status and that'd be it.

--
Alex

PS: yes, I could just copy message id from the sent mail in my MUA, but
I like to make sure links I post aren't broke, so still I'll need to
wait until archives catches up to double check.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] review of: collation for (expr)

2012-01-19 Thread Peter Eisentraut

On tor, 2012-01-12 at 21:25 -0800, probabble wrote:
 Compiling on Ubuntu 10.04 LTS AMD64 on a GoGrid virtual machine from
 2012-01-12 checkout.
 
 Bison upgraded to v2.5, and downgraded to v2.4.1
 
 Make process for both versions resulted in the following errors:
 
 make[3]: Leaving directory `/usr/local/src/pgbuild/src/backend/catalog'
 make -C parser gram.h
 make[3]: Entering directory `/usr/local/src/pgbuild/src/backend/parser'
 /usr/local/bin/bison -d  -o gram.c gram.y
 gram.y: conflicts: 370 reduce/reduce
 gram.y: expected 0 reduce/reduce conflicts
 gram.y:10482.27-10494.33: warning: rule useless in parser due to conflicts:
 func_expr: COLLATION FOR '(' a_expr ')'
 make[3]: *** [gram.c] Error 1

I can't reproduce that.

In the meantime, attached is a re-merged version of the patch; the old
version doesn't apply cleanly anymore.
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index ff9b8b0..4d77024 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -13637,6 +13637,10 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
 primarypg_typeof/primary
/indexterm
 
+   indexterm
+primarycollation for/primary
+   /indexterm
+
   para
xref linkend=functions-info-catalog-table lists functions that
extract information from the system catalogs.
@@ -13790,6 +13794,11 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
entrytyperegtype/type/entry
entryget the data type of any value/entry
   /row
+  row
+   entryliteralfunctioncollation for (parameterany/parameter)/function/literal/entry
+   entrytypetext/type/entry
+   entryget the collation of the argument/entry
+  /row
  /tbody
 /tgroup
/table
@@ -13916,6 +13925,27 @@ SELECT typlen FROM pg_type WHERE oid = pg_typeof(33);
 /programlisting
   /para
 
+  para
+   The expression literalcollation for/literal returns the collation of the
+   value that is passed to it.  Example:
+programlisting
+SELECT collation for (description) FROM pg_description LIMIT 1;
+ pg_collation_for 
+--
+ default
+(1 row)
+
+SELECT collation for ('foo' COLLATE de_DE);
+ pg_collation_for 
+--
+ de_DE
+(1 row)
+/programlisting
+  The value might be quoted and schema-qualified.  If no collation is derived
+  for the argument expression, then a null value is returned.  If the argument
+  is not of a collatable data type, then an error is raised.
+  /para
+
indexterm
 primarycol_description/primary
/indexterm
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 0ec039b..dd0e2e8 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -10466,6 +10466,19 @@ func_expr:	func_name '(' ')' over_clause
 	n-location = @1;
 	$$ = (Node *)n;
 }
+			| COLLATION FOR '(' a_expr ')'
+{
+	FuncCall *n = makeNode(FuncCall);
+	n-funcname = SystemFuncName(pg_collation_for);
+	n-args = list_make1($4);
+	n-agg_order = NIL;
+	n-agg_star = FALSE;
+	n-agg_distinct = FALSE;
+	n-func_variadic = FALSE;
+	n-over = NULL;
+	n-location = @1;
+	$$ = (Node *)n;
+}
 			| CURRENT_DATE
 {
 	/*
@@ -11917,7 +11930,6 @@ unreserved_keyword:
 			| CLASS
 			| CLOSE
 			| CLUSTER
-			| COLLATION
 			| COMMENT
 			| COMMENTS
 			| COMMIT
@@ -12253,6 +12265,7 @@ reserved_keyword:
 			| CAST
 			| CHECK
 			| COLLATE
+			| COLLATION
 			| COLUMN
 			| CONSTRAINT
 			| CREATE
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index 3de6a5c..84dad33 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -26,12 +26,14 @@
 #include commands/dbcommands.h
 #include funcapi.h
 #include miscadmin.h
+#include nodes/nodeFuncs.h
 #include parser/keywords.h
 #include postmaster/syslogger.h
 #include storage/fd.h
 #include storage/pmsignal.h
 #include storage/proc.h
 #include storage/procarray.h
+#include utils/lsyscache.h
 #include tcop/tcopprot.h
 #include utils/builtins.h
 #include utils/timestamp.h
@@ -492,3 +494,29 @@ pg_typeof(PG_FUNCTION_ARGS)
 {
 	PG_RETURN_OID(get_fn_expr_argtype(fcinfo-flinfo, 0));
 }
+
+
+/*
+ * Implementation of the COLLATE FOR expression; returns the collation
+ * of the argument.
+ */
+Datum
+pg_collation_for(PG_FUNCTION_ARGS)
+{
+	Oid typeid;
+	Oid collid;
+
+	typeid = get_fn_expr_argtype(fcinfo-flinfo, 0);
+	if (!typeid)
+		PG_RETURN_NULL();
+	if (!type_is_collatable(typeid)  typeid != UNKNOWNOID)
+		ereport(ERROR,
+(errcode(ERRCODE_DATATYPE_MISMATCH),
+ errmsg(collations are not supported by type %s,
+		format_type_be(typeid;
+
+	collid = PG_GET_COLLATION();
+	if (!collid)
+		PG_RETURN_NULL();
+	PG_RETURN_TEXT_P(cstring_to_text(generate_collation_name(collid)));
+}
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 9994468..e4ea0d5 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -1945,6 +1945,8 @@ DESCR(convert generic options array to

Re: [HACKERS] Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility

2012-01-19 Thread Robert Haas

On Thu, Jan 19, 2012 at 10:37 AM, Noah Misch n...@leadboat.com wrote:
 I agree with Merlin; the frontend/backend protocol is logically distinct from
 the binary send/recv formats of data types.  For one key point, the latter is
 not exclusively core-defined; third-party extensions change their send/recv
 formats on a different schedule.  They can add myext.binary_format_version
 GUCs of their own to cope in a similar way.

I agree.  It occurs to me that we recently changed the default *text*
output format for bytea for reasons not dissimilar to those
contemplated here.  Presumably, that's a much more disruptive change,
and yet we've had minimal complaints because anyone who gets bitten
can easily set bytea_output='escape' and the problem goes away.  The
same thing seems like it would work here, only the number of people
needing to change the parameter will probably be even smaller, because
fewer people use binary than text.

Having said that, if we're to follow the precedent set by
bytea_format, maybe we ought to just add
binary_array_format={huge,ittybitty} and be done with it, rather than
inventing a minor protocol version GUC for something that isn't really
a protocol version change at all.  We could introduce a
differently-named general mechanism, but I guess I'm not seeing the
point of that either.  Just because someone has a
backward-compatibility issue with one change of this type doesn't mean
they have a similar issue with all of them.  So I think adding a
special-purpose GUC is more logical and more parallel to what we've
done in the past, and it doesn't saddle us with having to be certain
that we've designed the mechanism generally enough to handle all the
cases that may come later.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] automating CF submissions (was xlog location arithmetic)

2012-01-19 Thread Alvaro Herrera


Excerpts from Alex Shulgin's message of jue ene 19 15:41:54 -0300 2012:

 PS: yes, I could just copy message id from the sent mail in my MUA, but
 I like to make sure links I post aren't broke, so still I'll need to
 wait until archives catches up to double check.

I find this a bad excuse.  If you're a pgsql-hackers regular, then you
already know your posts are going to show up with the correct
message-id.  The links might be broken for the next 10 minutes, but
links that stay broken for a longer period than that should be rare.
Surely you don't change your MUA once a month or anything.

I know I don't waste time waiting for my posts to show up in the
archives before adding links to the CF app.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review of patch renaming constraints

2012-01-19 Thread Peter Eisentraut

On tor, 2012-01-12 at 22:43 -0600, Joshua Berkus wrote:
 Compiling on Ubuntu 10.04 LTS AMD64 on a GoGrid virtual machine from 
 2012-01-12 git checkout.
 
 Patch applied fine.
 
 Docs are present, build, look good and are clear.
 
 Changes to gram.y required Bison 2.5 to compile.  Are we requiring Bison 2.5 
 now?  There's no configure check for it, so it took me quite a while to 
 figure out what was wrong.

I can't reproduce that.  I think there might be something wrong with
your installation.  The same issue was reported for my COLLATION FOR
patch from the same environment.

 Make check passed.  Patch has tests for rename constraint.
 
 Most normal uses of alter table ... rename constraint ... worked normally.  
 However, the patch does not deal correctly with constraints which are not 
 inherited, such as primary key constraints:

That appears to be because creating a primary key constraint does not
set pg_constraint.conisonly correctly.  This was introduced recently
with noninherited check constraints.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum rate limit in KBps

2012-01-19 Thread Simon Riggs

On Sun, Jan 15, 2012 at 9:17 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:

 I think it makes more sense to use the max read rate as the main knob,
 rather than write rate. That's because the max read rate is higher than the
 write rate, when you don't need to dirty pages. Or do you think saturating
 the I/O system with writes is so much bigger a problem than read I/O that it
 makes more sense to emphasize the writes?

Yes, the writes are more important of the two.

Too many writes at one time can overflow hardware caches, so things
tend to get much worse beyond a certain point.

Also, rate limiting writes means we rate limit WAL rate also which is
very important.

I'd like this to apply to large DDL, not just VACUUMs.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] automating CF submissions (was xlog location arithmetic)

2012-01-19 Thread Andrew Dunstan




On 01/19/2012 01:41 PM, Alex Shulgin wrote:


With the proposed approach it would only take me to include

@fest comment Patch applies cleanly

and possibly

@fest status Needs Review

to update the patch status and that'd be it.



It will be easy if you get it right. My point was that it's way too easy 
to get it wrong.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum rate limit in KBps

2012-01-19 Thread Alvaro Herrera


Excerpts from Simon Riggs's message of jue ene 19 16:05:36 -0300 2012:
 On Sun, Jan 15, 2012 at 9:17 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 
  I think it makes more sense to use the max read rate as the main knob,
  rather than write rate. That's because the max read rate is higher than the
  write rate, when you don't need to dirty pages. Or do you think saturating
  the I/O system with writes is so much bigger a problem than read I/O that it
  makes more sense to emphasize the writes?
 
 Yes, the writes are more important of the two.
 
 Too many writes at one time can overflow hardware caches, so things
 tend to get much worse beyond a certain point.
 
 Also, rate limiting writes means we rate limit WAL rate also which is
 very important.
 
 I'd like this to apply to large DDL, not just VACUUMs.

More generally, this can sometimes be useful in general queries as well.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL Restore process during recovery

2012-01-19 Thread Simon Riggs

On Tue, Jan 17, 2012 at 6:52 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Mon, Jan 16, 2012 at 2:06 AM, Simon Riggs si...@2ndquadrant.com wrote:
 WALRestore process asynchronously executes restore_command while
 recovery continues working.

 Overlaps downloading of next WAL file to reduce time delays in file
 based archive recovery.

 Handles cases of file-only and streaming/file correctly.

 Though I've not reviewed the patch deeply yet, I observed the following
 two problems when I tested the patch.

 When I set up streaming replication + archive (i.e., restore_command is set)
 and started the standby, I got the following error:

    FATAL:  all AuxiliaryProcs are in use
    LOG:  walrestore process (PID 18839) exited with exit code 1

Fixed and better documented.

 When I started an archive recovery without setting restore_command,
 it successfully finished.

Not sure exactly what you mean, but I fixed a bug that might be
something you're seeing.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ce659ec..469e6d6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -40,6 +40,7 @@
 #include pgstat.h
 #include postmaster/bgwriter.h
 #include postmaster/startup.h
+#include postmaster/walrestore.h
 #include replication/walreceiver.h
 #include replication/walsender.h
 #include storage/bufmgr.h
@@ -187,7 +188,6 @@ static bool InArchiveRecovery = false;
 static bool restoredFromArchive = false;
 
 /* options taken from recovery.conf for archive recovery */
-static char *recoveryRestoreCommand = NULL;
 static char *recoveryEndCommand = NULL;
 static char *archiveCleanupCommand = NULL;
 static RecoveryTargetType recoveryTarget = RECOVERY_TARGET_UNSET;
@@ -575,8 +575,8 @@ bool reachedConsistency = false;
 
 static bool InRedo = false;
 
-/* Have we launched bgwriter during recovery? */
-static bool bgwriterLaunched = false;
+/* Have we launched background procs during archive recovery yet? */
+static bool ArchRecoveryBgProcsActive = false;
 
 /*
  * Information logged when we detect a change in one of the parameters
@@ -632,8 +632,6 @@ static bool XLogPageRead(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt,
 			 bool randAccess);
 static int	emode_for_corrupt_record(int emode, XLogRecPtr RecPtr);
 static void XLogFileClose(void);
-static bool RestoreArchivedFile(char *path, const char *xlogfname,
-	const char *recovername, off_t expectedSize);
 static void ExecuteRecoveryCommand(char *command, char *commandName,
 	   bool failOnerror);
 static void PreallocXlogFiles(XLogRecPtr endptr);
@@ -2706,19 +2704,47 @@ XLogFileRead(uint32 log, uint32 seg, int emode, TimeLineID tli,
 
 	XLogFileName(xlogfname, tli, log, seg);
 
+#define TMPRECOVERYXLOG	RECOVERYXLOG
+
 	switch (source)
 	{
 		case XLOG_FROM_ARCHIVE:
+			/*
+			 * Check to see if the WALRestore process has already put the
+			 * next file in place while we were working. If so, use that.
+			 * If not, get it ourselves. This makes it easier to handle
+			 * initial state before the WALRestore is active, and also
+			 * handles the stop/start logic correctly when we have both
+			 * streaming and file based replication active.
+			 *
+			 * We queue up the next task for WALRestore after we've begun to
+			 * use this file later in XLogFileRead().
+			 *
+			 * If the WALRestore process is still active, the lock wait makes
+			 * us wait, which is just like we were executing the command
+			 * ourselves and so doesn't alter the logic elsewhere.
+			 */
+			if (XLogFileIsNowFullyRestored(tli, log, seg))
+			{
+snprintf(path, MAXPGPATH, XLOGDIR /%s, TMPRECOVERYXLOG);
+restoredFromArchive = true;
+break;
+			}
+
 			/* Report recovery progress in PS display */
 			snprintf(activitymsg, sizeof(activitymsg), waiting for %s,
 	 xlogfname);
 			set_ps_display(activitymsg, false);
 
 			restoredFromArchive = RestoreArchivedFile(path, xlogfname,
-	  RECOVERYXLOG,
+	  TMPRECOVERYXLOG,
 	  XLogSegSize);
+
 			if (!restoredFromArchive)
+			{
+LWLockRelease(WALRestoreCommandLock);
 return -1;
+			}
 			break;
 
 		case XLOG_FROM_PG_XLOG:
@@ -2748,18 +2774,42 @@ XLogFileRead(uint32 log, uint32 seg, int emode, TimeLineID tli,
 		if (stat(xlogfpath, statbuf) == 0)
 		{
 			if (unlink(xlogfpath) != 0)
+			{
+LWLockRelease(WALRestoreCommandLock);
 ereport(FATAL,
 		(errcode_for_file_access(),
 		 errmsg(could not remove file \%s\: %m,
 xlogfpath)));
+			}
 			reload = true;
 		}
 
 		if (rename(path, xlogfpath)  0)
+		{
+			LWLockRelease(WALRestoreCommandLock);
 			ereport(ERROR,
 (errcode_for_file_access(),
  errmsg(could not rename file \%s\ to \%s\: %m,
 		path, xlogfpath)));
+		}
+
+		/*
+		 * Make sure we recover from the new filename, so we can reuse the
+		 * temporary

Re: [HACKERS] logging in high performance systems.

2012-01-19 Thread Robert Haas

On Mon, Jan 16, 2012 at 3:51 AM, Greg Smith g...@2ndquadrant.com wrote:
 There is an important distinction to make here.  Statement logging is one of
 the largest producers of logging data you want to worry about optimizing for
 on a high performance system.  The decision about whether to log or not may
 need to be made by the hook.  Something that hooks EmitErrorReport has
 already lost an important opportunity to make a decision about whether the
 system is perhaps too busy to worry about logging right now at all; you've
 already paid a chunk of the logging overhead getting the log message to it.
  I think both areas are going to be important to hook eventually.

I would dismissed this out of hand at this if you said it a year ago,
but I'm older and wiser now.  At some point this cycle, I did some
benchmarking of the subtransaction abort path, since the slowness of
things like EXCEPTION blocks in PL/pgsql is a known sore point.  I
don't remember the exact numbers anymore, but I do remember the
general picture, which is that constructing the error message is
shockingly expensive compared to anything else that we do in that
path.  I dropped it at that point for lack of good ideas: it would be
awfully nice to postpone the error message construction until we know
that it's actually needed, but I don't see any clean (or even messy)
way of doing that.

I'm not sure if the effect is quite as significant for toplevel
transaction abort, because sending the message to the client is going
to cost something, so the relative cost of generating the error
message on a busy system will be less.  But I still wouldn't like to
bet against it being significant if you're really hammering the
server.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Arithmetic operators for macaddr type

2012-01-19 Thread Robert Haas

On Tue, Jan 17, 2012 at 12:38 AM, Fujii Masao masao.fu...@gmail.com wrote:
 Here is a patch for $SUBJECT.  I merely added support for ~,  and |
 operators for the macaddr type.  The patch itself is rather trivial,
 and includes regression tests and a doc update.

 The patch looks fine except that it uses the OIDs already used in pg_proc.h.
 Attached is the updated version of the patch, which fixes the above problem.

That same problem came back into existence, so I fixed it again, added
a catversion bump, and committed this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: index support for regexp search

2012-01-19 Thread Heikki Linnakangas


On 22.11.2011 21:38, Alexander Korotkov wrote:

WIP patch with index support for regexp search for pg_trgm contrib is
attached.
In spite of techniques which extracts continuous text parts from regexp,
this patch presents technique of automatum transformation. That allows more
comprehensive trigrams extraction.


Nice!


Current version of patch have some limitations:
1) Algorithm of logical expression extraction on trigrams have high
computational complexity. So, it can become really slow on regexp with many
branches. Probably, improvements of this algorithm is possible.
2) Surely, no perfomance benefit if no trigrams can be extracted from
regexp. It's inevitably.
3) Currently, only GIN index is supported. There are no serious problems,
GiST code for it just not written yet.
4) It appear to be some kind of problem to extract multibyte encoded
character from pg_wchar. I've posted question about it here:
http://archives.postgresql.org/pgsql-hackers/2011-11/msg01222.php
While I've hardcoded some dirty solution. So
PG_EUC_JP, PG_EUC_CN, PG_EUC_KR, PG_EUC_TW, PG_EUC_JIS_2004 are not
supported yet.


This is pretty far from being in committable state, so I'm going to mark 
this as returned with feedback in the commitfest app. The feedback:


The code badly needs comments. There is no explanation of how the 
trigram extraction code in trgm_regexp.c works. Guessing from the 
variable names, it seems to be some sort of a coloring algorithm that 
works on a graph, but that all needs to be explained. Can this algorithm 
be found somewhere in literature, perhaps? A link to a paper would be nice.


Apart from that, the multibyte issue seems like the big one. Any way 
around that?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inline Extension

2012-01-19 Thread Dimitri Fontaine

Tom Lane t...@sss.pgh.pa.us writes:
 I'm with Heikki on not believing that this is a good idea.  If you are
 trying to do careful versioning of a set of object definitions, you want
 to stick the things in a file, you don't want them just flying by in
 submitted SQL.

I'm trying to open the extension facilities (versions being the first of
them, think \dx) to application PL code, and to hosted environments
where you're not granted access to the server's file system.

I think I would agree that the use case is not existing if the target is
traditional in-house deployments where the sys admins are your
colleagues. I've been told that's a smaller and smaller part of the
database world though.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Warning in views.c

2012-01-19 Thread Robert Haas

On Mon, Jan 16, 2012 at 3:47 PM, Magnus Hagander mag...@hagander.net wrote:
 Seem 1575fbcb caused this warning:

 view.c: In function ‘DefineVirtualRelation’:
 view.c:105:6: warning: variable ‘namespaceId’ set but not used
 [-Wunused-but-set-variable]

 Attached seems to be the easy fix - or am I missing something obvious?

No, I think you nailed it.  There is some more refactoring that should
be done there, I think, but until it is, we don't need that return
value for anything...

Thanks for fixing it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] controlling the location of server-side SSL files

2012-01-19 Thread Robert Haas

On Sat, Jan 14, 2012 at 8:40 AM, Peter Eisentraut pete...@gmx.net wrote:
 On mån, 2012-01-02 at 06:32 +0200, Peter Eisentraut wrote:
 I think I would like to have a set of GUC parameters to control the
 location of the server-side SSL files.

 Here is the patch for this.

 One thing that is perhaps worth thinking about:  Currently, we just
 ignore missing root.crt and root.crl files.  With this patch, we still
 do this, even if the user has given a specific nondefault location.
 That seems a bit odd, but I can't think of a simple way to do it better.

There's a review in the CF app for this finding only minor issues, so
I'm marking this patch therein as Ready for Committer.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON for PG 9.2

2012-01-19 Thread Robert Haas

On Sat, Jan 14, 2012 at 3:06 PM, Andrew Dunstan and...@dunslane.net wrote:
 Second, what should be do when the database encoding isn't UTF8? I'm
 inclined to emit a \u escape for any non-ASCII character (assuming it
 has a unicode code point - are there any code points in the non-unicode
 encodings that don't have unicode equivalents?). The alternative would be to
 fail on non-ASCII characters, which might be ugly. Of course, anyone wanting
 to deal with JSON should be using UTF8 anyway, but we still have to deal
 with these things. What about SQL_ASCII? If there's a non-ASCII sequence
 there we really have no way of telling what it should be. There at least I
 think we should probably error out.

I don't see any reason to escape anything more than the minimum
required by the spec, which only requires it for control characters.
If somebody's got a non-ASCII character in there, we can simply allow
it to be represented by itself.  That's almost certainly more compact
(and very possibly more readable) than emitting \u for each such
instance, and it also matches what the current EXPLAIN (FORMAT JSON)
output does.

In other words, let's decree that when the database encoding isn't
UTF-8, *escaping* of non-ASCII characters doesn't work.  But
*unescaped* non-ASCII characters should still work just fine.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: index support for regexp search

2012-01-19 Thread Alexander Korotkov

On Fri, Jan 20, 2012 at 12:30 AM, Heikki Linnakangas 
heikki.linnakan...@enterprisedb.com wrote:

 The code badly needs comments. There is no explanation of how the trigram
 extraction code in trgm_regexp.c works.

 Sure. I hoped to find a time for comments before commitfest starts.
Unfortunately I didn't, sorry.


 Guessing from the variable names, it seems to be some sort of a coloring
 algorithm that works on a graph, but that all needs to be explained. Can
 this algorithm be found somewhere in literature, perhaps? A link to a paper
 would be nice.

I hope it's truly novel. At least application to regular expressions. I'm
going to write a paper about it.


 Apart from that, the multibyte issue seems like the big one. Any way
 around that?

Conversion of pg_wchar to multibyte character is the only way I found to
avoid serious hacking of existing regexp code. Do you think additional
function in pg_wchar_tbl which converts pg_wchar back to multibyte
character is possible solution?

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] Inline Extension

2012-01-19 Thread Robert Haas

On Thu, Jan 19, 2012 at 3:42 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Tom Lane t...@sss.pgh.pa.us writes:
 I'm with Heikki on not believing that this is a good idea.  If you are
 trying to do careful versioning of a set of object definitions, you want
 to stick the things in a file, you don't want them just flying by in
 submitted SQL.

 I'm trying to open the extension facilities (versions being the first of
 them, think \dx) to application PL code, and to hosted environments
 where you're not granted access to the server's file system.

I guess the question is: for what purpose?

As you recognized in your original email, if the extension is inline,
then the objects will need to be dumped, because a simple CREATE
EXTENSION command is bound to fail.  But my understanding was that a
major part of the reason - if not the entire reason - was to get
pg_dump to emit CREATE EXTENSION bob instead of the component SQL
commands.  If we take that away, what's the remaining benefit of
packaging those objects inside an extension instead of just dumping
them loose into the database?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and relkind filtering

2012-01-19 Thread Bruce Momjian

On Sat, Dec 31, 2011 at 07:41:00PM -0500, Noah Misch wrote:
 On Mon, Dec 05, 2011 at 05:06:37PM -0500, Bruce Momjian wrote:
  Pg_upgrade has the following check to make sure the cluster is safe for
  upgrading:
  
  res = executeQueryOrDie(conn,
  SELECT n.nspname, c.relname, a.attname
  
  FROM   pg_catalog.pg_class c, 
 pg_catalog.pg_namespace n, 
 pg_catalog.pg_attribute a 
  WHERE  c.oid = a.attrelid AND 
 NOT a.attisdropped AND 
 a.atttypid IN ( 
 'pg_catalog.regproc'::pg_catalog.regtype, 
 'pg_catalog.regprocedure'::pg_catalog.regtype, 
 'pg_catalog.regoper'::pg_catalog.regtype, 
 'pg_catalog.regoperator'::pg_catalog.regtype, 
  /* regclass.oid is preserved, so 'regclass' is OK */
  /* regtype.oid is preserved, so 'regtype' is OK */
 'pg_catalog.regconfig'::pg_catalog.regtype, 
 'pg_catalog.regdictionary'::pg_catalog.regtype) AND
  
 c.relnamespace = n.oid AND 
 n.nspname != 'pg_catalog' AND 
 n.nspname != 'information_schema');
  
  Based on a report from EnterpriseDB, I noticed that we check all
  pg_class entries, while there are cases where this is unnecessary
  because there is no data behind the entry, e.g. views.  Here are the
  relkinds supported:
  
  #define   RELKIND_RELATION'r'   /* ordinary table */
  #define   RELKIND_INDEX   'i'   /* secondary index */
  #define   RELKIND_SEQUENCE'S'   /* sequence object */
  #define   RELKIND_TOASTVALUE  't'   /* for out-of-line 
  values */
  #define   RELKIND_VIEW'v'   /* view */
  #define   RELKIND_COMPOSITE_TYPE  'c'   /* composite type */
  #define   RELKIND_FOREIGN_TABLE   'f'   /* foreign table */
  #define   RELKIND_UNCATALOGED 'u'   /* not yet cataloged */
  
  What types, other than views, can we skip in this query?
 
 RELKIND_UNCATALOGED should never appear on disk, and RELKIND_SEQUENCE and
 RELKIND_TOASTVALUE do not allow adding columns or changing column types.  We
 might as well keep validating them.  RELKIND_RELATION and RELKIND_INDEX have
 storage, so we must check those.
 
 The remaining three relkinds (RELKIND_VIEW, RELKIND_COMPOSITE_TYPE,
 RELKIND_FOREIGN_TABLE) have no storage, but all are usable as column types in
 other relations that do have storage.  You could skip them iff they're unused
 that way, per a check like find_composite_type_dependencies().

Good point.  I have applied the attached comment patch to document why
we check all relkinds for regtypes.  Thanks.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c
new file mode 100644
index 8594d26..891eb9a
*** a/contrib/pg_upgrade/check.c
--- b/contrib/pg_upgrade/check.c
*** check_for_reg_data_type_usage(ClusterInf
*** 644,649 
--- 644,654 
  		DbInfo	   *active_db = cluster-dbarr.dbs[dbnum];
  		PGconn	   *conn = connectToServer(cluster, active_db-db_name);
  
+ 		/*
+ 		 *	While several relkinds don't store any data, e.g. views, they
+ 		 *	can be used to define data types of other columns, so we
+ 		 *	check all relkinds.
+ 		 */
  		res = executeQueryOrDie(conn,
  SELECT n.nspname, c.relname, a.attname 
  FROM	pg_catalog.pg_class c, 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: index support for regexp search

2012-01-19 Thread Alexander Korotkov

I also have a question about pg_wchar.

/*
 *---
 * encoding info table
 * XXX must be sorted by the same order as enum pg_enc (in mb/pg_wchar.h)
 *---
 */
pg_wchar_tbl pg_wchar_table[] = {
{pg_ascii2wchar_with_len, pg_ascii_mblen, pg_ascii_dsplen,
pg_ascii_verifier, 1}, /* PG_SQL_ASCII */
 {pg_eucjp2wchar_with_len, pg_eucjp_mblen, pg_eucjp_dsplen,
pg_eucjp_verifier, 3}, /* PG_EUC_JP */
 {pg_euccn2wchar_with_len, pg_euccn_mblen, pg_euccn_dsplen,
pg_euccn_verifier, 2}, /* PG_EUC_CN */
 {pg_euckr2wchar_with_len, pg_euckr_mblen, pg_euckr_dsplen,
pg_euckr_verifier, 3}, /* PG_EUC_KR */
 {pg_euctw2wchar_with_len, pg_euctw_mblen, pg_euctw_dsplen,
pg_euctw_verifier, 4}, /* PG_EUC_TW */
 {pg_eucjp2wchar_with_len, pg_eucjp_mblen, pg_eucjp_dsplen,
pg_eucjp_verifier, 3}, /* PG_EUC_JIS_2004 */
 {pg_utf2wchar_with_len, pg_utf_mblen, pg_utf_dsplen, pg_utf8_verifier, 4}, /*
PG_UTF8 */
 {pg_mule2wchar_with_len, pg_mule_mblen, pg_mule_dsplen, pg_mule_verifier,
4}, /* PG_MULE_INTERNAL */
 {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_LATIN1 */
 {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_LATIN2 */
 {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_LATIN3 */
 {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_LATIN4 */
 {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_LATIN5 */
 {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_LATIN6 */
 {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_LATIN7 */
 {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_LATIN8 */
 {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_LATIN9 */
 {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_LATIN10 */
 {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_WIN1256 */
 {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_WIN1258 */
 {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_WIN866 */
 {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_WIN874 */
 {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_KOI8R */
 {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_WIN1251 */
 {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_WIN1252 */
 {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* ISO-8859-5 */
 {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* ISO-8859-6 */
 {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* ISO-8859-7 */
 {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* ISO-8859-8 */
 {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_WIN1250 */
 {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_WIN1253 */
 {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_WIN1254 */
 {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_WIN1255 */
 {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_WIN1257 */
 {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen,
pg_latin1_verifier, 1}, /* PG_KOI8U */
 {0, pg_sjis_mblen, pg_sjis_dsplen, pg_sjis_verifier, 2}, /* PG_SJIS */
{0, pg_big5_mblen, pg_big5_dsplen, pg_big5_verifier, 2}, /* PG_BIG5 */
 {0, pg_gbk_mblen, pg_gbk_dsplen, pg_gbk_verifier, 2}, /* PG_GBK */
{0, pg_uhc_mblen, pg_uhc_dsplen, pg_uhc_verifier, 2}, /* PG_UHC */
 {0, pg_gb18030_mblen, pg_gb18030_dsplen, pg_gb18030_verifier, 4}, /*
PG_GB18030 */
{0, pg_johab_mblen, pg_johab_dsplen, pg_johab_verifier, 3}, /* PG_JOHAB */
 {0, pg_sjis_mblen, pg_sjis_dsplen, pg_sjis_verifier, 2} /*
PG_SHIFT_JIS_2004 */
};

What does last 7 zeros in the first column means? No conversion to pg_wchar
is possible from these encodings?

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] WIP -- renaming implicit sequences

2012-01-19 Thread Robert Haas

On Sat, Jan 14, 2012 at 5:51 PM, Thomas Munro mu...@ip9.org wrote:
 On 12 January 2012 00:58, Tom Lane t...@sss.pgh.pa.us wrote:
 Hmm ... this seems a bit inconsistent with the fact that we got rid of
 automatic renaming of indexes a year or three back.  Won't renaming of
 serials have all the same problems that caused us to give up on renaming
 indexes?

 I was sort of planning to do something similar for constraints (once
 the patch to support renaming constraints lands) and indexes (I didn't
 know they'd previously been automatically renamed and that had been
 dropped).

 Would you say that I should abandon this, no chance of being accepted?
  Is there a technical problem I'm missing, other than the gap between
 unique name generation and execution of the implicit ALTERs?

 Maybe I should look into writing a 'tidy rename' procedure for tables
 and columns instead, rather than modifying the behaviour of core ALTER
 TABLE.

+1 for that approach.  I think this is the kind of thing that seems
cooler on paper than it is in real life.  For example, consider this:

rhaas=# create table foo (a serial);
NOTICE:  CREATE TABLE will create implicit sequence foo_a_seq for
serial column foo.a
CREATE TABLE
rhaas=# alter sequence foo_a_seq rename to bob;
ALTER SEQUENCE

If somebody renames the table or the column at this point, it's a good
bet that they *don't* want bob renamed.

Also, some application code I've had in the past had sequence names
hardcoded into it - it did things like SELECT nextval(...) followed by
INSERT ..., for lack of INSERT .. RETURNING, which didn't exist at the
time.  So it's not implausible that renaming a sequence could be
unwanted, though in practice the likelihood is fairly low: had I
renamed a table, I probably would have renamed the sequence as well.

Another, admittedly minor consideration is that this can introduce
some subtle concurrency bugs that we don't have today.  For example,
suppose we choose a new name for the sequence which isn't in use, but
then between the time when we pick the name and the time we do the
insert the name becomes used, due to some concurrent transaction.  Now
we'll fail with a rather baffling error message.  That isn't
necessarily a huge problem - we have lots of code that is prone to
such race conditions - but it's not wonderful either.  Someday it
would be nice to figure out a cleaner solution to these kinds of
issues...  maybe allowing the btree AM to return normally with an
indication that there's a unique constraint violation, rather than
throwing an ERROR.

I think we should remove this from the TODO list, or at least document
that there are a number of reasons why it might be a deeper hole than
it appears to be at first glance.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON for PG 9.2

2012-01-19 Thread Andrew Dunstan




On 01/19/2012 03:49 PM, Robert Haas wrote:


In other words, let's decree that when the database encoding isn't
UTF-8, *escaping* of non-ASCII characters doesn't work.  But
*unescaped* non-ASCII characters should still work just fine.


The spec only allows unescaped Unicode chars (and for our purposes that 
means UTF8). An unescaped non-ASCII character in, say, ISO-8859-1 will 
result in something that's not legal JSON. See 
http://www.ietf.org/rfc/rfc4627.txt?number=4627 section 3.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: index support for regexp search

2012-01-19 Thread Alexander Korotkov

On Fri, Jan 20, 2012 at 1:07 AM, Alexander Korotkov aekorot...@gmail.comwrote:

 What does last 7 zeros in the first column means? No conversion to
 pg_wchar is possible from these encodings?

Uh, I see. These encodings is not supported as server encodings.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] JSON for PG 9.2

2012-01-19 Thread Robert Haas

On Thu, Jan 19, 2012 at 4:07 PM, Andrew Dunstan and...@dunslane.net wrote:
 On 01/19/2012 03:49 PM, Robert Haas wrote:
 In other words, let's decree that when the database encoding isn't
 UTF-8, *escaping* of non-ASCII characters doesn't work.  But
 *unescaped* non-ASCII characters should still work just fine.

 The spec only allows unescaped Unicode chars (and for our purposes that
 means UTF8). An unescaped non-ASCII character in, say, ISO-8859-1 will
 result in something that's not legal JSON. See
 http://www.ietf.org/rfc/rfc4627.txt?number=4627 section 3.

I understand.  I'm proposing that we not care.  In other words, if the
server encoding is UTF-8, it'll really be JSON.  But if the server
encoding is something else, it'll be almost-JSON.  And specifically,
the \u syntax won't work, and there might be some non-Unicode
characters in there.  If that's not the behavior you want, then use
UTF-8.

It seems pretty clear that we're going to have to make some trade-off
to handle non-UTF8 encodings, and I think what I'm suggesting is a lot
less painful than disabling high-bit characters altogether.  If we do
that, then what happens if a user runs EXPLAIN (FORMAT JSON) and his
column label has a non-Unicode character in there?  Should we say, oh,
sorry, you can't explain that in JSON format?  That is mighty
unfriendly, and probably mighty complicated and expensive to figure
out, too.  We *do not support* mixing encodings in the same database,
and if we make it the job of this patch to fix that problem, we're
going to be in the same place for 9.2 that we have been for the last
several releases: nowhere.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade with plpython is broken

2012-01-19 Thread Bruce Momjian

On Thu, Dec 22, 2011 at 11:42:23AM -0500, Robert Haas wrote:
 On Mon, Dec 19, 2011 at 10:16 AM, Peter Eisentraut pete...@gmx.net wrote:
  Upgrading an instance containing plpython from =8.4 to =9.0 is broken
  because the module plpython.so was renamed to plpython2.so, and so the
  pg_upgrade check for loadable libraries fails thus:
 
     Your installation references loadable libraries that are missing from the
     new installation.  etc.
 
  Installing a symlink fixes the issue.  Should we teach pg_upgrade about
  this renaming, or should we install the symlink as part of the standard
  installation?
 
 I feel like this is a pg_upgrade bug, not so much a PL/python problem.

I looked into this and the problem is coming from the checking of
pg_proc library functions (not explicitly _language_ functions):

 plpython_call_handler| $libdir/plpython2
 plpython_inline_handler  | $libdir/plpython2
 plpython_validator   | $libdir/plpython2

All three of these entries relate to plpython, and obviously you can see
the library name change.

The list of required libraries is generated in the old cluster.  One
interesting solution would be to lookup the matching function names from
the new cluster's pg_pltemplate, and use that library name.  That would
fix the problem of language library files being renamed, but not address
non-language library file names being renamed --- there is no _template_
to look for these new values.

I hate to add a complex fix for languages and leave the non-language
cases unfixed.

For that reason, I wonder if I should just hard-code the plpython rename
into the pg_upgrade test in check_loadable_libraries().

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum rate limit in KBps

2012-01-19 Thread Greg Smith


On 1/18/12 4:18 PM, Jim Nasby wrote:

What about doing away with all the arbitrary numbers completely, and just state 
data rate limits for hit/miss/dirty?


Since many workloads will have a mix of all three, it still seems like 
there's some need for weighing these individually, even if they each got 
their own rates.  If someone says read=8MB/s and write=4MB/s (the 
current effective defaults), I doubt they would be happy with seeing 
12MB/s happen.



BTW, this is a case where it would be damn handy to know if the miss was really 
a miss or not... in the case where we're already rate limiting vacuum, could we 
afford the cost of get_time_of_day() to see if a miss actually did have to come 
from disk?


We certainly might if it's a system where timing information is 
reasonably cheap, and measuring that exact area will be easy if the 
timing test contrib module submitted into this CF gets committed.  I 
could see using that to re-classify some misses as hits if the read 
returns fast enough.


There's not an obvious way to draw that line though.  The fast=hit vs. 
slow=miss transition happens at very different place on SSD vs. 
regular disks, as the simplest example.  I don't see any way to wander 
down this path that doesn't end up introducing multiple new GUCs, which 
is the opposite of what I'd hoped to do--which was at worst to keep the 
same number, but reduce how many were likely to be touched.



--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum rate limit in KBps

2012-01-19 Thread Greg Smith


On 1/19/12 1:10 PM, Robert Haas wrote:

I have to say that I find that intensely counterintuitive.  The
current settings are not entirely easy to tune correctly, but at least
they're easy to explain.


I attempt to explain those settings to people in training classes about 
once a month.  It's never been anything but a complete disaster.  I am 
barely concerned about preserving the current UI because, as far as I've 
been able to tell, there are only a handful of PostgreSQL installatinos 
on the planet that have managed to use it happily.  Even the ones that 
do have a non-default setup that works usually flailed about for some 
time until they get something that works, over a few frustrating months. 
 And the result are settings few dare touch for fear of breaking it.


It's also worth pointing out that VACUUM problems are very close to the 
top of the list of problems larger sites run into.  So right now we have 
an inscrutable UI around an often essential part of the database to 
tune, one that any production site that gets over a few hundred GB of 
data in it will run into problems with.  I wouldn't care about this area 
if it weren't for people screaming about how bad it is every time the 
topic comes up.


If there's anyone out there who has run a larger PostgreSQL database and 
not at some point been extremely frustrated with how the current VACUUM 
settings are controlled, please speak up and say I'm wrong about this. 
I thought it was well understood the UI was near unusably bad, it just 
wasn't obvious what to do about it.



What does that 8MB mean and how does it
relate to vacuum_cost_page_miss?  If I double vacuum_rate_page_miss,
does that effectively also double the cost limit, so that dirty pages
and hits become relatively cheaper?  If so, then I think what that
really means is that the limit is 8MB only if there are no hits and no
dirtied pages - otherwise it's less, and the amount by which it is
less is the result of some arcane calculation.  Ugh!


Saying what I suggested is an arcane calculation strikes me as pretty 
weird--we'd be hard pressed to design a more arcane calculation than the 
one that's already happening.


The feedback here so far seems to lead toward making independent read 
and write knobs.  I'm going to chew on the scenarios Robert described 
and the ones Jim has been commenting on and see if I can refactor this 
into something friendlier that addresses them.


As for the suggestion that I'm bringing this up a bit late in the 
release cycle, I've been trying.  My first submission pushing in this 
direction--improving the logging first, which is needed before you can 
usefully measure a behavior change--happened back in September.  I've 
been moving this area as fast as I can get it to budge.  I'm concerned 
now that much will be made of improved performance in 9.2, leading to 
people converting even larger systems than they used to.  And it's not 
hard at all to find a large system where inability to tune vacuum easily 
is the top limiting factor on overall performance.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON for PG 9.2

2012-01-19 Thread Andrew Dunstan




On 01/19/2012 04:12 PM, Robert Haas wrote:

On Thu, Jan 19, 2012 at 4:07 PM, Andrew Dunstanand...@dunslane.net  wrote:

On 01/19/2012 03:49 PM, Robert Haas wrote:

In other words, let's decree that when the database encoding isn't
UTF-8, *escaping* of non-ASCII characters doesn't work.  But
*unescaped* non-ASCII characters should still work just fine.

The spec only allows unescaped Unicode chars (and for our purposes that
means UTF8). An unescaped non-ASCII character in, say, ISO-8859-1 will
result in something that's not legal JSON. See
http://www.ietf.org/rfc/rfc4627.txt?number=4627  section 3.

I understand.  I'm proposing that we not care.  In other words, if the
server encoding is UTF-8, it'll really be JSON.  But if the server
encoding is something else, it'll be almost-JSON.  And specifically,
the \u syntax won't work, and there might be some non-Unicode
characters in there.  If that's not the behavior you want, then use
UTF-8.

It seems pretty clear that we're going to have to make some trade-off
to handle non-UTF8 encodings, and I think what I'm suggesting is a lot
less painful than disabling high-bit characters altogether.  If we do
that, then what happens if a user runs EXPLAIN (FORMAT JSON) and his
column label has a non-Unicode character in there?  Should we say, oh,
sorry, you can't explain that in JSON format?  That is mighty
unfriendly, and probably mighty complicated and expensive to figure
out, too.  We *do not support* mixing encodings in the same database,
and if we make it the job of this patch to fix that problem, we're
going to be in the same place for 9.2 that we have been for the last
several releases: nowhere.



OK, then we need to say that very clearly and up front (including in the 
EXPLAIN docs.)


Of course, for data going to the client, if the client encoding is UTF8, 
they should get legal JSON, regardless of what the database encoding is, 
and conversely too, no?


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP -- renaming implicit sequences

2012-01-19 Thread Alvaro Herrera


Excerpts from Robert Haas's message of jue ene 19 18:07:33 -0300 2012:

 I think we should remove this from the TODO list, or at least document
 that there are a number of reasons why it might be a deeper hole than
 it appears to be at first glance.

Maybe not remove it, but instead add a link to this discussion.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: index support for regexp search

2012-01-19 Thread Erik Rijkers

On Thu, January 19, 2012 21:30, Heikki Linnakangas wrote:
 On 22.11.2011 21:38, Alexander Korotkov wrote:
 WIP patch with index support for regexp search for pg_trgm contrib is
 attached.
 In spite of techniques which extracts continuous text parts from regexp,
 this patch presents technique of automatum transformation. That allows more
 comprehensive trigrams extraction.

 Nice!


Yes, wonderful stuff; I tested quite a bit with this patch; FWIW, here's what I 
found.

The patch yields spectacular speedups with small, simple-enough regexen.  But 
it does not do a
good enough job when guessing where to use the index and where fall back to Seq 
Scan.  This can
lead to (also spectacular) slow-downs, compared to Seq Scan.

I used the following to generate 3 test tables with lines of 80 random chars 
(just in case it's
handy for others to use):

$ cat create_data.sh
#!/bin/sh

for power in 4 5 6
do
  table=azjunk${power}
  index=${table}_trgmrgx_txt_01_idx
  echo -- generating table $table with index $index;
  time perl -E'
  sub ss{ join,@_[ map{rand @_} 1 .. shift ] };
  say(ss(80,a..g, ,h..m, ,n..s, ,t..z))
 for 1 .. 1e'${power}; \
  | psql -aqXc drop table if exists $table;
  create table $table(txt text); copy $table from stdin;
  -- set session maintenance_work_mem='20GB';
  create index $index on $table using gin(txt gin_trgm_ops);
  analyze $table;;
done


\dt+ public.azjunk*
  List of relations
 Schema |  Name   | Type  |   Owner   |  Size   | Description
+-+---+---+-+-
 public | azjunk4 | table | breinbaas | 1152 kB |
 public | azjunk5 | table | breinbaas | 11 MB   |
 public | azjunk6 | table | breinbaas | 112 MB  |
(3 rows)



I guessed that MAX_COLOR_CHARS limits the character class size (to 4, in your 
patch), is that
true?   I can understand you want that value to be low to limit the above risk, 
but now it reduces
the usability of the feature a bit: one has to split up larger char-classes 
into several smaller
ones to make a statement use the index: i.e.:
 txt ~ 'f[aeio]n' OR txt ~ 'f[uy]n'
instead of
 txt ~ 'f[aeiouy]n'



I made compiled instances with larger values for MAX_COLOR_CHARS (6 and 9), and 
sure enough they
used the index for larger classes such as the above, but of course also got 
into problems easier
when quantifiers are added (*, +, {n,m}).

A better calculation to decide index-use seems necessary, and ideally one that 
allows for a larger
MAX_COLOR_CHARS than 4.  Especially quantifiers could perhaps be inspected wrt 
that decision.
IMHO, the functionality would still be very useful when only very simple 
regexen were considered.

Btw, it seems impossible to Ctrl-C out of a search once it is submitted; I 
suppose this is
normally necessary for perfomance reasons, but it would be useful te be able to 
compile a test
version that allows it.  I don't know how hard that would be.




There is also a minor bug, I think, when running with  'set enable_seqscan=off' 
 in combination
with a too-large regex:

$ cat fail.sql:

set enable_bitmapscan=on;
set enable_seqscan=on;
explain analyze select txt from azjunk4 where txt ~ 'f[aeio]n';   -- OK
explain analyze select txt from azjunk4 where txt ~ 'f[aeiou]n';  -- OK

set enable_bitmapscan=on;
set enable_seqscan=off;
explain analyze select txt from azjunk4 where txt ~ 'f[aeio]n';   -- OK
explain analyze select txt from azjunk4 where txt ~ 'f[aeiou]n';  -- crashes


 $ psql -f fail.sql
  QUERY PLAN
--
 Bitmap Heap Scan on azjunk4  (cost=52.01..56.02 rows=1 width=81) (actual 
time=1.011..5.291
rows=131 loops=1)
   Recheck Cond: (txt ~ 'f[aeio]n'::text)
   -  Bitmap Index Scan on azjunk4_trgmrgx_txt_01_idx  (cost=0.00..52.01 
rows=1 width=0) (actual
time=0.880..0.880 rows=131 loops=1)
 Index Cond: (txt ~ 'f[aeio]n'::text)
 Total runtime: 5.700 ms
(5 rows)

  QUERY PLAN
---
 Seq Scan on azjunk4  (cost=0.00..268.00 rows=1 width=81) (actual 
time=1.491..36.049 rows=164
loops=1)
   Filter: (txt ~ 'f[aeiou]n'::text)
   Rows Removed by Filter: 9836
 Total runtime: 36.112 ms
(4 rows)

  QUERY PLAN
--
 Bitmap Heap Scan on azjunk4  (cost=52.01..56.02 rows=1 width=81) (actual 
time=0.346..0.927
rows=131 loops=1)
   Recheck Cond: (txt ~ 'f[aeio]n'::text)
   -  Bitmap Index Scan on azjunk4_trgmrgx_txt_01_idx  (cost=0.00..52.01 
rows=1 width=0) (actual
time=0.316..0.316 rows=131 loops=1)
 Index Cond: (txt ~ 'f[aeio]n'::text)

Re: [HACKERS] WIP -- renaming implicit sequences

2012-01-19 Thread Robert Haas

On Thu, Jan 19, 2012 at 6:30 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 Excerpts from Robert Haas's message of jue ene 19 18:07:33 -0300 2012:

 I think we should remove this from the TODO list, or at least document
 that there are a number of reasons why it might be a deeper hole than
 it appears to be at first glance.

 Maybe not remove it, but instead add a link to this discussion.

I don't see what that accomplishes, unless someone's arguing that we
really do want this behavior...?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP -- renaming implicit sequences

2012-01-19 Thread Alvaro Herrera


Excerpts from Robert Haas's message of jue ene 19 20:53:42 -0300 2012:
 
 On Thu, Jan 19, 2012 at 6:30 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
 
  Excerpts from Robert Haas's message of jue ene 19 18:07:33 -0300 2012:
 
  I think we should remove this from the TODO list, or at least document
  that there are a number of reasons why it might be a deeper hole than
  it appears to be at first glance.
 
  Maybe not remove it, but instead add a link to this discussion.
 
 I don't see what that accomplishes, unless someone's arguing that we
 really do want this behavior...?

Well, it documents what happened when we tried to do it.  Sort of like
features we do not want.  But then, maybe it's just TODO bloat.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON for PG 9.2

2012-01-19 Thread Robert Haas

On Thu, Jan 19, 2012 at 5:59 PM, Andrew Dunstan and...@dunslane.net wrote:
 OK, then we need to say that very clearly and up front (including in the
 EXPLAIN docs.)

Can do.

 Of course, for data going to the client, if the client encoding is UTF8,
 they should get legal JSON, regardless of what the database encoding is, and
 conversely too, no?

Well, that would be nice, but I don't think it's practical.  It will
certainly be the case, under the scheme I'm proposing, or probably any
other sensible scheme also, that if a client whose encoding is UTF-8
gets a value of type json back fro the database, it's strictly valid
JSON.  But it won't be possible to store every legal JSON value in the
database if the database encoding is anything other than UTF-8, even
if the client encoding is UTF-8.  The backend will get the client's
UTF-8 bytes and transcode them to the server encoding before calling
the type-input function, so if there are characters in there that
can't be represented in UTF-8, then we'll error out before the JSON
data type ever gets control.  In theory, it would be possible to
accept such strings if the client chooses to represent them using a
\u sequence, but I'm unexcited about doing the work required to
make that happen, because it will still be a pretty half-baked: we'll
be able to accept some representations of the same JSON constant but
not others.

I think the real fix for this problem is to introduce an
infrastructure inside the database that allows us to have different
columns stored in different encodings.  People use bytea for that
right now, but that's pretty unfriendly: it would be nice to have a
better system.  However, I expect that to take a lot of work and break
a lot of things, and until we do it I don't feel that compelled to
provide buggy and incomplete support for it under the guise of
implementing a JSON datatype.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON for PG 9.2

2012-01-19 Thread David E. Wheeler

On Jan 19, 2012, at 4:27 PM, Robert Haas wrote:

 I think the real fix for this problem is to introduce an
 infrastructure inside the database that allows us to have different
 columns stored in different encodings.  People use bytea for that
 right now, but that's pretty unfriendly: it would be nice to have a
 better system.  However, I expect that to take a lot of work and break
 a lot of things, and until we do it I don't feel that compelled to
 provide buggy and incomplete support for it under the guise of
 implementing a JSON datatype.

+1 This seems like a reasonable compromise and course of action, especially if 
someone is interested in taking on column-level encodings at some point in the 
next year or two.

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP -- renaming implicit sequences

2012-01-19 Thread Robert Haas

On Thu, Jan 19, 2012 at 7:15 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
  I think we should remove this from the TODO list, or at least document
  that there are a number of reasons why it might be a deeper hole than
  it appears to be at first glance.
 
  Maybe not remove it, but instead add a link to this discussion.

 I don't see what that accomplishes, unless someone's arguing that we
 really do want this behavior...?

 Well, it documents what happened when we tried to do it.  Sort of like
 features we do not want.  But then, maybe it's just TODO bloat.

I share your urge to memorialize the conversation somewhere, but I
fear it will just cause future people to spend time thinking about it
that isn't really warranted.  I guess we could move it to the features
we do not want section, but that's mostly bigger picture stuff.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum rate limit in KBps

2012-01-19 Thread Robert Haas

On Thu, Jan 19, 2012 at 5:39 PM, Greg Smith g...@2ndquadrant.com wrote:
 On 1/19/12 1:10 PM, Robert Haas wrote:
 I have to say that I find that intensely counterintuitive.  The
 current settings are not entirely easy to tune correctly, but at least
 they're easy to explain.

 I attempt to explain those settings to people in training classes about once
 a month.  It's never been anything but a complete disaster.  I am barely
 concerned about preserving the current UI because, as far as I've been able
 to tell, there are only a handful of PostgreSQL installatinos on the planet
 that have managed to use it happily.  Even the ones that do have a
 non-default setup that works usually flailed about for some time until they
 get something that works, over a few frustrating months.  And the result are
 settings few dare touch for fear of breaking it.

 It's also worth pointing out that VACUUM problems are very close to the top
 of the list of problems larger sites run into.  So right now we have an
 inscrutable UI around an often essential part of the database to tune, one
 that any production site that gets over a few hundred GB of data in it will
 run into problems with.  I wouldn't care about this area if it weren't for
 people screaming about how bad it is every time the topic comes up.

 If there's anyone out there who has run a larger PostgreSQL database and not
 at some point been extremely frustrated with how the current VACUUM settings
 are controlled, please speak up and say I'm wrong about this. I thought it
 was well understood the UI was near unusably bad, it just wasn't obvious
 what to do about it.

 What does that 8MB mean and how does it
 relate to vacuum_cost_page_miss?  If I double vacuum_rate_page_miss,
 does that effectively also double the cost limit, so that dirty pages
 and hits become relatively cheaper?  If so, then I think what that
 really means is that the limit is 8MB only if there are no hits and no
 dirtied pages - otherwise it's less, and the amount by which it is
 less is the result of some arcane calculation.  Ugh!

 Saying what I suggested is an arcane calculation strikes me as pretty
 weird--we'd be hard pressed to design a more arcane calculation than the one
 that's already happening.

Perhaps so, but I'm willing to bet that if we have a variable that
looks like a pure read limit or a pure dirty limit and really is not,
we'll have succeeded.  :-)

 The feedback here so far seems to lead toward making independent read and
 write knobs.  I'm going to chew on the scenarios Robert described and the
 ones Jim has been commenting on and see if I can refactor this into
 something friendlier that addresses them.

 As for the suggestion that I'm bringing this up a bit late in the release
 cycle, I've been trying.  My first submission pushing in this
 direction--improving the logging first, which is needed before you can
 usefully measure a behavior change--happened back in September.  I've been
 moving this area as fast as I can get it to budge.  I'm concerned now that
 much will be made of improved performance in 9.2, leading to people
 converting even larger systems than they used to.  And it's not hard at all
 to find a large system where inability to tune vacuum easily is the top
 limiting factor on overall performance.

I certainly didn't intend to come across as disparaging your work on
this topic.  I understand that there are big problems with the way
things work now; I'm just cautious about trying to replace them too
hastily with something that may not turn out to be any better.  Of
course, if we can replace it with something that we're sure is
actually an improvement, I'm all in favor of that.  But, IMHO, the
problems in this area are too serious to be solved by renaming the
knobs.  At most, we're going to buy ourselves a little time to come up
with a better solution.

IMHO, and at the risk of repeating myself, one of the big problems in
this area is that we're making the user guess something that we really
ought to be figuring out for them.  Just as users want checkpoints to
run as slowly as possible while still not bumping into the next
checkpoint, they'd presumably like vacuum to run as slowly as possible
without bumping into the next vacuum.  Instead, we make them tell us
how fast they'd like it tor run, which requires them to guess a value
high enough to finish soon enough but low enough to minimize the
impact on the rest of the system.

Another problem is that the vacuum algorithm itself could, I think, be
made much smarter.  We could teach HOT to prune pages that contain no
HOT chains but do contain dead tuples.  That would leave dead line
pointers behind, but that's not nearly as bad as leaving the entire
tuple behind.  We could, as Simon and others have suggested, have one
threshold for vacuuming the heap (i.e. reclaiming dead tuples) and
another for vacuuming the indexes (i.e. reclaiming dead line
pointers).  That would open the door to partial vacuuming: just

Re: [HACKERS] Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility

2012-01-19 Thread Noah Misch

On Thu, Jan 19, 2012 at 02:00:20PM -0500, Robert Haas wrote:
 On Thu, Jan 19, 2012 at 10:37 AM, Noah Misch n...@leadboat.com wrote:
  I agree with Merlin; the frontend/backend protocol is logically distinct 
  from
  the binary send/recv formats of data types. ?For one key point, the latter 
  is
  not exclusively core-defined; third-party extensions change their send/recv
  formats on a different schedule. ?They can add myext.binary_format_version
  GUCs of their own to cope in a similar way.
 
 I agree.  It occurs to me that we recently changed the default *text*
 output format for bytea for reasons not dissimilar to those
 contemplated here.  Presumably, that's a much more disruptive change,
 and yet we've had minimal complaints because anyone who gets bitten
 can easily set bytea_output='escape' and the problem goes away.  The
 same thing seems like it would work here, only the number of people
 needing to change the parameter will probably be even smaller, because
 fewer people use binary than text.
 
 Having said that, if we're to follow the precedent set by
 bytea_format, maybe we ought to just add
 binary_array_format={huge,ittybitty} and be done with it, rather than
 inventing a minor protocol version GUC for something that isn't really
 a protocol version change at all.  We could introduce a
 differently-named general mechanism, but I guess I'm not seeing the
 point of that either.  Just because someone has a
 backward-compatibility issue with one change of this type doesn't mean
 they have a similar issue with all of them.  So I think adding a
 special-purpose GUC is more logical and more parallel to what we've
 done in the past, and it doesn't saddle us with having to be certain
 that we've designed the mechanism generally enough to handle all the
 cases that may come later.

That makes sense.  An attraction of a single binary format version was avoiding
the Is this worth a GUC? conversation for each change.  However, adding a GUC
should be no more notable than bumping a binary format version.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers

2012-01-19 Thread Robert Haas

On Wed, Jan 18, 2012 at 4:23 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Alvaro Herrera alvhe...@commandprompt.com writes:
 Huh, isn't it simpler to just pass the triggers the parse tree *after*
 parse analysis?  I don't understand what you're doing here.

 I didn't realize that the parse analysis is in fact done from within
 standard_ProcessUtility() directly, which means your suggestion is
 indeed workable.

 Tom Lane t...@sss.pgh.pa.us writes:
 It's not the costs I'm worried about so much as the side effects ---

 Ok, so I'm now calling the command trigger procedures once the parse
 analysis is done, and guess what, I'm back to the same problem as
 before:

  https://github.com/dimitri/postgres/commit/4bfab6344a554c09f7322e861f9d09468f641bd9

  CREATE TABLE public.ab_foo-bar
  (
    id serial NOT NULL,
    foo integer default 1,
    PRIMARY KEY(id)
  );
  NOTICE:  CREATE TABLE will create implicit sequence ab_foo-bar_id_seq for 
 serial column ab_foo-bar.id
  NOTICE:  snitch: CREATE SEQUENCE
  ERROR:  unrecognized node type: 904

 I'm not sure about the next step, and I'm quite sure I need to stop here
 for tonight. Any advice welcome, I'll be working on that again as soon
 as tomorrow.

My advice is to forget about trying to provide the command string to
the user for the first version of this patch.  As you're finding out,
there's no simple, easy, obvious way of doing it, and there are N0
useful things that can be done without that functionality.  I continue
to think that for a first version of this, we should be satisfied to
pass just the OID.  I know that's not really what you want, but
there's much to be said for picking a goal that is achievable in the
limited time available, and I fear that setting your sights too high
will lead to something that either (a) doesn't get committed, or (b)
gets committed, but turns out not to work very well, either of which
would be less than ideal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review of patch renaming constraints

2012-01-19 Thread Nikhil Sontakke

  Make check passed.  Patch has tests for rename constraint.
 
  Most normal uses of alter table ... rename constraint ... worked
 normally.  However, the patch does not deal correctly with constraints
 which are not inherited, such as primary key constraints:

 That appears to be because creating a primary key constraint does not
 set pg_constraint.conisonly correctly.  This was introduced recently
 with noninherited check constraints.


 Umm, conisonly is set as false from primary key entries in pg_constraint.
And primary keys are anyways not inherited. So why is the conisonly field
interfering in rename? Seems quite orthogonal to me.

Regards,
Nikhils

Re: [HACKERS] WAL Restore process during recovery

2012-01-19 Thread Fujii Masao

On Fri, Jan 20, 2012 at 4:17 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Tue, Jan 17, 2012 at 6:52 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Mon, Jan 16, 2012 at 2:06 AM, Simon Riggs si...@2ndquadrant.com wrote:
 WALRestore process asynchronously executes restore_command while
 recovery continues working.

 Overlaps downloading of next WAL file to reduce time delays in file
 based archive recovery.

 Handles cases of file-only and streaming/file correctly.

 Though I've not reviewed the patch deeply yet, I observed the following
 two problems when I tested the patch.

 When I set up streaming replication + archive (i.e., restore_command is set)
 and started the standby, I got the following error:

    FATAL:  all AuxiliaryProcs are in use
    LOG:  walrestore process (PID 18839) exited with exit code 1

 Fixed and better documented.

 When I started an archive recovery without setting restore_command,
 it successfully finished.

 Not sure exactly what you mean, but I fixed a bug that might be
 something you're seeing.

Thanks!

But you forgot to include walrestore.c and .h in the patch. Can you submit
the updated version of the patch?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Group commit, revised

2012-01-19 Thread Peter Geoghegan

On 19 January 2012 17:40, Robert Haas robertmh...@gmail.com wrote:
 I don't know what you mean by this.  I think removing wal_writer_delay
 is premature, because I think it still may have some utility, and the
 patch removes it.  That's a separate change that should be factored
 out of this patch and discussed separately.

FWIW, I don't really care too much if we keep wal_writer_delay,
provided it is only used in place of the patch's
WALWRITER_NORMAL_TIMEOUT constant. I will note in passing that I doubt
that the effect with asynchronous commits and hint bits is pronounced
enough to have ever transferred through to someone making a practical
recommendation to reduce wal_writer_delay to ameliorate clog
contention.

 When we did sync rep it made sense to have the WALSender do the work
 and for others to just wait. It would be quite strange to require a
 different design for essentially the same thing for normal commits and
 WAL flushes to local disk. I should mention the original proposal for
 streaming replication had each backend send data to standby
 independently and that was recognised as a bad idea after some time.
 Same for sync rep also.

 I don't think those cases are directly comparable.  SR is talking to
 another machine, and I can't imagine that there is a terribly
 convenient or portable way for every backend that needs one to get a
 hold of the file descriptor for the socket.  Even if it could, the
 data is sent as a stream, so if multiple backends sent to the same
 file descriptor you'd have to have some locking to prevent messages
 from getting interleaved.  Or else you could have multiple
 connections, or use UDP, but that gets rather complex.  Anyway, none
 of this is an issue for file I/O: anybody can open the file, and if
 two backends write data at different offsets at the same time - or the
 same data at the same offset at the same time - there's no problem.
 So the fact that it wasn't a good idea for SR doesn't convince me that
 it's also a bad idea here.

 On the other hand, I'm not saying we *should* do it that way, either -
 i.e. I am not trying to require a different design just because it's
 fun to make people change things.  Rather, I am trying to figure out
 whether the design you've chosen is in fact the best one, and part of
 that involves reasoning about why it might or might not be.  There are
 obvious reasons to think that having process A kick process B and go
 to sleep, then have process B do some work and wake up process A might
 be less efficient than having process A just do the work itself, in
 the uncontended case.  The fact that that isn't happening seems
 awfully strange to me; it's hard to understand why 3 system calls are
 faster than one.  That suggests that either the current system is
 badly broken in some way that this patch fixes (in which case, it
 would be nice to know what the broken thing is) or that there's an
 opportunity for further optimization of the new patch (either now or
 later, depending on how much work we're talking about).  Just to be
 clear, it makes perfect sense that the new system is faster in the
 contended case, and the benchmark numbers are very impressive.  It's a
 lot harder to understand why it's not slower in the uncontended case.

 Not sure why its odd to have backends do one thing and auxiliaries do
 another. The whole point of auxiliary processes is that they do a
 specific thing different to normal backends. Anyway, the important
 thing is to have auxiliary processes be independent of each other as
 much as possible, which simplifies error handling and state logic in
 the postmaster.

 Yeah, I guess the shutdown sequence could get a bit complex if we try
 to make everyone go through the WAL writer all the time.  But I wonder
 if we could rejiggering things somehow so that everything goes through
 WAL writer if its dead.

I'm not sure what you mean by this last bit. It sounds a bit hazardous.

My problem with nominating a backend to the status of an auxiliary is
that no matter what way you cut it, it increases the failure surface
area, so to speak.

I'm not sure why Heikki thinks that it follows that having a
dependency on some backend is simpler than having one on an auxiliary
process. As to the question of IPC and context switch overhead, I'd
speculate that protecting access to a data structure with book keeping
information regarding which backend is currently the driver and so on
might imply considerably more overhead than IPC and context switching.
It might also be that having WAL Writer almost solely responsible for
this might facilitate more effective use of CPU cache.

On most modern architectures, system calls don't actually cause a full
context switch. The kernel can just enter a mode switch (go from
user mode to kernel mode, and then back to user mode). You can observe
this effect with vmstat. That's how 3 system calls might not look much
more expensive than 1.

 +       if (delta  XLOG_SEG_SIZE * CheckPointSegments

Re: [HACKERS] Online base backup from the hot-standby

2012-01-19 Thread Steve Singer


On 12-01-17 05:38 AM, Fujii Masao wrote:

On Fri, Jan 13, 2012 at 5:02 PM, Fujii Masaomasao.fu...@gmail.com  wrote:

The amount of code changes to allow pg_basebackup to make a backup from
the standby seems to be small. So I ended up merging that changes and the
infrastructure patch. WIP patch attached. But I'd happy to split the patch again
if you want.

Attached is the updated version of the patch. I wrote the limitations of
standby-only backup in the document and changed the error messages.



Here is my review of this verison of the patch. I think this patch has 
been in every CF for 9.2 and I feel it is getting close to being 
committed.  The only issue of significants is a crash I encountered 
while testing, see below.


I am fine with including the pg_basebackup changes in the patch it also 
makes testing some of the other changes possible.



The documentation updates you have are good

I don't see any issues looking at the code.



Testing Review


I encountered this on my first replica (the one based on the master).  I 
am not sure if it is related to this patch, it happened after the 
pg_basebackup against the replica finished.


TRAP: FailedAssertion(!(((xid) != ((TransactionId) 0))), File: 
twophase.c, Line: 1238)

LOG:  startup process (PID 1) was terminated by signal 6: Aborted
LOG:  terminating any other active server processes

A little earlier this postmaster had printed.

LOG:  restored log file 0001001F from archive
LOG:  restored log file 00010020 from archive
cp: cannot stat 
`/usr/local/pgsql92git/archive/00010021': No such file 
or directory

LOG:  unexpected pageaddr 0/1900 in log file 0, segment 33, offset 0
cp: cannot stat 
`/usr/local/pgsql92git/archive/00010021': No such file 
or directory



I have NOT been able to replicate this error  and I am not sure exactly 
what I had done in my testing prior to that point.



In another test run I had

- set full page writes=off and did a checkpoint
- Started the pg_basebackup
- set full_page_writes=on and did a HUP + some database activity that 
might have forced a checkpoint.


I got this message from pg_basebackup.
./pg_basebackup -D ../data3 -l foo -h localhost -p 5438
pg_basebackup: could not get WAL end position from server

I point this out because the message is different than the normal could 
not initiate base backup: FATAL:  WAL generated with 
full_page_writes=off thatI normally see.We might want to add a 
PQerrorMessage(conn)) to pg_basebackup to print the error details.  
Since this patch didn't actually change pg_basebackup I don't think your 
required to improve the error messages in it.  I am just mentioning this 
because it came up in testing.



The rest of the tests I did involving changing full_page_writes  
with/without checkpoints and sighups and promoting the replica seemed to 
work as expected.






Regards,

Re: [HACKERS] triggered_change_notification v3

2012-01-19 Thread Robert Haas

On Sun, Jan 15, 2012 at 11:05 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Attached is a version of a previously posted patch which has been
 modified based on on-list feedback from Álvaro.

 This is a generalized trigger function which can be used as an AFTER
 EACH ROW trigger on any table which has a primary key, and will send
 notifications of operations for which the trigger is attached, with a
 payload indicating the table, the operation, and the primary key. A
 channel can be specified in the CREATE TRIGGER command, but will
 default to tcn if omitted.

Nice work, Kevin.  I've committed this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum rate limit in KBps

2012-01-19 Thread Greg Smith

I chewed a bit on Heikki's comment that similarity to the query planning 
parameters might be useful, and Robert's that being able to explain how 
the feature works more easily has value.  I have an initial adjustment 
of my general idea that I think moves usefully in both those directions.


The existing VACUUM cost constants look like this:

vacuum_cost_page_hit = 1
vacuum_cost_page_miss = 10
vacuum_cost_page_dirty = 20

These could be adjusted to instead be ratios like the query planner ones 
(seq_page_cost, random_page_cost, etc.), referenced off a value of 1.0 
for page miss ~= a read is expected:


vacuum_cost_page_hit = 0.1
vacuum_cost_page_miss = 1.0
vacuum_cost_page_dirty = 2.0

Now add in the new setting, which is explicitly said to be the read value:

vacuum_cost_read_limit = 8000 # maximum page miss read rate in 
kilobytes/second


And I can shuffle the numbers around internally such that things still 
work exactly the same, at the default parameters.  And then anyone who 
spends time learning how either the query planning or vacuum cost ratio 
constants work will find the learning curve to pick up the other set easier.


An interesting fall-out of this refactoring is that old postgresql.conf 
settings moved forward for *all* these values will still work fine.  The 
ratios are right and the internal computation won't care.  The math is 
just more complicated to explain when vacuum_cost_page_miss is anything 
but 1.0, which is a problem the manual doesn't have to address.  We 
don't worry about making every query planner parameter discussion 
consider what happens if someone moves seq_page_cost around, this will 
put vacuum_cost_page_miss into the same reference constant category.  
The only problem is for someone who changed one but not all of them in 
their old configuration; that's going to give an unexpected result.


It might be a bit more straightforward yet if things were renamed so it 
was more obvious that page miss~=read, but I haven't seen a good way to 
do that yet.  Renaming the reference cost value to vacuum_cost_page_read 
has two problems.  It makes the backward compatibility issues larger, 
and it's not quite true.  The way I think this should be explained, they 
really aren't the same; that's why I used ~= above.  A page miss is not 
guaranteed to be a read, it just is expected to be one in the worst 
case.  The read rate that vacuum page misses introduce will not be 
exactly the same as vacuum_cost_read_limit--but it will be below that 
limit, which is all it claims to be.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inline Extension

2012-01-19 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Thu, Jan 19, 2012 at 3:42 PM, Dimitri Fontaine
 I'm trying to open the extension facilities (versions being the first of
 them, think \dx) to application PL code, and to hosted environments
 where you're not granted access to the server's file system.

 I guess the question is: for what purpose?

 As you recognized in your original email, if the extension is inline,
 then the objects will need to be dumped, because a simple CREATE
 EXTENSION command is bound to fail.  But my understanding was that a
 major part of the reason - if not the entire reason - was to get
 pg_dump to emit CREATE EXTENSION bob instead of the component SQL
 commands.  If we take that away, what's the remaining benefit of
 packaging those objects inside an extension instead of just dumping
 them loose into the database?

Indeed, it seems like such a thing is not an extension at all anymore,
or at least it gives up many of the useful properties of extensions.

Given the entire lack of demand from the field for such a cut-down
concept of extension, I think we should not be in a hurry to introduce
it.  Maybe in a year or two when we have a clearer idea of how people
are actually using extensions, there will be a better argument for it.
Right now I'm afraid that we might foreclose our options for other
future extension features because these things would be incompatible
with such ideas.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

2012-01-19 Thread Greg Smith


On 01/19/2012 10:52 AM, Robert Haas wrote:

It's not quite clear from your email, but I gather that the way that
this is intended to work is that these values increment every time we
checkpoint?


Right--they get updated in the same atomic bump that moves up things 
like buffers_checkpoint



Also, forgive for asking this possibly-stupid question, but of what
use is this information? I can't imagine why I'd care about a running
total of the number of files fsync'd to disk.  I also can't really
imagine why I'd care about the length of the write phase, which surely
will almost always be a function of checkpoint_completion_target and
checkpoint_timeout unless I manage to overrun the number of
checkpoint_segments I've allocated.  The only number that really seems
useful to me is the time spent syncing.  I have a clear idea what to
look for there: smaller numbers are better than bigger ones.  For the
rest I'm mystified.


Priority #1 here is to reduce (but, admittedly, not always eliminate) 
the need for log file parsing of this particular area, so including all 
the major bits from the existing log message that can be published this 
way would include the write phase time.  You mentioned one reason why 
the write phase time might be interesting; there could be others.  One 
of the things expected here is that Munin will expand its graphing of 
values from pg_stat_bgwriter to include all these fields.  Most of the 
time the graph of time spent in the write phase will be boring and 
useless.  Making it easy for a look at a graph to spot those rare times 
when it isn't is one motivation for including it.


As for why to include the number of files being sync'd, one reason is 
again simply wanting to include everything that can easily be 
published.  A second is that it helps support ideas like my Checkpoint 
sync pause one; that's untunable in any reasonable way without some 
easy way of monitoring the number of files typically sync'd.  Sometimes 
when I'm investigating checkpoint spikes during sync, I wonder whether 
they were because more files than usual were synced, or if it's instead 
just because of more churn on a smaller number.  Making this easy to 
graph pulls that data out to where I can compare it with disk I/O 
trends.  And there's precedent now proving that an always incrementing 
number in pg_stat_bgwriter can be turned into such a graph easily by 
monitoring tools.



And, it doesn't seem like it's necessarily going to safe me a whole
lot either, because if it turns out that my sync phases are long, the
first question out of my mouth is going to be what percentage of my
total sync time is accounted for by the longest sync?.  And so right
there I'm back to the logs.  It's not clear how such information could
be usefully exposed in pg_stat_bgwriter either, since you probably
want to know only the last few values, not a total over all time.


This isn't ideal yet.  I mentioned how some future performance event 
logging history collector was really needed as a place to push longest 
sync times into, and we don't have it yet.  This is the best thing to 
instrument that I'm sure is useful, and that I can stick onto with the 
existing infrastructure.


The idea is that this change makes it possible to trigger a sync times 
are too long alert out of a tool that's based solely on database 
queries.  When that goes off, yes you're possibly back to the logs again 
for more details about the longest individual sync time.  But the rest 
of the time, what's hopefully the normal state of things, you can ignore 
the logs and just track the pg_stat_bgwriter numbers.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade with plpython is broken

2012-01-19 Thread Peter Eisentraut

On tor, 2012-01-19 at 17:04 -0500, Bruce Momjian wrote:
 For that reason, I wonder if I should just hard-code the plpython
 rename into the pg_upgrade test in check_loadable_libraries().

Yes, I haven't come up with a better solution either.

If this becomes a general problem, we might need to add a command line
option to ignore certain names or something.  But not yet.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON for PG 9.2

2012-01-19 Thread Tom Lane

Andrew Dunstan and...@dunslane.net writes:
 On 01/19/2012 04:12 PM, Robert Haas wrote:
 On Thu, Jan 19, 2012 at 4:07 PM, Andrew Dunstanand...@dunslane.net  wrote:
 The spec only allows unescaped Unicode chars (and for our purposes that
 means UTF8). An unescaped non-ASCII character in, say, ISO-8859-1 will
 result in something that's not legal JSON.

 I understand.  I'm proposing that we not care.  In other words, if the
 server encoding is UTF-8, it'll really be JSON.  But if the server
 encoding is something else, it'll be almost-JSON.

 Of course, for data going to the client, if the client encoding is UTF8, 
 they should get legal JSON, regardless of what the database encoding is, 
 and conversely too, no?

Yes.  I think this argument has been mostly theologizing, along the
lines of how many JSON characters can dance on the head of a pin.
From a user's perspective, the database encoding is only a constraint on
which characters he can store.  He does not know or care what the bit
representation is inside the server.  As such, if we store a non-ASCII
character in a JSON string, it's valid JSON as far as the user is
concerned, so long as that character exists in the Unicode standard.
If his client encoding is UTF8, the value will be letter-perfect JSON
when it gets to him; and if his client encoding is not UTF8, then he's
already pretty much decided that he doesn't give a fig about the
Unicode-centricity of the JSON spec, no?

So I'm with Robert: we should just plain not care.  I would further
suggest that maybe what we should do with incoming JSON escape sequences
is convert them to Unicode code points and then to the equivalent
character in the database encoding (or throw error if there is none).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review of patch renaming constraints

2012-01-19 Thread Peter Eisentraut

On fre, 2012-01-20 at 09:08 +0530, Nikhil Sontakke wrote:
  Umm, conisonly is set as false from primary key entries in
 pg_constraint.
 And primary keys are anyways not inherited. So why is the conisonly
 field interfering in rename? Seems quite orthogonal to me. 

In the past, each kind of constraint was either always inherited or
always not, implicitly.  Now, for check constraints we can choose what
we want, and in the future, perhaps we will want to choose for primary
keys as well.  So having conisonly is really a good step into that
future, and we should use it uniformly.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review of patch renaming constraints

2012-01-19 Thread Nikhil Sontakke

  And primary keys are anyways not inherited. So why is the conisonly
  field interfering in rename? Seems quite orthogonal to me.

 In the past, each kind of constraint was either always inherited or
 always not, implicitly.  Now, for check constraints we can choose what
 we want, and in the future, perhaps we will want to choose for primary
 keys as well.  So having conisonly is really a good step into that
 future, and we should use it uniformly.


Agreed. And right now primary key constraints are not marked as only making
them available for inheritance in the future. Or you prefer it otherwise?

Anyways, fail to see the direct connection between this and renaming. Might
have to look at this patch for that.

Regards,
Nikhils

Re: [HACKERS] Patch review for logging hooks (CF 2012-01)

2012-01-19 Thread Greg Smith


On 01/18/2012 04:23 PM, Marti Raudsepp wrote:

The updated patch looks good, marking as 'Ready for Committer'


Patches without documentation are never ready for commit.  For this one, 
I'm not sure if that should be in the form of a reference example in 
contrib, or just something that documents that the hook exists and what 
the ground rules are for grabbing it.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

87 matches

Mail list logo