Re: [HACKERS] Client Messages

2012-01-26 Thread Heikki Linnakangas

On 25.01.2012 15:29, Jim Mlodgenski wrote:

On Tue, Jan 24, 2012 at 7:38 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

There's one little problem remaining with this, which is what to do if the
message is in a different encoding than used by the client? That's not a new
problem, we have the same problem with any string GUC, if you do SHOW
setting. We restricted application_name to ASCII characters, which is an
obvious way to avoid the problem, but I think it would be a shame to
restrict this to ASCII-only.

Isn't that an issue for the administrator understanding his audience.
Maybe some additional documentation explaining the encoding issue?


The thing is, there's currently no encoding conversion happening, so if 
you have one database in LATIN1 encoding and another in UTF-8, for 
example, whatever you put in your postgresql.conf is going to be wrong 
for one database. I'm happy to just document the issue for per-database 
messages, ALTER DATABASE ... SET welcome_message, the encoding used 
there need to match the encoding of the database, or it's displayed as 
garbage. But what about per-user messages, when the user has access to 
several databases, or postgresql.conf?


For postgresql.conf I think we could make a rule that it's always in 
UTF-8. We haven't had to take a stance on the encoding used in 
postgresql.conf before, but IMHO UTF-8 only would be quite reasonable. 
We already have a problem there if for example you have two database 
with different encodings, a schema with non-ascii characters in it, and 
you try to put that schema in search_path in postgresql.conf. That's 
such a rare situation that we haven't heard any complaints, but it's not 
so unreasonable for welcome_message.


That still leaves the problem with per-user messages, though. We've 
managed to dodge the encoding issue in shared catalogs this far. It 
would be pretty hard to enforce any given encoding there, I think. 
Perhaps we could just document that, and advice to create per-database 
and per-user settings in that case.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stats_recovery view

2012-01-26 Thread Bernd Helmle



--On 15. Januar 2012 02:50:00 -0500 Jaime Casanova ja...@2ndquadrant.com 
wrote:



Attached is a patch thats implements a pg_stat_recovery view that
keeps counters about processed wal records. I just notice that it
still lacks documentation but i will add it during the week.


Hi Jaime,

do you have an updated patch? The current v1 patch doesn't apply cleanly 
anymore, and before i go and rebase the patch i thought i'm asking...


--
Thanks

Bernd

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parameterized inner paths

2012-01-26 Thread Cédric Villemain
 To be clear, I'd love to have this feature.  But if there is a choice
 between reducing planning time significantly for everyone and NOT
 getting this feature, and increasing planning time significantly for
 everyone and getting this feature, I think we will make more people
 happy by doing the first one.

 We're not really talking about are we going to accept or reject a
 specific feature.  We're talking about whether we're going to decide
 that the last two years worth of planner development were headed in
 the wrong direction and we're now going to reject that and try to
 think of some entirely new concept.  This isn't an isolated patch,
 it's the necessary next step in a multi-year development plan.  The
 fact that it's a bit slower at the moment just means there's still
 work to do.


Those planner improvements are very promising despite the current
extra cost in planning in some situations. And it looks like a good
solution to have an effective and interesting index-only usage.

We've hitten the planner limits and a lower level of the planner need
to be revisited. Please Tom, keep on.
And it might be that 9.2 is a very good release to do that because if
there is performance impact from this patch (at the end), there are so
many improvment in other places that it should be easely compensated
for users. It might be harder to do the change in 9.3 if the
performance of 9.3 are lower than the ones from 9.2.

-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] xlog location arithmetic

2012-01-26 Thread Fujii Masao
On Sun, Jan 22, 2012 at 1:13 AM, Euler Taveira de Oliveira
eu...@timbira.com wrote:
 On 23-12-2011 12:05, Tom Lane wrote:
 I too think a datatype is overkill, if we're only planning on providing
 one function.  Just emit the values as numeric and have done.

 Here it is. Output changed to numeric.

Thanks!

When I compiled the source with xlogdiff.patch, I got the following warnings.

xlogfuncs.c:511:2: warning: format '%lX' expects argument of type
'long unsigned int *', but argument 3 has type 'uint64 *' [-Wformat]
xlogfuncs.c:511:2: warning: format '%lX' expects argument of type
'long unsigned int *', but argument 4 has type 'uint64 *' [-Wformat]
xlogfuncs.c:515:2: warning: format '%lX' expects argument of type
'long unsigned int *', but argument 3 has type 'uint64 *' [-Wformat]
xlogfuncs.c:515:2: warning: format '%lX' expects argument of type
'long unsigned int *', but argument 4 has type 'uint64 *' [-Wformat]
xlogfuncs.c:524:3: warning: format '%lX' expects argument of type
'long unsigned int', but argument 2 has type 'uint64' [-Wformat]
xlogfuncs.c:528:3: warning: format '%lX' expects argument of type
'long unsigned int', but argument 2 has type 'uint64' [-Wformat]

When I tested the patch, I got the following error:

postgres=# SELECT pg_current_xlog_location();
 pg_current_xlog_location
--
 0/274
(1 row)

postgres=# SELECT pg_xlog_location_diff('0/274', '0/274');
ERROR:  xrecoff 274 is out of valid range, 0..A4A534C

In func.sgml
   para
The functions shown in xref
linkend=functions-admin-backup-table assist in making on-line backups.
These functions cannot be executed during recovery.
   /para

Since pg_xlog_location_diff() can be executed during recovery,
the above needs to be updated.

 While the output was int8 I could use
 pg_size_pretty but now I couldn't. I attached another patch that implements
 pg_size_pretty(numeric).

I agree it's necessary.

 * Note: every entry in pg_proc.h is expected to have a DESCR() comment,
 * except for functions that implement pg_operator.h operators and don't
 * have a good reason to be called directly rather than via the operator.

According to the above source code comment in pg_proc.h, ISTM
pg_size_pretty() for numeric also needs to have its own DESCR().

+   buf = DatumGetCString(DirectFunctionCall1(numeric_out,
NumericGetDatum(size)));
+   result = strcat(buf,  kB);

According to man strcat, the dest string must have enough space for
the result.
buf has enough space?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Second thoughts on CheckIndexCompatible() vs. operator families

2012-01-26 Thread Noah Misch
On Wed, Jan 25, 2012 at 11:53:10PM -0500, Tom Lane wrote:
 Not only is that code spectacularly unreadable, but has nobody noticed
 that this commit broke the buildfarm?

Thanks for reporting the problem.  This arose because the new test case
temporarily sets client_min_messages=DEBUG1.  The default buildfarm
configuration sets log_statement=all in its postgresql.conf, so the client
gets those log_statement lines.  I would fix this as attached, resetting the
optional logging to defaults during the test cases in question.  Not
delightful, but that's what we have to work with.

nm
*** a/src/test/regress/expected/alter_table.out
--- b/src/test/regress/expected/alter_table.out
***
*** 1665,1671  where oid = 'test_storage'::regclass;
--- 1665,1679 
   t
  (1 row)
  
+ --
  -- SET DATA TYPE without a rewrite
+ --
+ -- Temporarily disable optional logging that make installcheck testers
+ -- reasonably might enable.
+ SET log_duration = off;
+ SET log_lock_waits = off;
+ SET log_statement = none;
+ SET log_temp_files = -1;
  CREATE DOMAIN other_textarr AS text[];
  CREATE TABLE norewrite_array(c text[] PRIMARY KEY);
  NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index 
norewrite_array_pkey for table norewrite_array
***
*** 1674,1679  ALTER TABLE norewrite_array ALTER c TYPE text[]; -- no work
--- 1682,1691 
  ALTER TABLE norewrite_array ALTER c TYPE other_textarr; -- rebuild index
  DEBUG:  building index norewrite_array_pkey on table norewrite_array
  RESET client_min_messages;
+ RESET log_duration;
+ RESET log_lock_waits;
+ RESET log_statement;
+ RESET log_temp_files;
  --
  -- lock levels
  --
*** a/src/test/regress/sql/alter_table.sql
--- b/src/test/regress/sql/alter_table.sql
***
*** 1197,1203  select reltoastrelid  0 as has_toast_table
--- 1197,1213 
  from pg_class
  where oid = 'test_storage'::regclass;
  
+ --
  -- SET DATA TYPE without a rewrite
+ --
+ 
+ -- Temporarily disable optional logging that make installcheck testers
+ -- reasonably might enable.
+ SET log_duration = off;
+ SET log_lock_waits = off;
+ SET log_statement = none;
+ SET log_temp_files = -1;
+ 
  CREATE DOMAIN other_textarr AS text[];
  CREATE TABLE norewrite_array(c text[] PRIMARY KEY);
  SET client_min_messages = debug1;
***
*** 1205,1210  ALTER TABLE norewrite_array ALTER c TYPE text[]; -- no work
--- 1215,1225 
  ALTER TABLE norewrite_array ALTER c TYPE other_textarr; -- rebuild index
  RESET client_min_messages;
  
+ RESET log_duration;
+ RESET log_lock_waits;
+ RESET log_statement;
+ RESET log_temp_files;
+ 
  --
  -- lock levels
  --

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Second thoughts on CheckIndexCompatible() vs. operator families

2012-01-26 Thread Robert Haas
On Thu, Jan 26, 2012 at 6:55 AM, Noah Misch n...@leadboat.com wrote:
 On Wed, Jan 25, 2012 at 11:53:10PM -0500, Tom Lane wrote:
 Not only is that code spectacularly unreadable, but has nobody noticed
 that this commit broke the buildfarm?

 Thanks for reporting the problem.  This arose because the new test case
 temporarily sets client_min_messages=DEBUG1.  The default buildfarm
 configuration sets log_statement=all in its postgresql.conf, so the client
 gets those log_statement lines.  I would fix this as attached, resetting the
 optional logging to defaults during the test cases in question.  Not
 delightful, but that's what we have to work with.

I'm just going to remove the test.  This is not very future-proof and
an ugly pattern if it gets copied to other places.  We need to work on
a more sensible way for ALTER TABLE to report what it did, but a
solution based on what GUCs the build-farm happens to set doesn't seem
like it's justified for the narrowness of the case we're testing here.
 Whether or not we allow this case to work without a rewrite is in
some sense arbitrary. There's no real reason it can't be done; rather,
we're just exercising restraint to minimize the risk of future bugs.
So I don't want to go to great lengths to test it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql NUL record and field separator

2012-01-26 Thread Abhijit Menon-Sen
At 2012-01-14 14:23:49 +0200, pete...@gmx.net wrote:

 Inspired by this question http://stackoverflow.com/questions/6857265 I
 have implemented a way to set the psql record and field separators to
 a zero byte (ASCII NUL character).

Since this patch is in the commitfest, I had a look at it.

I agree that the feature is useful. The patch applies and builds cleanly
with HEAD@9f9135d1, but needs a further minor tweak to work (attached).
Without it, both zero separators get overwritten with the default value
after option parsing. The code looks good otherwise.

There's one problem:

 psql --record-separator-zero -At -c 'select something from somewhere' | xargs 
 -0 dosomething

If you run find -print0 and it finds one file, it will still print
filename\0, and xargs -0 will work fine.

But psql --record-separator-zero -At -c 'select 1' will print 1\n, not
1\0 or even 1\0\n, so xargs -0 will use the value 1\n, not 1. If
you're doing this in a shell script, handing the last argument specially
would be painful.

At issue are (at least) these three lines from print_unaligned_text in
src/bin/psql/print.c:

 358 /* the last record needs to be concluded with a newline */
 359 if (need_recordsep)
 360 fputc('\n', fout);

Perhaps the right thing to do would be to change this to output \0 if
--record-separator-zero was used (but leave it at \n otherwise)? That
is what my second attached patch does:

$ bin/psql --record-separator-zero --field-separator-zero -At -c 'select 1,2 
union select 3,4'|xargs -0 echo
1 2 3 4

Thoughts?

 I think the most common use of this would be to set the record
 separator from the command line, so we could use a short option
 such as -0 or -z for that.

I agree. The current option names are very unwieldy to type.

-- ams
diff --git a/src/bin/psql/startup.c b/src/bin/psql/startup.c
index 69207c1..691ad14 100644
--- a/src/bin/psql/startup.c
+++ b/src/bin/psql/startup.c
@@ -150,12 +150,14 @@ main(int argc, char *argv[])
 
 	parse_psql_options(argc, argv, options);
 
-	if (!pset.popt.topt.fieldSep.separator)
+	if (!pset.popt.topt.fieldSep.separator 
+		!pset.popt.topt.fieldSep.separator_zero)
 	{
 		pset.popt.topt.fieldSep.separator = pg_strdup(DEFAULT_FIELD_SEP);
 		pset.popt.topt.fieldSep.separator_zero = false;
 	}
-	if (!pset.popt.topt.recordSep.separator)
+	if (!pset.popt.topt.recordSep.separator 
+		!pset.popt.topt.recordSep.separator_zero)
 	{
 		pset.popt.topt.recordSep.separator = pg_strdup(DEFAULT_RECORD_SEP);
 		pset.popt.topt.recordSep.separator_zero = false;
diff --git a/src/bin/psql/print.c b/src/bin/psql/print.c
index 43ddab1..94535eb 100644
--- a/src/bin/psql/print.c
+++ b/src/bin/psql/print.c
@@ -357,7 +357,12 @@ print_unaligned_text(const printTableContent *cont, FILE *fout)
 		}
 		/* the last record needs to be concluded with a newline */
 		if (need_recordsep)
-			fputc('\n', fout);
+		{
+			if (cont-opt-recordSep.separator_zero)
+print_separator(cont-opt-recordSep, fout);
+			else
+fputc('\n', fout);
+		}
 	}
 }
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: tracking temp files in pg_stat_database

2012-01-26 Thread Magnus Hagander
On Sat, Jan 21, 2012 at 20:12, Tomas Vondra t...@fuzzy.cz wrote:
 On 21 Leden 2012, 18:13, Magnus Hagander wrote:
 On Sat, Jan 21, 2012 at 18:02, Tomas Vondra t...@fuzzy.cz wrote:
 Though I just looked at the tabstat code again, and we already split
 that message up at regular intervals. Which would make it quite weird
 to have global counters in it as well.

 But instead of there, perhaps we need a general non table, but more
 than one type of data message sent out at the same time. There is
 other stuff in the queue for it.

 I'm not convinced either way - I'm not against the original way in
 your patch either. I just wanted to throw the idea out there, and was
 hoping somebody else would have an opinion too :-)

 Let's make that in a separate patch. It seems like a good thing to do, but
 I don't think it should be mixed with this patch.

Yeah, I'm not sure we even want to do that yet, when we go down this
road. We might eventually, of course.


 I do like the idea of not sending the message for each temp file,
 though.
 One thing that worries me are long running transactions (think about a
 batch process that runs for several hours within a single transaction).
 By
 sending the data only at the commit, such transactions would not be
 accounted properly. So I'd suggest sending the message either at commit

 By that argument they are *already* not accounted properly, because
 the number of rows and those counters are wrong. By sending the temp
 file data in the middle of the transaction, you won't be able to
 correlate those numbers with the temp file usage.

 I'm not saying the other usecase isn't more common, but whichever way
 you do it, it's going to get inconsistent with *something*.

 The question is whether the patch should be modified to send the message
 only at commit (i.e. just like tabstats) or if it can remain the way it is
 now. And maybe we could change the way all those messages are sent (e.g.
 to the regular submits I've proposed in the previous post) to make them
 more consistent.

 I'm not asking for 100% consistency. What I don't like is a batch process
 consuming resources but not reflected in the stats until the final commit.
 With a huge spike in the stats right after that, although the activity was
 actually spread over several hours.

Yeah, having thought about it some more, I think sending them when
they happen is a better idea. (It's obviously still only going to send
them when the file is closed, but that's as close as we can reasonably
get).

I've cleaned up the patch a bit and applied it - thanks!

Other than cosmetic, I changed the text in the description around a
bit (sol if that's worse now, that's purely my fault) and added docs
to the section about pg_stat_database that you'd forgotten.

Also, I'd like to point you in the direction of the
src/include/catalog/find_unused_oids and duplicate_oids scripts. The
oids you'd picked conflicted with the pg_extension catalog from a year
ago. Easy enough to fix, and there are often conflicts with more
recent oids that the committer has to deal with anyway, but cleaning
those up before submission always helps a little bit. For the next one
:-)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: tracking temp files in pg_stat_database

2012-01-26 Thread Tomas Vondra
On 26 Leden 2012, 14:44, Magnus Hagander wrote:
 On Sat, Jan 21, 2012 at 20:12, Tomas Vondra t...@fuzzy.cz wrote:
 On 21 Leden 2012, 18:13, Magnus Hagander wrote:
 On Sat, Jan 21, 2012 at 18:02, Tomas Vondra t...@fuzzy.cz wrote:
 Though I just looked at the tabstat code again, and we already split
 that message up at regular intervals. Which would make it quite weird
 to have global counters in it as well.

 But instead of there, perhaps we need a general non table, but more
 than one type of data message sent out at the same time. There is
 other stuff in the queue for it.

 I'm not convinced either way - I'm not against the original way in
 your patch either. I just wanted to throw the idea out there, and was
 hoping somebody else would have an opinion too :-)

 Let's make that in a separate patch. It seems like a good thing to do,
 but
 I don't think it should be mixed with this patch.

 Yeah, I'm not sure we even want to do that yet, when we go down this
 road. We might eventually, of course.

Yes, that's one of the reasons why I suggested to do that in a separate
patch (that might be rejected if we find it's a bad idea).

 I've cleaned up the patch a bit and applied it - thanks!

 Other than cosmetic, I changed the text in the description around a
 bit (sol if that's worse now, that's purely my fault) and added docs
 to the section about pg_stat_database that you'd forgotten.

Great. Thanks for the fixes.

 Also, I'd like to point you in the direction of the
 src/include/catalog/find_unused_oids and duplicate_oids scripts. The
 oids you'd picked conflicted with the pg_extension catalog from a year
 ago. Easy enough to fix, and there are often conflicts with more
 recent oids that the committer has to deal with anyway, but cleaning
 those up before submission always helps a little bit. For the next one
 :-)

Aha! I've been wondering how the commiters identify duplicate oids etc.
but I was unaware of those scripts.

Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simulating Clog Contention

2012-01-26 Thread Abhijit Menon-Sen
At 2012-01-12 12:31:20 +, si...@2ndquadrant.com wrote:

 The following patch adds a pgbench option -I to load data using
 INSERTs

This is just to confirm that the patch applies and builds and works
fine (though of course it does take a long time… pity there doesn't
seem to be any easy way to add progress indication like -i has).

I'm aware of the subsequent discussion about using only a long option,
documenting -n, and adding a knob to control index creation timing. I
don't have a useful opinion on any of those things. It's just that the
patch was marked Needs review and it was only while waiting for 100k
inserts to run that I thought of checking if there was any discussion
about it. Oops.

 Yes, my laptop really is that slow.

It appears to be eight times as fast as mine.

-- ams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parameterized inner paths

2012-01-26 Thread Robert Haas
On Wed, Jan 25, 2012 at 11:24 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I wrote:
 Attached is a WIP patch for parameterized paths, along the
 lines we have discussed before: ...

 I've made considerable progress on the TODO items I listed: indxpath.c
 has been ripped apart and restructured, and I have it considering
 parameterized paths for hash sub-joins.  I made a deliberate policy
 decision not to work very hard on exploring parameterized mergejoin
 plans, because that seems to inflate the number of paths considered way
 more than the relative usefulness of merge over hash joins justifies.

I don't fully understand this code, especially not on my first time
through it, but it seems to me that the key to getting the performance
of this code up to where we'd like it to be is to control the number
of useless paths that get generated.

Is there a guard in here against joining a parameterized path to an
intermediate relation when no SJ is involved?  In other words, if
we're joining a parameterized path on A to a path on B, then either
the join to B should satisfy at least part of the parameterization
needed by A, or there should be a special join with A and B on one
side and a relation that satisfies at least part of the
parameterization of A on the other.  In the probably not uncommon case
where there are no SJs at all or all such SJs have only a single rel
on the nullable side, we ought to be able to avoid creating any more
paths than we do right now.  Even if there are SJs with multiple rels
on the outside, we could try to implement some fast check for whether
intermediate paths make any sense: e.g. union the set of rels on the
nullable sides of SJs.  Then, if the joinrel whose paths we're trying
to build isn't a subset of that set, the only thing worth considering
at this level is satisfying a parameterization by building a
parameter-driven nestloop: a bigger parameterized path will do us no
good.

Maybe there's a heuristic like this already in there; I just didn't
see it on first glance.  I guess I'm surprised my the amount of
increase you're seeing in paths considered, though admittedly I
haven't seen your test cases.  If we're properly filtering out the
paths that don't matter, I wouldn't have expected it to have as much
impact as you're describing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.2] sepgsql's DROP Permission checks

2012-01-26 Thread Robert Haas
On Thu, Jan 26, 2012 at 7:27 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 It seems to me reasonable design.
 The attached patch is rebased one according to your perform-deletion patch.

That looks pretty sensible.  But I don't think this is true any more:

+Please note that it shall not be checked on the objects removed by
+cascaded deletion according to the standard manner in SQL.

I've been thinking more about the question of object access hooks with
arguments as well.  In a couple of designs you've proposed, we've
needed InvokeObjectAccessHook0..11 or whatever, and I don't like that
design - it seems grotty and not type-safe.  However, looking at what
you did in this patch, I have another idea.  Suppose that we add ONE
additional argument to the object access hook function, as you've done
here, and if a particular type of hook invocation needs multiple
arguments, it can pass them with a struct.  In fact, let's use a
struct regardless, for uniformity, and pass the value as a void *.
That is:

typedef struct ObjectAccessDrop {
int dropflags;
} ObjectAccessDrop;

At the call site, we do this:

if (object_access_hook)
{
ObjectAccessDrop arg;
arg.dropflags = flags;
InvokeObjectAccessHook(..., arg);
}

If there's no argument, then we can just do:

InvokeObjectAccessHook(..., NULL);

The advantage of this is that if we change the structure definition,
loadable modules compiled against a newer code base should either (1)
still work or (2) fail to compile.  The way we have it right now, if
we decide to pass a second argument (say, the DropBehavior) to the
hook, we're potentially going to silently break sepgsql no matter how
we do it.  But if we enforce use of a struct, then the only thing the
client should ever be doing with the argument it gets is casting it to
ObjectAccessDrop *.  Then, if we've added fields to the struct, the
code will still do the right thing even if the field order has been
changed or whatever.   If we've removed fields or changed their data
types, things should blow up fairly obviously instead of seeming to
work but actually failing.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: pg_basebackup (missing exit on error)

2012-01-26 Thread Robert Haas
On Tue, Jan 24, 2012 at 4:39 AM, Thomas Ogrisegg
tom-...@patches.fnord.at wrote:
 While testing a backup script, I noticed that pg_basebackup exits with
 0, even if it had errors while writing the backup to disk when the
 backup file is to be sent to stdout. The attached patch should fix this
 problem (and some special cases when the last write fails).

This part looks like a typo:

+   if (fflush (tarfile) != 0)
+   {
+   fprintf(stderr, _(%s:
error flushing stdout: %s\n),
+
strerror (errno));
+   disconnect_and_exit(1);
+   }

The error is in flushing the tarfile, not stdout, right?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Psql names completion.

2012-01-26 Thread Josh Kupershmidt
On Mon, Jan 23, 2012 at 5:28 PM, Dominik Bylica dominik2c...@gmail.com wrote:
 Hello.

 It's probably not the right place to write, but I guess you are there to
 take care of it.

 When I was creating a trigger with command like:
 create trigger asdf before update on beginninOfTheNameOfMyTable...
 I hit tab and I got:
 create trigger asdf before update on SET

 That was obviously not what I expected to get.

Should be fixed in 9.2.

Josh

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoid FK validations for no-rewrite ALTER TABLE ALTER TYPE

2012-01-26 Thread Noah Misch
On Wed, Jan 25, 2012 at 10:39:56PM -0500, Noah Misch wrote:
 In any event, the patch needed a rebase, so I've attached it rebased and with
 that comment edited to reference ri_GenerateQualCollation(), that being the
 most-relevant source for the assumption in question.

Commit 9d35116611e6a1fc10f2298944fbf0e4e1a826be invalidated the test case hunks
again.  We'll need to either remove the test cases, as Robert chose to do for
that other patch, or bolster them per
http://archives.postgresql.org/message-id/20120126115536.GD15670%40tornado.leadboat.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_database deadlock counter

2012-01-26 Thread Magnus Hagander
On Sun, Jan 22, 2012 at 19:35, Jaime Casanova ja...@2ndquadrant.com wrote:
 On Mon, Jan 16, 2012 at 3:19 PM, Magnus Hagander mag...@hagander.net wrote:
 Attached patch adds a counter for number of deadlocks in a database to
 pg_stat_database.


 A little review:

 - it applies with a few hunks
 - the oid you have chosen for the function pg_stat_get_db_deadlocks()
 is already in use, please choose another one using the unused_oids
 script (3150)

yeah, taht always happens..

 - pg_stat_reset() doesn't reset deadlock counter (see
 pgstat_recv_resetcounter())

Good catch, thanks!

I've adjusted the patch for that, resolved the conflicts with the
other pg_stat_database patch, and committed it.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Different error messages executing CREATE TABLE or ALTER TABLE to create a column xmin

2012-01-26 Thread Robert Haas
On Tue, Jan 24, 2012 at 6:27 PM, Vik Reykja vikrey...@gmail.com wrote:
 I took my first stab at hacking the sources to fix the bug reported here:

 http://archives.postgresql.org/pgsql-bugs/2012-01/msg00152.php

 It's such a simple patch but it took me several hours with Google and IRC
 and I'm sure I did many things wrong.  It seems to work as advertised,
 though, so I'm submitting it.

 I suppose since I have now submitted a patch, it is my moral obligation to
 review at least one.  I'm not sure how helpful I'll be, but I'll go bite the
 bullet and sign myself up anyway.

I'm not sure that an error message that is accurate but slightly less
clear than you'd like qualifies as a bug, but I agree that it would
probably be worth improving, and I also agree with the general
approach you've taken here.  However, I think we ought to handle
renaming a column symmetrically to adding one.  So here's a revised
version of your patch that does that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


systemcolumn-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: better support for debugging of overloaded functions

2012-01-26 Thread Abhijit Menon-Sen
At 2011-11-24 17:44:16 +0100, pavel.steh...@gmail.com wrote:

 patch is relative long, but almost all are changes in regress tests.
 Changes in plpgsql are 5 lines

The change looks good in principle. The patch applies to HEAD with a bit
of fuzz and builds fine… but it fails tests, because it's incomplete.

Pavel, your patch doesn't contain any changes to pl_exec.c. Did you just
forget to submit them? Anyway, some errcontext() calls need to be taught
to print -fn_signature rather than -fn_name. I made those changes, and
found some more failing tests.

Updated patch attached. Ready for committer.

-- ams
diff --git a/src/pl/plpgsql/src/pl_comp.c b/src/pl/plpgsql/src/pl_comp.c
index 6ba1e5e..eee6f6c 100644
--- a/src/pl/plpgsql/src/pl_comp.c
+++ b/src/pl/plpgsql/src/pl_comp.c
@@ -342,6 +342,7 @@ do_compile(FunctionCallInfo fcinfo,
 	compile_tmp_cxt = MemoryContextSwitchTo(func_cxt);
 
 	function-fn_name = pstrdup(NameStr(procStruct-proname));
+	function-fn_signature = format_procedure(fcinfo-flinfo-fn_oid);
 	function-fn_oid = fcinfo-flinfo-fn_oid;
 	function-fn_xmin = HeapTupleHeaderGetXmin(procTup-t_data);
 	function-fn_tid = procTup-t_self;
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 5ce8d6e..57e337e 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -799,7 +799,7 @@ plpgsql_exec_error_callback(void *arg)
 			 * local variable initialization
 			 */
 			errcontext(PL/pgSQL function \%s\ line %d %s,
-	   estate-func-fn_name,
+	   estate-func-fn_signature,
 	   estate-err_stmt-lineno,
 	   _(estate-err_text));
 		}
@@ -810,7 +810,7 @@ plpgsql_exec_error_callback(void *arg)
 			 * arguments into local variables
 			 */
 			errcontext(PL/pgSQL function \%s\ %s,
-	   estate-func-fn_name,
+	   estate-func-fn_signature,
 	   _(estate-err_text));
 		}
 	}
@@ -818,13 +818,13 @@ plpgsql_exec_error_callback(void *arg)
 	{
 		/* translator: last %s is a plpgsql statement type name */
 		errcontext(PL/pgSQL function \%s\ line %d at %s,
-   estate-func-fn_name,
+   estate-func-fn_signature,
    estate-err_stmt-lineno,
    plpgsql_stmt_typename(estate-err_stmt));
 	}
 	else
 		errcontext(PL/pgSQL function \%s\,
-   estate-func-fn_name);
+   estate-func-fn_signature);
 }
 
 
diff --git a/src/pl/plpgsql/src/plpgsql.h b/src/pl/plpgsql/src/plpgsql.h
index 0aef8dc..739b9e4 100644
--- a/src/pl/plpgsql/src/plpgsql.h
+++ b/src/pl/plpgsql/src/plpgsql.h
@@ -679,6 +679,7 @@ typedef struct PLpgSQL_func_hashkey
 typedef struct PLpgSQL_function
 {/* Complete compiled function	  */
 	char	   *fn_name;
+	char	   *fn_signature;
 	Oid			fn_oid;
 	TransactionId fn_xmin;
 	ItemPointerData fn_tid;
diff --git a/src/test/regress/expected/domain.out b/src/test/regress/expected/domain.out
index 4f47374..4da1c53 100644
--- a/src/test/regress/expected/domain.out
+++ b/src/test/regress/expected/domain.out
@@ -493,7 +493,7 @@ begin
 end$$ language plpgsql;
 select doubledecrement(3); -- fail because of implicit null assignment
 ERROR:  domain pos_int does not allow null values
-CONTEXT:  PL/pgSQL function doubledecrement line 3 during statement block local variable initialization
+CONTEXT:  PL/pgSQL function doubledecrement(pos_int) line 3 during statement block local variable initialization
 create or replace function doubledecrement(p1 pos_int) returns pos_int as $$
 declare v pos_int := 0;
 begin
@@ -501,7 +501,7 @@ begin
 end$$ language plpgsql;
 select doubledecrement(3); -- fail at initialization assignment
 ERROR:  value for domain pos_int violates check constraint pos_int_check
-CONTEXT:  PL/pgSQL function doubledecrement line 3 during statement block local variable initialization
+CONTEXT:  PL/pgSQL function doubledecrement(pos_int) line 3 during statement block local variable initialization
 create or replace function doubledecrement(p1 pos_int) returns pos_int as $$
 declare v pos_int := 1;
 begin
@@ -514,10 +514,10 @@ select doubledecrement(0); -- fail before call
 ERROR:  value for domain pos_int violates check constraint pos_int_check
 select doubledecrement(1); -- fail at assignment to v
 ERROR:  value for domain pos_int violates check constraint pos_int_check
-CONTEXT:  PL/pgSQL function doubledecrement line 4 at assignment
+CONTEXT:  PL/pgSQL function doubledecrement(pos_int) line 4 at assignment
 select doubledecrement(2); -- fail at return
 ERROR:  value for domain pos_int violates check constraint pos_int_check
-CONTEXT:  PL/pgSQL function doubledecrement while casting return value to function's return type
+CONTEXT:  PL/pgSQL function doubledecrement(pos_int) while casting return value to function's return type
 select doubledecrement(3); -- good
  doubledecrement 
 -
@@ -566,7 +566,7 @@ end$$ language plpgsql;
 select array_elem_check(121.00);
 ERROR:  numeric field overflow
 DETAIL:  A field with precision 4, scale 2 must round to an absolute value less than 10^2.
-CONTEXT:  PL/pgSQL 

Re: [HACKERS] Client Messages

2012-01-26 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 The thing is, there's currently no encoding conversion happening, so if 
 you have one database in LATIN1 encoding and another in UTF-8, for 
 example, whatever you put in your postgresql.conf is going to be wrong 
 for one database. I'm happy to just document the issue for per-database 
 messages, ALTER DATABASE ... SET welcome_message, the encoding used 
 there need to match the encoding of the database, or it's displayed as 
 garbage. But what about per-user messages, when the user has access to 
 several databases, or postgresql.conf?

I've not looked at the patch, but what exactly will happen if the string
has the wrong encoding?

The idea that occurs to me is to have the code that uses the GUC do a
verify_mbstr(noerror) on it, and silently ignore it if it doesn't pass
(maybe with a LOG message).  This would have to be documented of course,
but it seems better than the potential consequences of trying to send a
wrongly-encoded string.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Python result metadata

2012-01-26 Thread Abhijit Menon-Sen
At 2012-01-11 22:05:34 +0200, pete...@gmx.net wrote:

 I propose to add two functions to the result object:
 
 .colnames() returns a list of column names (strings)
 .coltypes() returns a list of type OIDs (integers) […]
 
 Patch attached.  Comments welcome.

Applies, builds, passes tests. Code looks simple and good. Ready for
committer, unless you want to add a typmod accessor too.

-- ams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pause at end of recovery

2012-01-26 Thread Magnus Hagander
On Thu, Jan 26, 2012 at 08:42, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Dec 28, 2011 at 7:27 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Thu, Dec 22, 2011 at 6:16 AM, Simon Riggs si...@2ndquadrant.com wrote:

 I can see a reason to do this now. I've written patch and will commit
 on Friday. Nudge me if I don't.

 It's hard to write this so it works in all cases and doesn't work in
 the right cases also.

 Basically, we can't get in the way of crash recovery, so the only way
 we can currently tell a crash recovery from an archive recovery is the
 presence of restore_command.

 If you don't have that and you haven't set a recovery target, it won't
 pause and there's nothing I can do, AFAICS.

 Please test this and review before commit.

 What if wrong recovery target is specified and an archive recovery reaches
 end of WAL files unexpectedly? Even in this case, we want to pause
 recovery at the end? Otherwise, we'll lose chance to correct the recovery
 target and retry archive recovery.

Yes, we definitely want to pause then.


 One idea; starting archive recovery with standby_mode=on meets your needs?

I haven't tested, but probably, yes. But in that case, why do we need
the pause_at_recovery_target *at all*? It's basically overloaded
functionality already, but I figured it was set up that way to keep
replication and recovery a bit separated?

 When archive recovery reaches end of WAL files, regardless of whether recovery
 target is specified or not, recovery pauses at the end. If hot_standby
 is enabled,
 you can check the contents and if it's OK you can finish recovery by
 pg_ctl promote.

That is pretty much the usecase, yes. Or readjust the recovery target
(or, heck, add more files to the wal archive because you set it up
wrong somehow) and continue the recovery further along the line.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [PATCH] Fix float8 parsing of denormal values (on some platforms?)

2012-01-26 Thread Abhijit Menon-Sen
At 2011-12-21 18:24:17 +0200, ma...@juffo.org wrote:

  I think the least invasive fix, as proposed by Jeroen, is to fail only
  when ERANGE is set *and* the return value is 0.0 or +/-HUGE_VAL.
  Reading relevant specifications, this seems to be a fairly safe
  assumption. That's what the attached patch does.
 
 Oops, now attached the patch too.

Approach seems sensible, patch applies, builds, and passes tests.
Marking ready for committer and crossing fingers about buildfarm
failures…

-- ams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parameterized inner paths

2012-01-26 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Is there a guard in here against joining a parameterized path to an
 intermediate relation when no SJ is involved?  In other words, if
 we're joining a parameterized path on A to a path on B, then either
 the join to B should satisfy at least part of the parameterization
 needed by A, or there should be a special join with A and B on one
 side and a relation that satisfies at least part of the
 parameterization of A on the other.

There is no such guard.  We could probably add one with not an
outrageous amount of expense, but I'm not clear on why you think it's
appropriate to limit the join ordering that way?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: information schema/aclexplode doesn't know about default privileges

2012-01-26 Thread Abhijit Menon-Sen
At 2012-01-09 20:23:59 +0200, pete...@gmx.net wrote:

 Nobody had a better idea, so here is the final patch.  I adjusted the
 regression tests a bit to avoid bloat from the now-visible owner
 privileges.

Patch applies, builds, and passes tests (and does report EXECUTE
privileges on a newly-created function). Code looks fine.

-- ams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Different error messages executing CREATE TABLE or ALTER TABLE to create a column xmin

2012-01-26 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 However, I think we ought to handle
 renaming a column symmetrically to adding one.

Yeah, I was thinking the same.

 So here's a revised version of your patch that does that.

This looks reasonable to me, except that possibly the new error message
text could do with a bit more thought.  It seems randomly unlike the
normal message, and I also have a bit of logical difficulty with the
wording equating a column with a column name.  The wording that
is in use in the existing CREATE TABLE case is

column name \%s\ conflicts with a system column name

We could do worse than to use that verbatim, so as to avoid introducing
a new translatable string.  Another possibility is

column \%s\ of relation \%s\ already exists as a system column

Or we could keep the primary errmsg the same as it is for a normal
column and instead add a DETAIL explaining that this is a system column.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simulating Clog Contention

2012-01-26 Thread Merlin Moncure
On Thu, Jan 26, 2012 at 8:18 AM, Abhijit Menon-Sen a...@toroid.org wrote:
 This is just to confirm that the patch applies and builds and works
 fine (though of course it does take a long time… pity there doesn't
 seem to be any easy way to add progress indication like -i has).

On any non server grade hardware you'd probably want to disable
synchronous_commit while loading.  FWIW, this is a great addition to
pgbench.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should we add crc32 in libpgport?

2012-01-26 Thread Abhijit Menon-Sen
At 2012-01-17 13:43:11 -0800, dan...@heroku.com wrote:

  See the attached patch.  It has a detailed cover letter/comment at
  the top of the file.

The patch looks good, but the resetxlog and controldata Makefile hunks
didn't apply. I don't know why, but I've attached updated versions of
those hunks below, to save someone a moment. Everything else is fine.

-- ams
diff --git a/src/bin/pg_controldata/Makefile b/src/bin/pg_controldata/Makefile
index 0eff846..ebaa179 100644
--- a/src/bin/pg_controldata/Makefile
+++ b/src/bin/pg_controldata/Makefile
@@ -15,16 +15,13 @@ subdir = src/bin/pg_controldata
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS= pg_controldata.o pg_crc.o $(WIN32RES)
+OBJS= pg_controldata.o $(WIN32RES)
 
 all: pg_controldata
 
 pg_controldata: $(OBJS) | submake-libpgport
 	$(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
 
-pg_crc.c: $(top_srcdir)/src/backend/utils/hash/pg_crc.c
-	rm -f $@  $(LN_S) $ .
-
 install: all installdirs
 	$(INSTALL_PROGRAM) pg_controldata$(X) '$(DESTDIR)$(bindir)/pg_controldata$(X)'
 
diff --git a/src/bin/pg_resetxlog/Makefile b/src/bin/pg_resetxlog/Makefile
index eb03b8a..b0dfd16 100644
--- a/src/bin/pg_resetxlog/Makefile
+++ b/src/bin/pg_resetxlog/Makefile
@@ -15,16 +15,13 @@ subdir = src/bin/pg_resetxlog
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS= pg_resetxlog.o pg_crc.o $(WIN32RES)
+OBJS= pg_resetxlog.o $(WIN32RES)
 
 all: pg_resetxlog
 
 pg_resetxlog: $(OBJS) | submake-libpgport
 	$(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
 
-pg_crc.c: $(top_srcdir)/src/backend/utils/hash/pg_crc.c
-	rm -f $@  $(LN_S) $ .
-
 install: all installdirs
 	$(INSTALL_PROGRAM) pg_resetxlog$(X) '$(DESTDIR)$(bindir)/pg_resetxlog$(X)'
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BGWriter latch, power saving

2012-01-26 Thread Heikki Linnakangas

On 17.01.2012 14:38, Peter Geoghegan wrote:

On 17 January 2012 11:24, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

In the patch I sent, I did rearrange the sleeping logic. I think it's more
readable the way it is now.


I have no objection to either your refinement of the sleeping logic,
nor that you moved some things in both the existing code and my patch
so that they occur when no spinlock is held.


Ok, committed with some further cleanup.

Do you think the docs need to be updated for this, and if so, where? The 
only place I found in the docs that speak about how the bgwriter works 
is in config.sgml, where bgwriter_delay is described. Want to suggest an 
update to that?



Should I proceed with a benchmark on V3, so that we can get this
committed? I imagine that a long pgbench-tools run is appropriate,
(after all, it was used to justify the re-write of the BGWriter for
8.3) at various scale factors, from smallish to quite large.


I did some testing on this, with a highly artificial test case that 
dirties pages in shared_buffers as fast as possible. I tried to make it 
a worst-case scenario, see attached script. I tested this with a 32-core 
HP Itanium box, and on my 2-core laptop, and didn't see any measurable 
slowdown from this patch. So I think we're good.


If setting the latch would become a contention issue, there would be a 
pretty easy solution: only try to do it every 10 or 100 dirtied pages, 
for example. A few dirty pages in the buffer cache don't mean anything, 
as long as we kick the bgwriter in a fairly timely fashion when a larger 
burst of activity begins.


BTW, do you have some sort of a test setup for these power-saving 
patches, to actually measure the effect on number of interrupts or 
electricity use? Fewer wakeups should be a good thing, but it would be 
nice to see some figures to get an idea of how much progress we've done 
and what still needs to be done.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com


dirtypages.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parameterized inner paths

2012-01-26 Thread Robert Haas
On Thu, Jan 26, 2012 at 11:04 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Is there a guard in here against joining a parameterized path to an
 intermediate relation when no SJ is involved?  In other words, if
 we're joining a parameterized path on A to a path on B, then either
 the join to B should satisfy at least part of the parameterization
 needed by A, or there should be a special join with A and B on one
 side and a relation that satisfies at least part of the
 parameterization of A on the other.

 There is no such guard.  We could probably add one with not an
 outrageous amount of expense, but I'm not clear on why you think it's
 appropriate to limit the join ordering that way?

Because I think that adding parameterized paths that fail that test is
bound to be worse - or at least no better - than just rearranging the
join order.  In other words, suppose I have this:

a some-join (b some-other-join c on b.x = c.x) on a.y = b.y

If some-join is a LEFT join and some-other-join is an INNER join, then
it makes sense to think about implementing this as a parameterized
nest loop with a on the outside and (b ij c) on the inside.  But if
the joins commute, then AFAICT it doesn't.  The trivial case is when
they are both inner joins; I could do:

Nest Loop
- Seq Scan A
- Nested Loop
- Index Scan B
- Index Scan C

But that's no better than:

Nest Loop
- Nest Loop
- Seq Scan A
- Index Scan B
- Index Scan C

...because those are alternative formulations of the same concept:
scan A, use that to drive index probes against B, and then use those
results to drive index probes against C.

But the same applies if both joins are left joins, or more generally:
if the joins commute, then the plans we're already generating are
sufficient.  When they *don't* commute, then we can potentially
benefit from joining a parameterized path against an intermediate
table.

I would expect a lot of people might have things like this:

a INNER JOIN b ON a.bid = b.bid
INNER JOIN c ON a.cid = c.cid
INNER JOIN d ON a.did = d.did
INNER JOIN e ON d.eid = e.eid
INNER JOIN f ON d.fid = f.fid
LEFT JOIN g ON a.gid = g.gid
LEFT JOIN h ON a.hid = h.hid
LEFT JOIN i ON a.iid = i.iid

AFAICS, such queries aren't going to benefit from this optimization,
so ideally we'd like them to not have to pay little or nothing for it.
 Now, if h happens to be a view defined as h1 INNER JOIN h2 ON h1.x =
h2.x, then it makes sense to try to join a parameterized path on
either h1 or h2 against the other relation before implementing it as a
nest loop against some set of relations that includes a, but we still
can't get any benefit from parameterization anywhere else - joining h1
or h2 against anything else in there is not legal, and join of a
parameterized path over d to, say, f is still no better than what we
can get by rearranging the join order: if the path over d is
parameterized, then whatever join (d join f) will be no better than
(whatever join d) join f.  So it seems like we ought to just not build
a parameterized d join f path in the first place, unless, of course, I
am missing something.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Different error messages executing CREATE TABLE or ALTER TABLE to create a column xmin

2012-01-26 Thread Robert Haas
On Thu, Jan 26, 2012 at 11:29 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 However, I think we ought to handle
 renaming a column symmetrically to adding one.

 Yeah, I was thinking the same.

 So here's a revised version of your patch that does that.

 This looks reasonable to me, except that possibly the new error message
 text could do with a bit more thought.  It seems randomly unlike the
 normal message, and I also have a bit of logical difficulty with the
 wording equating a column with a column name.  The wording that
 is in use in the existing CREATE TABLE case is

 column name \%s\ conflicts with a system column name

 We could do worse than to use that verbatim, so as to avoid introducing
 a new translatable string.  Another possibility is

 column \%s\ of relation \%s\ already exists as a system column

 Or we could keep the primary errmsg the same as it is for a normal
 column and instead add a DETAIL explaining that this is a system column.

I intended for the message to match the CREATE TABLE case.  I think it
does, except I see now that Vik's patch left out the word name where
the original string has it.  So I'll vote in favor of your first
option, above, since that's what I intended anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simulating Clog Contention

2012-01-26 Thread Robert Haas
On Thu, Jan 26, 2012 at 11:41 AM, Merlin Moncure mmonc...@gmail.com wrote:
 On Thu, Jan 26, 2012 at 8:18 AM, Abhijit Menon-Sen a...@toroid.org wrote:
 This is just to confirm that the patch applies and builds and works
 fine (though of course it does take a long time… pity there doesn't
 seem to be any easy way to add progress indication like -i has).

 On any non server grade hardware you'd probably want to disable
 synchronous_commit while loading.  FWIW, this is a great addition to
 pgbench.

Do you object to separating out the three different things the patch
does and adding separate options for each one?  If so, why?  I find
them independently useful.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parameterized inner paths

2012-01-26 Thread Robert Haas
On Thu, Jan 26, 2012 at 11:54 AM, Robert Haas robertmh...@gmail.com wrote:
 AFAICS, such queries aren't going to benefit from this optimization,
 so ideally we'd like them to not have to pay little or nothing for it.

There are a few too many negations in that sentence.

Let me try again:

AFAICS, such queries aren't going to benefit from this optimization,
so ideally we'd like them to pay little or nothing for it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Different error messages executing CREATE TABLE or ALTER TABLE to create a column xmin

2012-01-26 Thread Vik Reykja
On Thu, Jan 26, 2012 at 17:58, Robert Haas robertmh...@gmail.com wrote:

 On Thu, Jan 26, 2012 at 11:29 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  This looks reasonable to me, except that possibly the new error message
  text could do with a bit more thought.  It seems randomly unlike the
  normal message, and I also have a bit of logical difficulty with the
  wording equating a column with a column name.  The wording that
  is in use in the existing CREATE TABLE case is
 
  column name \%s\ conflicts with a system column name
 
  We could do worse than to use that verbatim, so as to avoid introducing
  a new translatable string.  Another possibility is
 
  column \%s\ of relation \%s\ already exists as a system column
 
  Or we could keep the primary errmsg the same as it is for a normal
  column and instead add a DETAIL explaining that this is a system column.

 I intended for the message to match the CREATE TABLE case.  I think it
 does, except I see now that Vik's patch left out the word name where
 the original string has it.  So I'll vote in favor of your first
 option, above, since that's what I intended anyway.


My intention was to replicate the CREATE TABLE message; I'm not sure how I
failed on that particular point.  Thank you guys for noticing and fixing it
(along with all the other corrections).


Re: [HACKERS] [v9.2] Add GUC sepgsql.client_label

2012-01-26 Thread Robert Haas
On Tue, Jan 10, 2012 at 6:28 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 This patch adds a new GUC sepgsql.client_label that allows client
 process to switch its privileges into another one, as long as the
 system security policy admits this transition.
 Because of this feature, I ported two permissions from process class
 of SELinux; setcurrent and dyntransition. The first one checks
 whether the client has a right to switch its privilege. And the other
 one checks a particular transition path from X to Y.

 This feature might seem to break assumption of the sepgsql's security
 model. However, single-directed domain transition from
 bigger-privileges to smaller-privileged domain by users' operation is
 also supported on operating system, and useful feature to restrict
 applications capability at beginning of the session.

 A few weeks ago, I got a requirement from Joshua Brindle. He is
 working for Web-application that uses CAC (Common Access Card) for its
 authentication, and wanted to integrate its security credential and
 security label of selinux/sepgsql.
 One problem was the system environment unavailable to use
 labeled-networking (IPsec), thus, it was not an option to switch the
 security label of processes on web-server side. An other solution is
 to port dynamic-transition feature into sepgsql, as an analogy of
 operating system.

 An expected scenario is below:
 The web-server is running with WEBSERV domain. It is allowed to
 connect to PostgreSQL, and also allowed to invoke an trusted-procedure
 that takes an argument of security-credential within CAC, but, nothing
 else are allowed.
 The trusted-procedure is allowed to reference a table between
 security-credential and security-label to be assigned on, then it
 switches the security label of client into CLIENT_n.
 The CLIENT_n shall be allowed to access tables, functions and others
 according to the security policy, and also allowed to reset
 sepgsql.security_label to revert WEBSERV. However, he is not
 available to switch other domain without security-credential stored
 within CAC card.

 I and Joshua agreed this scenario is reasonable and secure.
 So, we'd like to suggest this new feature towards v9.2 timeline.

I'm wondering if a function would be a better fit than a GUC.  I don't
think you can really restrict the ability to revert a GUC change -
i.e. if someone does a SET and then a RESET, you pretty much have to
allow that.  I think.  But if you expose a function then it can work
however you like.

On another note, this is an awfully large patch.  Is there a separate
patch here that just does code rearrangement that should be separated
out?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simulating Clog Contention

2012-01-26 Thread Merlin Moncure
On Thu, Jan 26, 2012 at 10:59 AM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jan 26, 2012 at 11:41 AM, Merlin Moncure mmonc...@gmail.com wrote:
 On Thu, Jan 26, 2012 at 8:18 AM, Abhijit Menon-Sen a...@toroid.org wrote:
 This is just to confirm that the patch applies and builds and works
 fine (though of course it does take a long time… pity there doesn't
 seem to be any easy way to add progress indication like -i has).

 On any non server grade hardware you'd probably want to disable
 synchronous_commit while loading.  FWIW, this is a great addition to
 pgbench.

 Do you object to separating out the three different things the patch
 does and adding separate options for each one?  If so, why?  I find
 them independently useful.

I didn't take a position on that -- although superficially it seems
like more granular control is good (and you can always group options
together with a 'super option' like as in cp -a) -- just making a
general comment on the usefulness of testing against records that
don't have the same xid.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inline Extension

2012-01-26 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 On Mon, Jan 23, 2012 at 7:59 PM, Daniel Farina dan...@heroku.com wrote:
 Things are still a bit ugly in the more complex cases: consider
 PostGIS's linkage against libproj and other libraries.  In order to

After thinking about that for a while I think we should not include the
shared module (.so) in the scope of this patch nor 9.2.  Ensuring that
the .so dependencies are met is a problem well outside the reach of the
PostgreSQL system.

 cover all cases, I feel that what I need is an optional hook (for the
 same of argument, let's say it's another command type hook, e.g.
 archive_command) to be executed when extension (un)installation is
 occurs on a primary or is replayed on a standby whereby I can acquire
 the necessary dependencies for an extension or signal some kind of
 error (as to exactly how that interfaces with the server is delicate,
 should one want to supply good error messages to the user).

Supporting command triggers on the replica is in the TODO list. Having
that in 9.2 is far from granted though, but the catalog and syntax
already have support for the feature.

 But, that's a bit unlike how we normally do things.  And if we're
 going to WAL-log the writing of the extension files, then I start to
 feel like we should go back to putting the data in a system catalog
 rather than the filesystem, because inventing a new WAL record type
 for this seems like it will make things quite a bit more complicated
 for no particular benefit.

Exactly, and we should talk about concurrent accesses to the filesystem
too, and how to have a clean transactional backup including the scripts
when that pg_dump option is used.

So I'm going to prepare the next version of the patch with this design:

 - in catalog extension scripts for inline extension

   pg_extension_script(extoid, oldversion, version, script)

   oldversion is null when create extension is used
   unless when using the create extension from 'unpackaged' form

 - see about adding more control properties in the catalog?

 - current code that is parsing the filenames to determine the upgrade
   path will have to be able to take the version strings from the new
   catalog as an alternative, and getting to the script content must be
   able to select from the catalog or read a file on disk

 - pg_dump defaults to not dumping extension content

 - pg_dump --include-extension-scripts dumps the scripts found either in
   the filesystem or the catalog, a create script first then any number
   of update script as needed to reach the current installed version

 - same as we have -t, add -e --extension to pg_dump so that you can
   choose to dump only a given extension

The extension dumping will not include the shared modules, so if you
extension depend on them being installed on the server, you will be much
better served with some OS level packaging.

Not for 9.2, but I can't help thinking that if we could manage to host
the .so module itself in the catalogs, we could solve updating it in a
transactional way and more importantly host it per-database, rather than
having the modules work per major version (not even per cluster) and the
extension mechanism work per-database inside each cluster. But that's
work for another release.

Any comments before I spend time coding this?
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pause at end of recovery

2012-01-26 Thread Fujii Masao
On Fri, Jan 27, 2012 at 12:50 AM, Magnus Hagander mag...@hagander.net wrote:
 One idea; starting archive recovery with standby_mode=on meets your needs?

 I haven't tested, but probably, yes. But in that case, why do we need
 the pause_at_recovery_target *at all*? It's basically overloaded
 functionality already, but I figured it was set up that way to keep
 replication and recovery a bit separated?

AFAIK, when standby_mode = on, archive recovery pauses only at end of WAL files.
When recovery target is specified and archive recovery reaches the
target, it doesn't
pause. OTOH, when pause_at_recovery_target is set, archive recovery pauses only
at the target but not end of WAL files. Neither can cover all the usecases. So
pause_at_recovery_target was implemented.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Different error messages executing CREATE TABLE or ALTER TABLE to create a column xmin

2012-01-26 Thread Robert Haas
On Thu, Jan 26, 2012 at 12:02 PM, Vik Reykja vikrey...@gmail.com wrote:
 On Thu, Jan 26, 2012 at 17:58, Robert Haas robertmh...@gmail.com wrote:

 On Thu, Jan 26, 2012 at 11:29 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  This looks reasonable to me, except that possibly the new error message
  text could do with a bit more thought.  It seems randomly unlike the
  normal message, and I also have a bit of logical difficulty with the
  wording equating a column with a column name.  The wording that
  is in use in the existing CREATE TABLE case is
 
  column name \%s\ conflicts with a system column name
 
  We could do worse than to use that verbatim, so as to avoid introducing
  a new translatable string.  Another possibility is
 
  column \%s\ of relation \%s\ already exists as a system column
 
  Or we could keep the primary errmsg the same as it is for a normal
  column and instead add a DETAIL explaining that this is a system column.

 I intended for the message to match the CREATE TABLE case.  I think it
 does, except I see now that Vik's patch left out the word name where
 the original string has it.  So I'll vote in favor of your first
 option, above, since that's what I intended anyway.

 My intention was to replicate the CREATE TABLE message; I'm not sure how I
 failed on that particular point.  Thank you guys for noticing and fixing it
 (along with all the other corrections).

OK, committed with that further change.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_bulkload ON_DUPLICATE_MERGE

2012-01-26 Thread Robert Haas
On Wed, Jan 25, 2012 at 2:49 PM, Benjamin Johnson
benjamin.john...@getcarbonblack.com wrote:
 PG Gurus,

 I have a table like this:

 CREATE TABLE filemods (
  guid                  BIGINT NOT NULL UNIQUE,
  filepath_guid         BIGINT NOT NULL,
  createtime            TIMESTAMP WITH TIME ZONE DEFAULT NULL,
  writetime             TIMESTAMP WITH TIME ZONE DEFAULT NULL,
  deletetime            TIMESTAMP WITH TIME ZONE DEFAULT NULL,
 );

 One event might have (1, '2012-01-25 11:00:00', NULL, NULL) and
 another event might have (1, NULL, '2012-01-25 11:05:00', NULL) and the
 end result should be:
 (1, '2012-01-25 11:00:00', '2012-01-25 11:05:00', NULL).


 I'm trying to modify pg_bulkload to merge two rows together.  The
 changes I have made seem to be working, although I would like input on
 what I am doing that is unsafe or terribly wrong.  You can be brutal.

 I've seen incredible write speed with using pg_bulkload.  If I can just
 get it to consolidate our rows based on the unique key it will remove
 a lot of complexity in our software.

 Also, I'm not entirely sure this mailing list is the correct one, but
 with all the internals you all know, I'm hoping you can help point out
 nasty flaws in my algorithm.  This is the first legitimate attempt I
 have made at modifying PG source, so I'm not real familiar with the
 proper way of loading pages and tuples and updating heaps and all that
 pg core stuff.

 Here's the modifications to pg_btree.c (from pg_bulkload HEAD):

 http://pastebin.com/U23CapvR

 I also attached the patch.

I am not sure who maintains pg_bulkload, but it's not part of the core
distribution, so this is the wrong mailing list  you may want to
look here:

http://pgfoundry.org/mail/?group_id=1000261

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Client Messages

2012-01-26 Thread Peter Eisentraut
On tor, 2012-01-26 at 10:34 +0200, Heikki Linnakangas wrote:
 For postgresql.conf I think we could make a rule that it's always in 
 UTF-8. We haven't had to take a stance on the encoding used in 
 postgresql.conf before, but IMHO UTF-8 only would be quite reasonable.

We have so far violently resisted forcing UTF-8 anywhere, and this
doesn't seem like a good place to start.

We could perhaps recognize an encoding declaration like this in the
file:

http://docs.python.org/reference/lexical_analysis.html#encoding-declarations


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Configuring Postgres to Add A New Source File

2012-01-26 Thread Robert Haas
On Wed, Jan 25, 2012 at 7:06 AM, Tareq Aljabban
tareq.aljab...@gmail.com wrote:
 Hi,
 I'm doing some development on the storage manager module of Postgres.

 I have added few source files already to the smgr folder, and I was able to
 have the Make system includes them by adding their names to the OBJS list in
 the Makefile inside the smgr folder. (i.e. When I add A.c, I would add A.o
 to the OBJS list).

 That was working fine. Now I'm trying to add a new file hdfs_test.c to the
 project. The problem with this file is that it requires some extra
 directives in its compilation command (-I and -L directives).

 Using gcc the file is compiled with the command:

 gcc hdfs_test.c -I/HDFS_HOME/hdfs/src/c++/libhdfs
 -I/usr/lib/jvm/default-java/include -L/HDFS_HOME/hdfs/src/c++/libhdfs
 -L/HDFS_HOME/build/c++/Linux-i386-32/lib
 -L/usr/lib/jvm/default-java/jre/lib/i386/server -ljvm -lhdfs -o hdfs_test

 I was told that in order to add the extra directives, I need to modify
 $LDFLAGS in configure.in and $LIBS in MakeFile.

 I read about autoconf and checked configure.in of Postgres but still wasn't
 able to know exactly what I should be doing.

This is really a general question about make that has nothing to do
with PostgreSQL specifically; you might want to get a book on make.

Makefiles in subdirectories of src/backend are only intended to be
able to build the backend itself, so there's probably no particularly
easy way to do what you want there.  You likely want to put this code
someplace like src/test/hdfs_test and copy one of the other Makefiles
into the new directory, perhaps src/test/isolation/Makefile.

But even if you do that for starters, you're going to need to make
some modifications, which may be a bit tricky if you have no
experience at all using make.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] hiding variable-length fields from Form_pg_* structs

2012-01-26 Thread Peter Eisentraut
On mån, 2012-01-09 at 14:04 -0500, Tom Lane wrote:
 #ifdef CATALOG_VARLEN   /* variable-length fields start here
 */
 
 to be even clearer.
 
 What would be appropriate to add instead of those inconsistently-used
 comments is explicit comments about the exception cases such as
 proargtypes, to make it clear that the placement of the #ifdef
 CATALOG_VARLEN is intentional and not a bug in those cases.
 
I implemented your suggestions in the attached patch.
diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index 1ba935c..cdb0bee 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -143,6 +143,7 @@ sub Catalogs
 elsif ($declaring_attributes)
 {
 next if (/^{|^$/);
+next if (/^#/);
 if (/^}/)
 {
 undef $declaring_attributes;
@@ -150,6 +151,7 @@ sub Catalogs
 else
 {
 my ($atttype, $attname) = split /\s+/, $_;
+die parse error ($input_file) unless $attname;
 if (exists $RENAME_ATTTYPE{$atttype})
 {
 $atttype = $RENAME_ATTTYPE{$atttype};
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index d0138fc..a59950e 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -3351,15 +3351,30 @@ RelationGetIndexList(Relation relation)
 	while (HeapTupleIsValid(htup = systable_getnext(indscan)))
 	{
 		Form_pg_index index = (Form_pg_index) GETSTRUCT(htup);
+		Datum		indclassDatum;
+		oidvector  *indclass;
+		bool		isnull;
 
 		/* Add index's OID to result list in the proper order */
 		result = insert_ordered_oid(result, index-indexrelid);
 
+		/*
+		 * indclass cannot be referenced directly through the C struct, because
+		 * it comes after the variable-width indkey field.  Must extract the
+		 * datum the hard way...
+		 */
+		indclassDatum = heap_getattr(htup,
+	 Anum_pg_index_indclass,
+	 GetPgIndexDescriptor(),
+	 isnull);
+		Assert(!isnull);
+		indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
 		/* Check to see if it is a unique, non-partial btree index on OID */
 		if (index-indnatts == 1 
 			index-indisunique  index-indimmediate 
 			index-indkey.values[0] == ObjectIdAttributeNumber 
-			index-indclass.values[0] == OID_BTREE_OPS_OID 
+			indclass-values[0] == OID_BTREE_OPS_OID 
 			heap_attisnull(htup, Anum_pg_index_indpred))
 			oidIndex = index-indexrelid;
 	}
diff --git a/src/include/catalog/genbki.h b/src/include/catalog/genbki.h
index 5fe3a5b..bcf31e6 100644
--- a/src/include/catalog/genbki.h
+++ b/src/include/catalog/genbki.h
@@ -22,6 +22,16 @@
 /* Introduces a catalog's structure definition */
 #define CATALOG(name,oid)	typedef struct CppConcat(FormData_,name)
 
+/*
+ * This is never defined; it's here only for documentation.
+ *
+ * Variable-length catalog fields (except possibly the first not nullable one)
+ * should not be visible in C structures, so they are made invisible by #ifdefs
+ * of an undefined symbol.  See also MARKNOTNULL in bootstrap.c for how this is
+ * handled.
+ */
+#undef CATALOG_VARLEN
+
 /* Options that may appear after CATALOG (on the same line) */
 #define BKI_BOOTSTRAP
 #define BKI_SHARED_RELATION
diff --git a/src/include/catalog/pg_aggregate.h b/src/include/catalog/pg_aggregate.h
index 4e10021..0c8a20c 100644
--- a/src/include/catalog/pg_aggregate.h
+++ b/src/include/catalog/pg_aggregate.h
@@ -44,7 +44,9 @@ CATALOG(pg_aggregate,2600) BKI_WITHOUT_OIDS
 	regproc		aggfinalfn;
 	Oid			aggsortop;
 	Oid			aggtranstype;
-	text		agginitval;		/* VARIABLE LENGTH FIELD */
+#ifdef CATALOG_VARLEN			/* variable-length fields start here */
+	text		agginitval;
+#endif
 } FormData_pg_aggregate;
 
 /* 
diff --git a/src/include/catalog/pg_attrdef.h b/src/include/catalog/pg_attrdef.h
index 01f663c..ad770e4 100644
--- a/src/include/catalog/pg_attrdef.h
+++ b/src/include/catalog/pg_attrdef.h
@@ -32,8 +32,10 @@ CATALOG(pg_attrdef,2604)
 {
 	Oid			adrelid;		/* OID of table containing attribute */
 	int2		adnum;			/* attnum of attribute */
+#ifdef CATALOG_VARLEN			/* variable-length fields start here */
 	pg_node_tree adbin;			/* nodeToString representation of default */
 	text		adsrc;			/* human-readable representation of default */
+#endif
 } FormData_pg_attrdef;
 
 /* 
diff --git a/src/include/catalog/pg_attribute.h b/src/include/catalog/pg_attribute.h
index 5cb16f6..45e38e4 100644
--- a/src/include/catalog/pg_attribute.h
+++ b/src/include/catalog/pg_attribute.h
@@ -145,11 +145,8 @@ CATALOG(pg_attribute,1249) BKI_BOOTSTRAP BKI_WITHOUT_OIDS BKI_ROWTYPE_OID(75) BK
 	/* attribute's collation */
 	Oid			attcollation;
 
-	/*
-	 * VARIABLE LENGTH FIELDS start here.  These fields may be NULL, too.
-	 *
-	 * NOTE: the following fields are not present in tuple descriptors!

Re: [HACKERS] Different error messages executing CREATE TABLE or ALTER TABLE to create a column xmin

2012-01-26 Thread Vik Reykja
On Thu, Jan 26, 2012 at 18:47, Robert Haas robertmh...@gmail.com wrote:

 OK, committed with that further change.


Thank you, Robert!  My first real contribution, even if tiny :-)

Just a small nit to pick, though: Giuseppe Sucameli contributed to this
patch but was not credited in the commit log.


Re: [HACKERS] Client Messages

2012-01-26 Thread Heikki Linnakangas

On 26.01.2012 17:31, Tom Lane wrote:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

The thing is, there's currently no encoding conversion happening, so if
you have one database in LATIN1 encoding and another in UTF-8, for
example, whatever you put in your postgresql.conf is going to be wrong
for one database. I'm happy to just document the issue for per-database
messages, ALTER DATABASE ... SET welcome_message, the encoding used
there need to match the encoding of the database, or it's displayed as
garbage. But what about per-user messages, when the user has access to
several databases, or postgresql.conf?


I've not looked at the patch, but what exactly will happen if the string
has the wrong encoding?


You get an incorrectly encoded string, ie. garbage, in your console, 
when you log in with psql.


You can also use current_setting() to copy the incorrectly-encoded 
string elsewhere in the system. If you insert it into a table and run 
pg_dump, I think the dump might not be restorable. That's a bit of a 
stretch, perhaps, but it would be nice to avoid that.


BTW, you can already do that if you set e.g default_text_search_config 
to something non-ASCII in postgresql.conf. Or if you do it with 
search_path, you get a warning at login. For example, I did ALTER USER 
foouser set search_path ='kääk'; in a LATIN1 database, and then 
connected to a UTF-8 database and got:


$ ~/pgsql.master/bin/psql postgres foouser
WARNING:  invalid value for parameter search_path: k��k
DETAIL:  schema k��k does not exist
psql (9.2devel)
Type help for help.

(in case that didn't get across right, I set the search_path to a string 
containing two a-with-umlauts, and in the warning, they got replaced 
with question marks with inverse colors, which is apparently a character 
that the console uses to display bytes that are not valid UTF-8).


The problem with welcome_message would look just like that. No-one is 
likely to run into that with search_path, but it's quite reasonable and 
expected to use your native language in a welcome message.



The idea that occurs to me is to have the code that uses the GUC do a
verify_mbstr(noerror) on it, and silently ignore it if it doesn't pass
(maybe with a LOG message).  This would have to be documented of course,
but it seems better than the potential consequences of trying to send a
wrongly-encoded string.


Hmm, fine with me. It would be nice to plug the hole that these bogus 
characters can leak elsewhere into the system through current_setting, 
though. Perhaps we could put the verify_mbstr() call somewhere in guc.c, 
to forbid incorrectly encoded characters from being stored in the guc 
variable in the first place.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Different error messages executing CREATE TABLE or ALTER TABLE to create a column xmin

2012-01-26 Thread Robert Haas
On Thu, Jan 26, 2012 at 1:13 PM, Vik Reykja vikrey...@gmail.com wrote:
 On Thu, Jan 26, 2012 at 18:47, Robert Haas robertmh...@gmail.com wrote:

 OK, committed with that further change.

 Thank you, Robert!  My first real contribution, even if tiny :-)

 Just a small nit to pick, though: Giuseppe Sucameli contributed to this
 patch but was not credited in the commit log.

Hmm, I should have credited him as the reporter: sorry about that.  I
didn't use any of his code, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-01-26 Thread Alvaro Herrera

Excerpts from Alvaro Herrera's message of mar ene 24 15:47:16 -0300 2012:

 Need more code changes for the following:

 * export FOR KEY UPDATE lock mode in SQL

While doing this, I realized that there's an open item here regarding a
transaction that locks a tuple, and then in an aborted savepoint deletes
it.  As things stand, what happens is that the original tuple lock is
forgotten entirely, which was one of the things I wanted to fix (and in
fact is fixed for all other cases AFAICS).  So what we need is to be
able to store a MultiXactId that includes a member for KeyUpdate locks,
which will represent an UPDATE that touches key columns as well as
DELETEs.  That closes the hole.  However, the problem with this is that
we have no more bits left in the flag bitmask, which is another concern
you had raised.  I chose the easy way out and added a full byte of flags
per transaction.

This means that we now have 1636 xacts per members page rather than
1900+, but I'm not too concerned about that.  (We could cut back to 4
flag bits per xact -- but then, having some room for future growth is
probably a good plan anyway).

So DELETEs can also create multis.  I'm going to submit an updated patch
shortly.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.2] Add GUC sepgsql.client_label

2012-01-26 Thread Kohei KaiGai
2012/1/26 Robert Haas robertmh...@gmail.com:
 I'm wondering if a function would be a better fit than a GUC.  I don't
 think you can really restrict the ability to revert a GUC change -
 i.e. if someone does a SET and then a RESET, you pretty much have to
 allow that.  I think.  But if you expose a function then it can work
 however you like.

One benefit to use GUC is that we can utilize existing mechanism to
revert a value set within a transaction block on error.
If we implement same functionality with functions, XactCallback allows
sepgsql to get control on appropriate timing?

 On another note, this is an awfully large patch.  Is there a separate
 patch here that just does code rearrangement that should be separated
 out?

OK. I moved some routines related to client_label into label.c.
I'll separate this part from the new functionality part.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Different error messages executing CREATE TABLE or ALTER TABLE to create a column xmin

2012-01-26 Thread Giuseppe Sucameli
Hi Robert,

On Thu, Jan 26, 2012 at 7:59 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jan 26, 2012 at 1:13 PM, Vik Reykja vikrey...@gmail.com wrote:
 On Thu, Jan 26, 2012 at 18:47, Robert Haas robertmh...@gmail.com wrote:

 OK, committed with that further change.

thank you for the fast fix and Vik for the base patch ;)

 Just a small nit to pick, though: Giuseppe Sucameli contributed to this
 patch but was not credited in the commit log.

 I didn't use any of his code, though.

I agree, Robert didn't use any part of my patch, so there's no
reason to credit me in the log.

Regards.

-- 
Giuseppe Sucameli

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.2] Add GUC sepgsql.client_label

2012-01-26 Thread Robert Haas
On Thu, Jan 26, 2012 at 2:07 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 2012/1/26 Robert Haas robertmh...@gmail.com:
 I'm wondering if a function would be a better fit than a GUC.  I don't
 think you can really restrict the ability to revert a GUC change -
 i.e. if someone does a SET and then a RESET, you pretty much have to
 allow that.  I think.  But if you expose a function then it can work
 however you like.

 One benefit to use GUC is that we can utilize existing mechanism to
 revert a value set within a transaction block on error.
 If we implement same functionality with functions, XactCallback allows
 sepgsql to get control on appropriate timing?

Not sure, but I thought the use case was to set this at connection
startup time and then hand the connection off to a client.  What keeps
the client from just issuing RESET?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parameterized inner paths

2012-01-26 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Is there a guard in here against joining a parameterized path to an
 intermediate relation when no SJ is involved?  In other words, if
 we're joining a parameterized path on A to a path on B, then either
 the join to B should satisfy at least part of the parameterization
 needed by A, or there should be a special join with A and B on one
 side and a relation that satisfies at least part of the
 parameterization of A on the other.

I've implemented this idea, recast a bit to prevent generating a
parameterized join path in the first place unless it depends on a
parameter from a relation for which there's a join ordering constraint
still outstanding.  It seems to get us to where the planning time
penalty is only about 10%, which frankly is probably less than sampling
error considering the small set of test cases I'm looking at.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BGWriter latch, power saving

2012-01-26 Thread Peter Geoghegan
On 26 January 2012 16:48, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Ok, committed with some further cleanup.

 Do you think the docs need to be updated for this, and if so, where? The
 only place I found in the docs that speak about how the bgwriter works is in
 config.sgml, where bgwriter_delay is described. Want to suggest an update to
 that?

This new behaviour might warrant a mention, as in the attached doc-patch.

 I did some testing on this, with a highly artificial test case that dirties
 pages in shared_buffers as fast as possible. I tried to make it a worst-case
 scenario, see attached script. I tested this with a 32-core HP Itanium box,
 and on my 2-core laptop, and didn't see any measurable slowdown from this
 patch. So I think we're good.

Cool. I was pretty confident that it would be completely impossible to
show a regression under any circumstances, but I'm glad that you
tested it on a large server like that.

 BTW, do you have some sort of a test setup for these power-saving patches,
 to actually measure the effect on number of interrupts or electricity use?
 Fewer wakeups should be a good thing, but it would be nice to see some
 figures to get an idea of how much progress we've done and what still needs
 to be done.

The only thing I've been using is Intel's powertop utility. It is
pretty easy to see what's happening, and what you'll see is exactly
what you'd expect (So by process, I could see that the bgwriter had
exactly 5 wake-ups per second with bgwriter_delay at 200). It is
useful to sleep quite pro-actively, as CPUs will enter idle C-states
and move to lower P-states quite quickly (some will be more familiar
with the commercial names for P-states, such as Intel's speedstep
technology).  I might have been less conservative about the
circumstances that cause the bgwriter to go to sleep, but I was
conscious of the need to get something into this release.

It's difficult to know what a useful workload should be to show the
benefits of this work, apart from the obvious one, which is to show
Postgres when completely idle. I think we started at 11.5 wake-ups per
second, and I think that's down to about 6.5 now - the WAL Writer
still accounts for 5 of those, so that's why I feel it's particularly
important to get it done too, though obviously that's something that
should defer to however we end up implementing group commit.

I intend to blog about it in the next few days, and I'll present a
careful analysis of the benefits of this work there. Look out for it
on planet.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
new file mode 100644
index 309b6a5..0e4dd97
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
*** SET ENABLE_SEQSCAN TO OFF;
*** 1318,1334 
 /indexterm
 listitem
  para
!  Specifies the delay between activity rounds for the
!  background writer.  In each round the writer issues writes
!  for some number of dirty buffers (controllable by the
!  following parameters).  It then sleeps for varnamebgwriter_delay/
!  milliseconds, and repeats.  The default value is 200 milliseconds
!  (literal200ms/). Note that on many systems, the effective
!  resolution of sleep delays is 10 milliseconds; setting
!  varnamebgwriter_delay/ to a value that is not a multiple of
!  10 might have the same results as setting it to the next higher
!  multiple of 10.  This parameter can only be set in the
!  filenamepostgresql.conf/ file or on the server command line.
  /para
 /listitem
/varlistentry
--- 1318,1335 
 /indexterm
 listitem
  para
! 		 Specifies the delay between activity rounds for the background writer.
! 		 In each round the writer issues writes for some number of dirty buffers
! 		 (controllable by the following parameters).  It then sleeps for
! 		 varnamebgwriter_delay/ milliseconds, and repeats, though it will
! 		 hibernate when it has looked through every buffer in
! 		 varnameshared_buffers/varname without finding a dirty buffer.  The
! 		 default value is 200 milliseconds (literal200ms/). Note that on
! 		 many systems, the effective resolution of sleep delays is 10
! 		 milliseconds; setting varnamebgwriter_delay/ to a value that is not
! 		 a multiple of 10 might have the same results as setting it to the next
! 		 higher multiple of 10.  This parameter can only be set in the
! 		 filenamepostgresql.conf/ file or on the server command line.
  /para
 /listitem
/varlistentry

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-01-26 Thread Robert Haas
On Thu, Jan 19, 2012 at 1:47 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 Thoughts?

I generated some random data using this query:

create table nodups (g) as select (g%10)*1000+g/10 from
generate_series(1,1) g;

And then used pgbench to repeatedly sort it using this query:

select * from nodups order by g offset 10001;

The patch came out about 28% faster than master.  Admittedly, that's
with no client overhead, but still: not bad.

I don't like the API you've designed, though: as you have it,
PrepareSortSupportFromOrderingOp does this after calling the sort
support function:

+   ssup-usable_compar = ResolveComparatorProper(sortFunction);

I think it would be better to simply have the sort support functions
set usable_compar themselves.  That would allow any user-defined
functions that happen to have the same binary representation and
comparison rules as one of the types for which we supply a custom
qsort() to use initialize it to still make use of the optimization.
There's no real reason to have a separate switch to decide how to
initialize that field: the sort support trampoline already does that,
and I don't see any reason to introduce a second way of doing the same
thing.

I am also a little unhappy that we have to duplicate code the fastcmp
functions from nbtcompare.c in builtins.h.  Wouldn't it make more
sense to declare those functions as inline in nbtcompare.c, and then
call the qsort-generating macro from that file?

There were a couple of comment updates in tuplesort.c that looked
independent from the reset of the patch, so I've committed those
separately.  I also committed your change to downgrade the
belt-and-suspenders check for self-comparison to an assert, with some
rewording of your proposed comment.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parameterized inner paths

2012-01-26 Thread Robert Haas
On Thu, Jan 26, 2012 at 2:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Is there a guard in here against joining a parameterized path to an
 intermediate relation when no SJ is involved?  In other words, if
 we're joining a parameterized path on A to a path on B, then either
 the join to B should satisfy at least part of the parameterization
 needed by A, or there should be a special join with A and B on one
 side and a relation that satisfies at least part of the
 parameterization of A on the other.

 I've implemented this idea, recast a bit to prevent generating a
 parameterized join path in the first place unless it depends on a
 parameter from a relation for which there's a join ordering constraint
 still outstanding.  It seems to get us to where the planning time
 penalty is only about 10%, which frankly is probably less than sampling
 error considering the small set of test cases I'm looking at.

Awesome.  If you can post the updated patch, I'll poke at it a little
more and see if anything jumps out at me, but that sounds promising.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-26 Thread Noah Misch
On Wed, Jan 11, 2012 at 10:12:31PM +, Simon Riggs wrote:
 v7

This patch uses FPIs to guard against torn hint writes, even when the
checksums are disabled.  One could not simply condition them on the
page_checksums setting, though.  Suppose page_checksums=off and we're hinting
a page previously written with page_checksums=on.  If that write tears,
leaving the checksum intact, that block will now fail verification.  A couple
of ideas come to mind.  (a) Read the on-disk page and emit an FPI only if the
old page had a checksum.  (b) Add a checksumEnableLSN field to pg_control.
Whenever the cluster starts with checksums disabled, set the field to
InvalidXLogRecPtr.  Whenever the cluster starts with checksums enabled and the
field is InvalidXLogRecPtr, set the field to the next LSN.  When a checksum
failure occurs in a page with LSN older than the stored one, emit either a
softer warning or no message at all.

Not all WAL-bypassing operations use SetBufferCommitInfoNeedsSave() to dirty
the buffer.  Here are other sites I noticed that do MarkBufferDirty() without
bumping the page LSN:
heap_xlog_visible()
visibilitymap_clear()
visibilitymap_truncate()
lazy_scan_heap()
XLogRecordPageWithFreeSpace()
FreeSpaceMapTruncateRel()
fsm_set_and_search()
fsm_vacuum_page()
fsm_search_avail()
Though I have not investigated each of these in detail, I suspect most or all
represent continued torn-page hazards.  Since the FSM is just a hint, we do
not WAL-log it at all; it would be nicest to always skip checksums for it,
too.  The visibility map shortcuts are more nuanced.

When page_checksums=off and we read a page last written by page_checksums=on,
we still verify its checksum.  If a mostly-insert/read site uses checksums for
some time and eventually wishes to shed the overhead, disabling the feature
will not stop the costs for reads of old data.  They would need a dump/reload
to get the performance of a never-checksummed database.  Let's either
unconditionally skip checksum validation under page_checksums=off or add a new
state like page_checksums=ignore for that even-weaker condition.

 --- a/doc/src/sgml/config.sgml
 +++ b/doc/src/sgml/config.sgml

 +   para
 +Turning this parameter off speeds normal operation, but
 +might allow data corruption to go unnoticed. The checksum uses
 +16-bit checksums, using the fast Fletcher 16 algorithm. With this
 +parameter enabled there is still a non-zero probability that an error
 +could go undetected, as well as a non-zero probability of false
 +positives.
 +   /para

What sources of false positives do you intend to retain?

 --- a/src/backend/catalog/storage.c
 +++ b/src/backend/catalog/storage.c
 @@ -20,6 +20,7 @@
  #include postgres.h
  
  #include access/visibilitymap.h
 +#include access/transam.h
  #include access/xact.h
  #include access/xlogutils.h
  #include catalog/catalog.h
 @@ -70,6 +71,7 @@ static PendingRelDelete *pendingDeletes = NULL; /* head of 
 linked list */
  /* XLOG gives us high 4 bits */
  #define XLOG_SMGR_CREATE 0x10
  #define XLOG_SMGR_TRUNCATE   0x20
 +#define XLOG_SMGR_HINT   0x40
  
  typedef struct xl_smgr_create
  {
 @@ -477,19 +479,74 @@ AtSubAbort_smgr(void)
   smgrDoPendingDeletes(false);
  }
  
 +/*
 + * Write a backup block if needed when we are setting a hint.
 + *
 + * Deciding the if needed bit is delicate and requires us to either
 + * grab WALInsertLock or check the info_lck spinlock. If we check the
 + * spinlock and it says Yes then we will need to get WALInsertLock as well,
 + * so the design choice here is to just go straight for the WALInsertLock
 + * and trust that calls to this function are minimised elsewhere.
 + *
 + * Callable while holding share lock on the buffer content.
 + *
 + * Possible that multiple concurrent backends could attempt to write
 + * WAL records. In that case, more than one backup block may be recorded
 + * though that isn't important to the outcome and the backup blocks are
 + * likely to be identical anyway.
 + */
 +#define  SMGR_HINT_WATERMARK 13579
 +void
 +smgr_buffer_hint(Buffer buffer)
 +{
 + /*
 +  * Make an XLOG entry reporting the hint
 +  */
 + XLogRecPtr  lsn;
 + XLogRecData rdata[2];
 + int watermark = SMGR_HINT_WATERMARK;
 +
 + /*
 +  * Not allowed to have zero-length records, so use a small watermark
 +  */
 + rdata[0].data = (char *) (watermark);
 + rdata[0].len = sizeof(int);
 + rdata[0].buffer = InvalidBuffer;
 + rdata[0].buffer_std = false;
 + rdata[0].next = (rdata[1]);
 +
 + rdata[1].data = NULL;
 + rdata[1].len = 0;
 + rdata[1].buffer = buffer;
 + rdata[1].buffer_std = true;
 + rdata[1].next = NULL;
 +
 + lsn = XLogInsert(RM_SMGR_ID, XLOG_SMGR_HINT, rdata);
 +
 + /*
 +  * Set the page LSN if we wrote a backup block.
 +   

Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-01-26 Thread Peter Geoghegan
On 26 January 2012 19:45, Robert Haas robertmh...@gmail.com wrote:
 The patch came out about 28% faster than master.  Admittedly, that's
 with no client overhead, but still: not bad.

Thanks. There was a 28% reduction in the time it took to execute the
query, but there would have also been a larger reduction in the time
that the backend held that all of that locally-allocated memory. That
might also be worth instrumenting directly, by turning on trace_sort
- can you report numbers on that, please?

 I don't like the API you've designed, though: as you have it,
 PrepareSortSupportFromOrderingOp does this after calling the sort
 support function:

 +       ssup-usable_compar = ResolveComparatorProper(sortFunction);

 I think it would be better to simply have the sort support functions
 set usable_compar themselves.  That would allow any user-defined
 functions that happen to have the same binary representation and
 comparison rules as one of the types for which we supply a custom
 qsort() to use initialize it to still make use of the optimization.
 There's no real reason to have a separate switch to decide how to
 initialize that field: the sort support trampoline already does that,
 and I don't see any reason to introduce a second way of doing the same
 thing.

Hmm. You're right. I can't believe that that didn't occur to me. In
practice, types that use the SortSupport API are all going to be
façades on scalar types anyway, much like date and timestamp, and of
those a good proportion will surely have the same comparator
representation as the specialisations introduced by this patch. It
might be that virtually all third-party types that end up using the
API can avail of some specialisation.

 I am also a little unhappy that we have to duplicate code the fastcmp
 functions from nbtcompare.c in builtins.h.  Wouldn't it make more
 sense to declare those functions as inline in nbtcompare.c, and then
 call the qsort-generating macro from that file?

Maybe it would, but since the meta-qsort_arg introduced includes
partial duplicates of code from tuplesort.c, it kind of felt right to
instantiate specialisations there. It may be that doing it in
nbtcompare.c is the best option available to us. Off the top of my
head, I'm pretty sure that that's a good bit less code.

 There were a couple of comment updates in tuplesort.c that looked
 independent from the reset of the patch, so I've committed those
 separately.  I also committed your change to downgrade the
 belt-and-suspenders check for self-comparison to an assert, with some
 rewording of your proposed comment.

That seems reasonable.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-01-26 Thread Robert Haas
On Thu, Jan 26, 2012 at 4:09 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 26 January 2012 19:45, Robert Haas robertmh...@gmail.com wrote:
 The patch came out about 28% faster than master.  Admittedly, that's
 with no client overhead, but still: not bad.

 Thanks. There was a 28% reduction in the time it took to execute the
 query, but there would have also been a larger reduction in the time
 that the backend held that all of that locally-allocated memory. That
 might also be worth instrumenting directly, by turning on trace_sort
 - can you report numbers on that, please?

Apparently not.  The sort is too short to register in the trace_sort
output.  I just get:

LOG:  begin tuple sort: nkeys = 1, workMem = 1024, randomAccess = f
LOG:  performsort starting: CPU 0.00s/0.00u sec elapsed 0.00 sec
LOG:  performsort done: CPU 0.00s/0.00u sec elapsed 0.00 sec
LOG:  internal sort ended, 853 KB used: CPU 0.00s/0.00u sec elapsed 0.00 sec

 I don't like the API you've designed, though: as you have it,
 PrepareSortSupportFromOrderingOp does this after calling the sort
 support function:

 +       ssup-usable_compar = ResolveComparatorProper(sortFunction);

 I think it would be better to simply have the sort support functions
 set usable_compar themselves.  That would allow any user-defined
 functions that happen to have the same binary representation and
 comparison rules as one of the types for which we supply a custom
 qsort() to use initialize it to still make use of the optimization.
 There's no real reason to have a separate switch to decide how to
 initialize that field: the sort support trampoline already does that,
 and I don't see any reason to introduce a second way of doing the same
 thing.

 Hmm. You're right. I can't believe that that didn't occur to me. In
 practice, types that use the SortSupport API are all going to be
 façades on scalar types anyway, much like date and timestamp, and of
 those a good proportion will surely have the same comparator
 representation as the specialisations introduced by this patch. It
 might be that virtually all third-party types that end up using the
 API can avail of some specialisation.

Possibly.  At a minimum it keeps the door open.

 I am also a little unhappy that we have to duplicate code the fastcmp
 functions from nbtcompare.c in builtins.h.  Wouldn't it make more
 sense to declare those functions as inline in nbtcompare.c, and then
 call the qsort-generating macro from that file?

 Maybe it would, but since the meta-qsort_arg introduced includes
 partial duplicates of code from tuplesort.c, it kind of felt right to
 instantiate specialisations there. It may be that doing it in
 nbtcompare.c is the best option available to us. Off the top of my
 head, I'm pretty sure that that's a good bit less code.

I was hoping so...

 There were a couple of comment updates in tuplesort.c that looked
 independent from the reset of the patch, so I've committed those
 separately.  I also committed your change to downgrade the
 belt-and-suspenders check for self-comparison to an assert, with some
 rewording of your proposed comment.

 That seems reasonable.

Cool.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers

2012-01-26 Thread Dimitri Fontaine
Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 Really I think there is not any single point where you can put the
 command-trigger hook and be done.  In almost every case, the right
 place is going to be buried somewhere within the execution of the
 command.

 I'm finally realizing it. I already introduced a structure called
 CommandContextData to keep track of the current command elements we want
 to pass down to command triggers, I think we're saying that this should
 be a static variable that commands will need to be editing.

In fact passing it as an argument to the command trigger API is much
simpler and done in the attached. I'm having problems here with my
install and not enough time this week (you don't speak English if you
don't use understatements here and there, right?) so please expect a
one-step-further patch to show the integration concept, not a ready for
commit one just yet.

Next steps are about adding support for more commands, and now that we
have settled on a simpler integration that will be easy. The parameters
sent to the trigger procedure are now the command tag, the main object
Oid, its schema name and its object name. Only the command tag will
never be NULL, all the other columns could be left out when calling an
ANY COMMAND trigger, or a command on a schemaless object.

Note: the per-command integration means the Oid is generally available,
so let's just export it.

An ack about the way it's now implemented would be awesome, and we could
begin to start about which set of command exactly we want supported from
the get go (default to all of them of course, but well, I don't think
that's necessarily the best use of our time given our command list).

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support



command-trigger.v6.patch.gz
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-01-26 Thread Peter Geoghegan
Alright, so while I agree with everything you've asked for, the fact
is that there is a controversy in relation to binary bloat, and that's
the blocker here. How can we satisfactorily resolve that, or is that
question adequately addressed by the benchmark that I produced?

What if third party modules could include the template_qsort_arg.h
header, and instantiate their own specialisation? The sort support
function could either set an enum, or set a pointer to a qsort_arg
specialisation, if there comparator was a little different, but still
just a few instructions, as with float-based timestamps (I don't care
enough about them to provide one in core, though). It would also
essentially allow for user-defined sort functions, provided they
fulfilled a basic interface. They may not even have to be
comparison-based. I know that I expressed scepticism about the weird
and wonderful ideas that some people have put forward in that area,
but that's mostly because I don't think that GPU based sorting in a
database is going to be practical.

Why not add this capability to the SortSupport API, given that it
wouldn't be very hard? The user would set the enum too, so it would
appear in explain analyze output as custom sort or something like
that.

While a sorting specialisation of our qsort_arg is actually pretty
close to optimal for scalar types and façades thereof, there could be
a type that would benefit from using radix sort, for example, which
isn't even comparison-based, and a user couldn't very well do that
with the existing SortSupport API.

I certainly don't care about this capability enough to defend it
against any objections that anyone may have, especially at this late
stage in the cycle. I just think that we might as well have it.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parameterized inner paths

2012-01-26 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 ... In an ideal world, I'd like the amount of effort we spend
 planning to be somehow tied to the savings we can expect to get, and
 deploy optimizations like this only in cases where we have a
 reasonable expectation of that effort being repaid.

BTW, so far as that goes: the only way I know to significantly cut the
planner's join-planning resource consumption in any run-time-tunable
fashion is to restrict it to planning subproblems separately, as for
instance via from_collapse_limit/join_collapse_limit.  Now, whatever the
merits of those specific heuristics, the killer issue is that maybe you
really needed a join order different from what the subproblem division
entails.  I believe that this patch can help with that.  If the issue
is that you really don't want to form the entire result of a specific
subproblem subquery, that can be dealt with by treating the subquery as
a parameterizable path, such that it can be on the inside of a nestloop
with whatever other relation is supplying the parameter.

Or in other words, the reason from_collapse_limit/join_collapse_limit
sometimes lead to bad plans is really that they represent artificial
join order restrictions.  So I think this patch ought to be able to
alleviate the worst cases of that.  Or at least it did a few hours ago.
What we'll probably want to do is tweak the path-formation heuristic
you suggested so that joins to relations outside the current subproblem
are treated like outer joins and allowed to form parameterized paths.
Anyway this is the sort of thing that I hope to investigate after the
basic patch is in.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inline Extension

2012-01-26 Thread David E. Wheeler
On Jan 26, 2012, at 9:40 AM, Dimitri Fontaine wrote:

 So I'm going to prepare the next version of the patch with this design:
 
 - in catalog extension scripts for inline extension
 
   pg_extension_script(extoid, oldversion, version, script)
 
   oldversion is null when create extension is used
   unless when using the create extension from 'unpackaged' form

Would you keep all the migration scripts used over time to upgrade from one 
version to another?

 - see about adding more control properties in the catalog?
 
 - current code that is parsing the filenames to determine the upgrade
   path will have to be able to take the version strings from the new
   catalog as an alternative, and getting to the script content must be
   able to select from the catalog or read a file on disk
 
 - pg_dump defaults to not dumping extension content
 
 - pg_dump --include-extension-scripts dumps the scripts found either in
   the filesystem or the catalog, a create script first then any number
   of update script as needed to reach the current installed version
 
 - same as we have -t, add -e --extension to pg_dump so that you can
   choose to dump only a given extension

Also --exclude-extension?

 The extension dumping will not include the shared modules, so if you
 extension depend on them being installed on the server, you will be much
 better served with some OS level packaging.

Or must make sure it’s installed on the system before you restore.

 Not for 9.2, but I can't help thinking that if we could manage to host
 the .so module itself in the catalogs, we could solve updating it in a
 transactional way and more importantly host it per-database, rather than
 having the modules work per major version (not even per cluster) and the
 extension mechanism work per-database inside each cluster. But that's
 work for another release.

+1 Cloud vendors will *love* this.

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-26 Thread Dan Scales
Some other comments on the checksum patch:

I'm not sure why you moved the checksum calculation (PageSetVerificationInfo) 
to mdwrite() rather than smgrwrite().  If there were every another storage 
backend, it would have to duplicate the checksum check, right?  Is there a 
disadvantage to putting it in smgrwrite()?

You may have already noticed this, but a bunch of the comments are incorrect, 
now that you have moved the checksum calculation to mdwrite().

  config.sgml - says writes via temp_buffers (which I think means local 
buffers) are not checksummed -- that's no longer true, right?
  bufmgr.c, line 1914 - bufToWrite no longer exists.  You could revert changes 
from 1912-1920
  localbuf.c, line 203 - as mentioned below, this comment is no longer 
relevant, you are checksumming local buffers now


Dan

- Original Message -
From: Noah Misch n...@leadboat.com
To: Simon Riggs si...@2ndquadrant.com
Cc: Heikki Linnakangas heikki.linnakan...@enterprisedb.com, Robert Haas 
robertmh...@gmail.com, Andres Freund and...@anarazel.de, Kevin Grittner 
kevin.gritt...@wicourts.gov, da...@fetter.org, ai...@highrise.ca, 
st...@mit.edu, pgsql-hackers@postgresql.org
Sent: Thursday, January 26, 2012 12:20:39 PM
Subject: Re: [HACKERS] 16-bit page checksums for 9.2

On Wed, Jan 11, 2012 at 10:12:31PM +, Simon Riggs wrote:
 v7

This patch uses FPIs to guard against torn hint writes, even when the
checksums are disabled.  One could not simply condition them on the
page_checksums setting, though.  Suppose page_checksums=off and we're hinting
a page previously written with page_checksums=on.  If that write tears,
leaving the checksum intact, that block will now fail verification.  A couple
of ideas come to mind.  (a) Read the on-disk page and emit an FPI only if the
old page had a checksum.  (b) Add a checksumEnableLSN field to pg_control.
Whenever the cluster starts with checksums disabled, set the field to
InvalidXLogRecPtr.  Whenever the cluster starts with checksums enabled and the
field is InvalidXLogRecPtr, set the field to the next LSN.  When a checksum
failure occurs in a page with LSN older than the stored one, emit either a
softer warning or no message at all.

Not all WAL-bypassing operations use SetBufferCommitInfoNeedsSave() to dirty
the buffer.  Here are other sites I noticed that do MarkBufferDirty() without
bumping the page LSN:
heap_xlog_visible()
visibilitymap_clear()
visibilitymap_truncate()
lazy_scan_heap()
XLogRecordPageWithFreeSpace()
FreeSpaceMapTruncateRel()
fsm_set_and_search()
fsm_vacuum_page()
fsm_search_avail()
Though I have not investigated each of these in detail, I suspect most or all
represent continued torn-page hazards.  Since the FSM is just a hint, we do
not WAL-log it at all; it would be nicest to always skip checksums for it,
too.  The visibility map shortcuts are more nuanced.

When page_checksums=off and we read a page last written by page_checksums=on,
we still verify its checksum.  If a mostly-insert/read site uses checksums for
some time and eventually wishes to shed the overhead, disabling the feature
will not stop the costs for reads of old data.  They would need a dump/reload
to get the performance of a never-checksummed database.  Let's either
unconditionally skip checksum validation under page_checksums=off or add a new
state like page_checksums=ignore for that even-weaker condition.

 --- a/doc/src/sgml/config.sgml
 +++ b/doc/src/sgml/config.sgml

 +   para
 +Turning this parameter off speeds normal operation, but
 +might allow data corruption to go unnoticed. The checksum uses
 +16-bit checksums, using the fast Fletcher 16 algorithm. With this
 +parameter enabled there is still a non-zero probability that an error
 +could go undetected, as well as a non-zero probability of false
 +positives.
 +   /para

What sources of false positives do you intend to retain?

 --- a/src/backend/catalog/storage.c
 +++ b/src/backend/catalog/storage.c
 @@ -20,6 +20,7 @@
  #include postgres.h
  
  #include access/visibilitymap.h
 +#include access/transam.h
  #include access/xact.h
  #include access/xlogutils.h
  #include catalog/catalog.h
 @@ -70,6 +71,7 @@ static PendingRelDelete *pendingDeletes = NULL; /* head of 
 linked list */
  /* XLOG gives us high 4 bits */
  #define XLOG_SMGR_CREATE 0x10
  #define XLOG_SMGR_TRUNCATE   0x20
 +#define XLOG_SMGR_HINT   0x40
  
  typedef struct xl_smgr_create
  {
 @@ -477,19 +479,74 @@ AtSubAbort_smgr(void)
   smgrDoPendingDeletes(false);
  }
  
 +/*
 + * Write a backup block if needed when we are setting a hint.
 + *
 + * Deciding the if needed bit is delicate and requires us to either
 + * grab WALInsertLock or check the info_lck spinlock. If we check the
 + * spinlock and it says Yes then we will need to get WALInsertLock as well,
 + * so the design choice here is to just go straight for the 

[HACKERS] PGCon 2012 Call for Papers - reminder

2012-01-26 Thread Dan Langille
A reminder, there are three days left to get in before the deadline…

PGCon 2012 will be held 17-18 May 2012, in Ottawa at the University of
Ottawa.  It will be preceded by two days of tutorials on 15-16 May 2012.

We are now accepting proposals for talks.  Proposals can be quite 
simple. We do not require academic-style papers.

If you are doing something interesting with PostgreSQL, please submit
a proposal.  You might be one of the backend hackers or work on a
PostgreSQL related project and want to share your know-how with
others. You might be developing an interesting system using PostgreSQL
as the foundation. Perhaps you migrated from another database to
PostgreSQL and would like to share details.  These, and other stories
are welcome. Both users and developers are encouraged to share their
experiences.

Here are a some ideas to jump start your proposal process:

- novel ways in which PostgreSQL is used
- migration of production systems from another database
- data warehousing
- tuning PostgreSQL for different work loads
- replication and clustering
- hacking the PostgreSQL code
- PostgreSQL derivatives and forks
- applications built around PostgreSQL
- benchmarking and performance engineering
- case studies
- location-aware and mapping software with PostGIS
- The latest PostgreSQL features and features in development
- research and teaching with PostgreSQL
- things the PostgreSQL project could do better
- integrating PostgreSQL with 3rd-party software

Both users and developers are encouraged to share their experiences.

The schedule is:

8 Jan 2012 Proposal acceptance begins
29 Jan 2012 Proposal acceptance ends
19 Feb 2012 Confirmation of accepted proposals

See also http://www.pgcon.org/2012/papers.php

Instructions for submitting a proposal to PGCon 2012 are available
from: http://www.pgcon.org/2012/submissions.php

-- 
Dan Langille - http://langille.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] hiding variable-length fields from Form_pg_* structs

2012-01-26 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On mån, 2012-01-09 at 14:04 -0500, Tom Lane wrote:
 #ifdef CATALOG_VARLEN   /* variable-length fields start here
 */
 
 to be even clearer.
 
 What would be appropriate to add instead of those inconsistently-used
 comments is explicit comments about the exception cases such as
 proargtypes, to make it clear that the placement of the #ifdef
 CATALOG_VARLEN is intentional and not a bug in those cases.

 I implemented your suggestions in the attached patch.

This looks ready to go to me, except for one trivial nit: in
pg_extension.h, please keep the comment pointing out that extversion
should never be null.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-01-26 Thread Robert Haas
On Thu, Jan 26, 2012 at 5:10 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 Alright, so while I agree with everything you've asked for, the fact
 is that there is a controversy in relation to binary bloat, and that's
 the blocker here. How can we satisfactorily resolve that, or is that
 question adequately addressed by the benchmark that I produced?

I could go either way on that, honestly.  I took a look today and
found out that on my machine, a postgres 9.0.x executable was about
282kB larger than an 8.4 postgres executable.  9.1 was about 386kB
larger than that, and 9.2 current is 239kB larger still.  So it's not
as if your patch is the first one to enlarge the size of the binary -
it's obviously been growing steadily from release to release for
years.  I found that your patch adds 55kB to the executable size,
which is clearly a lot more than what a typical patch adds, but still
under 1%.  So I wouldn't be shocked and appalled if we decided to
commit this as-is.

But if we want to put it on a diet, the first thing I'd probably be
inclined to lose is the float4 specialization.  Some members of the
audience will recall that I take dim view of floating point arithmetic
generally, but I'll concede that there are valid reasons for using
float8.  I have a harder time coming up with a good reason to use
float4 - ever, for anything you care about.  So I would be inclined to
think that if we want to trim this back a bit, maybe that's the one to
let go.  If we want to be even more aggressive, the next thing I'd
probably lose is the optimization of multiple sortkey cases, on the
theory that single sort keys are probably by far the most common
practical case.

I'm not surprised that you weren't able to measure a performance
regression from the binary bloat.  Any such regression is bound to be
very small and probably quite difficult to notice most of the time;
it's really the cumulative effect of many binary-size-increasing
changes we're worried about, not each individual one.  Certainly,
trying to shrink the binary is micro-optimimzation at its finest, but
then so is inlining comparators.  I don't think it can be
realistically argued that we can increasing the size of the binary
arbitrarily will never get us in trouble, much like (for a typical
American family) spending $30 to have dinner at a cheap resteraunt
will never break the budget.  But if you do it every day, it gets
expensive (and fattening).

 What if third party modules could include the template_qsort_arg.h
 header, and instantiate their own specialisation?

Seems reasonable to me.

 The sort support
 function could either set an enum, or set a pointer to a qsort_arg
 specialisation, if there comparator was a little different, but still

The latter seems better.

 just a few instructions, as with float-based timestamps (I don't care
 enough about them to provide one in core, though). It would also
 essentially allow for user-defined sort functions, provided they
 fulfilled a basic interface. They may not even have to be
 comparison-based. I know that I expressed scepticism about the weird
 and wonderful ideas that some people have put forward in that area,
 but that's mostly because I don't think that GPU based sorting in a
 database is going to be practical.

A question for another day.

 Why not add this capability to the SortSupport API, given that it
 wouldn't be very hard? The user would set the enum too, so it would
 appear in explain analyze output as custom sort or something like
 that.

I'm unenthused about going to any trouble to change the EXPLAIN format.

 While a sorting specialisation of our qsort_arg is actually pretty
 close to optimal for scalar types and façades thereof, there could be
 a type that would benefit from using radix sort, for example, which
 isn't even comparison-based, and a user couldn't very well do that
 with the existing SortSupport API.

 I certainly don't care about this capability enough to defend it
 against any objections that anyone may have, especially at this late
 stage in the cycle. I just think that we might as well have it.

I don't see any reason not too, assuming it's not a lot of code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2012-01-26 Thread Abhijit Menon-Sen
At 2012-01-15 11:08:05 -0500, and...@dunslane.net wrote:

 Here's an update that adds row_to_json, plus a bit more cleanup.

I've started reviewing this patch, but it'll take me a bit longer to go
through json.c properly. Here are a few preliminary notes:

1. The patch has a lot of whitespace errors (primarily lines ending in
   whitespace), but applying with git apply --whitespace=fix isn't safe,
   because the test results need some (but not all) of those spaces. I
   applied the patch, backed out the changes to expected/json.out, and
   created the file from the patch, then removed the superfluous
   whitespace.

2. I bumped some function OIDs to avoid conflicts.

3. One documentation typo.

Everything other than json.c (which I haven't read yet) looks fine
(builds, passes tests). I've attached a patch covering the changes
I made.

More later.

-- ams
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 4f8b35e..ce4c4f6 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -9626,7 +9626,7 @@ table2-mapping
   /indexterm
 
   para
-This section descripbes the functions that are available for creating
+This section describes the functions that are available for creating
 JSON (see xref linkend=datatype-json) data.
   /para
 
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index cdfa4cc..02c8679 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -4027,11 +4027,11 @@ DATA(insert OID = 323 (  json_recv		   PGNSP PGUID 12 1 0 0 0 f f f t f s 1 0 11
 DESCR(I/O);
 DATA(insert OID = 324 (  json_send		   PGNSP PGUID 12 1 0 0 0 f f f t f s 1 0 17 114 _null_ _null_ _null_ _null_ json_send _null_ _null_ _null_ ));
 DESCR(I/O);
-DATA(insert OID = 3144 (  query_to_json	   PGNSP PGUID 12 1 0 0 0 f f f t f s 2 0 114 25 16 _null_ _null_ _null_ _null_ query_to_json _null_ _null_ _null_ ));
+DATA(insert OID = 3153 (  query_to_json	   PGNSP PGUID 12 1 0 0 0 f f f t f s 2 0 114 25 16 _null_ _null_ _null_ _null_ query_to_json _null_ _null_ _null_ ));
 DESCR(I/O);
-DATA(insert OID = 3145 (  array_to_json	   PGNSP PGUID 12 1 0 0 0 f f f t f s 1 0 114 2277 _null_ _null_ _null_ _null_ array_to_json _null_ _null_ _null_ ));
+DATA(insert OID = 3154 (  array_to_json	   PGNSP PGUID 12 1 0 0 0 f f f t f s 1 0 114 2277 _null_ _null_ _null_ _null_ array_to_json _null_ _null_ _null_ ));
 DESCR(I/O);
-DATA(insert OID = 3146 (  row_to_json	   PGNSP PGUID 12 1 0 0 0 f f f t f s 1 0 114 2249 _null_ _null_ _null_ _null_ row_to_json _null_ _null_ _null_ ));
+DATA(insert OID = 3155 (  row_to_json	   PGNSP PGUID 12 1 0 0 0 f f f t f s 1 0 114 2249 _null_ _null_ _null_ _null_ row_to_json _null_ _null_ _null_ ));
 DESCR(I/O);
 
 /* uuid */
diff --git a/src/test/regress/expected/json.out b/src/test/regress/expected/json.out
index b975d72..1984106 100644
--- a/src/test/regress/expected/json.out
+++ b/src/test/regress/expected/json.out
@@ -278,11 +278,11 @@ SELECT query_to_json('select x as b, x * 2 as c from generate_series(1,3) x',tru
 (1 row)
 
 SELECT query_to_json('
-  SELECT $$a$$ || x AS b, 
- y AS c, 
+  SELECT $$a$$ || x AS b,
+ y AS c,
  ARRAY[ROW(x.*,ARRAY[1,2,3]),
-   ROW(y.*,ARRAY[4,5,6])] AS z 
-  FROM generate_series(1,2) x, 
+   ROW(y.*,ARRAY[4,5,6])] AS z
+  FROM generate_series(1,2) x,
generate_series(4,5) y',true);
 query_to_json 
 --
@@ -299,19 +299,19 @@ SELECT query_to_json('select array_agg(x) as d from generate_series(5,10) x',fal
 (1 row)
 
 -- array_to_json
-SELECT array_to_json(array_agg(x)) 
+SELECT array_to_json(array_agg(x))
 FROM generate_series(1,10) x;
  array_to_json  
 
  [1,2,3,4,5,6,7,8,9,10]
 (1 row)
 
-SELECT array_to_json(array_agg(q)) 
-FROM (SELECT $$a$$ || x AS b, 
- y AS c, 
+SELECT array_to_json(array_agg(q))
+FROM (SELECT $$a$$ || x AS b,
+ y AS c,
  ARRAY[ROW(x.*,ARRAY[1,2,3]),
-   ROW(y.*,ARRAY[4,5,6])] AS z 
-  FROM generate_series(1,2) x, 
+   ROW(y.*,ARRAY[4,5,6])] AS z
+  FROM generate_series(1,2) x,
generate_series(4,5) y) q;
  array_to_json 
 ---
@@ -331,12 +331,12 @@ SELECT row_to_json(row(1,'foo'));
  {f1:1,f2:foo}
 (1 row)
 
-SELECT row_to_json(q) 
-FROM (SELECT $$a$$ || x AS b, 
- y AS c, 
+SELECT 

Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-01-26 Thread Peter Geoghegan
On 27 January 2012 03:32, Robert Haas robertmh...@gmail.com wrote:
 But if we want to put it on a diet, the first thing I'd probably be
 inclined to lose is the float4 specialization.  Some members of the
 audience will recall that I take dim view of floating point arithmetic
 generally, but I'll concede that there are valid reasons for using
 float8.  I have a harder time coming up with a good reason to use
 float4 - ever, for anything you care about.  So I would be inclined to
 think that if we want to trim this back a bit, maybe that's the one to
 let go.  If we want to be even more aggressive, the next thing I'd
 probably lose is the optimization of multiple sortkey cases, on the
 theory that single sort keys are probably by far the most common
 practical case.

Obviously I don't think that we should let anything go, as the
improvement in performance is so large that we're bound to be ahead -
we only really pay for what we use anyway, and when we use a
specialisation the difference is really quite big, particularly when
you look at sorting in isolation.  If a specialisation is never used,
it is more or less never paid for, so there's no point in worrying
about that. That said, float4 is obviously the weakest link. I'm
inclined to think that float8 is the second weakest though, mostly
because we get both dates and timestamps for free with the integer
specialisations.

 I'm not surprised that you weren't able to measure a performance
 regression from the binary bloat.  Any such regression is bound to be
 very small and probably quite difficult to notice most of the time;
 it's really the cumulative effect of many binary-size-increasing
 changes we're worried about, not each individual one.  Certainly,
 trying to shrink the binary is micro-optimimzation at its finest, but
 then so is inlining comparators.  I don't think it can be
 realistically argued that we can increasing the size of the binary
 arbitrarily will never get us in trouble, much like (for a typical
 American family) spending $30 to have dinner at a cheap resteraunt
 will never break the budget.  But if you do it every day, it gets
 expensive (and fattening).

Sure. At the risk of stating the obvious, and of repeating myself, I
will point out that the true cost of increasing the size of the binary
is not necessarily linear - it's a complex equation. I hope that this
doesn't sound flippant, but if some naive person were to look at just
the increasing binary size of Postgres and its performance in each
successive release, they might conclude that there was a positive
correlation between the two (since we didn't add flab to the binary,
but muscle that pulls its own weight and then some).

At the continued risk of stating the obvious, CPUs don't just cache
instructions - they cache data too.  If we spend less than half the
time sorting data, which is the level of improvement I was able to
demonstrate against pre-SortSupport Postgres, that will surely very
often have the aggregate effect of ameliorating cache contention
between cores.

 just a few instructions, as with float-based timestamps (I don't care
 enough about them to provide one in core, though). It would also
 essentially allow for user-defined sort functions, provided they
 fulfilled a basic interface. They may not even have to be
 comparison-based. I know that I expressed scepticism about the weird
 and wonderful ideas that some people have put forward in that area,
 but that's mostly because I don't think that GPU based sorting in a
 database is going to be practical.

 A question for another day.

Fair enough.

 I certainly don't care about this capability enough to defend it
 against any objections that anyone may have, especially at this late
 stage in the cycle. I just think that we might as well have it.

 I don't see any reason not too, assuming it's not a lot of code.

Good.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-01-26 Thread Kyotaro HORIGUCHI
Thank you for the comment,

 First, my priority is one-the-fly result processing,
 not the allocation optimizing.  And this patch seems to make
 it possible, I can process results row-by-row, without the
 need to buffer all of them in PQresult.  Which is great!
 
 But the current API seems clumsy, I guess its because the
 patch grew from trying to replace the low-level allocator.

 Exactly.

 I would like to propose better one-shot API with:
 
 void *(*RowStoreHandler)(PGresult *res, PGresAttValue *columns);
 
 where the PGresAttValue * is allocated once, inside PQresult.
 And the pointers inside point directly to network buffer.

 Good catch, thank you. The patch is dragging too much from the
old implementation. It is no need to copy the data inside
getAnotherTuple to do it, as you say.

 Ofcourse this requires replacing the current per-column malloc+copy
 pattern with per-row parse+handle pattern, but I think resulting
 API will be better:
 
 1) Pass-through processing do not need to care about unnecessary
per-row allocations.
 
 2) Handlers that want to copy of the row (like regular libpq),
can optimize allocations by having global view of the row.
(Eg. One allocation for row header + data).
 
 This also optimizes call patterns - first libpq parses packet,
 then row handler processes row, no unnecessary back-and-forth.
 
 
 Summary - current API has various assumptions how the row is
 processed, let's remove those.

 Thank you, I rewrite the patch to make it realize.

 regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BGWriter latch, power saving

2012-01-26 Thread Heikki Linnakangas

On 26.01.2012 21:37, Peter Geoghegan wrote:

On 26 January 2012 16:48, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

Ok, committed with some further cleanup.

Do you think the docs need to be updated for this, and if so,
where? The only place I found in the docs that speak about how the
bgwriter works is in config.sgml, where bgwriter_delay is
described. Want to suggest an update to that?


This new behaviour might warrant a mention, as in the attached
doc-patch.


Hmm, the word hibernate might confuse people in this context. I
committed this as:


It then sleeps for varnamebgwriter_delay/ milliseconds, and
repeats. When there are no dirty buffers in the buffer pool, though,
it goes into a longer sleep regardless of varnamebgwriter_delay/.


Thanks!

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers