Re: [HACKERS] cataloguing NOT NULL constraints

2011-08-06 Thread Dean Rasheed
On 4 August 2011 18:57, Peter Eisentraut pete...@gmx.net wrote:
 Have you considered just cataloging NOT NULL constraints as CHECK
 constraints and teaching the reverse parser to convert x CHECK (x IS
 NOT NULL) to x NOT NULL.  It seems to me that we're adding a whole
 lot of hoopla here that is essentially identical to the existing CHECK
 constraint support (it must be, per SQL standard), for no additional
 functionality.


With this approach, if the user explicitly wrote CHECK (x IS NOT
NULL) would it end up setting attnotnull? Presumably, since the
pg_constraint entry would be indistinguishable, it would be difficult
to do otherwise. From a performance standpoint that might be a good
thing, but it's going to be bad for compatibility.

What if they wrote CHECK (NOT x IS NULL), or CHECK (x IS DISTINCT
FROM NULL)? How many variants would it reverse parse?

What would this do to error messages when the constraint is violated?

I'm not convinced this simplifies things much; it just moves the
complexity elsewhere, and at the cost of losing compatibility with the
current behaviour.

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cataloguing NOT NULL constraints

2011-08-06 Thread Dean Rasheed
On 6 August 2011 01:01, Peter Eisentraut pete...@gmx.net wrote:
 Before embarking on rewriting this patch from scratch, I would like to
 know what's your opinion (or the SQL standard's) on the fact that this
 patch separated the PRIMARY KEY from NOT NULL constraints, so that they
 don't act exactly alike (to wit, the not-nullness of a PK does not
 inherit while the one from a NOT NULL constraint does).

 The SQL standard deals with inheritance in terms of composite types,
 which don't have constraints, so that doesn't give any guidance.

 That said, I think the substitutability property of object-oriented
 systems, namely that you can use a child object in place of a parent
 object, requires in principle that we inherit all constraints (by
 default, at least).  We don't inherit primary keys because of
 implementation issues with indexes, but at some point in the future we
 should fix that.  So to some degree, inheriting the not-null property of
 primary keys while not inheriting the rest of it is a bit wrong, but it
 would appear to be a step in the right direction, and changing
 established behavior seems a bit gratuitous to me.


The current behaviour is inconsistent - the not-null property of a PK
is sometimes inherited and sometimes not, depending on whether the PK
is added at table-creation time or later. So a change in either
direction is a change to some current behaviour, unless we leave it
inconsistent, which seems unacceptable.

So I don't think compatibility arguments apply here, and I would tend
to agree that inheriting the not-null property of PKs while not
inheriting the rest seems wrong.

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cataloguing NOT NULL constraints

2011-08-06 Thread Dean Rasheed
On 6 August 2011 08:17, Dean Rasheed dean.a.rash...@gmail.com wrote:
 The current behaviour is inconsistent - the not-null property of a PK
 is sometimes inherited and sometimes not, depending on whether the PK
 is added at table-creation time or later.


Oops, that should have been depending on whether the PK is defined
before or after the inheritance is set up.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reduce WAL logging of INSERT SELECT

2011-08-06 Thread Heikki Linnakangas

On 06.08.2011 06:32, Gokulakannan Somasundaram wrote:


However, for small operations it's a net loss - you avoid writing a WAL
record, but you have to fsync() the heap instead. If you only modify a few
rows, the extra fsync (or fsyncs if there are indexes too) is more expensive
than writing the WAL.

We'd need a heuristic to decide whether to write WAL or fsync at the end.
For regular INSERTs, UPDATEs and DELETEs, you have the planner's estimate of
number of rows affected. Another thing we should do is move the fsync call
from the end of COPY (and other such operations) to the end of transaction.
That way if you do e.g one COPY followed by a bunch of smaller INSERTs or
UPDATEs, you only need to fsync the files once.



Have you thought about recovery, especially when the page size is greater
than the disk block size(  4K ). With WAL, there is a way to restore the
pages to the original state, during recovery, if there are partial page
writes. Is it possible to do the same with direct fsync without WAL?


The point of the optimization is that you can only skip WAL when it's 
been created (or truncated) in the same transaction. In that case, if 
the transaction aborts because of a crash, you don't care about the 
contents of the table anyway.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cataloguing NOT NULL constraints

2011-08-06 Thread Peter Eisentraut
On lör, 2011-08-06 at 08:04 +0100, Dean Rasheed wrote:
 On 4 August 2011 18:57, Peter Eisentraut pete...@gmx.net wrote:
  Have you considered just cataloging NOT NULL constraints as CHECK
  constraints and teaching the reverse parser to convert x CHECK (x IS
  NOT NULL) to x NOT NULL.  It seems to me that we're adding a whole
  lot of hoopla here that is essentially identical to the existing CHECK
  constraint support (it must be, per SQL standard), for no additional
  functionality.
 
 
 With this approach, if the user explicitly wrote CHECK (x IS NOT
 NULL) would it end up setting attnotnull?

Yes, I would assume so.

 Presumably, since the
 pg_constraint entry would be indistinguishable, it would be difficult
 to do otherwise. From a performance standpoint that might be a good
 thing, but it's going to be bad for compatibility.

What compatibility issue are you concerned about specifically?

 What if they wrote CHECK (NOT x IS NULL), or CHECK (x IS DISTINCT
 FROM NULL)? How many variants would it reverse parse?

Well, it's already the case that the user can write check constraints in
any number of forms that have the effect of restricting null values; and
attnotnull isn't set in those cases.  So in the beginning I'd be quite
happy if we just recognized CHECK (x IS NOT NULL).

Longer term, I think we could tie this in with more general nullability
detection.  For example, it is occasionally asked that we can expose
nullability through views or CREATE TABLE AS.  The SQL standards has
rules for what cases we should detect (which don't include the two you
give).

 What would this do to error messages when the constraint is violated?

That's a reasonable concern, although not a fatal one, and it can be
solved in any case.

 I'm not convinced this simplifies things much; it just moves the
 complexity elsewhere, and at the cost of losing compatibility with the
 current behaviour.

No, I don't think this would lose compatibility (except perhaps in cases
of error message wording).


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cataloguing NOT NULL constraints

2011-08-06 Thread Peter Eisentraut
On lör, 2011-08-06 at 08:17 +0100, Dean Rasheed wrote:
 The current behaviour is inconsistent - the not-null property of a PK
 is sometimes inherited and sometimes not, depending on whether the PK
 is added at table-creation time or later. So a change in either
 direction is a change to some current behaviour, unless we leave it
 inconsistent, which seems unacceptable. 

Yeah, OK, that's wrong then.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reduce WAL logging of INSERT SELECT

2011-08-06 Thread Simon Riggs
On Sat, Aug 6, 2011 at 4:16 AM, Bruce Momjian br...@momjian.us wrote:

 Well, if the table is created in the same transaction (which is the only
 case under consideration), no other sessions can write to the table so
 you are just writing the entire table on commit, rather than to the WAL.

Below a certain point, skipping WAL is slower and over an intermediate
range there is no benefit. So small amounts of data on large servers
goes slower.

heap_fsync() requires a scan of shared buffers, which may not be cheap.

There is a difficulty because you would need to calculate the cut-off
is for a particular database, and then predict ahead of time whether
the number of rows that will be handled by the statement is low enough
to warrant using the optimisation. Both of which I call a hard
problem.

I think we should remove the COPY optimisation because of this and
definitely not extend INSERT SELECT to perform it automatically.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cataloguing NOT NULL constraints

2011-08-06 Thread Dean Rasheed
On 6 August 2011 11:03, Peter Eisentraut pete...@gmx.net wrote:
 On lör, 2011-08-06 at 08:04 +0100, Dean Rasheed wrote:
 On 4 August 2011 18:57, Peter Eisentraut pete...@gmx.net wrote:
  Have you considered just cataloging NOT NULL constraints as CHECK
  constraints and teaching the reverse parser to convert x CHECK (x IS
  NOT NULL) to x NOT NULL.  It seems to me that we're adding a whole
  lot of hoopla here that is essentially identical to the existing CHECK
  constraint support (it must be, per SQL standard), for no additional
  functionality.
 

 With this approach, if the user explicitly wrote CHECK (x IS NOT
 NULL) would it end up setting attnotnull?

 Yes, I would assume so.

 Presumably, since the
 pg_constraint entry would be indistinguishable, it would be difficult
 to do otherwise. From a performance standpoint that might be a good
 thing, but it's going to be bad for compatibility.

 What compatibility issue are you concerned about specifically?

 What if they wrote CHECK (NOT x IS NULL), or CHECK (x IS DISTINCT
 FROM NULL)? How many variants would it reverse parse?

 Well, it's already the case that the user can write check constraints in
 any number of forms that have the effect of restricting null values; and
 attnotnull isn't set in those cases.  So in the beginning I'd be quite
 happy if we just recognized CHECK (x IS NOT NULL).

 Longer term, I think we could tie this in with more general nullability
 detection.  For example, it is occasionally asked that we can expose
 nullability through views or CREATE TABLE AS.  The SQL standards has
 rules for what cases we should detect (which don't include the two you
 give).

 What would this do to error messages when the constraint is violated?

 That's a reasonable concern, although not a fatal one, and it can be
 solved in any case.

 I'm not convinced this simplifies things much; it just moves the
 complexity elsewhere, and at the cost of losing compatibility with the
 current behaviour.

 No, I don't think this would lose compatibility (except perhaps in cases
 of error message wording).


Hmm, maybe my compatibility concerns are not so serious. I'm still
trying to work out exactly what user-visible changes this approach
would lead to. Suppose you had:

CREATE TABLE foo
(
  a int NOT NULL,
  b int CHECK (b IS NOT NULL),
  c int CHECK (NOT c IS NULL)
);

Right now \d gives:

  Table public.foo
 Column |  Type   | Modifiers
+-+---
 a  | integer | not null
 b  | integer |
 c  | integer |
Check constraints:
foo_b_check CHECK (b IS NOT NULL)
foo_c_check CHECK (NOT c IS NULL)

With this approach, one change would be that you'd gain an extra not
null in the Modifiers column for b.

But how many CHECK constraints would show? I guess it would show 3,
although it could be changed to just show 1. But it certainly couldn't
continue to show 2, since nothing in the catalogs could distinguish
the constraints on a from those on b.

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Further news on Clang - spurious warnings

2011-08-06 Thread Peter Geoghegan
On 5 August 2011 20:07, Tom Lane t...@sss.pgh.pa.us wrote:
 This patch is a truly horrid idea, as it eliminates the error checking
 the compiler is trying to provide, and does so globally not only in the
 trouble spots.

I think that the mistake I may have made here was assuming that the
existing code was perfect, and therefore it was a simply matter of
tweaking. With that assumption in mind, the only fix could have been a
global fix. I should not have acted in haste.

 If I were trying to get rid of this warning, I'd be wondering why
 ReplNodeTag is a distinct enum in the first place.

Indeed, that doesn't seem to be justified anywhere, and seems to be a
violation of the abstraction of Node as a pseudo base class which
would have broken any downcasting that we might have attempted to
do. Since ReplNodeTag wasn't being used directly, just its enumeration
constants, simply moving the constants to the global, generic NodeTag
enum fixes the issue.

That is what the attached patch does.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index d8bc6b8..2c5e162 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -403,6 +403,13 @@ typedef enum NodeTag
 	T_CommonTableExpr,
 
 	/*
+	 * TAGS FOR REPLICATION PARSER (replnodes.h)
+	 */
+	T_IdentifySystemCmd = 950,
+	T_BaseBackupCmd,
+	T_StartReplicationCmd,
+
+	/*
 	 * TAGS FOR RANDOM OTHER STUFF
 	 *
 	 * These are objects that aren't part of parse/plan/execute node tree
@@ -410,7 +417,7 @@ typedef enum NodeTag
 	 * purposes (usually because they are involved in APIs where we want to
 	 * pass multiple object types through the same pointer).
 	 */
-	T_TriggerData = 950,		/* in commands/trigger.h */
+	T_TriggerData = 1000,		/* in commands/trigger.h */
 	T_ReturnSetInfo,			/* in nodes/execnodes.h */
 	T_WindowObjectData,			/* private in nodeWindowAgg.c */
 	T_TIDBitmap,/* in nodes/tidbitmap.h */
diff --git a/src/include/replication/replnodes.h b/src/include/replication/replnodes.h
index e027f92..8523e7e 100644
--- a/src/include/replication/replnodes.h
+++ b/src/include/replication/replnodes.h
@@ -18,16 +18,6 @@
 #include nodes/primnodes.h
 #include nodes/value.h
 
-/*
- * NodeTags for replication parser
- */
-typedef enum ReplNodeTag
-{
-	T_IdentifySystemCmd = 10,
-	T_BaseBackupCmd,
-	T_StartReplicationCmd
-}	ReplNodeTag;
-
 /* --
  *		IDENTIFY_SYSTEM command
  * --

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cataloguing NOT NULL constraints

2011-08-06 Thread Tom Lane
Dean Rasheed dean.a.rash...@gmail.com writes:
 [ wonders what psql's \d will do with NOT NULL constraints ]

I think this might be taking the notion of it should be backwards
compatible too far.  We're not expecting this patch to not change
the wording of error messages, for instance (in fact, I think a large
part of the point is to add constraint names to not-null errors).
Why would we expect it to not change what \d shows?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reduce WAL logging of INSERT SELECT

2011-08-06 Thread Heikki Linnakangas

On 06.08.2011 13:13, Simon Riggs wrote:

I think we should remove the COPY optimisation because of this and
definitely not extend INSERT SELECT to perform it automatically.


It can be very helpful when loading a lot of data, so I'm not in favor 
of removing it altogether. Maybe WAL-log the first 1 rows or such 
normally, and skip WAL after that. Of course, loading 10001 rows becomes 
the worst case then, but something along those lines...


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reduce WAL logging of INSERT SELECT

2011-08-06 Thread Jaime Casanova
On Sat, Aug 6, 2011 at 11:05 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 06.08.2011 13:13, Simon Riggs wrote:

 I think we should remove the COPY optimisation because of this and
 definitely not extend INSERT SELECT to perform it automatically.

 It can be very helpful when loading a lot of data, so I'm not in favor of
 removing it altogether. Maybe WAL-log the first 1 rows or such normally,
 and skip WAL after that. Of course, loading 10001 rows becomes the worst
 case then, but something along those lines...


why 1 rows? maybe the right solution is move towards make a normal
table unlogged and viceversa... probably that's harder to do but we
will have better control and less odd heuristics

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reduce WAL logging of INSERT SELECT

2011-08-06 Thread Kevin Grittner
Heikki Linnakangas  wrote:
 On 06.08.2011 13:13, Simon Riggs wrote:
 I think we should remove the COPY optimisation because of this and
 definitely not extend INSERT SELECT to perform it automatically.
 
 It can be very helpful when loading a lot of data, so I'm not in
 favor of removing it altogether.
 
Yeah, it can currently help a lot.  Of course, if WAL-logging could
in any way facilitate hint bit and frozen xmin setting during bulk
loads, I'm sure the WAL-logged version would win easily.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reduce WAL logging of INSERT SELECT

2011-08-06 Thread Tom Lane
Jaime Casanova ja...@2ndquadrant.com writes:
 On Sat, Aug 6, 2011 at 11:05 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 It can be very helpful when loading a lot of data, so I'm not in favor of
 removing it altogether. Maybe WAL-log the first 1 rows or such normally,
 and skip WAL after that. Of course, loading 10001 rows becomes the worst
 case then, but something along those lines...

 why 1 rows?

Yeah; any particular number is wrong.  Perhaps it'd be better to put the
behavior under user control.  In the case of COPY, where we already have
a place to stick random options, you could imagine writing something
like

COPY ... WITH (bulk)

to cue the system that a lot of data is coming in.  But I don't see any
nice way to do something similar for INSERT/SELECT.  I hesitate to
suggest a GUC, but something like SET bulk_load = on would be pretty
straightforward to use in pg_dump for instance.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Module extension for parsing and rewriting functions with infixed syntax

2011-08-06 Thread Thomas Girault
Hello,

I am working on a PostgreSQL extension module which defines new
grammar rules completing the classical SQL syntax defined in the
src/backend/parser/gram.y file.

Basically, I want to handle predicates having an infixed syntax { X IS
Y } and rewrite them as classical Boolean functions with prefixed
syntax { Y(X) }.

For instance, the following query :
SELECT * FROM cars WHERE cars.color IS yellow;
would be rewritten into :
SELECT * FROM cars WHERE yellow(cars.color);

The new predicate could be rewritten as a plpgsql Boolean function
with an unique argument (cars.color IS yellow -- yellow(cars.color)).
I have then added the following rule to the func_expr definition
(see gram.y:10280 in postgresql-9.1beta3 source code) :

func_expr:
...
| func_arg_expr IS func_name over_clause
{
FuncCall *n = makeNode(FuncCall);
n-funcname = $3;
n-args = list_make1($1);
n-agg_order = NIL;
n-agg_star = FALSE;
n-agg_distinct = FALSE;
n-func_variadic = FALSE;
n-over = $4;
n-location = @1;
$$ = (Node *)n;
}
...

However, my first attempt leads to the following errors :
/usr/bin/bison -d  -o gram.c gram.y
gram.y: conflicts: 84 shift/reduce, 807 reduce/reduce
gram.y: expected 0 shift/reduce conflicts
gram.y: expected 0 reduce/reduce conflicts

How can I avoid this kind of errors without changing the entire grammar?

In addition, I would rather making this new functionality independent
of the original PostgreSQL source code.
Ideally, the new defined bison rules would be defined in an autonomous
module extension.
I have seen that some contrib modules (such as SEG or CUBE) define
separate bison grammar rules.
However, I don't understand yet how such rules integrate with the
gram.y file without any conflicts.

Can I define my new bison rules separately of the gram.y file?
Can I use the new functionality dynamically after loading an extension
module (LOAD 'MY_EXTENSION';)?


I am new in the PostgreSQL community and any ideas for solving these
problems would be very helpful.

Thanks by advance,

Thomas Girault

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Module extension for parsing and rewriting functions with infixed syntax

2011-08-06 Thread Tom Lane
Thomas Girault toma.gira...@gmail.com writes:
 func_expr:
 ...
   | func_arg_expr IS func_name over_clause

 However, my first attempt leads to the following errors :
   /usr/bin/bison -d  -o gram.c gram.y
   gram.y: conflicts: 84 shift/reduce, 807 reduce/reduce
   gram.y: expected 0 shift/reduce conflicts
   gram.y: expected 0 reduce/reduce conflicts

Most likely, you need to think about how to distinguish this case from
IS NULL, IS TRUE, and the other things that can follow IS already.
I don't think all those words are considered fully reserved today,
but they'd have to be (or at least not be allowed as func_names)
to make this non-ambiguous.  Looking at bison -v output will help
you figure out for sure what it thinks is ambiguous.

 In addition, I would rather making this new functionality independent
 of the original PostgreSQL source code.
 Ideally, the new defined bison rules would be defined in an autonomous
 module extension.

Bison doesn't support that; it needs to see the entire grammar at once.

(Many years ago I worked with systems at HP that did have run-time-
extensible parsers.  They were a pain: not all that flexible, and it
was easy to get grammar bugs wherein things parsed some other way than
you expected.  To me the main value of bison is that it tells you about
it when you wrote something ambiguous; and for that, it really has to
see all the rules not just some of them.)

 I have seen that some contrib modules (such as SEG or CUBE) define
 separate bison grammar rules.

Those grammars have nothing to do with parsing SQL; they're just used to
parse literal values of those data types.

 Can I define my new bison rules separately of the gram.y file?
 Can I use the new functionality dynamically after loading an extension
 module (LOAD 'MY_EXTENSION';)?

No, and no.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mosbench revisited

2011-08-06 Thread Jeff Janes
On Wed, Aug 3, 2011 at 11:21 AM, Robert Haas robertmh...@gmail.com wrote:
 About nine months ago, we had a discussion of some benchmarking that
 was done by the mosbench folks at MIT:

 http://archives.postgresql.org/pgsql-hackers/2010-10/msg00160.php

 Although the authors used PostgreSQL as a test harness for driving
 load, it's pretty clear from reading the paper that their primary goal
 was to stress the Linux kernel, so the applicability of the paper to
 real-world PostgreSQL performance improvement is less than it might
 be.  Still, having now actually investigated in some detail many of
 the same performance issues that they were struggling with, I have a
 much clearer understanding of what's really going on here.  In
 PostgreSQL terms, here are the bottlenecks they ran into:

 1. We configure PostgreSQL to use a 2 Gbyte application-level cache
 because PostgreSQL protects its free-list with a single lock and thus
 scales poorly with smaller caches.  This is a complaint about
 BufFreeList lock which, in fact, I've seen as a huge point of
 contention on some workloads.  In fact, on read-only workloads, with
 my lazy vxid lock patch applied, this is, I believe, the only
 remaining unpartitioned LWLock that is ever taken in exclusive mode;
 or at least the only one that's taken anywhere near often enough to
 matter.  I think we're going to do something about this, although I
 don't have a specific idea in mind at the moment.

I was going to ask if you if had done any benchmarks with scale such
that the tables fit in RAM but not in shared_buffers.  I guess you
have.

The attached experimental patch fixed freelist contention on 8 cores.
It would be nice to see what happens above that.

It has been cherry picked up to HEAD, but not tested against it. (Last
tested in Dec 2010, my how time flies)

The approach is to move the important things from a LWLock to a
spinlock, and to not do any locking for increments to clock-hand
increment and numBufferAllocs.
That means that some buffers might occasionally get inspected twice
and some might not get inspected at all during any given clock cycle,
but this should not lead to any correctness problems.   (Disclosure:
Tom didn't like this approach when it was last discussed.)

I just offer this for whatever it is worth to you--I'm not proposing
it as an actual patch to be applied.

When data fits in RAM but not shared_buffers, maybe the easiest fix is
to increase shared_buffers.  Which brings up the other question I had
for you about your work with Nate's celebrated loaner machine.  Have
you tried to reproduce the performance problems that have been
reported (but without public disclosure of how to reproduce) with
shared_buffers  8GB on machines with RAM 8GB ?

Cheers,

Jeff
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index bf9903b..304135e 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -28,7 +28,8 @@ typedef struct
 	int			nextVictimBuffer;
 
 	int			firstFreeBuffer;	/* Head of list of unused buffers */
-	int			lastFreeBuffer; /* Tail of list of unused buffers */
+	int			lastFreeBuffer;		 /* Tail of list of unused buffers */
+	slock_t			mutex;			/* Protects all of these, except sometimes not nextVictimBuffer */
 
 	/*
 	 * NOTE: lastFreeBuffer is undefined when firstFreeBuffer is -1 (that is,
@@ -44,7 +45,7 @@ typedef struct
 } BufferStrategyControl;
 
 /* Pointers to shared state */
-static BufferStrategyControl *StrategyControl = NULL;
+static volatile BufferStrategyControl *StrategyControl = NULL;
 
 /*
  * Private (non-shared) state for managing a ring of shared buffers to re-use.
@@ -123,15 +124,13 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 		}
 	}
 
-	/* Nope, so lock the freelist */
-	*lock_held = true;
-	LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE);
-
 	/*
 	 * We count buffer allocation requests so that the bgwriter can estimate
 	 * the rate of buffer consumption.	Note that buffers recycled by a
 	 * strategy object are intentionally not counted here.
+	 * This is now done without a lock, and so updates can be lost
 	 */
+  
 	StrategyControl-numBufferAllocs++;
 
 	/*
@@ -142,6 +141,12 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 	 */
 	while (StrategyControl-firstFreeBuffer = 0)
 	{
+		SpinLockAcquire(StrategyControl-mutex);
+		if (StrategyControl-firstFreeBuffer0)
+		{
+			SpinLockRelease(StrategyControl-mutex);
+			break;
+		}
 		buf = BufferDescriptors[StrategyControl-firstFreeBuffer];
 		Assert(buf-freeNext != FREENEXT_NOT_IN_LIST);
 
@@ -149,6 +154,8 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 		StrategyControl-firstFreeBuffer = buf-freeNext;
 		buf-freeNext = FREENEXT_NOT_IN_LIST;
 
+		SpinLockRelease(StrategyControl-mutex);
+
 		/*
 		 * If the buffer is pinned or has a nonzero usage_count, we cannot use
 		 * it; discard it and retry.  (This can only happen if VACUUM put 

Re: [HACKERS] Reduce WAL logging of INSERT SELECT

2011-08-06 Thread Jeff Davis
On Fri, 2011-08-05 at 23:16 -0400, Bruce Momjian wrote:
 Well, if the table is created in the same transaction (which is the only
 case under consideration), no other sessions can write to the table so
 you are just writing the entire table on commit, rather than to the WAL.

The transaction can still write to many tables that way, and that could
mean many fsyncs.

Also, there may be a bunch of other transactions trying to write to the
WAL that have to wait as your transaction has to seek out to fsync the
data file and then seek back to the WAL.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mosbench revisited

2011-08-06 Thread Jeff Janes
On Wed, Aug 3, 2011 at 3:21 PM, Jim Nasby j...@nasby.net wrote:
 On Aug 3, 2011, at 1:21 PM, Robert Haas wrote:
 1. We configure PostgreSQL to use a 2 Gbyte application-level cache
 because PostgreSQL protects its free-list with a single lock and thus
 scales poorly with smaller caches.  This is a complaint about
 BufFreeList lock which, in fact, I've seen as a huge point of
 contention on some workloads.  In fact, on read-only workloads, with
 my lazy vxid lock patch applied, this is, I believe, the only
 remaining unpartitioned LWLock that is ever taken in exclusive mode;
 or at least the only one that's taken anywhere near often enough to
 matter.  I think we're going to do something about this, although I
 don't have a specific idea in mind at the moment.

 This has been discussed before: 
 http://archives.postgresql.org/pgsql-hackers/2011-03/msg01406.php (which 
 itself references 2 other threads).

 The basic idea is: have a background process that proactively moves buffers 
 onto the free list so that backends should normally never have to run the 
 clock sweep (which is rather expensive). The challenge there is figuring out 
 how to get stuff onto the free list with minimal locking impact. I think one 
 possible option would be to put the freelist under it's own lock (IIRC we 
 currently use it to protect the clock sweep as well). Of course, that still 
 means the free list lock could be a point of contention, but presumably it's 
 far faster to add or remove something from the list than it is to run the 
 clock sweep.

Hi Jim,

My experiments have shown that the freelist proper is not
substantially faster than the freelist clocksweep--and that is even
under the assumption that putting things back into the freelist is
absolutely free.  Under all the workload's I've been able to contrive,
other than ones contrived by actually hacking the code itself to make
it pathological, the average number of buffers inspected per run of
the clock sweep is 2.5.  Under contention, the mere act of acquiring
a lock is more traumatic than the actual work carried out under the
lock.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mosbench revisited

2011-08-06 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 It would be nice if the Linux guys would fix this problem for us, but
 I'm not sure whether they will.  For those who may be curious, the
 problem is in generic_file_llseek() in fs/read-write.c.  On a platform
 with 8-byte atomic reads, it seems like it ought to be very possible
 to read inode-i_size without taking a spinlock.  A little Googling
 around suggests that some patches along these lines have been proposed
 and - for reasons that I don't fully understand - rejected.  That now
 seems unfortunate.  Barring a kernel-level fix, we could try to
 implement our own cache to work around this problem.  However, any
 such cache would need to be darn cheap to check and update (since we
 can't assume that relation extension is an infrequent event) and must
 somehow having the same sort of mutex contention that's killing the
 kernel in this workload.

What about making the relation extension much less frequent?  It's been
talked about before here, that instead of extending 8kB at a time we
could (should) extend by much larger chunks.  I would go as far as
preallocating the whole next segment (1GB) (in the background) as soon
as the current is more than half full, or such a policy.

Then you have the problem that you can't really use lseek() anymore to
guess'timate a relation size, but Tom said in this thread that the
planner certainly doesn't need something that accurate.  Maybe the
reltuples would do?  If not, it could be that some adapting of its
accuracy could be done?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-06 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 I think we'll be a lot better off with the framework discussed last
 year: build a generic plan, as well as custom plans for the first few
 sets of parameter values, and then observe whether there's a significant
 reduction in estimated costs for the custom plans.

Another way here would be to cache more than a single plan and to keep
execution time samples or some other relevant runtime characteristics.
Then what we need would be a way to switch from a plan to another at run
time on some conditions, like realizing that the reason why the planner
thought a nestloop would be perfect is obviously wrong, or maybe just
based on runtime characteristics.

 But in any case, it's way premature to be debating this until we have
 the infrastructure in which we can experiment with different policies.

That too.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plperl crash with Debian 6 (64 bit), pl/perlu, libwww and https

2011-08-06 Thread Alex Hunsaker
On Fri, Aug 5, 2011 at 08:53, Andrew Dunstan and...@dunslane.net wrote:


 On 08/04/2011 11:23 PM, Alex Hunsaker wrote:

 [ ... don't let people set signal handlers postgres sets ]

 This whole thing is a massive over-reaction to a problem we almost certainly
 know how to fix fairly simply and relatively painlessly, and attempts
 (unsuccessfully, at least insofar as comprehensiveness is concerned) to fix
 a problem nobody's actually reported having AFAIK.

*shrug* OK.

Find attached a version that does the equivalent of local %SIG for
each pl/perl(u) call. I was only able to test on 5.14.1, but I looked
at 5.8.9 to make sure it looks like it works.

Its a tad slower (something like 0.00037ms per call), but uhh thats
quite acceptable IMHO (best of 5 runs):

= create or replace function simple() returns void as $$ $$ language plperl;
CREATE FUNCTION

-- pre patch
= select count(simple()) from generate_series(1, 1000);
Time: 10219.149 ms

-- patched
= select count(simple()) from generate_series(1, 1000);
Time: 13924.025 ms

Thoughts?
*** a/src/pl/plperl/expected/plperl.out
--- b/src/pl/plperl/expected/plperl.out
***
*** 639,641  CONTEXT:  PL/Perl anonymous code block
--- 639,643 
  DO $do$ use warnings FATAL = qw(void) ; my @y; my $x = sort @y; 1; $do$ LANGUAGE plperl;
  ERROR:  Useless use of sort in scalar context at line 1.
  CONTEXT:  PL/Perl anonymous code block
+ DO $do$ die $SIG{'ALARM'} set! if($SIG{'ALARM'}); $SIG{'ALARM'} = sub { print alarm!\n}; $do$ LANGUAGE plperl;
+ DO $do$ die $SIG{'ALARM'} set! if($SIG{'ALARM'}); $do$ LANGUAGE plperl;
*** a/src/pl/plperl/plperl.c
--- b/src/pl/plperl/plperl.c
***
*** 1895,1906  plperl_call_perl_func(plperl_proc_desc *desc, FunctionCallInfo fcinfo)
--- 1895,1912 
  {
  	dSP;
  	SV		   *retval;
+ 	GV		   *gv;
  	int			i;
  	int			count;
  
  	ENTER;
  	SAVETMPS;
  
+ 	gv = gv_fetchpv(SIG, 0, SVt_PVHV);
+ 	if (!gv)
+ 		elog(ERROR, couldn't fetch %%SIG);
+ 	save_hash(gv);			/* local %SIG */
+ 
  	PUSHMARK(SP);
  	EXTEND(sp, desc-nargs);
  
***
*** 1976,1981  plperl_call_perl_trigger_func(plperl_proc_desc *desc, FunctionCallInfo fcinfo,
--- 1982,1988 
  	dSP;
  	SV		   *retval,
  			   *TDsv;
+ 	GV		   *gv;
  	int			i,
  count;
  	Trigger*tg_trigger = ((TriggerData *) fcinfo-context)-tg_trigger;
***
*** 1986,1995  plperl_call_perl_trigger_func(plperl_proc_desc *desc, FunctionCallInfo fcinfo,
  	TDsv = get_sv(_TD, 0);
  	if (!TDsv)
  		elog(ERROR, couldn't fetch $_TD);
- 
  	save_item(TDsv);			/* local $_TD */
  	sv_setsv(TDsv, td);
  
  	PUSHMARK(sp);
  	EXTEND(sp, tg_trigger-tgnargs);
  
--- 1993,2006 
  	TDsv = get_sv(_TD, 0);
  	if (!TDsv)
  		elog(ERROR, couldn't fetch $_TD);
  	save_item(TDsv);			/* local $_TD */
  	sv_setsv(TDsv, td);
  
+ 	gv = gv_fetchpv(SIG, 0, SVt_PVHV);
+ 	if (!gv)
+ 		elog(ERROR, couldn't fetch %%SIG);
+ 	save_hash(gv);			/* local %SIG */
+ 
  	PUSHMARK(sp);
  	EXTEND(sp, tg_trigger-tgnargs);
  
*** a/src/pl/plperl/sql/plperl.sql
--- b/src/pl/plperl/sql/plperl.sql
***
*** 415,417  DO $do$ use strict; my $name = foo; my $ref = $$name; $do$ LANGUAGE plperl;
--- 415,420 
  -- check that we can use warnings (in this case to turn a warn into an error)
  -- yields ERROR:  Useless use of sort in scalar context.
  DO $do$ use warnings FATAL = qw(void) ; my @y; my $x = sort @y; 1; $do$ LANGUAGE plperl;
+ 
+ DO $do$ die $SIG{'ALARM'} set! if($SIG{'ALARM'}); $SIG{'ALARM'} = sub { print alarm!\n}; $do$ LANGUAGE plperl;
+ DO $do$ die $SIG{'ALARM'} set! if($SIG{'ALARM'}); $do$ LANGUAGE plperl;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Further news on Clang - spurious warnings

2011-08-06 Thread Tom Lane
Peter Geoghegan pe...@2ndquadrant.com writes:
 On 5 August 2011 20:07, Tom Lane t...@sss.pgh.pa.us wrote:
 If I were trying to get rid of this warning, I'd be wondering why
 ReplNodeTag is a distinct enum in the first place.

 Indeed, that doesn't seem to be justified anywhere, and seems to be a
 violation of the abstraction of Node as a pseudo base class which
 would have broken any downcasting that we might have attempted to
 do. Since ReplNodeTag wasn't being used directly, just its enumeration
 constants, simply moving the constants to the global, generic NodeTag
 enum fixes the issue.

 That is what the attached patch does.

I did this plus moving replnodes.h into a saner spot: if they're
full-fledged Nodes, they ought to be defined in src/include/nodes/.

I did not renumber the existing node tags as your patch suggests.
We might want to do that at some point to make a bit more breathing
room, but I think it's too late in the 9.1 cycle to be doing that
in 9.1.  People have probably already built third-party code that
incorporates values like T_ReturnSetInfo.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mosbench revisited

2011-08-06 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes:
 My experiments have shown that the freelist proper is not
 substantially faster than the freelist clocksweep--and that is even
 under the assumption that putting things back into the freelist is
 absolutely free.

The freelist isn't there to make buffer allocation faster, though;
it's there for allocation efficiency.  The point is that when some
buffers have become completely useless (eg, because we dropped the table
they were for), they'll be recycled in preference to reclaiming buffers
that contain still-possibly-useful data.  It would certainly be simple
to get rid of the freelist and only recycle dead buffers when the clock
sweep reaches them, but I think we'd be paying for that in extra,
unnecessary I/O.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transient plans versus the SPI API

2011-08-06 Thread Dimitri Fontaine
Jeff Davis pg...@j-davis.com writes:
 A control knob sounds limited. For instance, what if the application
 knows that some parameters will be constant over the time that the plan
 is saved? It would be nice to be able to bind some parameters to come up
 with a generic (but less generic) plan, and then execute it many times.
 Right now that can only be done by inlining such constants in the SQL,
 which is what we want to avoid.

+1

I was already thinking in those term at the application level for the
example I've been using before in this thread, and only reading your
mail I realize that maybe the backend should be able to do that itself.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] vacuumlo patch

2011-08-06 Thread Josh Kupershmidt
On Tue, Jul 26, 2011 at 12:18 PM, Timothy D. F. Lewis
elatl...@gmail.com wrote:
 I'm not sure what David did for me so as per Roberts suggestion I have added
 this patch to the commit fest.
 I'm hoping I have not put this patch in more than one workflow.

Hi Tim,

I would be willing to review this patch for the next CommitFest. I'd
like to request that you send an updated version of your patch *as an
attachment* to avoid the problems with long lines getting
automatically wrapped, as Alvaro mentioned. I had trouble getting the
existing patches to apply.

A few preliminary comments about the patch:

 1. It wasn't clear to me whether you're OK with Aron's suggested
tweak, please include it in your patch if so.

 2. I think it might be better to use INT_MAX instead of hardcoding 2147483647.

 3. Your patch has some minor code style differences wrt. the existing
code, e.g.
+if(param-transaction_limit!=0 
deleted=param-transaction_limit)
should have a space before the first '(' and around the '!=' and '='

 4. The rest of the usage strings spell out 'large object(s)' instead
of abbreviating 'LO'
+printf(  -l LIMIT stop after removing LIMIT LOs\n);

 5. transaction_limit is an int, yet you're using strtol() which
returns long. Maybe you want to use atoi() or make transaction_limit a
long?

Thanks
Josh

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] vacuumlo patch

2011-08-06 Thread Tim
Hi Josh,

Thanks for help. Attached is a patch including changes suggested in your
comments.

Excerpts from Josh's message On Sat, Aug 6, 2011 at 9:57 PM:


  1. It wasn't clear to me whether you're OK with Aron's suggested
 tweak, please include it in your patch if so.


I decided to and included Aron's tweak.
I'm not sure if the PostgreSQL project prefers simplified code over faster
run time (if only under rare conditions).
Who knows maybe the compiler will re-optimize it anyway.

  2. I think it might be better to use INT_MAX instead of hardcoding
 2147483647.


Fixed, though I'm not sure if I put the include in the preferred order.


   3. ... should have a space before the first '(' and around the '!=' and
 '='


Fixed.


   4. The rest of the usage strings spell out 'large object(s)' instead
 of abbreviating 'LO'...


Fixed, I omitted the brackets around the s to conform with the other usage
strings.


   5. transaction_limit is an int, yet you're using strtol() which
 returns long. Maybe you want to use atoi() or make transaction_limit a
 long?


The other INT in the code is using strtol() so I also used strtol instead of
changing more code.
I'm not sure if there are any advantages or disadvantages.
maybe it's easier to prevent/detect/report overflow wraparound.

I can't think of a good reason anyone would want to limit transaction to
something more than INT_MAX.
Implementing that would best be done in separate and large patch as
PQntuples returns an int and is used on 271 lines across PostgreSQL.


Thanks again for the help.
2147483647


vacuumlo.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers