date:20131216


On 12/15/2013 03:57 AM, Tom Lane wrote:

Josh Berkus j...@agliodbs.com writes:

I think even the FLOAT case deserves some consideration.  What's the
worst-case drift?


Complete loss of all significant digits.

The case I was considering earlier of single-row windows could be made
safe (I think) if we apply the negative transition function first, before
incorporating the new row(s).  Then for example if you've got float8 1e20
followed by 1, you compute (1e20 - 1e20) + 1 and get the right answer.
It's not so good with two-row windows though:

 Table  correct sum of  negative-transition
this + next value   result
 1e20   1e201e20 + 1 = 1e20
 1  1   1e20 - 1e20 + 0 = 0
 0


In general, folks who do aggregate operations on
FLOATs aren't expecting an exact answer, or one which is consistent
beyond a certain number of significant digits.


Au contraire.  People who know what they're doing expect the results
to be what an IEEE float arithmetic unit would produce for the given
calculation.  They know how the roundoff error ought to behave, and they
will not thank us for doing a calculation that's not the one specified.
I will grant you that there are plenty of clueless people out there
who *don't* know this, but they shouldn't be using float arithmetic
anyway.


And Dave is right: how many bug reports would we get about NUMERIC is
fast, but FLOAT is slow?


I've said this before, but: we can make it arbitrarily fast if we don't
have to get the right answer.  I'd rather get it's slow complaints
than this is the wrong answer complaints.


There's another technique we could use which doesn't need a negative 
transition function, assuming the order you feed the values to the 
aggreate function doesn't matter: keep subtotals. For example, if the 
window first contains values 1, 2, 3, 4, you calculate 3 + 4 = 7, and 
then 1 + 2 + 7 = 10. Next, 1 leaves the window, and 5 enters it. Now you 
calculate  2 + 7 + 5 = 14. By keeping the subtotal (3 + 4 = 7) around, 
you saved one addition compared to calculating 2 + 3 + 4 + 5 from scratch.


The negative transition function is a lot simpler and faster for 
count(*) and integer operations, so we probably should implement that 
anyway. But the subtotals technique could be very useful for other data 
types.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: variant of regclass

2013-12-16 Thread Tatsuo Ishii

 Tatsuo Ishii is...@postgresql.org writes:
 Can I make sure that we want to keep the current behavior:
 
 test=# SELECT 'pg_klass'::regclass;
 ERROR:  relation pg_klass does not exist
 
 Yeah, I think the consensus is to not change the behavior of the input
 functions, just add some new ones.

Ok, here is the conceptual patch to implement toregclass (only for
now). If my direction is ok, I'll come up with complete patches to
implement more to* functions. Any advice will be appreciated.

Here is a sample session:

test=# select toregclass('foo');
 toregclass 

 -
(1 row)

test=# select toregclass('pg_class');
 toregclass 

 pg_class
(1 row)

test=# select toregclass('pg_class')::oid;
 toregclass 

   1259
(1 row)

test=# select toregclass('foo')::oid;
 toregclass 

  0
(1 row)

Implementation notes:

To implement toregclass, which does not throw errors when invalid
argument is given, src/backend/utils/adt/regproc.c is modified. I
added two static functions:

static Datum regclass_gut(char *class_name_or_oid, bool raiseError);
static List *stringToQualifiedNameList_gut(const char *string, bool raiseError);

regclass_gut is called from regclassin and toregclass and do the most
job before regclassin did. raiseError flag controls whether an error
is raised or not when an invalid argument (for example non existent
relation) is given. For this purpose, regclass_gut wraps the call to
oidin using a PG_TRY block.

Secondly, when called as bootstap and raiseError is true, returns
InvalidOid instead of raising an error relation XXX does not
exist. However, I doubt there's no customer who calls regclass_gut
with raiseError is false in the bootstrap.

Thirdly, stringToQualifiedNameList_gut is added to replace
stringToQualifiedNameList. The reason why I don't use PG_TRY block is,
I need to free some memory allocated inside the function in an
error condition.

Finially I modified the call to RangeVarGetRelid to switch
missing_ok flag to reflect raiseError argument.

One thing I need to further is modifying makeRangeVarFromNameList. If
strange schema qualified name like a.b.c.d.e.f is given, still an
error raises.

So, any advice will be appreciated.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c
index c24a2c1..d3532d7 100644
--- a/src/backend/utils/adt/regproc.c
+++ b/src/backend/utils/adt/regproc.c
@@ -45,6 +45,8 @@ static char *format_operator_internal(Oid operator_oid, bool force_qualify);
 static char *format_procedure_internal(Oid procedure_oid, bool force_qualify);
 static void parseNameAndArgTypes(const char *string, bool allowNone,
 	 List **names, int *nargs, Oid *argtypes);
+static Datum regclass_gut(char *class_name_or_oid, bool raiseError);
+static List *stringToQualifiedNameList_gut(const char *string, bool raiseError);
 
 
 /*
@@ -804,21 +806,55 @@ Datum
 regclassin(PG_FUNCTION_ARGS)
 {
 	char	   *class_name_or_oid = PG_GETARG_CSTRING(0);
+	Oid			result;
+
+	result = regclass_gut(class_name_or_oid, true);
+	PG_RETURN_OID(result);
+}
+
+Datum
+toregclass(PG_FUNCTION_ARGS)
+{
+	char	   *class_name_or_oid = PG_GETARG_CSTRING(0);
+	Oid			result;
+
+	result = regclass_gut(class_name_or_oid, false);
+	PG_RETURN_OID(result);
+}
+
+/*
+ * Gut of regclassin and toregclass.
+ * If raiseError is false, returns InvalidOid upon error.
+ */
+static Datum regclass_gut(char *class_name_or_oid, bool raiseError)
+{
 	Oid			result = InvalidOid;
 	List	   *names;
 
 	/* '-' ? */
 	if (strcmp(class_name_or_oid, -) == 0)
-		PG_RETURN_OID(InvalidOid);
+		return result;
 
 	/* Numeric OID? */
 	if (class_name_or_oid[0] = '0' 
 		class_name_or_oid[0] = '9' 
 		strspn(class_name_or_oid, 0123456789) == strlen(class_name_or_oid))
 	{
-		result = DatumGetObjectId(DirectFunctionCall1(oidin,
-		CStringGetDatum(class_name_or_oid)));
-		PG_RETURN_OID(result);
+		PG_TRY();
+		{
+			result = DatumGetObjectId(DirectFunctionCall1(oidin,
+		  CStringGetDatum(class_name_or_oid)));
+		}
+		PG_CATCH();
+		{
+			if (raiseError)
+PG_RE_THROW();
+			else
+return InvalidOid;
+		}
+		PG_END_TRY();
+
+		return result;
 	}
 
 	/* Else it's a name, possibly schema-qualified */
@@ -848,28 +884,36 @@ regclassin(PG_FUNCTION_ARGS)
 		if (HeapTupleIsValid(tuple = systable_getnext(sysscan)))
 			result = HeapTupleGetOid(tuple);
 		else
-			ereport(ERROR,
-	(errcode(ERRCODE_UNDEFINED_TABLE),
-			   errmsg(relation \%s\ does not exist, class_name_or_oid)));
+			if (raiseError)
+ereport(ERROR,
+		(errcode(ERRCODE_UNDEFINED_TABLE),
+		 errmsg(relation \%s\ does not exist, class_name_or_oid)));
+			else
+return InvalidOid;
 
 		/* We assume there can be only one match */
 
 		systable_endscan(sysscan);

Re: [HACKERS] Like operator for name type

2013-12-16 Thread Albe Laurenz

Mohsen SM wrote:
 I don't find where of code run the like operation for name Type.
 
 can you tell me where compare Like clues with one column of name type ?
 
 I don't find function for this operation in /src/backend/utils/adt/name.c 
 when I was in debugging mode
 and get break point on all functions.

Do you mean LIKE in a query like this?

test= EXPLAIN VERBOSE SELECT oid FROM pg_class WHERE relname LIKE 'pg_%';
  QUERY PLAN
--
 Seq Scan on pg_catalog.pg_class  (cost=0.00..12.80 rows=230 width=4)
   Output: oid
   Filter: (pg_class.relname ~~ 'pg_%'::text)
(3 rows)

That would use textlike() in backend/utils/adt/like.c

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

On Mon, Dec 16, 2013 at 9:39 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:


 There's another technique we could use which doesn't need a negative
 transition function, assuming the order you feed the values to the aggreate
 function doesn't matter: keep subtotals. For example, if the window first
 contains values 1, 2, 3, 4, you calculate 3 + 4 = 7, and then 1 + 2 + 7 =
 10. Next, 1 leaves the window, and 5 enters it. Now you calculate  2 + 7 +
 5 = 14. By keeping the subtotal (3 + 4 = 7) around, you saved one addition
 compared to calculating 2 + 3 + 4 + 5 from scratch.

 The negative transition function is a lot simpler and faster for count(*)
 and integer operations, so we probably should implement that anyway. But
 the subtotals technique could be very useful for other data types.

 - Heikki


That's quite interesting. I guess we would need another flag in
pg_aggregate to mark if the order of the tuples matters, string_agg would
be an example of one that would have to skip this.

At least for ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING if the
aggregate order did not matter than it would likely be quite efficient just
to aggregate from the bottom up and materialise the results at each tuple
and store it in the tuple store, then just use that materialised value when
that tuple is processed. This method won't work when the frame base is not
fixed.

I had been thinking ahead on how to improve MIN and MAX cases too. I came
up with something called tuple store indexes that could be build as
binary search trees with a composite index on the tuple position and the
aggregate's sort operator... Something similar to how the following query
could use an index on (id,value) to calculate max()
select max(value) from test where id between 1 and 100;
It's certainly not something for this patch, but it was an idea I came up
with which I think would be possible without adding any more columns to
pg_aggregate.

Regards

David Rowley

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

On Sun, Dec 15, 2013 at 12:17 PM, David Rowley dgrowle...@gmail.com wrote:


 One thing that is currently on my mind is what to do when passing volatile
 functions to the aggregate. Since the number of times we execute a volatile
 function will much depend on the window frame options, I think we should
 include some sort of warning in the documentation that The number of times
 that the expression is evaluated within the aggregate function when used in
 the context of a WINDOW is undefined. The other option would be to disable
 this optimisation if the aggregate expression contained a volatile
 function, but doing this, to me seems a bit weird as is anyone actually
 going to be depending on a volatile function being executed so many times?



I just wanted to bring this back into people's minds.
During writing this patch I found and removed a comment which was a todo
item to implement these negative transition functions. This comment said
about maybe disallowing the use of these if the expression of the function
contained a volatile function. I wondered why this was important and I
still don't really see why we would disallow this only to enforce that we
call that function an undefined number of times anyway.

nextval was the only volatile function that I could think of that would
allow me to give an example which was easy to understand what is going on
here.

CREATE SEQUENCE test_seq;
SELECT currval('test_seq'),
   COUNT(*) OVER (ORDER BY x.x ROWS BETWEEN CURRENT ROW AND UNBOUNDED
FOLLOWING),
   SUM(nextval('test_seq')) OVER (ORDER BY x.x ROWS BETWEEN CURRENT ROW
AND UNBOUNDED FOLLOWING)
FROM generate_series(1,2) x (x);
DROP SEQUENCE test_seq;

The results are:
 currval | count | sum
-+---+-
   2 | 2 |   3
   3 | 1 |   3

I've not looked to see if the spec has anything about this but, the first
row will have sum as 1+2 then the 2nd row will just have 1 row to aggregate
and the value will be 3 due to nextval returning 3, I could see an argument
that the results for this should actually be:

 currval | count | sum
-+---+-
   2 | 2 |   3
   3 | 1 |   2

If it was then the solution would have to be to materalise the expression
by evaluating it once for each tuple which sounds like a big change. I
thought maybe if we're going to be playing around with the number of times
these expressions are evaluated then we should stick a node in the docs to
tell our users not to depend on this.

Something like the attached maybe.


aggfuncwindow_note_doc.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum_work_mem


On 12/13/2013 08:40 PM, Alvaro Herrera wrote:

Heikki Linnakangas escribió:


I haven't been following this thread in detail, but would it help if
we implemented a scheme to reduce (auto)vacuum's memory usage? Such
schemes have been discussed in the past, packing the list of dead
items more tightly.


Well, it would help some, but it wouldn't eliminate the problem
completely.  Autovacuum scales its memory usage based on the size of the
table.  There will always be a table so gigantic that a maximum
allocated memory is to be expected; and DBAs will need a way to limit
the memory consumption even if we pack dead items more densely.

I was playing with keeping item pointers for each page in a bitmapset.
This was pretty neat and used a lot less memory than currently, except
that I needed to allocate a large chunk of memory and then have
bitmapsets use words within that large allocation space.  It turned out
to be too ugly so I abandoned it.  With the varbit encoding thingy in
the recent GIN patchset, maybe it would be workable.


The varbyte encoding is actually a very poor fit for vacuum. Vacuum 
needs fast random access into the array when scanning indexes, and the 
varbyte encoded item pointer lists used in gin don't allow that.


I couldn't find it in the archives now, but when we last discussed this, 
Tom suggested that we divide the large chunk of memory that vacuum 
allocates into two parts. The first part grows from the bottom up, and 
the second part from top down, until there is no free space in the 
middle anymore. For each heap page, there is one entry in the first 
part, with the block number, and a pointer to an entry in the second 
part. In the second part, there's a list of offset numbers on that page 
(or a bitmap).


Another idea: Store only the least significant 20 bits the block number 
of each item pointer, and use the remaining 12 bits for the offset 
number. So each item pointer is stored as a single 32 bit integer. For 
the top 12 bits of the block number, build a separate lookup table of 
4096 entries, indexed by the top bits. Each entry in the lookup table 
points to the beginning and end index in the main array where the 
entries for that page range is stored. That would reduce the memory 
usage by about 1/3, which isn't as good as the bitmap method when there 
is a lot of dead tuples same pages, but would probably be a smaller patch.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2013-12-16 Thread Shigeru Hanada

KaiGai-san,

2013/12/16 KaiGai Kohei kai...@ak.jp.nec.com:
 (2013/12/16 14:15), Shigeru Hanada wrote:
 (1) ctidscan
 Is session_preload_libraries available to enable the feature, like
 shared_*** and local_***?  According to my trial it works fine like
 two similar GUCs.

 It shall be available; nothing different from the two parameters that
 we have supported for long time. Sorry, I missed the new feature to
 mention about.

Check.

 (2) postgres_fdw
 JOIN push--down is a killer application of Custom Scan Provider
 feature, so I think it's good to mention it in the Remote Query
 Optimization section.

 I added an explanation about remote join execution on the section.
 Probably, it help users understand why Custom Scan node is here
 instead of Join node. Thanks for your suggestion.

Check.

I think that these patches are enough considered to mark as Ready for
Committer.

Regards,
-- 
Shigeru HANADA


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum_work_mem

2013-12-16 Thread Simon Riggs

On 16 December 2013 10:12, Heikki Linnakangas hlinnakan...@vmware.com wrote:
 On 12/13/2013 08:40 PM, Alvaro Herrera wrote:

 Heikki Linnakangas escribió:

 I haven't been following this thread in detail, but would it help if
 we implemented a scheme to reduce (auto)vacuum's memory usage? Such
 schemes have been discussed in the past, packing the list of dead
 items more tightly.


 Well, it would help some, but it wouldn't eliminate the problem
 completely.  Autovacuum scales its memory usage based on the size of the
 table.  There will always be a table so gigantic that a maximum
 allocated memory is to be expected; and DBAs will need a way to limit
 the memory consumption even if we pack dead items more densely.

The problem is allocation of memory, not one of efficient usage once
we have been allocated.

 Another idea: Store only the least significant 20 bits the block number of
 each item pointer, and use the remaining 12 bits for the offset number. So
 each item pointer is stored as a single 32 bit integer. For the top 12 bits
 of the block number, build a separate lookup table of 4096 entries, indexed
 by the top bits. Each entry in the lookup table points to the beginning and
 end index in the main array where the entries for that page range is stored.
 That would reduce the memory usage by about 1/3, which isn't as good as the
 bitmap method when there is a lot of dead tuples same pages, but would
 probably be a smaller patch.

We would do better to just use memory from shared_buffers and then we
wouldn't need a memory allocation or limit.
If we split the allocation into a series of BLCKSZ blocks of memory,
we can use your compression down to 4 bytes/row and then index the
blocks.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.8

Hi Robert,

On 2013-12-16 00:53:10 -0500, Robert Haas wrote:
 On Wed, Dec 11, 2013 at 7:08 PM, Andres Freund and...@2ndquadrant.com wrote:
  I don't think there's much point in including remapping in all of
  the error messages.  It adds burden for translators and users won't
  know what a remapping file is anyway.
 
  It helps in locating wich part of the code caused a problem. I utterly
  hate to get reports with error messages that I can't correlate to a
  sourcefile. Yes, I know that verbose error output exists, but usually
  you don't get it that way... That said, I'll try to make the messages
  simpler.
 
 Well, we could adopt a policy of not making messages originating from
 different locations in the code the same.  However, it looks to me
 like that's NOT the current policy, because some care has been taken
 to reuse messages rather than distinguish them.

To me that mostly looks like cases where we either don't want to tell
more for various security-ish purposes or where they have been copy and
pasted...

 There's no hard and
 fast rule here, because some cases are distinguished, but my gut
 feeling is that all of the errors your patch introduces are
 sufficiently obscure cases that separate messages with separate
 translations are not warranted.

Perhaps we should just introduce a marker that some such strings are not
to be translated if they are of the unexpected kind. That would probably
make debugging easier too ;)

 I need to examine that logic in more detail,
 but I had trouble following it and was hoping the next version would
 be better-commented.

Yea, I've started expanding the comments about the higher level concerns
- I've been so knee deep in this that I didn't realize they weren't
there.

  b) make hot_standby_feedback work across disconnections of the walsender
 connection (i.e peg xmin, not just for catalogs though)
 
 Check; might need to be optional.

Yea, I am pretty sure it will. It'd probably pretty nasty to set
min_recovery_apply_delay=7d and force xmin kept to that...

  c) Make sure we can transport those across cascading
 replication.
 
 Not sure I follow.

Consider a replication scenario like primary - standby-1 -
standby-2. The primary may not only not remove data that standby-1
requires, but also not data that standby-2 needs. Not really necessary
for WAL since that will also reside on standby-1 but definitely for the
xmin horizon.
So standby-1 will need to signal not only his own needs, but also of the
nodes below.

  The hard questions that I see are like:
  * How do we manage standby registration? Does the admin have to do that
manually? Or does a standby register itself automatically if some config
paramter is set?
  * If automatically, how do we deal with the situation that registrant
dies before noting his own identifier somewhere persistent? My best idea
is a two phase registration process where registration in phase 1 are
thrown away after a restart, but yuck.

 If you don't know the answers to these questions for the kind of
 replication that we have now, then how do you know the answers for
 logical replication?  Conversely, what makes the answers that you've
 selected for logical replication unsuitable for our existing
 replication?

There's a pretty fundamental difference imo - with the logical decoding
stuff we only supply support for change producing nodes, with physical
rep we supply both.
There's no need to decide about the way node ids are stored in in-core logical
rep. consumers since there are no in-core ones. Yet. Also, physical rep
by now is a pretty established thing, we need to be much more careful
about compatibility there.

 I have to admit that before I saw your design for the logical
 replication slots, the problem of making this work for our existing
 replication stuff seemed almost intractable to me; I had no idea how
 that was going to work.

Believe me, it caused me some headaches to deceive it for decoding
too. Oh, and I think I watched just about all episodes of some stupid TV
show during it ;)

 Keeping the data in shared memory,
 persisting them across shutdowns, and managing them via either
 function calls or the replication command language seems perfect.

Thanks. I think the concept has quite some merits. The implementation is
a bit simplistic atm, we e.g. might want to work harder at coalescing
fsync()s and such, but that's a further step when we see whether it's
worthwile in the real world.

 Now, in terms of how registration works, whether for physical
 replication or logical, it seems to me that the DBA will have to be
 responsible for telling each client the name of the slot to which that
 client should connect (omitting it at said DBA's option if, as for
 physical replication, the slot is not mandatory).

It seems reasonable to me to reuse the application_name for the slot's
name, similar to the way it's used for synchronous rep. It seems odd to
use two different ways to identify nodes. t should

Re: [HACKERS] GIN improvements part 1: additional information


On 12/12/2013 06:44 PM, Alexander Korotkov wrote:

I've thought about different algorithms little more. General problem I see
is online update. We need it while it is typically not covered by
researches at all. We already have to invent small index in the end of
page. Different encoding methods adds more challenges. In general, methods
can be classified in two groups:
1) Values aren't aligned by bytes (gamma-codes, PFOR etc.)
2) Multiple values are packed together in small group (simple-9, simple-18)


Ok.


For the first group of methods when inserting in the middle of the page we
would have to do not byte-aligned shift of right part of values. I don't
know how expensive is this shift but I expect that it would be much slower
than memmove.


Agreed.


When values are packed into small groups, we have to either insert
inefficiently encoded value or re-encode whole right part of values.


It would probably be simplest to store newly inserted items 
uncompressed, in a separate area in the page. For example, grow the list 
of uncompressed items downwards from pg_upper, and the compressed items 
upwards from pg_lower. When the page fills up, re-encode the whole page.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

David Rowley dgrowle...@gmail.com writes:
 One thing that is currently on my mind is what to do when passing volatile
 functions to the aggregate. Since the number of times we execute a volatile
 function will much depend on the window frame options, I think we should
 include some sort of warning in the documentation that The number of times
 that the expression is evaluated within the aggregate function when used in
 the context of a WINDOW is undefined. The other option would be to disable
 this optimisation if the aggregate expression contained a volatile
 function, but doing this, to me seems a bit weird as is anyone actually
 going to be depending on a volatile function being executed so many times?

Once again: this patch has no business changing any user-visible behavior.
That would include not changing the number of evaluations of volatile
functions.  The planner is full of places where optimizations are disabled
for volatile subexpressions, and I don't see why this should be different.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.3 bug] disk space in pg_xlog increases during archive recovery

2013-12-16 Thread MauMau


Hi, Fujii san,


On Wed, Aug 7, 2013 at 7:03 AM, Fujii Masao masao.fu...@gmail.com wrote:

On second thought, probably we cannot remove the restored WAL files early
because they might be required for fast promotion which is new feature in 
9.3.
In fast promotion, an end-of-recovery checkpoint is not executed. After 
the end
of recovery, normal online checkpoint starts. If the server crashes 
before such
an online checkpoint completes, the server needs to replay again all the 
WAL
files which it replayed before the promotion. Since this is the crash 
recovery,
all those WAL files need to exist in pg_xlog directory. So if we remove 
the

restored WAL file from pg_xlog early, such a crash recovery might fail.

So even if cascade replication is disabled, if standby_mode = on, i.e., 
fast
promotion can be performed, we cannot remove the restored WAL files 
early.


Following Fujii-san's advice, I've made the attached patch.  This prevents 
unnecessary WAL accumulation in pg_xlog/ during archive recovery (not 
standby recovery).  To do this, I had to revive some code in 
exitArchiveRecovery() from 9.1.


Could you commit this to 9.2 and later?


Regards
MauMau


wal_increase_in_pitr.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Useless Replica Identity: NOTHING noise from psql \d

On 2013-12-14 17:43:36 +0100, Andres Freund wrote:
 On 2013-12-14 11:27:53 -0500, Tom Lane wrote:
  In HEAD:
  
  regression=# \d pg_depend
 Table pg_catalog.pg_depend
 Column|  Type   | Modifiers 
  -+-+---
   classid | oid | not null
   objid   | oid | not null
   objsubid| integer | not null
   refclassid  | oid | not null
   refobjid| oid | not null
   refobjsubid | integer | not null
   deptype | char  | not null
  Indexes:
  pg_depend_depender_index btree (classid, objid, objsubid)
  pg_depend_reference_index btree (refclassid, refobjid, refobjsubid)
  Replica Identity: NOTHING
  
  Where did that last line come from, and who thinks it's so important
  that it should appear by default?  It seems absolutely content-free
  even if I were using whatever feature it refers to, since it is
  (I presume) the default state.
 
 Hm. Yes, that's slightly inellegant. It's shown because it's not
 actually the normal default normal tables. Just for system tables. Maybe
 we should just set it to default (in pg_class) for system tables as
 well, and just change it in the relcache.

Hm. I don't like that choice much after thinking for a bit. Seems to
make querying the catalog unneccessarily complex.
How about making it conditional on the table's namespace instead? That
will do the wrong thing if somebody moves a table to pg_catalog and
configures a replica identity, but I think we can live with that, given
how many other things work strangely around that.

Patch attached.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 96322ca..4e132c8 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2310,7 +2310,13 @@ describeOneTableDetails(const char *schemaname,
 			printTableAddFooter(cont, buf.data);
 		}
 
-		if ((tableinfo.relkind == 'r' || tableinfo.relkind == 'm') 
+		/*
+		 * Show the table's replica identity, if nondefault, not USING INDEX,
+		 * and not a system catalog. If configured using an index, it will
+		 * display the identity when displaying the selected index above.
+		 */
+		if (strcmp(schemaname, pg_catalog) != 0 
+			(tableinfo.relkind == 'r' || tableinfo.relkind == 'm') 
 			tableinfo.relreplident != 'd'  tableinfo.relreplident != 'i')
 		{
 			const char *s = _(Replica Identity);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-12-16 Thread Peter Eisentraut

Amit, if you think this patch is ready now, please mark it as Ready for
Committer.  Otherwise, I encourage your to continue the discussion and
possibly resubmit the patch to the next commitfest.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: variant of regclass

Tatsuo Ishii is...@postgresql.org writes:
 To implement toregclass, which does not throw errors when invalid
 argument is given, src/backend/utils/adt/regproc.c is modified. I
 added two static functions:

 static Datum regclass_gut(char *class_name_or_oid, bool raiseError);
 static List *stringToQualifiedNameList_gut(const char *string, bool 
 raiseError);

Please spell that as guts not gut.

 regclass_gut is called from regclassin and toregclass and do the most
 job before regclassin did. raiseError flag controls whether an error
 is raised or not when an invalid argument (for example non existent
 relation) is given. For this purpose, regclass_gut wraps the call to
 oidin using a PG_TRY block.

I do not think that use of PG_TRY is either necessary or acceptable
--- for example, it will happily trap and discard query-cancel errors,
as well as other errors that are not necessarily safe to ignore.
If you don't want to risk an error on an invalid numeric string,
how about parsing the integer yourself with sscanf?

More generally, though, I don't see a great need to try to promise
that this function *never* throws any errors, and so I'm also suspicious
of the hacking you've done on stringToQualifiedNameList.  I'm even
less happy about the idea that this patch might start reaching into
things like makeRangeVarFromNameList.  I think it's sufficient if it
doesn't throw an error on name-not-found; you don't have to try to
prevent weird syntax problems from throwing errors.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Assertion failure in base backup code path

2013-12-16 Thread Antonin Houska

In HEAD, pg_basebackup causes WAL sender to fail when the replication
user is not a superuser:


#0  0x7f34f671dd25 in raise () from /lib64/libc.so.6
#1  0x7f34f671f1a8 in abort () from /lib64/libc.so.6
#2  0x008989a9 in ExceptionalCondition (conditionName=0xa51ac1
!(IsTransactionState()), errorType=0xa51734 FailedAssertion,
fileName=0xa516e0 catcache.c, lineNumber=) at assert.c:54
#3  0x0087ea36 in SearchCatCache (cache=0x23c4fb8, v1=16384,
v2=0, v3=0, v4=0) at catcache.c:
#4  0x00890cdd in SearchSysCache (cacheId=11, key1=16384,
key2=0, key3=0, key4=0) at syscache.c:909
#5  0x008a9a99 in has_rolreplication (roleid=16384) at
miscinit.c:401
#6  0x005146d1 in do_pg_start_backup (backupidstr=0x239d5b0
bkp_01, fast=0 '\000', starttli_p=0x7fff78e4f8ec,
labelfile=0x7fff78e4f8e0) at xlog.c:9633
#7  0x00733a24 in perform_base_backup (opt=0x7fff78e4fa30,
tblspcdir=0x242c6a0) at basebackup.c:106
#8  0x00735013 in SendBaseBackup (cmd=0x239dbf8) at basebackup.c:563
#9  0x0072f4f2 in exec_replication_command (cmd_string=0x23ea078
BASE_BACKUP LABEL 'bkp_01'  WAL  NOWAIT) at walsender.c:668
#10 0x0077c5c4 in PostgresMain (argc=1, argv=0x2385358,
dbname=0x2385248 , username=0x2385210 postgres_replication) at
postgres.c:4009
#11 0x00717c94 in BackendRun (port=0x23a2e90) at postmaster.c:4085
#12 0x0071742e in BackendStartup (port=0x23a2e90) at
postmaster.c:3774
#13 0x00713cc9 in ServerLoop () at postmaster.c:1585
#14 0x00713370 in PostmasterMain (argc=3, argv=0x2381f60) at
postmaster.c:1240
#15 0x00677698 in main (argc=3, argv=0x2381f60) at main.c:196

Some additional condition may be needed in the Assert() statement?

// Antonin Houska (Tony)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertion failure in base backup code path

Hi,

On 2013-12-16 15:08:51 +0100, Antonin Houska wrote:
 In HEAD, pg_basebackup causes WAL sender to fail when the replication
 user is not a superuser:
 
 
 #0  0x7f34f671dd25 in raise () from /lib64/libc.so.6
 #1  0x7f34f671f1a8 in abort () from /lib64/libc.so.6
 #2  0x008989a9 in ExceptionalCondition (conditionName=0xa51ac1
 !(IsTransactionState()), errorType=0xa51734 FailedAssertion,
 fileName=0xa516e0 catcache.c, lineNumber=) at assert.c:54
 #3  0x0087ea36 in SearchCatCache (cache=0x23c4fb8, v1=16384,
 v2=0, v3=0, v4=0) at catcache.c:
 #4  0x00890cdd in SearchSysCache (cacheId=11, key1=16384,
 key2=0, key3=0, key4=0) at syscache.c:909
 #5  0x008a9a99 in has_rolreplication (roleid=16384) at
 miscinit.c:401
 #6  0x005146d1 in do_pg_start_backup (backupidstr=0x239d5b0
 bkp_01, fast=0 '\000', starttli_p=0x7fff78e4f8ec,
 labelfile=0x7fff78e4f8e0) at xlog.c:9633
 #7  0x00733a24 in perform_base_backup (opt=0x7fff78e4fa30,
 tblspcdir=0x242c6a0) at basebackup.c:106
 #8  0x00735013 in SendBaseBackup (cmd=0x239dbf8) at basebackup.c:563
 #9  0x0072f4f2 in exec_replication_command (cmd_string=0x23ea078
 BASE_BACKUP LABEL 'bkp_01'  WAL  NOWAIT) at walsender.c:668
 #10 0x0077c5c4 in PostgresMain (argc=1, argv=0x2385358,
 dbname=0x2385248 , username=0x2385210 postgres_replication) at
 postgres.c:4009
 #11 0x00717c94 in BackendRun (port=0x23a2e90) at postmaster.c:4085
 #12 0x0071742e in BackendStartup (port=0x23a2e90) at
 postmaster.c:3774
 #13 0x00713cc9 in ServerLoop () at postmaster.c:1585
 #14 0x00713370 in PostmasterMain (argc=3, argv=0x2381f60) at
 postmaster.c:1240
 #15 0x00677698 in main (argc=3, argv=0x2381f60) at main.c:196
 
 Some additional condition may be needed in the Assert() statement?

Actually it more looks like a bug around the basebackup facility. It
shouldn't do syscache lookups without having the resource management
stuff around it (the transaction state asserted in SearchCatCache()).

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-12-16 Thread Amit Kapila

On Mon, Dec 16, 2013 at 6:51 PM, Peter Eisentraut pete...@gmx.net wrote:
 Amit, if you think this patch is ready now, please mark it as Ready for
 Committer.  Otherwise, I encourage your to continue the discussion and
 possibly resubmit the patch to the next commitfest.

Only one test is pending from myside to conclude. I am planing to
complete it tomorrow.
Is it okay?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] stuck spinlock

2013-12-16 Thread Merlin Moncure

On Sat, Dec 14, 2013 at 6:20 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-12-13 15:49:45 -0600, Merlin Moncure wrote:
 On Fri, Dec 13, 2013 at 12:32 PM, Robert Haas robertmh...@gmail.com wrote:
  On Fri, Dec 13, 2013 at 11:26 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  And while we're on the subject ... isn't bgworker_die() utterly and
  completely broken?  That unconditional elog(FATAL) means that no process
  using that handler can do anything remotely interesting, like say touch
  shared memory.
 
  Yeah, but for the record (since I see I got cc'd here), that's not my
  fault.  I moved it into bgworker.c, but it's been like that since
  Alvaro's original commit of the bgworker facility
  (da07a1e856511dca59cbb1357616e26baa64428e).


 Is this an edge case or something that will hit a lot of users?
 Arbitrary server panics seems pretty serious...

 Is your question about the bgworker part you're quoting or about the
 stuck spinlock stuff? I don't think the bgworker bug is too bad in
 practice but the one in handle_sig_alarm() stuff certainly is.

 I think while it looks possible to hit problems without statement/lock
 timeout, it's relatively unlikely that those are hit in practice.

Well, both -- I was just wondering out loud what the severity level of
this issue was.  In particular, is it advisable for the general public
avoid this release?   My read on this is 'probably'.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] row security roadmap proposal

2013-12-16 Thread Craig Ringer

Hi all

I'd like to outline a path toward getting row-security into core.
Comments / discussion appreciated.

I think I/we need to:

- Finish and commit updatable security barrier views. I've still got a
lot of straightening out to do there.

- Add an attribute to portals that stores the user ID at the time the
portal was planned. Possibly extensibly; I'd be surprised if we won't
need to associate other local info with a portal later.

- Allow storage of the pre-rewrite query plan and let saved plans be
marked as needing rewrite when replanned. We'll need this to permit some
users (initially just a hardcoded superuser; later by posession of a
certain right) to be totally exempt from row-security policy.
(Alternative: store separate with- and without-row-security plan trees
and pick which we use).

- Decide on and implement a structure for row-security functionality its
self. I'm persuaded by Robert's comments here, that whatever we expose
must be significantly more usable than a DIY on top of views (with the
fix for updatable security barrier views to make that possible). I
outlined the skeleton of a possible design in my earlier post, with the
heirachical and non-heirachical access labels. Should be implemented
using the same API we expose for extensions (like SEPostgresql RLS).

- Produce and commit a patch that adds the C API for row-security,
including calls to make it easy to wrap any relation in a dynamically
created or stored updatable security barrier subquery during rewrite.

- Produce and commit a patch series implementing the syntax, catalog
tables / catalog changes, documentation, etc for row-security that uses
this C API and commit it to core.

SEPostgreSQL RLS can then be built on top of the same API, using the
same core support.

Thoughts?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] row security roadmap proposal

Craig Ringer cr...@2ndquadrant.com writes:
 - Add an attribute to portals that stores the user ID at the time the
 portal was planned. Possibly extensibly; I'd be surprised if we won't
 need to associate other local info with a portal later.

This bit seems rather confused.  A portal is a runnable query; we
do not support replanning midstream, and I don't think we support
changes of UID either.

I agree that the plan cache needs to support treating change of UID
as a reason to discard a cached plan, but that's nothing to do with
portals.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] stuck spinlock

On 2013-12-16 08:36:51 -0600, Merlin Moncure wrote:
 On Sat, Dec 14, 2013 at 6:20 AM, Andres Freund and...@2ndquadrant.com wrote:
  On 2013-12-13 15:49:45 -0600, Merlin Moncure wrote:
  Is this an edge case or something that will hit a lot of users?
  Arbitrary server panics seems pretty serious...
 
  Is your question about the bgworker part you're quoting or about the
  stuck spinlock stuff? I don't think the bgworker bug is too bad in
  practice but the one in handle_sig_alarm() stuff certainly is.
 
  I think while it looks possible to hit problems without statement/lock
  timeout, it's relatively unlikely that those are hit in practice.
 
 Well, both -- I was just wondering out loud what the severity level of
 this issue was.  In particular, is it advisable for the general public
 avoid this release?   My read on this is 'probably'.

Hard to say, the issues fixed in the release are quite important as
well. I'd tend to say they are more important. I think we just need to
release 9.3.3 pretty soon.

The multixact fixes in 9.3.2 weren't complete either... (see recent push)

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] stuck spinlock

Andres Freund and...@2ndquadrant.com writes:
 Hard to say, the issues fixed in the release are quite important as
 well. I'd tend to say they are more important. I think we just need to
 release 9.3.3 pretty soon.

Yeah.

 The multixact fixes in 9.3.2 weren't complete either... (see recent push)

Are they complete now?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] stuck spinlock

On 2013-12-16 09:46:19 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  The multixact fixes in 9.3.2 weren't complete either... (see recent push)
 
 Are they complete now?

Hm. There's two issues I know of left, both discovered in #8673:
- slru.c:SlruScanDirectory() doesn't support long enough
  filenames. Afaics that should be a fairly easy fix.
- multixact/members isn't protected against wraparounds, only
  multixact/offsets is. That's a pretty longstanding bug though,
  although more likely to be hit these days.

Furthermore there's some missing optimizations (like the useless
multixact generation you noted upon in Update with subselect sometimes
returns wrong result), but those shouldn't hold up a release.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] row security roadmap proposal

2013-12-16 Thread Craig Ringer

On 12/16/2013 10:43 PM, Tom Lane wrote:
 Craig Ringer cr...@2ndquadrant.com writes:
 - Add an attribute to portals that stores the user ID at the time the
 portal was planned. Possibly extensibly; I'd be surprised if we won't
 need to associate other local info with a portal later.
 
 This bit seems rather confused.  A portal is a runnable query; we
 do not support replanning midstream, and I don't think we support
 changes of UID either.

We _do_ support changes of UID, or rather, current_user returns the
session user ID at the point in time it runs in the portal.

This can be observed with SECURITY DEFINER pl/pgsql functions returning
refcursor, and with cursors that're retained across a SET SESSION
AUTHORIZATION. They don't even need to be WITH HOLD, and s.s.a. can
occur within a transaction.

The point is to return the user ID at the time the portal was created,
rather than whatever the session now is.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] stuck spinlock

Andres Freund and...@2ndquadrant.com writes:
 On 2013-12-16 09:46:19 -0500, Tom Lane wrote:
 Are they complete now?

 Hm. There's two issues I know of left, both discovered in #8673:
 - slru.c:SlruScanDirectory() doesn't support long enough
   filenames. Afaics that should be a fairly easy fix.
 - multixact/members isn't protected against wraparounds, only
   multixact/offsets is. That's a pretty longstanding bug though,
   although more likely to be hit these days.

Actually, isn't this one a must-fix as well?

http://www.postgresql.org/message-id/CAPweHKe5QQ1747X2c0tA=5zf4yns2xcvgf13opd-1mq24rf...@mail.gmail.com

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Review: [HACKERS] ECPG infrastructure changes part 1, was: Re: ECPG fixes

2013-12-16 Thread Peter Eisentraut

On 12/6/13, 9:58 AM, Antonin Houska wrote:
 Tested git apply and build again. No warnings.
 
 The regression test also looks good to me now.
 
 I'm done with this review.
 
 (Not sure if I should move it to 'ready for committer' status or the CFM
 should do).

You should do that, but I'll do it now.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] stuck spinlock

Tom Lane escribió:
 Andres Freund and...@2ndquadrant.com writes:
  On 2013-12-16 09:46:19 -0500, Tom Lane wrote:
  Are they complete now?
 
  Hm. There's two issues I know of left, both discovered in #8673:
  - slru.c:SlruScanDirectory() doesn't support long enough
filenames. Afaics that should be a fairly easy fix.
  - multixact/members isn't protected against wraparounds, only
multixact/offsets is. That's a pretty longstanding bug though,
although more likely to be hit these days.
 
 Actually, isn't this one a must-fix as well?
 
 http://www.postgresql.org/message-id/CAPweHKe5QQ1747X2c0tA=5zf4yns2xcvgf13opd-1mq24rf...@mail.gmail.com

Yep, I'm going through that one now.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

Noah Misch wrote:
 On Tue, Dec 03, 2013 at 07:26:38PM +0100, Andres Freund wrote:
  On 2013-12-03 13:14:38 -0500, Noah Misch wrote:
   On Tue, Dec 03, 2013 at 04:37:58PM +0100, Andres Freund wrote:
I currently don't see fixing the errorneous freezing of lockers (not the
updater though) without changing the wal format or synchronously waiting
for all lockers to end. Which both see like a no-go?
   
   Not fixing it at all is the real no-go.  We'd take both of those 
   undesirables
   before just tolerating the lost locks in 9.3.
  
  I think it's changing the wal format then.
 
 I'd rather have an readily-verifiable fix that changes WAL format than a
 tricky fix that avoids doing so.  So, modulo not having seen the change, +1.

I've committed a patch which hopefully fixes the problem using this
approach.  Thanks, Noah, for noticing the issue, and thanks, Andres, for
collaboration in getting the code in the right state.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

On Sat, Dec 14, 2013 at 8:00 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Greg Stark st...@mit.edu writes:
 On 14 Dec 2013 15:40, Tom Lane t...@sss.pgh.pa.us wrote:
 I think you *can't* cover them for the float types; roundoff error
 would mean you don't get the same answers as before.

 I was going to say the same thing. But then I started to wonder What's
 so special about the answers we used to give? They are also subject to
 round off and the results are already quite questionable in those cases.

 Well, we can't easily do better than the old answers, and the new ones
 might be arbitrarily worse.  Example: sum or average across single-row
 windows ought to be exact in any case, but it might be arbitrarily wrong
 with the negative-transition technique.

 More generally, this is supposed to be a performance enhancement only;
 it's not supposed to change the results.

 This consideration also makes me question whether we should apply the
 method for NUMERIC.  Although in principle numeric addition/subtraction
 is exact, such a sequence could leave us with a different dscale than
 is returned by the existing code.  I'm not sure if changing the number of
 trailing zeroes is a big enough behavior change to draw complaints.

I tend to think it is.  I'm not sure if it's worth it, but you could
finesse this problem with a more complex transition state - keep track
of how many values with any given scale are part of the current
window.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_rewarm status

2013-12-16 Thread Jeff Amiel

Trying to follow the threads and other references - but I can't determine where 
this patch ended up.
(http://www.postgresql.org/message-id/ca+tgmobrrrxco+t6gcqrw_djw+uf9zedwf9bejnu+rb5teb...@mail.gmail.com)

I'm
 trying to experiment with some new hardware - and the functionality to 
add specific tables/indexes into cache ahead of time will benefit me 
greatly.

I found a page describing how to apply the patch to 9.2.4 (jumping through some 
hoops - http://issues.collectionspace.org/browse/UCJEPS-432) and was hoping to 
get a version to apply to 9.3.X

Can
 anyone advise me on how I might get this 'applied' to a 9.3.X source 
code base or any other options to denote specific relations that I'd 
like to get directly into shared_buffers?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.8

On Mon, Dec 16, 2013 at 6:01 AM, Andres Freund and...@2ndquadrant.com wrote:
 There's no hard and
 fast rule here, because some cases are distinguished, but my gut
 feeling is that all of the errors your patch introduces are
 sufficiently obscure cases that separate messages with separate
 translations are not warranted.

 Perhaps we should just introduce a marker that some such strings are not
 to be translated if they are of the unexpected kind. That would probably
 make debugging easier too ;)

Well, we have that: it's called elog.  But that doesn't seem like the
right thing here.

  b) make hot_standby_feedback work across disconnections of the walsender
 connection (i.e peg xmin, not just for catalogs though)

 Check; might need to be optional.

 Yea, I am pretty sure it will. It'd probably pretty nasty to set
 min_recovery_apply_delay=7d and force xmin kept to that...

Yes, that would be... unfortunate.

  c) Make sure we can transport those across cascading
 replication.

 Not sure I follow.

 Consider a replication scenario like primary - standby-1 -
 standby-2. The primary may not only not remove data that standby-1
 requires, but also not data that standby-2 needs. Not really necessary
 for WAL since that will also reside on standby-1 but definitely for the
 xmin horizon.
 So standby-1 will need to signal not only his own needs, but also of the
 nodes below.

True.

  The hard questions that I see are like:
  * How do we manage standby registration? Does the admin have to do that
manually? Or does a standby register itself automatically if some config
paramter is set?
  * If automatically, how do we deal with the situation that registrant
dies before noting his own identifier somewhere persistent? My best idea
is a two phase registration process where registration in phase 1 are
thrown away after a restart, but yuck.

 If you don't know the answers to these questions for the kind of
 replication that we have now, then how do you know the answers for
 logical replication?  Conversely, what makes the answers that you've
 selected for logical replication unsuitable for our existing
 replication?

 There's a pretty fundamental difference imo - with the logical decoding
 stuff we only supply support for change producing nodes, with physical
 rep we supply both.

I'm not sure I follow this.  Both what and what?

 There's no need to decide about the way node ids are stored in in-core logical
 rep. consumers since there are no in-core ones. Yet.

I don't know that we have or need to make any judgements about how to
store node IDs.  You have decided that slots have names, and I see no
problem there.

 Also, physical rep
 by now is a pretty established thing, we need to be much more careful
 about compatibility there.

I don't think we should change anything in backward-incompatible
fashion.  If we add any new behavior, it'd surely be optional.

 I think we need to improve the monitoring facilities a bit, and that
 should be it. Like
 * expose xmin in pg_stat_activity, pg_prepared_xacts,
   pg_replication_slots (or whatever it's going to be called)
 * expose the last restartpoint's redo pointer in pg_stat_replication, 
 pg_replication_slots

+1.

 That said, the consequences can be a bit harsher than a full disk - the
 anti-wraparound security measures might kick in requiring a restart into
 single user mode. That's way more confusing than cleaning up a bit of
 space on the disk.

Yes, true.  I'm not sure what approach to that problem is best.  It's
long seemed to me that well before we get to the point of shutting
down the whole cluster we ought to just start killing sessions with
old xmins.  But that doesn't generalize well to prepared transactions,
which can't just be rolled back or committed without guidance; and
killing slots seems a bit dicey too.

 Consider what happens though, if you promote a node for physical rep. As
 soon as it's promoted, it will accept writes and then start a
 checkpoint. Unless other standbys have started to follow that node
 before either that checkpoint happens (removing WAL) or
 autovacuuming/hot-pruning is performed (creating recovery conflicts),
 we'll possibly loose the data required to let the standbys follow the
 promotion. Note that wal_keep_segments and vacuum_defer_cleanup_age both
 sorta work for that...

True.

 Could somebody please deliver me a time dilation device?

Upon reflection, I am less concerned with actually having physical
slots in this release than I am with making sure we're not boxing
ourselves into a corner that will make them hard to add later.  If
we've got a clear design that can be generalized to that case, but the
SMOP required exceeds what can be done in the time available, I am OK
to punt it.  But I am not sure we're at that point yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make

Re: [HACKERS] logical changeset generation v6.8

Robert Haas robertmh...@gmail.com writes:
 On Mon, Dec 16, 2013 at 6:01 AM, Andres Freund and...@2ndquadrant.com wrote:
 Perhaps we should just introduce a marker that some such strings are not
 to be translated if they are of the unexpected kind. That would probably
 make debugging easier too ;)

 Well, we have that: it's called elog.  But that doesn't seem like the
 right thing here.

errmsg_internal?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3 reference constraint regression

Alvaro Herrera wrote:

 The fix for the immediate bug is to add some code to HTSU so that it
 checks for locks by other transactions even when the tuple was created
 by us.  I haven't looked at the other tqual routines yet, but I imagine
 they will need equivalent fixes.

This POC patch changes the two places in HeapTupleSatisfiesUpdate that
need to be touched for this to work.  This is probably too simplistic,
in that I make the involved cases return HeapTupleBeingUpdated without
checking that there actually are remote lockers, which is the case of
concern.  I'm not yet sure if this is the final form of the fix, or
instead we should expand the Multi (in the cases where there is a multi)
and verify whether any lockers are transactions other than the current
one.  As is, nothing seems to break, but I think that's probably just
chance and should not be relied upon.

Attached are two isolation specs which illustrate both the exact issue
reported by Dan, and a similar one which involves an aborted
subtransaction having updated the second version of the row.  (This
takes a slightly different code path.)

As far as I can tell, no other routine in tqual.c needs to change other
than HeapTupleSatisfiesUpdate.  The ones that test for visibility
(Dirty, Any, Self) are only concerned with whether the tuple is visible,
and of course that won't be affected by the tuple being locked; and
HeapTupleSatisfiesVacuum is only concerned with the tuple being dead,
which similarly won't.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
setup
{
create extension pageinspect;
create table parent (i int, c char(3));
create unique index parent_idx on parent (i);
insert into parent values (1, 'AAA');
create table child (i int references parent(i));
}

teardown
{
--  drop table child, parent;
--  drop extension pageinspect;
}

session s1
step s1b  { BEGIN; select txid_current(); }
step s1i  { INSERT INTO child VALUES (1); }

step s1e  { SELECT lp, t_xmin, t_xmax, to_hex(t_infomask) as mask, 
to_hex(t_infomask2) as mask2 FROM heap_page_items(get_raw_page('parent', 0)); }
step s1e2 { SELECT lp, t_xmax, members.* FROM 
heap_page_items(get_raw_page('parent', 0)), pg_get_multixact_members(t_xmax) 
members where t_infomask  x'1000'::int  0; }

step s1c  { COMMIT; }

session s2
step s2b  { BEGIN; select txid_current(); }
step s2u  { UPDATE parent SET c=lower(c); }
# step s2u2   { UPDATE parent SET i=i; }
step s2svu { SAVEPOINT f; UPDATE parent SET c = 'bbb'; ROLLBACK TO f; }
step s2d  { DELETE FROM parent returning ctid; }
step s2c  { COMMIT; }



permutation s1b s1i s2b s2u s2svu s1e s1e2 s2d s1e s1e2 
s1c s2c
#permutation s1b s1i s2b s2u2 s2d s1c s2c
#permutation s1b s1i s2b s2d s1c s2c
setup
{
create extension pageinspect;
create table parent (i int, c char(3));
create unique index parent_idx on parent (i);
insert into parent values (1, 'AAA');
create table child (i int references parent(i));
}

teardown
{
--  drop table child, parent;
--  drop extension pageinspect;
}

session s1
step s1b  { BEGIN; select txid_current(); }
step s1i  { INSERT INTO child VALUES (1); }

step s1e  { SELECT lp, t_xmin, t_xmax, to_hex(t_infomask) as mask, 
to_hex(t_infomask2) as mask2 FROM heap_page_items(get_raw_page('parent', 0)); }
step s1e2 { SELECT lp, t_xmax, members.* FROM 
heap_page_items(get_raw_page('parent', 0)), pg_get_multixact_members(t_xmax) 
members where t_infomask  x'1000'::int  0; }

step s1c  { COMMIT; }

session s2
step s2b  { BEGIN; select txid_current(); }
step s2u  { UPDATE parent SET c=lower(c); }
step s2u2 { UPDATE parent SET i=i; }
step s2d  { DELETE FROM parent returning ctid; }
step s2c  { COMMIT; }



permutation s1b s1i s2b s2u s1e s1e2 s2d s1e s1e2 s1c s2c
#permutation s1b s1i s2b s2u2 s2d s1c s2c
#permutation s1b s1i s2b s2d s1c s2c
*** a/src/backend/utils/time/tqual.c
--- b/src/backend/utils/time/tqual.c
***
*** 687,693  HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid,
--- 687,699 
  return HeapTupleMayBeUpdated;
  
  			if (HEAP_XMAX_IS_LOCKED_ONLY(tuple-t_infomask))	/* not deleter */
+ 			{
+ if (tuple-t_infomask  HEAP_XMAX_IS_MULTI)
+ 	return HeapTupleBeingUpdated;
+ else if (!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmax(tuple)))
+ 	return HeapTupleBeingUpdated;
  return HeapTupleMayBeUpdated;
+ 			}
  
  			if (tuple-t_infomask  HEAP_XMAX_IS_MULTI)
  			{
***
*** 700,706  HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid,
  
  /* updating subtransaction must have aborted */
  if (!TransactionIdIsCurrentTransactionId(xmax))
! 	return HeapTupleMayBeUpdated;
  else
  {
  	if (HeapTupleHeaderGetCmax(tuple) = curcid)
--- 706,712 
  
  /* updating subtransaction must have

[HACKERS] planner missing a trick for foreign tables w/OR conditions

Consider a query such as:

SELECT * FROM a, b WHERE (a.x = 42 AND b.y = 43) OR (a.x = 44 AND b.z = 45);

If a and/or b are regular tables, the query planner will cleverly
consider the possibility of using an index on a to filter for rows
with a.x = 42 OR a.x = 44, or of using an index on b to filter for
rows where b.y = 43 OR b.z = 45.  But if they are foreign tables, this
optimization isn't considered, because we don't intrinsically know
anything about what indexes are present on the foreign side.  However,
this optimization could potentially be quite valuable.  In fact, it's
arguably more useful here for regular tables, because even if no index
is present on the foreign side, applying the condition on the remote
side might eliminate enough data transfer overhead to win.  The only
situation in which I can really see it losing is if the simplified
qual ends up eliminating too few rows to cover the remote side's
processing costs; I'm not sure how possible that is, or how to know
whether it might be the case.

To see how this can torpedo performance, run the attached SQL file on
an empty database, and then run these quereis:

explain analyze SELECT other.id, other.title, local.id, local.title
FROM other INNER JOIN local ON other.id = local.id WHERE local.title =
md5(1::text) OR (local.title = md5(3::text) AND other.id = 3);

explain analyze SELECT other.id, other.title, frgn.id, frgn.title FROM
other INNER JOIN frgn ON other.id = frgn.id WHERE frgn.title =
md5(1::text) OR (frgn.title = md5(3::text) AND other.id = 3);

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


rm32176.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.8

On Mon, Dec 16, 2013 at 12:21 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Dec 16, 2013 at 6:01 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
 Perhaps we should just introduce a marker that some such strings are not
 to be translated if they are of the unexpected kind. That would probably
 make debugging easier too ;)

 Well, we have that: it's called elog.  But that doesn't seem like the
 right thing here.

 errmsg_internal?

There's that, too.  But again, these messages are not can't-happen
scenarios.  The argument is just whether to reuse existing error
message text (like could not write file) or invent a new variation
(like could not write remapping file).  Andres' argument (which is
valid) is that distinguished messages make it easier to troubleshoot
without needing to turn on verbose error messages.  My argument (which
I think is also valid) is that a user isn't likely to know what a
remapping file is, and having more messages increases the translation
burden.  Is there a project policy on this topic?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] planner missing a trick for foreign tables w/OR conditions

2013-12-16 Thread David Fetter

On Mon, Dec 16, 2013 at 12:41:43PM -0500, Robert Haas wrote:
 Consider a query such as:
 
 SELECT * FROM a, b WHERE (a.x = 42 AND b.y = 43) OR (a.x = 44 AND b.z = 45);
 
 If a and/or b are regular tables, the query planner will cleverly
 consider the possibility of using an index on a to filter for rows
 with a.x = 42 OR a.x = 44, or of using an index on b to filter for
 rows where b.y = 43 OR b.z = 45.  But if they are foreign tables, this
 optimization isn't considered, because we don't intrinsically know
 anything about what indexes are present on the foreign side.  However,
 this optimization could potentially be quite valuable.  In fact, it's
 arguably more useful here for regular tables, because even if no index
 is present on the foreign side, applying the condition on the remote
 side might eliminate enough data transfer overhead to win.  The only
 situation in which I can really see it losing is if the simplified
 qual ends up eliminating too few rows to cover the remote side's
 processing costs; I'm not sure how possible that is, or how to know
 whether it might be the case.
 
 To see how this can torpedo performance, run the attached SQL file on
 an empty database, and then run these quereis:

+1 for fixing this bug :)

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rewarm status

On Mon, Dec 16, 2013 at 12:02 PM, Jeff Amiel becauseimj...@yahoo.com wrote:
 Trying to follow the threads and other references - but I can't determine 
 where this patch ended up.
 (http://www.postgresql.org/message-id/ca+tgmobrrrxco+t6gcqrw_djw+uf9zedwf9bejnu+rb5teb...@mail.gmail.com)

Well, the patch was rejected, more or less because people felt it
overlapped with pgfincore too much.  I don't particularly agree,
because pgfincore can't load data into shared buffers and doesn't work
on Windows, but other people felt differently.  There was talk of
polishing up pgfincore for possible inclusion in contrib, perhaps
adding this functionality along the way, but AFAIK there's been no
activity on that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rewarm status

On Mon, Dec 16, 2013 at 9:02 AM, Jeff Amiel becauseimj...@yahoo.com wrote:

 Trying to follow the threads and other references - but I can't determine
 where this patch ended up.
 (
 http://www.postgresql.org/message-id/ca+tgmobrrrxco+t6gcqrw_djw+uf9zedwf9bejnu+rb5teb...@mail.gmail.com
 )

 I'm
  trying to experiment with some new hardware - and the functionality to
 add specific tables/indexes into cache ahead of time will benefit me
 greatly.

 I found a page describing how to apply the patch to 9.2.4 (jumping through
 some hoops - http://issues.collectionspace.org/browse/UCJEPS-432) and was
 hoping to get a version to apply to 9.3.X

 Can
  anyone advise me on how I might get this 'applied' to a 9.3.X source
 code base or any other options to denote specific relations that I'd
 like to get directly into shared_buffers?



In my experience the installation in 9.3.X the same way as it does in 9.2.4.

Cheers,

Jeff

Re: [HACKERS] pg_rewarm status

On Mon, Dec 16, 2013 at 10:02 AM, Robert Haas robertmh...@gmail.com wrote:

 On Mon, Dec 16, 2013 at 12:02 PM, Jeff Amiel becauseimj...@yahoo.com
 wrote:
  Trying to follow the threads and other references - but I can't
 determine where this patch ended up.
  (
 http://www.postgresql.org/message-id/ca+tgmobrrrxco+t6gcqrw_djw+uf9zedwf9bejnu+rb5teb...@mail.gmail.com
 )

 Well, the patch was rejected, more or less because people felt it
 overlapped with pgfincore too much.  I don't particularly agree,
 because pgfincore can't load data into shared buffers and doesn't work
 on Windows, but other people felt differently.  There was talk of
 polishing up pgfincore for possible inclusion in contrib, perhaps
 adding this functionality along the way, but AFAIK there's been no
 activity on that.


It wasn't rejected, it was returned with feedback with generally positive
reviews.  I think the main feedback was that it should provide a
single-argument overloaded function that takes just the object name and
applies reasonable defaults for the remaining arguments,  for example
'main', 'buffer',NULL,NULL.   I had thought that the worry about overlap
with pgfincore was mostly resolved favorably, but perhaps I misread the
situation.

I'd like to see it revived for 9.4 if you are willing.

Cheers,

Jeff

Re: [HACKERS] logical changeset generation v6.8

Robert Haas escribió:

 There's that, too.  But again, these messages are not can't-happen
 scenarios.  The argument is just whether to reuse existing error
 message text (like could not write file) or invent a new variation
 (like could not write remapping file).  Andres' argument (which is
 valid) is that distinguished messages make it easier to troubleshoot
 without needing to turn on verbose error messages.  My argument (which
 I think is also valid) is that a user isn't likely to know what a
 remapping file is, and having more messages increases the translation
 burden.  Is there a project policy on this topic?

I would vote for a generic could not write file %s where the %s lets
the troubleshooter know the path of the file, and thus in what context
it is being read.  We already have a similar case where slru.c reports
error as pertaining to transaction 12345 but the path is
pg_subtrans/xyz or multixact etc; while it doesn't explicitely say
what module is raising the error, it's pretty clear from the path.

Would that not work here?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension Templates S03E11

On 12/15/2013 10:47 PM, Jeff Davis wrote:
 The patch offers an alternative to dropping files on the filesystem
 before doing CREATE EXTENSION. Instead, if the extension has no C code,
 you can put it in the catalog using ordinary SQL access, and execute the
 same kind of CREATE EXTENSION. Aside from that, it's pretty much
 identical to existing extensions.

OK, so the idea is that for psql-only extensions (i.e. ones without
.so's) the user shouldn't be required to create a file on disk.  That
seems straightforwards and acceptable.

 Stephen doesn't like the idea that the SQL in an extension is a blob of
 text. 

I, personally, would prefer per-object line-items, but I don't think
that's a deal-breaker.  Having a single text blob does match up with
existing Extension design.

Note for Dimitri, though: I think that having line-item objects in
dependancy order would make this feature vastly more useful for
schema-template maintenance.  Give it some thought.

 There are weird cases, like if you make local modifications to
 objects held in an extension, then dump/reload will lose those local
 modifications.

What does DUMP/Reload do with regular Extensions currently in that case?

 Another issue, which I agree is dubious in many
 situations, is that the version of an extension is not preserved across
 dump/reload (this is actually a feature, which was designed with
 contrib-style extensions in mind, but can be surprising in other
 circumstances).

Well, this should work with a versioning system, in which dump/reload
can load older versions of the extension if they are present, the same
as external Extensions do now.  Is that built in?

 This isn't necessarily a dead-end, but there are a lot of unsettled
 issues, and it will take some soul-searching to answer them. Is an
 extension a blob of text with a version, that's maintained in some
 external repo?

Well, plus potentially binaries and library references, yes.  Although
you could describe all of Postgres as a bunch of text blobs and some
library references, when you get right down to it.

 Is it the job of postgres to ensure that dump/reload
 creates the same situation that you started with, including local
 modifications to objects that are part of an extension?

IMHO: No.  AFAIK, if a user modifies, say, information_schema views in
PostgreSQL, we don't respect that in dump/restore either.

Now, I can see adding to this mechanism a method for tracking such
modifications in a way that pgdump can support them.  But that can
easily be a version 2 feature.

 Should
 everything be an extension, or do we need to invent a new concept for
 some of the use cases? What role to external tools play in all of this?

So, the reason I was confused by this feature -- and the reason Stephen
hates it, I think -- is that I thought it was solving the Extensions
don't follow replication, and they are complicated to install if your OS
doesn't have good packages problem.  It's not, and it never will solve
that issue.

It's solving a completely different problem, to wit:

Some PostgreSQL shops with lots of servers have large internal libraries
of functions, views, and similar code that they've written to support
their applications, which don't comprise a complete database.  This
feature would allow them to package those libraries, and version,
upgrade and track them, without requiring a filesystem-based install.  I
myself have a couple clients who could benefit from this.

I think the name Extension Templates is horrible because it misleads
all of us on this list into thinking the proposed feature is completely
something other than what it is.  I don't have a better name offhand,
but that's got to change before it becomes a feature.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension Templates S03E11

On 12/16/2013 10:53 AM, Josh Berkus wrote:
 Some PostgreSQL shops with lots of servers have large internal libraries
 of functions, views, and similar code that they've written to support
 their applications, which don't comprise a complete database.  This
 feature would allow them to package those libraries, and version,
 upgrade and track them, without requiring a filesystem-based install.  I
 myself have a couple clients who could benefit from this.

cc'ing Jim Nasby, since I think Enova is part of the target market for
this feature.  Jim, have you taken a look at this?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] planner missing a trick for foreign tables w/OR conditions

Robert Haas robertmh...@gmail.com writes:
 Consider a query such as:
 SELECT * FROM a, b WHERE (a.x = 42 AND b.y = 43) OR (a.x = 44 AND b.z = 45);

 If a and/or b are regular tables, the query planner will cleverly
 consider the possibility of using an index on a to filter for rows
 with a.x = 42 OR a.x = 44, or of using an index on b to filter for
 rows where b.y = 43 OR b.z = 45.  But if they are foreign tables, this
 optimization isn't considered, because we don't intrinsically know
 anything about what indexes are present on the foreign side.  However,
 this optimization could potentially be quite valuable.  In fact, it's
 arguably more useful here for regular tables, because even if no index
 is present on the foreign side, applying the condition on the remote
 side might eliminate enough data transfer overhead to win.  The only
 situation in which I can really see it losing is if the simplified
 qual ends up eliminating too few rows to cover the remote side's
 processing costs; I'm not sure how possible that is, or how to know
 whether it might be the case.

 Thoughts?

The problem is that that optimization is a crock; see the comments
for create_or_index_quals().  We can't just turn it loose to CNF-ify
every OR it might find.  The case that we support at the moment is
to CNF-ify whichever single OR condition looks like the best win,
and it's hard to see how to do that without any index knowledge.

In principle, when we're using remote estimates, we could probably
ask the remote server about each possibility ... but that could be
expensive.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.8

Robert Haas robertmh...@gmail.com writes:
 There's that, too.  But again, these messages are not can't-happen
 scenarios.  The argument is just whether to reuse existing error
 message text (like could not write file) or invent a new variation
 (like could not write remapping file).

As long as the message includes the file name, which it surely oughta,
I don't see that we need any explanation of what Postgres thinks the
file is for.  If someone cares about that they can reverse-engineer
it from the file name; while as you said upthread, most of the time
the directory path is going to be the key piece of useful information.

So +1 for could not write file.

 Andres' argument (which is
 valid) is that distinguished messages make it easier to troubleshoot
 without needing to turn on verbose error messages.  My argument (which
 I think is also valid) is that a user isn't likely to know what a
 remapping file is, and having more messages increases the translation
 burden.  Is there a project policy on this topic?

I think Andres' argument is a thinly veiled version of let's put the
routine name into the message text, which there definitely is project
policy against (see 49.3.13 in the message style guide).  If you want to
know the code location where the error was thrown, the answer is to get
a verbose log, not to put identifying information into the user-facing
message text.  And this is only partially-identifying information,
which seems like the worst of both worlds: you've got confused users and
overworked translators, and you still don't know exactly where it was
thrown from.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension Templates S03E11

Josh Berkus j...@agliodbs.com writes:
 So, the reason I was confused by this feature -- and the reason Stephen
 hates it, I think -- is that I thought it was solving the Extensions
 don't follow replication, and they are complicated to install if your OS
 doesn't have good packages problem.  It's not, and it never will solve
 that issue.

 It's solving a completely different problem, to wit:

 Some PostgreSQL shops with lots of servers have large internal libraries
 of functions, views, and similar code that they've written to support
 their applications, which don't comprise a complete database.  This
 feature would allow them to package those libraries, and version,
 upgrade and track them, without requiring a filesystem-based install.  I
 myself have a couple clients who could benefit from this.

Right.  I think a lot of the tension comes from people being unconvinced
that the existing extension feature is an ideal model for this sort of
use-case.  Extensions were mainly designed around the notion of a .so
with some SQL datatype/function/etc declarations that have to match up
with the C code.  So it made sense for them to be relatively static things
that live in the filesystem.  Notably, if you're migrating to a new PG
major version, you're at the very least gonna have to recompile the C code
and quite likely will need to change it some.  So adapting the SQL code
if needed goes along with that, and would never be automatic in any case.

OTOH, for a set of pure-SQL objects, it's not necessary that there be a
canonical text file somewhere, and we have in principle complete knowledge
of the objects' semantics as well as the ability to dump-and-restore into
newer PG versions.  So it's not at all clear that we should just adopt the
existing model with the smallest possible changes --- which AFAICS is
basically what this proposal is.  Maybe that's the way to go, but we
should consider alternatives, and in particular I think there is much
more reason to allow inside-the-database mutation of the SQL objects.

 I think the name Extension Templates is horrible because it misleads
 all of us on this list into thinking the proposed feature is completely
 something other than what it is.  I don't have a better name offhand,
 but that's got to change before it becomes a feature.

Given your previous para, I wonder if library or package would work
better.  I agree that template isn't le mot juste.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] row security roadmap proposal

2013-12-16 Thread Gregory Smith


On 12/16/13 9:36 AM, Craig Ringer wrote:

- Finish and commit updatable security barrier views. I've still got a
lot of straightening out to do there.
I don't follow why you've put this part first.  It has a lot of new 
development and the risks that go along with that, but the POC projects 
I've been testing are more interested in the view side issues.



- Decide on and implement a structure for row-security functionality its
self. I'm persuaded by Robert's comments here, that whatever we expose
must be significantly more usable than a DIY on top of views (with the
fix for updatable security barrier views to make that possible).


Can't say i agree we should be getting tangled into making this slick on 
the user side right now.  If you open a new can of worms like 
heirachical access labels, this goes back into basic design, and we'll 
spend months on that before returning to exactly the same implementation 
details.  I tried to make a case for how having a really generic 
mechanism that's doesn't presume to know how labels will be assigned can 
be a good thing.


Given the state this is all in right now, I'd much rather publish a hard 
to use but powerful API than to presume you're going to get an easier to 
use design right.  The place I'm at here is trying to figure out the 
simplest useful thing that could be committed and then hammer on the 
details.  (Not the first time I've beat that drum on a feature)  Your 
roadmap goes way past that, which is great to avoid being painted into a 
corner, but I'm thinking more about what a useful feature freeze point 
would look like given it's December 16 now.


--
Greg Smith greg.sm...@crunchydatasolutions.com
Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3 reference constraint regression

Alvaro Herrera wrote:

 This POC patch changes the two places in HeapTupleSatisfiesUpdate that
 need to be touched for this to work.  This is probably too simplistic,
 in that I make the involved cases return HeapTupleBeingUpdated without
 checking that there actually are remote lockers, which is the case of
 concern.  I'm not yet sure if this is the final form of the fix, or
 instead we should expand the Multi (in the cases where there is a multi)
 and verify whether any lockers are transactions other than the current
 one.  As is, nothing seems to break, but I think that's probably just
 chance and should not be relied upon.

After playing with this, I think the reason this seems to work without
fail is that all callers of HeapTupleSatisfiesUpdate are already
prepared to deal with the case where HeapTupleBeingUpdated is returned
but there is no actual transaction that would block the operation.
So I think the proposed patch is okay, barring a few more comments.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

BTW, there are a a couple of spec files floating around which perhaps we
should consider getting into the source repo (in some cleaned up form).
Noah published one, Andres shared a couple more with me, and I think I
have two more.  They can't be made to work in normal circumstances,
because they depend on concurrent server activity.  But perhaps we
should add them anyway and perhaps list them in a separate schedule
file, so that any developer interested in messing with this stuff has
them readily available for testing.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
setup {
CREATE EXTENSION pageinspect;
CREATE TABLE freezetab (
a int
) WITH OIDS;
INSERT INTO freezetab VALUES (1);
}

teardown {
DROP EXTENSION pageinspect;
}

session s1
step s1b  { BEGIN; }
step s1s  { SELECT txid_current(), * FROM freezetab FOR KEY 
SHARE; }
step s1c  { COMMIT; }
step s1e  { SELECT lp, t_xmin, t_xmax, to_hex(t_infomask) as 
mask, to_hex(t_infomask2) as mask2 FROM 
heap_page_items(get_raw_page('freezetab', 0)); }
step s1e2 { SELECT lp, t_xmax, members.* FROM 
heap_page_items(get_raw_page('freezetab', 0)), pg_get_multixact_members(t_xmax) 
members; }

session s2
step s2b  { BEGIN; }
step s2s  { SELECT txid_current(), * FROM freezetab FOR KEY 
SHARE; }
step s2c  { COMMIT; }

session s3
step s3b  { BEGIN; }
step s3u  { UPDATE freezetab SET a = 2; }
step s3c  { COMMIT; }

session s4
step s4f  { VACUUM FREEZE freezetab; }


permutation s1b s1s s2b s2s s3b s3u s1c s1e s4f s1e s1e2 
s2c s3c

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3 reference constraint regression

On 2013-12-16 17:43:37 -0300, Alvaro Herrera wrote:
 Alvaro Herrera wrote:
 
  This POC patch changes the two places in HeapTupleSatisfiesUpdate that
  need to be touched for this to work.  This is probably too simplistic,
  in that I make the involved cases return HeapTupleBeingUpdated without
  checking that there actually are remote lockers, which is the case of
  concern.  I'm not yet sure if this is the final form of the fix, or
  instead we should expand the Multi (in the cases where there is a multi)
  and verify whether any lockers are transactions other than the current
  one.  As is, nothing seems to break, but I think that's probably just
  chance and should not be relied upon.
 
 After playing with this, I think the reason this seems to work without
 fail is that all callers of HeapTupleSatisfiesUpdate are already
 prepared to deal with the case where HeapTupleBeingUpdated is returned
 but there is no actual transaction that would block the operation.
 So I think the proposed patch is okay, barring a few more comments.

Are you sure? the various wait/nowait cases don't seem to handle that
correctly.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] planner missing a trick for foreign tables w/OR conditions

On Mon, Dec 16, 2013 at 2:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The problem is that that optimization is a crock; see the comments
 for create_or_index_quals().  We can't just turn it loose to CNF-ify
 every OR it might find.  The case that we support at the moment is
 to CNF-ify whichever single OR condition looks like the best win,
 and it's hard to see how to do that without any index knowledge.

 In principle, when we're using remote estimates, we could probably
 ask the remote server about each possibility ... but that could be
 expensive.

Could we get by without actually converting to CNF?

Suppose we define JOINQUAL-STRIP(P, B), where P is a join clause and B
is a truth value.  Something like this:

JOINQUAL-STRIP(NOT Q, B) = NOT (JOINQUAL-STRIP(Q, NOT B))
JOINQUAL-STRIP(Q1 AND Q2 AND ... AND Qn, B) = the conjunction of
JOINQUAL-STRIP(Qi, B) for all i
JOINQUAL-STRIP(Q1 OR Q2 OR ... OR Qn, B) = the disjunction of
JOINQUAL-STRIP(Qi, B) for all i

For any join clause not of one of the above forms, JOINQUAL-STRIP(P,
B) is equal to P if there are no Vars in P referring to any table
other than the one for which we're constructing baserestrictinfo, and
to B otherwise.

Given this definition, we can take each join clause P and apply
JOINQUAL-STRIP(P, true) to it.  If the result is true, forget it.  If
the result is anything else, it's a simplified version of the original
AND/OR/NOT tree that we can apply on the remote side (if it's
pushdown-safe) to pre-filter the results.  In plain English, we walk
down through AND, OR, and NOT nodes and inspect what's underneath.
Whenever we find something that references only the remote table under
consideration, we keep it as is.  If we find something that touches
any other table, we assume it's true unless we're beneath an odd
number of negations, in which case we assume it's false.

e.g. if we start with (L1 AND R1) OR NOT(L2 OR R2), where the Li
reference local vars and the Ri only remote vars, we get (true AND R1)
OR NOT(false OR R2); after some trivial simplification, this reduces
to R1 OR NOT R2, which is indeed a suitable pre-filter.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rewarm status

On Mon, Dec 16, 2013 at 1:34 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Mon, Dec 16, 2013 at 10:02 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Dec 16, 2013 at 12:02 PM, Jeff Amiel becauseimj...@yahoo.com
 wrote:
  Trying to follow the threads and other references - but I can't
  determine where this patch ended up.
 
  (http://www.postgresql.org/message-id/ca+tgmobrrrxco+t6gcqrw_djw+uf9zedwf9bejnu+rb5teb...@mail.gmail.com)

 Well, the patch was rejected, more or less because people felt it
 overlapped with pgfincore too much.  I don't particularly agree,
 because pgfincore can't load data into shared buffers and doesn't work
 on Windows, but other people felt differently.  There was talk of
 polishing up pgfincore for possible inclusion in contrib, perhaps
 adding this functionality along the way, but AFAIK there's been no
 activity on that.

 It wasn't rejected, it was returned with feedback with generally positive
 reviews.  I think the main feedback was that it should provide a
 single-argument overloaded function that takes just the object name and
 applies reasonable defaults for the remaining arguments,  for example
 'main', 'buffer',NULL,NULL.   I had thought that the worry about overlap
 with pgfincore was mostly resolved favorably, but perhaps I misread the
 situation.

 I'd like to see it revived for 9.4 if you are willing.

I don't mind rebasing the patch and tweaking the API if there's real
support for including this in contrib, but my recollection of the
previous discussions is less positive than yours.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ANALYZE sampling is too good

On Mon, Dec 9, 2013 at 3:14 PM, Heikki Linnakangas
hlinnakan...@vmware.comwrote:



  I took a stab at using posix_fadvise() in ANALYZE. It turned out to be
 very easy, patch attached. Your mileage may vary, but I'm seeing a nice
 gain from this on my laptop. Taking a 3 page sample of a table with
 717717 pages (ie. slightly larger than RAM), ANALYZE takes about 6 seconds
 without the patch, and less than a second with the patch, with
 effective_io_concurrency=10. If anyone with a good test data set loaded
 would like to test this and post some numbers, that would be great.


Performance is often chaotic near transition points, so I try to avoid data
sets that are slightly bigger or slightly smaller than RAM (or some other
limit).

Do you know how many io channels your SSD has (or whatever the term of art
is for SSD drives)?

On a RAID with 12 spindles, analyzing pgbench_accounts at scale 1000 (13GB)
with 4 GB of RAM goes from ~106 seconds to ~19 seconds.

However, I'm not sure what problem we want to solve here.  I certainly
would not wish to give a background maintenance process permission to
confiscate my entire RAID throughput for its own operation.  Perhaps this
could only be active for explicit analyze, and only if vacuum_cost_delay=0?

Perhaps there should be something like alter background role autovac set
  Otherwise we are going to end up with an autovacuum_* shadow
parameter for many of our parameters, see autovacuum_work_mem discussions.

Cheers,

Jeff

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

On Tue, Dec 17, 2013 at 1:18 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Once again: this patch has no business changing any user-visible behavior.
 That would include not changing the number of evaluations of volatile
 functions.  The planner is full of places where optimizations are disabled
 for volatile subexpressions, and I don't see why this should be different.


My point was meant to be more along the lines of that I thought it was
already broken and it perhaps should be fixed or at the very least we could
warn the users about it.
I would imagine that most of those other places in the planner are to
prevent extra evaluations of volatile functions? In this particular case
we're already evaluating these multiple extra times when a tuple moves of
the top of the frame. I would have thought that we should only evaluate the
volatile function once per tuple. This is not what the current
implementation does.

I don't have an issue skipping this optimisation when the aggregate's
expression contain any volatile functions. I just wanted to raise my
concerns about the current behaviour, which I find a bit bizarre.

Regards

David Rowley


 regards, tom lane

Re: [HACKERS] GIN improvements part 1: additional information

2013-12-16 Thread Alexander Korotkov

On Mon, Dec 16, 2013 at 3:30 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 12/12/2013 06:44 PM, Alexander Korotkov wrote:

 I've thought about different algorithms little more. General problem I see
 is online update. We need it while it is typically not covered by
 researches at all. We already have to invent small index in the end of
 page. Different encoding methods adds more challenges. In general, methods
 can be classified in two groups:
 1) Values aren't aligned by bytes (gamma-codes, PFOR etc.)
 2) Multiple values are packed together in small group (simple-9,
 simple-18)


 Ok.


  For the first group of methods when inserting in the middle of the page we
 would have to do not byte-aligned shift of right part of values. I don't
 know how expensive is this shift but I expect that it would be much slower
 than memmove.


 Agreed.


  When values are packed into small groups, we have to either insert
 inefficiently encoded value or re-encode whole right part of values.


 It would probably be simplest to store newly inserted items uncompressed,
 in a separate area in the page. For example, grow the list of uncompressed
 items downwards from pg_upper, and the compressed items upwards from
 pg_lower. When the page fills up, re-encode the whole page.


Good idea. But:
1) We'll still need item indexes in the end of page for fast scan.
2) Storage would be easily extendable to hold additional information as
well.
Better compression shouldn't block more serious improvements.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] GIN improvements part 1: additional information


On 12/17/2013 12:22 AM, Alexander Korotkov wrote:

On Mon, Dec 16, 2013 at 3:30 PM, Heikki Linnakangas hlinnakan...@vmware.com

wrote:



On 12/12/2013 06:44 PM, Alexander Korotkov wrote:

  When values are packed into small groups, we have to either insert

inefficiently encoded value or re-encode whole right part of values.


It would probably be simplest to store newly inserted items uncompressed,
in a separate area in the page. For example, grow the list of uncompressed
items downwards from pg_upper, and the compressed items upwards from
pg_lower. When the page fills up, re-encode the whole page.


I hacked together an implementation of a variant of Simple9, to see how 
it performs. Insertions are handled per the above scheme.


In a limited pg_trgm test case I've been using a lot for this, this 
reduces the index size about 20%, compared to varbyte encoding. It might 
be possible to squeeze it a bit more, I handcrafted the selectors in 
the encoding algorithm to suite our needs, but I don't actually have a 
good idea of how to choose them optimally. Also, the encoding can encode 
0 values, but we never need to do that, so you could take advantage of 
that to pack items tighter.


Compression and decompression speed seems to be about the same.

Patch attached if you want to play with it. WAL replay is still broken, 
and there are probably bugs.



Good idea. But:
1) We'll still need item indexes in the end of page for fast scan.


Sure.


2) Storage would be easily extendable to hold additional information as
well.
Better compression shouldn't block more serious improvements.


I'm not sure I agree with that. For all the cases where you don't care 
about additional information - which covers all existing users for 
example - reducing disk size is pretty important. How are you planning 
to store the additional information, and how does using another encoding 
gets in the way of that?


- Heikki


gin-packed-postinglists-simple9-1.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Cost estimation problem on seq scan in a loop

While doing ad hoc queries I've seen several different problems that all
seem to be variations on a theme.

The plan comes out looking like this:

 Nested Loop  (cost=826867.95..877038.04 rows=1 width=125)
   Join Filter: (foo.bar = smallish_table.bar)
   -  Something Complicated  (cost=826867.95..876800.28 rows=1 width=81)
.
   -  Seq Scan on smallish_table  (cost=0.00..142.89 rows=7389 width=44)


The estimate of rows=1 for Something Complicated is wrong and you really
get 1000 or 100,000 rows.  Meaning the seq scan on smallish_table gets
iterated a lot, and the time really adds up.

It would be great if Something Complicated had the correct row estimate,
but since I've seen this situation arise with a lot of different Something
Complicated that don't have much to do with each other (although usually an
antijoin of some kind is involved) , there is little reason to think we can
squash every one of them.

Is there some principled way to go about teaching the planner that hashing
smallish_table on the join filter key is a cheap insurance policy against
underestimating the row count of the outer loop?

Cheers,

Jeff

Re: [HACKERS] planner missing a trick for foreign tables w/OR conditions

Robert Haas robertmh...@gmail.com writes:
 On Mon, Dec 16, 2013 at 2:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The problem is that that optimization is a crock; see the comments
 for create_or_index_quals().  We can't just turn it loose to CNF-ify
 every OR it might find.  The case that we support at the moment is
 to CNF-ify whichever single OR condition looks like the best win,
 and it's hard to see how to do that without any index knowledge.

 Could we get by without actually converting to CNF?

The hard part is not extracting the partial qual.  The hard part is
trying to make sure that adding this entirely-redundant scan qual doesn't
catastrophically degrade join size estimates.  The hack of making an
inverse adjustment to the original OR clause's selectivity works, more or
less, for a single join OR condition.  I don't think it works if there's
several modified OR conditions (possibly covering different sets of
relations).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cost estimation problem on seq scan in a loop

2013-12-16 Thread Greg Stark

On Mon, Dec 16, 2013 at 11:41 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 Is there some principled way to go about teaching the planner that hashing
 smallish_table on the join filter key is a cheap insurance policy against
 underestimating the row count of the outer loop?

The problem is that cheap protection can end up being very expensive
when it's the most expensive part of the query and it's repeated many
times. In this query it's expecting thousands of rows. The executor
goes to some effort to avoid having to do unnecessary copies of data
and be able to use it straight out of the disk buffers so having to
copy it an unnecessary time to a hash table would be annoying.

What's more likely, I think is having plan nodes that make decisions
at run-time. There's been some movement in this direction already and
lots of discussion about it. Having a join type that retrieves the
first few rows from the lhs and then decides whether to do a hash or
nested loop on the rhs based on how many it finds might be more
tractable than most other strategies.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Why no INSTEAD OF triggers on tables?

Hackers,

I've looked in the archives, but I can't find a reason why INSTEAD OF
triggers were never enabled for tables.  I'm interested in them in order
to return a rowcount to JDBC for INSERTs into partitioned tables.

Was there a technical obstacle, or is this just a TUIT issue?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why no INSTEAD OF triggers on tables?

Josh Berkus j...@agliodbs.com writes:
 I've looked in the archives, but I can't find a reason why INSTEAD OF
 triggers were never enabled for tables.

What would that mean exactly?  And how would you do the actual update
when it came time to?

 Was there a technical obstacle, or is this just a TUIT issue?

It sounds more like a requestor-hasnt-thought-it-through issue.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: variant of regclass

2013-12-16 Thread Tatsuo Ishii

 static Datum regclass_gut(char *class_name_or_oid, bool raiseError);
 static List *stringToQualifiedNameList_gut(const char *string, bool 
 raiseError);
 
 Please spell that as guts not gut.

Thanks. I see.

 regclass_gut is called from regclassin and toregclass and do the most
 job before regclassin did. raiseError flag controls whether an error
 is raised or not when an invalid argument (for example non existent
 relation) is given. For this purpose, regclass_gut wraps the call to
 oidin using a PG_TRY block.
 
 I do not think that use of PG_TRY is either necessary or acceptable
 --- for example, it will happily trap and discard query-cancel errors,
 as well as other errors that are not necessarily safe to ignore.
 If you don't want to risk an error on an invalid numeric string,
 how about parsing the integer yourself with sscanf?

Fair enough. I will remove the part.

 More generally, though, I don't see a great need to try to promise
 that this function *never* throws any errors, and so I'm also suspicious
 of the hacking you've done on stringToQualifiedNameList.  I'm even
 less happy about the idea that this patch might start reaching into
 things like makeRangeVarFromNameList.  I think it's sufficient if it
 doesn't throw an error on name-not-found; you don't have to try to
 prevent weird syntax problems from throwing errors.

For the pgpool-II use case, I'm happy to follow you because pgpool-II
always does a grammatical check (using raw parser) on a query first
and such syntax problem will be pointed out, thus never reaches to
the state where calling toregclass.

One concern is, other users expect toregclass to detect such syntax
problems. Anybody?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension Templates S03E11

2013-12-16 Thread Jim Nasby

On 12/16/13, 1:00 PM, Josh Berkus wrote:

On 12/16/2013 10:53 AM, Josh Berkus wrote:

Some PostgreSQL shops with lots of servers have large internal libraries
of functions, views, and similar code that they've written to support
their applications, which don't comprise a complete database. This
feature would allow them to package those libraries, and version,
upgrade and track them, without requiring a filesystem-based install. I
myself have a couple clients who could benefit from this.

cc'ing Jim Nasby, since I think Enova is part of the target market for
this feature. Jim, have you taken a look at this?

The name rings a bell; I think I looked at it in the past.

I've read all of this thread that I've currently got (back to 12/11), so I
think I've got some idea what's going on.

Enova definitely has libraries of objects, and in fact we're currently working
on releasing them via PGXN. That's proving a bit challenging since now we have
to find a way to incorporate PGXN into our existing deployment framework (I do
NOT want users to have to manually run pgxn client commands). Another
complication is that we don't want our production servers downloading random,
un-audited code, so we need an internal PGXN mirror.

I think it's probably best if I describe the issues that we've run across, to
help the rest of the community understand the pain points. I'll work on doing
that.

In the meantime, I can say this:

- Being forced to put files down for extensions is a PITA
- We don't have a good way to deal with extensions that have been installed in
a non-standard schema, other than search_path, which for a complex database is
impractical
- There's a lot that could potentially be done without any external libraries
(we've got the equivalent of probably 6-8 modules, none of which require C and
only one uses a Perl module (which is part of the module itself; the only
reason for the .pm is to properly factor the code between plperl functions)
- We definitely need a mechanism for declaring deps between modules

Somewhat related to this, I really wish Postgres had the idea of a class, that was allowed to
contain any type of object and could be instantiated when needed. For example, if we had an
address class, we could instantiate it once for tracking our customer addresses, and a second
time for tracking the addresses customers supply for their employers. Such a mechanism would probably be
ideal for what we need, but of course you'd still have the question of how to load a class definition that
someone else has published.
--
Jim C. Nasby, Data Architect j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Race condition in b-tree page deletion

2013-12-16 Thread Kevin Grittner

Heikki Linnakangas hlinnakan...@vmware.com wrote:

 Here's a patch implementing this scheme.

I've read through the papers on the btree technique, read the
thread, and have been reviewing the patch.  It will take me another
day or two to complete a close review and testing, but I wanted to
give preliminary results, and just let people know that I am
working on it.

After reading through the papers and the README for btree, I have
to say that the approach to deletion seemed like the weak link. 
Searches, scans, and insertions all were pretty cohesive.  The
logic was all based on the fact that missing *upper* levels was OK
because the sibling pointers would be automatically used to search
until the correct page was found at each level.  The old deletion
approach turned this on its head by having missing pages at the
bottom, which introduces whole new classes of problems which
otherwise don't exist.  This patch makes interim states in the
deletion process look almost exactly the same as interim states in
the insertion process, which I think is A Very Good Thing.

The approach to limiting the number of locks to a finite (and
small) number is clever, but seems quite sound.  The locks are
always taken in an order which cannot cause a deadlock.

The write-up lacks any explicit mention of what happens if the
branch being considered for deletion reaches all the way to the
root.  Although an astute reader will notice that since root page
will always be the only page at its level, it must always be the
rightmost page at its level, and therefore is correctly handled by
the logic dealing with that, I think it merits an explicit mention
in the README.

I agree with others that this patch is not suitable for
back-patching.  I wonder whether some other solution might be worth
back-patching, since it is clearly a bug.  The never delete
internal pages might be safe enough to consider.  It would lead to
some degree of bloat, but perhaps bloat is better than errors.  Of
course, the argument can certainly be made that people hit the bug
so rarely that we're better off just leaving it alone in stable
branches.  Anyway, that would be a different patch.

More on *this* patch in a day or two.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why no INSTEAD OF triggers on tables?

On 12/16/2013 04:22 PM, Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
 I've looked in the archives, but I can't find a reason why INSTEAD OF
 triggers were never enabled for tables.
 
 What would that mean exactly?  And how would you do the actual update
 when it came time to?

Well, I'm specifically thinking of master partition tables.  In that
case, we really want an INSTEAD OF trigger.

It seems a little silly that I need to create a separate view, and then
an INSTEAD OF trigger on the view, in order to get a rows-updated count
back from an INSERT which hits a partitioned table.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] dpkg-buildpackage fails on 9.2.6 on ubuntu 12.04.3 LTS

2013-12-16 Thread imagene...@gmail.com

Preface on the venue: It is specified in the description of mailing
lists that pgsql-general is for all topics except those relating to
failed compiles. However doing a search for the word compile fails to
yield any other mailing list for said compilation failures. Presumably
this is the said mailing list for compilation failures.

The compilation appears to fail on initdb, specifying that the initdb
as expected must be run by the user that runs the daemon, postgres,
when executing:

dpkg-buildpackage

Executing:

sudo -u postgres dpkg-buildpackage

Does not work. How does one build a debian package from source?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Race condition in b-tree page deletion

2013-12-16 Thread Jim Nasby


On 11/9/13, 10:02 AM, Heikki Linnakangas wrote:

3. Another approach would be to get rid of the can't delete rightmost child 
limitation. We currently have that limitation because it ensures that we never need to 
change the high-key of a page. If we delete a page that is the rightmost child of its 
parent, we transfer the deleted keyspace from the parent page to its right sibling. To do 
that, we need to update the high key of the parent, as well as the downlink of the right 
sibling at the grandparent level. That's a bit complicated, because updating the high key 
might require splitting the page.


Is the rightmost child issue likely to affect indexes on increasing values, 
like a queue table? Because we get significant bloat on our indexes, especially 
ones that are on things like timestamps in a queue table (FIFO queue, not a 
LIFO stack):

INFO:  index page_hits_pkey now contains 6083 row versions in 9363 pages

INFO:  vacuuming cnu_stats.page_hits_queue
INFO:  scanned index page_hits_pkey to remove 4329 row versions
DETAIL:  CPU 0.07s/0.02u sec elapsed 0.60 sec.
INFO:  page_hits_queue: removed 4329 row versions in 436 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  index page_hits_pkey now contains 7891 row versions in 9363 pages
DETAIL:  4329 index row versions were removed.
9338 index pages have been deleted, 9227 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  page_hits_queue: found 4329 removable, 7891 nonremovable row versions 
in 548 out of 548 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 10210 unused item pointers.
0 pages are entirely empty.
CPU 0.08s/0.02u sec elapsed 0.62 sec.
INFO:  vacuuming pg_toast.pg_toast_25287
INFO:  index pg_toast_25287_index now contains 0 row versions in 2 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  pg_toast_25287: found 0 removable, 0 nonremovable row versions in 0 
out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM

--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dpkg-buildpackage fails on 9.2.6 on ubuntu 12.04.3 LTS

2013-12-16 Thread imagene...@gmail.com

http://pastebin.com/GP7b4FPT

tail of the failure.

Running:

sudo -u postgres dpkg-buildpackage -rsudo

Ends in the same failure.

On Mon, Dec 16, 2013 at 9:50 PM, imagene...@gmail.com
imagene...@gmail.com wrote:
 Preface on the venue: It is specified in the description of mailing
 lists that pgsql-general is for all topics except those relating to
 failed compiles. However doing a search for the word compile fails to
 yield any other mailing list for said compilation failures. Presumably
 this is the said mailing list for compilation failures.

 The compilation appears to fail on initdb, specifying that the initdb
 as expected must be run by the user that runs the daemon, postgres,
 when executing:

 dpkg-buildpackage

 Executing:

 sudo -u postgres dpkg-buildpackage

 Does not work. How does one build a debian package from source?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] planner missing a trick for foreign tables w/OR conditions

On Mon, Dec 16, 2013 at 6:59 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Dec 16, 2013 at 2:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The problem is that that optimization is a crock; see the comments
 for create_or_index_quals().  We can't just turn it loose to CNF-ify
 every OR it might find.  The case that we support at the moment is
 to CNF-ify whichever single OR condition looks like the best win,
 and it's hard to see how to do that without any index knowledge.

 Could we get by without actually converting to CNF?

 The hard part is not extracting the partial qual.  The hard part is
 trying to make sure that adding this entirely-redundant scan qual doesn't
 catastrophically degrade join size estimates.

OK, I had a feeling that's where the problem was likely to be.  Do you
have any thoughts about a more principled way of solving this problem?

I mean, off-hand, it's not clear to me that the comments about this
being a MAJOR HACK aren't overstated.  I mean, if we expect a join
qual for {X Y} to have a given selectivity, but then we pre-filter X
with a clause that is deliberately redundant with that join qual, then
the join qual does indeed become less selective when view as applying
to the surviving rows.  This does not strike me as very much different
from the oft-encountered problem of estimating selectivity for a = 1
AND b = 1, where a and b are correlated.  Whichever qual we apply
first changes the data distribution such that the selectivity of the
second qual is not the same as it would have been when applied to the
entirety of the data.  Estimating it that way would not be a hack; it
would be reality.

Now, it might be true that frobbing what is intended as a cache based
on the knowledge that the cache will never be flushed is a hack.

 The hack of making an
 inverse adjustment to the original OR clause's selectivity works, more or
 less, for a single join OR condition.  I don't think it works if there's
 several modified OR conditions (possibly covering different sets of
 relations).

I might be missing something, but I suspect it works fine if every
path for the relation is generating the same rows.  The partial qual
is definitely going to be applied before the join qual, so the join
qual will surely be hitting only data that's been pre-filtered by the
partial qual, and so its selectivity will be correspondingly more.
Where it seems to me that you'd run in to trouble is if we created one
path that doesn't bother enforcing the partial qual (relying on the
fact that it will be applied post-join) and another path that does.
Now you've got the same selectivity estimate for the join in two cases
where the incoming data distribution is significantly different, which
is bad.

Do you see another danger?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why no INSTEAD OF triggers on tables?

On Mon, Dec 16, 2013 at 9:16 PM, Josh Berkus j...@agliodbs.com wrote:
 On 12/16/2013 04:22 PM, Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
 I've looked in the archives, but I can't find a reason why INSTEAD OF
 triggers were never enabled for tables.

 What would that mean exactly?  And how would you do the actual update
 when it came time to?

 Well, I'm specifically thinking of master partition tables.  In that
 case, we really want an INSTEAD OF trigger.

/me scratches head.

So, put a BEFORE trigger, and make it return NULL.  Same effect,
different notation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changeset Extraction Interfaces

On Sat, Dec 14, 2013 at 12:37 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-12-14 11:50:00 -0500, Robert Haas wrote:
 On Fri, Dec 13, 2013 at 9:14 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  But can you imagine those users needing an exported snapshot? I can
  think of several short-lived usages, but all of those are unlikely to
  need a consistent view of the overall database. And those are less
  likely to be full blown replication solutions.
  I.e. it's not the DBA making that decision but the developer making the
  decision based on whether he requires the snapshot or not.

 Well, it still seems to me that the right way to think about this is
 that the change stream begins at a certain point, and then once you
 cross a certain threshold (all transactions in progress at that time
 have ended) any subsequent snapshot is a possible point from which to
 roll forward.

 Unfortunately it's not possible to build exportable snapshots at any
 time - it requires keeping far more state around since we need to care
 about all transactions, not just transactions touching the
 catalog. Currently you can only export the snapshot in the one point we
 become consistent, after that we stop maintaining that state.

I don't get it.  Once all the old transactions are gone, I don't see
why you need any state at all to build an exportable snapshot.  Just
take a snapshot.

 1. Slots.  We know we need physical slots as well as logical slots,
 but the patch as currently constituted only offers logical slots.

 Well, then tell me the way you want to go forward on that end. I can
 make the slot interface more generic if we know exactly what we need,
 but I doesn't seem fair to take this patch hostage until I develop a
 separate not so small feature. Why is that my task?
 Because I think it's important, and because by now I know the related
 code pretty well by now, I am willing to provide the parts of the that
 prevent required WAL from being deleted, peg xmin and report the current
 state to SQL, but somebody else is going to have to the rest.

The part that you're expressing willingness to do sounds entirely
satisfactory to me.  As I mentioned on the other thread, I'm perhaps
even willing to punt that feature entirely provided that we have a
clear design for how to add it later, but I think it'd be nicer to get
it done now.

And just for the record, I think the idea that I am holding this patch
hostage is absurd.  I have devoted a large amount of time and energy
to moving this forward and plan to devote more.  Because of that work,
big chunks of what is needed here are already committed.  If my secret
plan is to make it as difficult as possible for you to get this
committed, I'm playing a deep game.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] commit fest 2013-11 final report

2013-12-16 Thread Peter Eisentraut

The commit fest is closed.  Final score:

Needs Review: 8, Ready for Committer: 13, Committed: 36, Returned with
Feedback: 42, Rejected: 9. Total: 108.


All patches that were in Waiting on Author or Needs Review state and
have received a decent review have been moved to Returned with Feedback
state.  Authors are encouraged to review the feedback and provide an
updated patch for the next commit fest.  Of course, you can also work
with the reviewers and committers to arrive at a resolution before then.

Committers should now try to get the Ready for Committer queue cleared.

That leaves 8 patches that haven't been reviewed.  If you take a look,
you can imagine why: They're the most complicated and obscure topics,
combined with reviewers who signed up but didn't get to do a review,
plus perhaps authors who were too busy to follow up aggressively.  I'll
leave those open in case someone still wants to take a look.  Authors
should move those patches forward to the next commit fest if they want.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rewarm status

2013-12-16 Thread Amit Kapila

On Tue, Dec 17, 2013 at 12:04 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Mon, Dec 16, 2013 at 10:02 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Dec 16, 2013 at 12:02 PM, Jeff Amiel becauseimj...@yahoo.com
 wrote:

 Well, the patch was rejected, more or less because people felt it
 overlapped with pgfincore too much.  I don't particularly agree,
 because pgfincore can't load data into shared buffers and doesn't work
 on Windows, but other people felt differently.

As far as I can see, it doesn't have the PREWARM_READ mode where
specified pages can be read, also another thing is that as pg_prewarm
uses internal API's, it has certain other advantage as well like for
before PREFETCH, it can ensure whether the block is already in buffer
cache.


There was talk of
 polishing up pgfincore for possible inclusion in contrib, perhaps
 adding this functionality along the way, but AFAIK there's been no
 activity on that.


 It wasn't rejected, it was returned with feedback with generally positive
 reviews.  I think the main feedback was that it should provide a
 single-argument overloaded function that takes just the object name and
 applies reasonable defaults for the remaining arguments,  for example
 'main', 'buffer',NULL,NULL.   I had thought that the worry about overlap
 with pgfincore was mostly resolved favorably, but perhaps I misread the
 situation.

 I'd like to see it revived for 9.4 if you are willing.

I have used pg_prewarm during some of work related to Buffer Management and
other performance related work. It is quite useful utility.
+1 for reviving this patch for 9.4

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dpkg-buildpackage fails on 9.2.6 on ubuntu 12.04.3 LTS

2013-12-16 Thread imagene...@gmail.com

This is resolved by running dpkg-buildpackage with postgres and
dpkg-buildpackage -rsudo with the postgresql source folder in a
directory owned by postgres as dpkg creates a temporary folder in said
parent directory.

However, I am now running into the following issue:

The reason I am compiling is because I must change #define NAMEDATALEN
64 to a larger value. Setting it to 256 has presumably yielded the
following errors in the make check:

It fails on the name and enum tests.

I am rerunning the compilation to confirm this is a result of this
change. Please specify how to remove the regression check from the
dpkg build or how to resolve this (is there a max value that will not
fail?) if this is caused by this change or by something else on the
specified platform.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-12-16 Thread Amit Kapila

On Thu, Dec 12, 2013 at 12:24 PM, Haribabu kommi
haribabu.ko...@huawei.com wrote:
 On 06 December 2013 11:57 Amit Kapila wrote:

 A simplified test and updated patch by taking care the above comment are 
 attached in the mail.
 I am not able to reduce the test duration but changed as the test 
 automatically exists after 45 mins run.
 Please check vacuum_test.sh file more details for running the test.

   Auto vacuum count   Bloat size
 Master   15220MB
 Patched_nkeep18213MB


I ran the attached test and the numbers are as below:
Auto vacuum count   Bloat size
 Master 16222MB
 Patched_nkeep 23216MB

Here by Bloat size, it means the table_size after the test finished and
by Auto vacuum count, it means number of times Auto Vacuum is
triggered during test run.

It clearly shows that by setting number of dead tuples at end of Vacuum improves
the situation.
I am marking this (based on patch vacuum_fix_v7_nkeep.patch) as Ready
For Committer.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

On Tue, Dec 17, 2013 at 11:06 AM, David Rowley dgrowle...@gmail.com wrote:

 On Tue, Dec 17, 2013 at 1:18 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Once again: this patch has no business changing any user-visible behavior.
 That would include not changing the number of evaluations of volatile
 functions.  The planner is full of places where optimizations are disabled
 for volatile subexpressions, and I don't see why this should be different.


 My point was meant to be more along the lines of that I thought it was
 already broken and it perhaps should be fixed or at the very least we could
 warn the users about it.
 I would imagine that most of those other places in the planner are to
 prevent extra evaluations of volatile functions? In this particular case
 we're already evaluating these multiple extra times when a tuple moves of
 the top of the frame. I would have thought that we should only evaluate the
 volatile function once per tuple. This is not what the current
 implementation does.

 I don't have an issue skipping this optimisation when the aggregate's
 expression contain any volatile functions. I just wanted to raise my
 concerns about the current behaviour, which I find a bit bizarre.



To improve on the example I used to try and get my point across:

These were all run on an unpatched copy of HEAD. Ignore the actual results
from sum() and look at what currval() is set to after each query.

create sequence seq;
select sum(nextval('seq')) over (order by n rows between current row and
unbounded following)
from generate_series(1,10) n(n);
select currval('seq');

drop sequence seq;
create sequence seq;
select sum(nextval('seq')) over (order by n)
from generate_series(1,10) n(n);
select currval('seq');

nextval() is executed 55 times with the first query and 10 times with the
2nd query. Of course this is because the current implementation requires
that when a tuple moves out of scope that the whole frame be re-aggregated.
I had thought that all of the places that disabled optimisations due to
there being volatile somewhere were to stop the number of executions being
undefined, this case seems undefined already, or at least I can't find
anywhere in the docs that says the expression will be executed this number
of times.

Once again, I'm not fighting to have inverse transitions uses when volatile
functions are involved, I'll happily disable that. I just wanted to raise
this to find out if it's intended or not and it seemed like a good thread
to do it on.

Regards

David Rowley

Re: [HACKERS] Race condition in b-tree page deletion