date:20100901

Re: [HACKERS] compiling with RELCACHE_FORCE_RELEASE doesn't pass regression

2010-09-01 Thread Jeff Davis

On Wed, 2010-09-01 at 20:57 -0400, Tom Lane wrote:
> I wrote:
> > Probably the best fix would be to make typcache flushing fully
> > independent of the relcache, but that would mean making sure that all
> > ALTER TABLE variants that affect the rowtype will issue an explicit
> > typcache flush.  That seems a bit too invasive to be back-patchable.
> > I'm not entirely sure this sort of failure can occur without
> > RELCACHE_FORCE_RELEASE, but I'm definitely not sure it can't, so a
> > backpatchable fix would be nice.
> 
> After a bit more study it seems that there is a reasonably
> back-patchable approach to this.  We can continue to drive flushing of
> composite-type typcache entries off of relcache flush, but it has to
> occur when we do RelationCacheInvalidateEntry() or
> RelationCacheInvalidate() due to a SI invalidate event, not just
> anytime a relcache entry is closed.  We can do that by plugging in a
> callback function with CacheRegisterRelcacheCallback.
> 

I think I see how this fixes the problem, but I still don't completely
understand.

Why can't we just make a real copy of the tuple descriptor for the type
cache entry, rather than sharing it between the relcache and the type
cache?

Thank you for the quick response.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] leaky views, yet again

2010-09-01 Thread KaiGai Kohei

(2010/09/02 12:38), Robert Haas wrote:
> 2010/9/1 KaiGai Kohei:
>> (2010/09/02 11:57), Robert Haas wrote:
>>> 2010/9/1 KaiGai Kohei:
 Right now, it stands on a strict assumption that considers operators
 implemented with built-in functions are safe; it does not have no
 possibility to leak supplied arguments anywhere.

 Please note that this patch does not case about a case when
 a function inside a view and a function outside a view are
 distributed into same level and the later function has lower
 cost value.
>>>
>>> Without making some attempt to address these two points, I don't see
>>> the point of this patch.
>>>
>>> Also, I believe we decided previously do this deoptimization only in
>>> case the user requests it with CREATE SECURITY VIEW.
>>>
>> Perhaps, I remember the previous discussion incorrectly.
>>
>> If we have a hint about whether the supplied view is intended to security
>> purpose, or not, it seems to me it is a reliable method to prevent pulling
>> up the subqueries come from security views.
>> Is it too much deoptimization?
> 
> Well, that'd prevent something like id = 3 from getting pushed down,
> which seems a bit harsh.
> 
Hmm. If so, we need to remember what FromExpr was come from subqueries
of security views, and what were not. Then, we need to prevent leakable
clause will be distributed to inside of them.
In addition, we also need to care about the order of function calls in
same level, because it is not implicitly solved.

At first, let's consider top-half of the matter.

When views are expanded into subqueries in query-rewriter, Query tree
lost an information OID of the view, because RangeTblEntry does not have
OID of the relation when it is RTE_SUBQUERY. So, we need to patch here
to mark a flag whether the supplied view is security focused, or not.

Then, pull_up_simple_subquery() pulls up a supplied subquery into normal
join, if possible. In this case, FromExpr is chained into the upper level.
Of course, FromExpr does not have a flag to show its origin, so we also
need to copy the new flag in RangeTblEntry to FromExpr.

Then, when distribute_qual_to_rels() is called, the caller also provides
a Bitmapset of relation-Ids which are contained under the FromExpr with
the flag saying it came from the security views.
Even if the supplied clause references a part of the Bitmapset, we need
to prevent the clause being pushed down into the relations came from
security views, except for ones we can make sure these are safe.

Perhaps, it is not impossible

Thanks,
-- 
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-01 Thread Fujii Masao

On Wed, Sep 1, 2010 at 4:11 PM, Heikki Linnakangas
 wrote:
> The obvious next question is how to wait for multiple sockets and a latch at
> the same time? Perhaps we should have a select()-like interface where you
> can pass multiple file descriptors. Then again, looking at the current
> callers of select() in the backend, apart from postmaster they all wait for
> only one fd.

Currently backends have not waited for multiple sockets, so I don't think that
interface is required for now. Similarly, we don't need to wait for the socket
to be ready to *write* because there is no use case for now.

>> Windows implementation of WaitLatchAndSocket() seems not to be
>> so simple. We would need to wait for both the latch event and
>> the packet from the socket by using WaitForMultipleObjectsEx().
>
> Well, we already use WaitForMultipleObjectsEx() to implement select() on
> Windows, so it should be straightforward to copy that. I'll look into that.

Agreed.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] leaky views, yet again

2010-09-01 Thread Robert Haas

2010/9/1 KaiGai Kohei :
> (2010/09/02 11:57), Robert Haas wrote:
>> 2010/9/1 KaiGai Kohei:
>>> Right now, it stands on a strict assumption that considers operators
>>> implemented with built-in functions are safe; it does not have no
>>> possibility to leak supplied arguments anywhere.
>>>
>>> Please note that this patch does not case about a case when
>>> a function inside a view and a function outside a view are
>>> distributed into same level and the later function has lower
>>> cost value.
>>
>> Without making some attempt to address these two points, I don't see
>> the point of this patch.
>>
>> Also, I believe we decided previously do this deoptimization only in
>> case the user requests it with CREATE SECURITY VIEW.
>>
> Perhaps, I remember the previous discussion incorrectly.
>
> If we have a hint about whether the supplied view is intended to security
> purpose, or not, it seems to me it is a reliable method to prevent pulling
> up the subqueries come from security views.
> Is it too much deoptimization?

Well, that'd prevent something like id = 3 from getting pushed down,
which seems a bit harsh.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] leaky views, yet again

2010-09-01 Thread KaiGai Kohei

(2010/09/02 11:57), Robert Haas wrote:
> 2010/9/1 KaiGai Kohei:
>> Right now, it stands on a strict assumption that considers operators
>> implemented with built-in functions are safe; it does not have no
>> possibility to leak supplied arguments anywhere.
>>
>> Please note that this patch does not case about a case when
>> a function inside a view and a function outside a view are
>> distributed into same level and the later function has lower
>> cost value.
> 
> Without making some attempt to address these two points, I don't see
> the point of this patch.
> 
> Also, I believe we decided previously do this deoptimization only in
> case the user requests it with CREATE SECURITY VIEW.
> 
Perhaps, I remember the previous discussion incorrectly.

If we have a hint about whether the supplied view is intended to security
purpose, or not, it seems to me it is a reliable method to prevent pulling
up the subqueries come from security views.
Is it too much deoptimization?

If we keep security views as subqueries, the later point shall be solved
implicitly. Even if we try to optimize security views in a certain level,
it is not difficult to solve, as I demonstrated before.

Thanks,
-- 
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] git: uh-oh

2010-09-01 Thread Robert Haas

On Wed, Sep 1, 2010 at 6:39 AM, Magnus Hagander  wrote:
>> That definitely didn't fix it, although I'm not quite sure why.  Can
>> you throw the modified CVS you ran this off of up somewhere I can
>> rsync it?
>
> no rsync server on that box, but I put up a tarball for you at
> http://www.hagander.net/pgsql/cvsrepo.tgz

OK, color me baffled.  I looked at gram.c and I believe you obsoleted
the right revs. The only difference I see between this and some other
random deleted file is that it has a couple of tags pointing to revs
that don't exist any more, but I can't see how that would cause the
observed weirdness.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rewrite, normal execution vs. EXPLAIN ANALYZE

2010-09-01 Thread Robert Haas

On Mon, Aug 30, 2010 at 5:30 PM, Marko Tiikkaja
 wrote:
> On 2010-08-31 12:07 AM +0300, I wrote:
>>
>> I looked at fixing this..
>
> The previous patch had a bug in fmgr_sql() our regression tests didn't
> catch.  Fixed version attached.

It would probably be a good idea to add this to the currently open CommitFest.

https://commitfest.postgresql.org/action/commitfest_view/open

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ECPG fix for mixed case cursor names

2010-09-01 Thread Robert Haas

2010/8/25 Boszormenyi Zoltan :
> PostgreSQL allows in plain SQL to declare a cursor
> e.g. in all lower case and fetch from is in all upper case.
> We need to allow this from ECPG, too, but strictly when
> the cursor name is not in a variable. Otherwise this code
> below doesn't notice the cursor's double declaration
> and complains using an undeclared cursor:

It might be a good idea to add this to the open CommitFest so we don't
lose track of it.

https://commitfest.postgresql.org/action/commitfest_view/open

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multibyte charater set in levenshtein function

2010-09-01 Thread Robert Haas

2010/8/28 Alexander Korotkov :
> Now test for levenshtein_less_equal performance.

Nice results.  I'll try to find time to look at this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] leaky views, yet again

2010-09-01 Thread Robert Haas

2010/9/1 KaiGai Kohei :
> Right now, it stands on a strict assumption that considers operators
> implemented with built-in functions are safe; it does not have no
> possibility to leak supplied arguments anywhere.
>
> Please note that this patch does not case about a case when
> a function inside a view and a function outside a view are
> distributed into same level and the later function has lower
> cost value.

Without making some attempt to address these two points, I don't see
the point of this patch.

Also, I believe we decided previously do this deoptimization only in
case the user requests it with CREATE SECURITY VIEW.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] compiling with RELCACHE_FORCE_RELEASE doesn't pass regression

2010-09-01 Thread Tom Lane

I wrote:
> Probably the best fix would be to make typcache flushing fully
> independent of the relcache, but that would mean making sure that all
> ALTER TABLE variants that affect the rowtype will issue an explicit
> typcache flush.  That seems a bit too invasive to be back-patchable.
> I'm not entirely sure this sort of failure can occur without
> RELCACHE_FORCE_RELEASE, but I'm definitely not sure it can't, so a
> backpatchable fix would be nice.

After a bit more study it seems that there is a reasonably
back-patchable approach to this.  We can continue to drive flushing of
composite-type typcache entries off of relcache flush, but it has to
occur when we do RelationCacheInvalidateEntry() or
RelationCacheInvalidate() due to a SI invalidate event, not just
anytime a relcache entry is closed.  We can do that by plugging in a
callback function with CacheRegisterRelcacheCallback.

Because the callback will only have the relation OID not the type OID,
it will have to scan the whole TypeCacheHash to see if there's a
matching entry.  However, that's not as bad as it sounds, because there
aren't likely to be very many entries in that hashtable.  I put in some
quick-hack instrumentation to see how big the table gets during the
regression tests, and find that of the hundred-odd backends launched
during the tests, none get above 26 typcache entries, and only 8 get as
many as 10 entries.  Based on those numbers, I'm not sure it'd ever be
worth adding the additional infrastructure to allow a direct hash lookup
instead.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] security hook on table creation

2010-09-01 Thread KaiGai Kohei

This patch allows external security providers to check privileges
to create a new relation and to inform the security labels to be
assigned on the new one.

This hook is put on DefineRelation() and OpenIntoRel(), but isn't
put on Boot_CreateStmt, create_toast_table() and make_new_heap(),
although they actually create a relation, because I assume these
are the cases when we don't need individual checks and security
labels.

DefineIndex() also creates a new relation (RELKIND_INDEX), but
it seems to me the PG implementation considers indexes are a part
of properties of tables, because it always has same owner-id.
So, it shall be checked at the upcoming ALTER TABLE hook, instead.

The ESP plugins can return a list of security labels of the new
relation, columns and relation-type. If multiple ESPs are installed,
it shall append its security labels on the security labels of the
secondary plugin.
The list shall be a list of SecLabelItem as follows:
  typedef struct
  {
  ObjectAddress   object;
  const char *tag;
  const char *seclabel;
  } SecLabelItem;
OID of them are decided in heap_create_with_catalog(), so ESP cannot
know what OID shall be supplied at the point where it makes access
control decision.
So, the core routine fix up the SecLabelItem::object->objectId later,
if InvalidOid is given. I think it is a reasonable idea rather than
putting one more hook to store security labels after the table creation.

Please also note that this patch depends on the security label support
patch that I submitted before.

Thanks,
-- 
KaiGai Kohei 
*** a/src/backend/bootstrap/bootparse.y
--- b/src/backend/bootstrap/bootparse.y
***
*** 246,252  Boot_CreateStmt:
  	  (Datum) 0,
  	  false,
  	  true,
! 	  false);
  		elog(DEBUG4, "relation created with oid %u", id);
  	}
  	do_end();
--- 246,253 
  	  (Datum) 0,
  	  false,
  	  true,
! 	  false,
! 	  NIL);
  		elog(DEBUG4, "relation created with oid %u", id);
  	}
  	do_end();
*** a/src/backend/catalog/heap.c
--- b/src/backend/catalog/heap.c
***
*** 49,54 
--- 49,55 
  #include "catalog/pg_type.h"
  #include "catalog/pg_type_fn.h"
  #include "catalog/storage.h"
+ #include "commands/seclabel.h"
  #include "commands/tablecmds.h"
  #include "commands/typecmds.h"
  #include "miscadmin.h"
***
*** 94,99  static Oid AddNewRelationType(const char *typeName,
--- 95,101 
  static void RelationRemoveInheritance(Oid relid);
  static void StoreRelCheck(Relation rel, char *ccname, Node *expr,
  			  bool is_local, int inhcount);
+ static void StoreSecLabels(Relation rel, Oid typeId, List *seclabels);
  static void StoreConstraints(Relation rel, List *cooked_constraints);
  static bool MergeWithExistingConstraint(Relation rel, char *ccname, Node *expr,
  			bool allow_merge, bool is_local);
***
*** 882,887  AddNewRelationType(const char *typeName,
--- 884,890 
   *	use_user_acl: TRUE if should look for user-defined default permissions;
   *		if FALSE, relacl is always set NULL
   *	allow_system_table_mods: TRUE to allow creation in system namespaces
+  *	seclabels : list of security labels to be assigned on, if exist
   *
   * Returns the OID of the new relation
   * 
***
*** 905,911  heap_create_with_catalog(const char *relname,
  		 Datum reloptions,
  		 bool use_user_acl,
  		 bool allow_system_table_mods,
! 		 bool if_not_exists)
  {
  	Relation	pg_class_desc;
  	Relation	new_rel_desc;
--- 908,915 
  		 Datum reloptions,
  		 bool use_user_acl,
  		 bool allow_system_table_mods,
! 		 bool if_not_exists,
! 		 List *seclabels)
  {
  	Relation	pg_class_desc;
  	Relation	new_rel_desc;
***
*** 1189,1194  heap_create_with_catalog(const char *relname,
--- 1193,1203 
  	}
  
  	/*
+ 	 * Store any supplied security labels of the relation
+ 	 */
+ 	StoreSecLabels(new_rel_desc, new_type_oid, seclabels);
+ 
+ 	/*
  	 * Store any supplied constraints and defaults.
  	 *
  	 * NB: this may do a CommandCounterIncrement and rebuild the relcache
***
*** 1808,1813  StoreRelCheck(Relation rel, char *ccname, Node *expr,
--- 1817,1861 
  }
  
  /*
+  * Store any supplied security labels of the new relation
+  */
+ static void
+ StoreSecLabels(Relation rel, Oid typeId, List *seclabels)
+ {
+ 	ListCell   *l;
+ 
+ 	foreach (l, seclabels)
+ 	{
+ 		SecLabelItem   *sl = lfirst(l);
+ 		ObjectAddress	object;
+ 
+ 		Assert(sl->tag != NULL && sl->seclabel != NULL);
+ 
+ 		memcpy(&object, &sl->object, sizeof(object));
+ 
+ 		/*
+ 		 * It fixes up OID of the new relation and type, because these IDs were
+ 		 * not decided at the point when external security provider made access
+ 		 * control decision and returns security label to be assigned on.
+ 		 *
+

Re: [HACKERS] compiling with RELCACHE_FORCE_RELEASE doesn't pass regression

2010-09-01 Thread Tom Lane

Jeff Davis  writes:
> I have attached regression.diffs after a "make check". The diffs don't
> look trivial, and actually look quite strange to me.

BTW, if I dike out the flush_rowtype_cache call at relcache.c:1929,
the regression tests do pass for me, so it seems there is only one
bug exposed by this test (as of HEAD anyway).  I don't find that
to be a credible real fix, though, as stated previously.

Once we do have a fix in place for this, it might be a good idea for
someone to run a buildfarm member with -DRELCACHE_FORCE_RELEASE ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix for pg_upgrade's forcing pg_controldata into English

2010-09-01 Thread Tom Lane

Bruce Momjian  writes:
> Tom Lane wrote:
>> I certainly hope that pg_regress isn't freeing the strings it passes
>> to putenv() ...

> pg_regress does not restore these settings (it says with C/English) so
> the code is different.

That's not what I'm on about.  You're trashing strings that are part of
the live environment.  It might accidentally fail to fail for you, if
your version of free() doesn't immediately clobber the released storage,
but it's still broken.  Read the putenv() man page.

+ #ifndef WIN32
+   char   *envstr = (char *) pg_malloc(ctx, strlen(var) +
+   strlen(val) + 1);
+ 
+   sprintf(envstr, "%s=%s", var, val);
+   putenv(envstr);
+   pg_free(envstr);

+ #else
+   SetEnvironmentVariableA(var, val);
+ #endif

The fact that there is no such free() in pg_regress is not an oversight
or shortcut.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix for pg_upgrade's forcing pg_controldata into English

2010-09-01 Thread Bruce Momjian

Tom Lane wrote:
> Bruce Momjian  writes:
> > Tom Lane wrote:
> >> However, that's something for 9.1 and beyond.  Bruce's immediate problem
> >> is what to do in pg_upgrade in 9.0, and there I concur that he should
> >> duplicate what pg_regress is doing.
> 
> > OK, here is a patch that sets all the variables that pg_regress.c sets.
> 
> I certainly hope that pg_regress isn't freeing the strings it passes
> to putenv() ...

pg_regress does not restore these settings (it says with C/English) so
the code is different.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] compiling with RELCACHE_FORCE_RELEASE doesn't pass regression

2010-09-01 Thread Tom Lane

I wrote:
> Jeff Davis  writes:
>> I have attached regression.diffs after a "make check". The diffs don't
>> look trivial, and actually look quite strange to me.

> Hmm, sorta looks like that breaks something related to checking for
> composite types.  That would be a bug.  Will look.

So here is the culprit:

#0  0x007c08e7 in flush_rowtype_cache (type_id=49022) at typcache.c:531
#1  0x007b4797 in RelationClearRelation (relation=0x7fc490ccebe0, 
rebuild=0 '\000') at relcache.c:1929
#2  0x007b4131 in RelationClose (relation=0x7fc490ccebe0)
at relcache.c:1680
#3  0x00472721 in relation_close (relation=0x7fc490ccebe0, lockmode=1)
at heapam.c:1051
#4  0x007c01fe in lookup_type_cache (type_id=49022, flags=64)
at typcache.c:299
#5  0x007c023f in lookup_rowtype_tupdesc_internal (type_id=49022, 
typmod=-1, noError=0 '\000') at typcache.c:321
#6  0x007c03e7 in lookup_rowtype_tupdesc_copy (type_id=49022, 
typmod=-1) at typcache.c:395
#7  0x007cd087 in get_expr_result_type (expr=0x1395d08, 
resultTypeId=0x0, resultTupleDesc=0x7fffbd6e5830) at funcapi.c:256

lookup_type_cache() opens the relcache entry to copy its tupdesc, does
so, and closes the relcache entry again.  But then (if
RELCACHE_FORCE_RELEASE is on) we blow away the relcache entry ...
and RelationClearRelation carefully reaches around and blows away
the tupdesc in the typcache too!

So that's overly aggressive, but I'm not sure what's a simple fix for
it.  If we just take out that flush_rowtype_cache call in
RelationClearRelation, I think we run the risk of the typcache not being
flushed when it needs to be, namely when an ALTER TABLE changes the
table's rowtype at an instant when some particular backend doesn't
have an active relcache entry for it.

Probably the best fix would be to make typcache flushing fully
independent of the relcache, but that would mean making sure that all
ALTER TABLE variants that affect the rowtype will issue an explicit
typcache flush.  That seems a bit too invasive to be back-patchable.
I'm not entirely sure this sort of failure can occur without
RELCACHE_FORCE_RELEASE, but I'm definitely not sure it can't, so a
backpatchable fix would be nice.

Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "serializable" in comments and names

2010-09-01 Thread Robert Haas

On Sep 1, 2010, at 11:02 AM, "Kevin Grittner"  
wrote:
> The patch to implement true serializable transactions using SSI
> renames/rewords these things to avoid confusion.  However, there are
> seven files which are changed only for this reason, and another
> where there is one "real" change of two lines hidden in the midst of
> dozens of lines of such wording changes.  I find it distracting to
> have all this mixed in, and I fear that it will be a time-waster for
> anyone reviewing the meat of the patch.  I'd like to submit a
> "no-op" patch to cover these issues in advance of the CF, to get
> them out of the way.  Joe suggested that I pose the issue to the
> -hackers list.

+1.

> I could knock out a couple other files from the main patch if people
> considered it acceptable to enable the SHMQueueIsDetached function
> now, which the patch uses in several places within asserts.  I would
> remove the #ifdef NOT_USED from around the (very short) function,
> and add it to the .h file.

-1.

> The changes to the comments and local variables seem pretty safe. 
> The change of IsXactIsoLevelSerializable to
> IsXactIsoLevelXactSnapshotBased (or whatever name the community
> prefers)

How about IsXactIsoLevelSnapshot?  Just to be a bit shorter.

...Robert
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Path question

2010-09-01 Thread Robert Haas

On Sep 1, 2010, at 10:21 AM, Greg Stark  wrote:
> For what it's worth I disagree with Tom. I think this is a situation
> where we need *both* types of solution. Ideally we will be able to use
> a plain Append node for cases where we know the relative ordering of
> the data in different partitions, but there will always be cases where
> the structured partition data doesn't actually match up with the
> ordering requested and we'll need to fall back to a merge-append node.

I agree. Explicit partitioning may open up some additional optimization 
possibilities in certain cases, but Merge Append is more general and extremely 
valuable in its own right.

...Robert
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix for pg_upgrade's forcing pg_controldata into English

2010-09-01 Thread Tom Lane

Bruce Momjian  writes:
> Tom Lane wrote:
>> However, that's something for 9.1 and beyond.  Bruce's immediate problem
>> is what to do in pg_upgrade in 9.0, and there I concur that he should
>> duplicate what pg_regress is doing.

> OK, here is a patch that sets all the variables that pg_regress.c sets.

I certainly hope that pg_regress isn't freeing the strings it passes
to putenv() ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] compiling with RELCACHE_FORCE_RELEASE doesn't pass regression

2010-09-01 Thread Tom Lane

Jeff Davis  writes:
> On Wed, 2010-09-01 at 15:31 -0400, Tom Lane wrote:
>> Jeff Davis  writes:
>>> Compiling with RELCACHE_FORCE_RELEASE doesn't pass "make check" on my
>>> machine.

>> What happens exactly?

> I have attached regression.diffs after a "make check". The diffs don't
> look trivial, and actually look quite strange to me.

Hmm, sorta looks like that breaks something related to checking for
composite types.  That would be a bug.  Will look.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix for pg_upgrade's forcing pg_controldata into English

2010-09-01 Thread Bruce Momjian

Tom Lane wrote:
> Alvaro Herrera  writes:
> > Excerpts from Bruce Momjian's message of mi?? sep 01 16:35:18 -0400 2010:
> >> I have implemented your suggestion of setting LANG, LANGUAGE, and
> >> LC_MESSAGES based on code in pg_regress.c;  that is the only place I see
> >> where we force English, and it certainly has received more testing.
> 
> > I think a real long-term solution is to have a machine-readable format
> > for pg_controldata, perhaps something very simple like
> 
> > $ pg_controldata --machine
> > PGCONTROL_VERSION_NUMBER=903
> > CATALOG_VERSION_NUMBER=201008051
> > DATABASE_SYSIDENTIFIER=5504177303240039672
> > etc
> 
> > This wouldn't be subject to translation and thus much easier to process.
> 
> +1.  pg_controldata was never written with the idea that its output
> would be read by programs.  If we're going to start doing that, we
> should provide an output format that's suitable, not try to work around
> it in the callers.
> 
> However, that's something for 9.1 and beyond.  Bruce's immediate problem
> is what to do in pg_upgrade in 9.0, and there I concur that he should
> duplicate what pg_regress is doing.

OK, here is a patch that sets all the variables that pg_regress.c sets.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
Index: contrib/pg_upgrade/controldata.c
===
RCS file: /cvsroot/pgsql/contrib/pg_upgrade/controldata.c,v
retrieving revision 1.9
diff -c -c -r1.9 controldata.c
*** contrib/pg_upgrade/controldata.c	6 Jul 2010 19:18:55 -	1.9
--- contrib/pg_upgrade/controldata.c	1 Sep 2010 22:38:59 -
***
*** 11,16 
--- 11,17 
  
  #include 
  
+ static void putenv2(migratorContext *ctx, const char *var, const char *val);
  
  /*
   * get_control_data()
***
*** 51,69 
  	bool		got_toast = false;
  	bool		got_date_is_int = false;
  	bool		got_float8_pass_by_value = false;
  	char	   *lang = NULL;
  
  	/*
! 	 * Because we test the pg_resetxlog output strings, it has to be in
! 	 * English.
  	 */
  	if (getenv("LANG"))
  		lang = pg_strdup(ctx, getenv("LANG"));
  #ifndef WIN32
! 	putenv(pg_strdup(ctx, "LANG=C"));
  #else
! 	SetEnvironmentVariableA("LANG", "C");
  #endif
  	snprintf(cmd, sizeof(cmd), SYSTEMQUOTE "\"%s/%s \"%s\"" SYSTEMQUOTE,
  			 cluster->bindir,
  			 live_check ? "pg_controldata\"" : "pg_resetxlog\" -n",
--- 52,106 
  	bool		got_toast = false;
  	bool		got_date_is_int = false;
  	bool		got_float8_pass_by_value = false;
+ 	char	   *lc_collate = NULL;
+ 	char	   *lc_ctype = NULL;
+ 	char	   *lc_monetary = NULL;
+ 	char	   *lc_numeric = NULL;
+ 	char	   *lc_time = NULL;
  	char	   *lang = NULL;
+ 	char	   *language = NULL;
+ 	char	   *lc_all = NULL;
+ 	char	   *lc_messages = NULL;
  
  	/*
! 	 * Because we test the pg_resetxlog output as strings, it has to be in
! 	 * English.  Copied from pg_regress.c.
  	 */
+ 	if (getenv("LC_COLLATE"))
+ 		lc_collate = pg_strdup(ctx, getenv("LC_COLLATE"));
+ 	if (getenv("LC_CTYPE"))
+ 		lc_ctype = pg_strdup(ctx, getenv("LC_CTYPE"));
+ 	if (getenv("LC_MONETARY"))
+ 		lc_monetary = pg_strdup(ctx, getenv("LC_MONETARY"));
+ 	if (getenv("LC_NUMERIC"))
+ 		lc_numeric = pg_strdup(ctx, getenv("LC_NUMERIC"));
+ 	if (getenv("LC_TIME"))
+ 		lc_time = pg_strdup(ctx, getenv("LC_TIME"));
  	if (getenv("LANG"))
  		lang = pg_strdup(ctx, getenv("LANG"));
+ 	if (getenv("LANGUAGE"))
+ 		language = pg_strdup(ctx, getenv("LANGUAGE"));
+ 	if (getenv("LC_ALL"))
+ 		lc_all = pg_strdup(ctx, getenv("LC_ALL"));
+ 	if (getenv("LC_MESSAGES"))
+ 		lc_messages = pg_strdup(ctx, getenv("LC_MESSAGES"));
+ 
+ 	putenv2(ctx, "LC_COLLATE", NULL);
+ 	putenv2(ctx, "LC_CTYPE", NULL);
+ 	putenv2(ctx, "LC_MONETARY", NULL);
+ 	putenv2(ctx, "LC_NUMERIC", NULL);
+ 	putenv2(ctx, "LC_TIME", NULL);
+ 	putenv2(ctx, "LANG",
  #ifndef WIN32
! 			NULL);
  #else
! 			/* On Windows the default locale cannot be English, so force it */
! 			"en");
  #endif
+ 	putenv2(ctx, "LANGUAGE", NULL);
+ 	putenv2(ctx, "LC_ALL", NULL);
+ 	putenv2(ctx, "LC_MESSAGES", "C");
+ 
  	snprintf(cmd, sizeof(cmd), SYSTEMQUOTE "\"%s/%s \"%s\"" SYSTEMQUOTE,
  			 cluster->bindir,
  			 live_check ? "pg_controldata\"" : "pg_resetxlog\" -n",
***
*** 334,361 
  	if (output)
  		pclose(output);
  
! 	/* restore LANG */
! 	if (lang)
! 	{
! #ifndef WIN32
! 		char	   *envstr = (char *) pg_malloc(ctx, strlen(lang) + 6);
! 
! 		sprintf(envstr, "LANG=%s", lang);
! 		putenv(envstr);
! #else
! 		SetEnvironmentVariableA("LANG", lang);
! #endif
! 		pg_free(lang);
! 	}
! 	else
! 	{
! #ifndef WIN32
! 		unsetenv("LANG");
! #else
! 		SetEnvironmentVariableA("LANG", "");
! #endif
! 	}
! 
  	/* verify that we got all the mandatory pg_control data */
  	if (!got_xid || !got_oid ||
  		(!live_check && !got_log_id) ||
--- 371,399 
  	if (output)
  		pclose(output);
  
! 	/*
! 	 *	Restore environ

Re: [HACKERS] Fix for pg_upgrade's forcing pg_controldata into English

2010-09-01 Thread Tom Lane

Alvaro Herrera  writes:
> Excerpts from Bruce Momjian's message of miÃ© sep 01 16:35:18 -0400 2010:
>> I have implemented your suggestion of setting LANG, LANGUAGE, and
>> LC_MESSAGES based on code in pg_regress.c;  that is the only place I see
>> where we force English, and it certainly has received more testing.

> I think a real long-term solution is to have a machine-readable format
> for pg_controldata, perhaps something very simple like

> $ pg_controldata --machine
> PGCONTROL_VERSION_NUMBER=903
> CATALOG_VERSION_NUMBER=201008051
> DATABASE_SYSIDENTIFIER=5504177303240039672
> etc

> This wouldn't be subject to translation and thus much easier to process.

+1.  pg_controldata was never written with the idea that its output
would be read by programs.  If we're going to start doing that, we
should provide an output format that's suitable, not try to work around
it in the callers.

However, that's something for 9.1 and beyond.  Bruce's immediate problem
is what to do in pg_upgrade in 9.0, and there I concur that he should
duplicate what pg_regress is doing.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix for pg_upgrade's forcing pg_controldata into English

2010-09-01 Thread Alvaro Herrera

Excerpts from Bruce Momjian's message of mié sep 01 16:35:18 -0400 2010:

> I have implemented your suggestion of setting LANG, LANGUAGE, and
> LC_MESSAGES based on code in pg_regress.c;  that is the only place I see
> where we force English, and it certainly has received more testing.

I think a real long-term solution is to have a machine-readable format
for pg_controldata, perhaps something very simple like

$ pg_controldata --machine
PGCONTROL_VERSION_NUMBER=903
CATALOG_VERSION_NUMBER=201008051
DATABASE_SYSIDENTIFIER=5504177303240039672
etc

This wouldn't be subject to translation and thus much easier to process.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-01 Thread Simon Riggs

On Wed, 2010-09-01 at 13:23 +0300, Heikki Linnakangas wrote:
> On 01/09/10 10:53, Fujii Masao wrote:
> > Before discussing about that, we should determine whether registering
> > standbys in master is really required. It affects configuration a lot.
> > Heikki thinks that it's required, but I'm still unclear about why and
> > how.
> >
> > Why do standbys need to be registered in master? What information
> > should be registered?
> 
> That requirement falls out from the handling of disconnected standbys. 
> If a standby is not connected, what does the master do with commits? If 
> the answer is anything else than acknowledge them to the client 
> immediately, as if the standby never existed, the master needs to know 
> what standby servers exist. Otherwise it can't know if all the standbys 
> are connected or not.

"All the standbys" presupposes that we know what they are, i.e. we have
registered them, so I see that argument as circular. Quorum commit does
not need registration, so quorum commit is the "easy to implement"
option and registration is the more complex later feature. I don't have
a problem with adding registration later and believe it can be done
later without issues.

> >> What does synchronous replication mean, when is a transaction
> >> acknowledged as committed?
> >
> > I proposed four synchronization levels:
> >
> > 1. async
> >doesn't make transaction commit wait for replication, i.e.,
> >asynchronous replication. This mode has been already supported in
> >9.0.
> >
> > 2. recv
> >makes transaction commit wait until the standby has received WAL
> >records.
> >
> > 3. fsync
> >makes transaction commit wait until the standby has received and
> >flushed WAL records to disk
> >
> > 4. replay
> >makes transaction commit wait until the standby has replayed WAL
> >records after receiving and flushing them to disk
> >
> > OTOH, Simon proposed the quorum commit feature. I think that both
> > is required for various our use cases. Thought?
> 
> I'd like to keep this as simple as possible, yet flexible so that with 
> enough scripting and extensions, you can get all sorts of behavior. I 
> think quorum commit falls into the "extension" category; if you're setup 
> is complex enough, it's going to be impossible to represent that in our 
> config files no matter what. But if you write a little proxy, you can 
> implement arbitrary rules there.
> 
> I think recv/fsync/replay should be specified in the standby. 

I think the wait mode (i.e. recv/fsync/replay or others) should be
specified in the master. This allows the application to specify whatever
level of protection it requires, and also allows the behaviour to be
different for user-specifiable parts of the application. As soon as you
set this on the standby then you have the one-size fits all approach to
synchronisation.

We already know performance of synchronous rep is poor, which is exactly
why I want to be able to control it at the application level. Fine
grained control is important, otherwise we may as well just use DRBD and
skip this project completely, since we already have that. It will also
be a feature that no other database has, taking us truly beyond what has
gone before.

The master/standby decision is not something that is easily changed.
Whichever we decide now will be the thing we stick with.

> It has no 
> direct effect on the master, the master would just relay the setting to 
> the standby when it connects, or the standby would send multiple 
> XLogRecPtrs and let the master decide when the WAL is persistent enough. 
> And what if you write a proxy that has some other meaning of "persistent 
> enough"? Like when it has been written to the OS buffers but not yet 
> fsync'd, or when it has been fsync'd to at least one standby and 
> received by at least three others. recv/fsync/replay is not going to 
> represent that behavior well.
> 
> "sync vs async" on the other hand should be specified in the master, 
> because it has a direct impact on the behavior of commits in the master.
> 

> I propose a configuration file standbys.conf, in the master:
> 
> # STANDBY NAMESYNCHRONOUS   TIMEOUT
> importantreplica  yes   100ms
> tempcopy  no10s
> 
> Or perhaps this should be stored in a system catalog.

That part sounds like complexity that can wait until later. I would not
object if you really want this, but would prefer it to look like this:

# STANDBY NAMEDEFAULT_WAIT_MODE   TIMEOUT
importantreplica  sync  100ms
tempcopy  async 10s

You don't *have* to use the application level control if you don't want
it. But its an important capability for real world apps, since the
alternative is deliberately splitting an application across two database
servers each with different wait modes.

> >> What to do if a standby server dies and never
> >> acknowledges a commit?
> >
> > The master's reaction to that situation should be configurab

[HACKERS] Fix for pg_upgrade's forcing pg_controldata into English

2010-09-01 Thread Bruce Momjian

Tom Lane wrote:
> Bruce Momjian  writes:
> > Greg Sabino Mullane wrote:
> >> Specifically, LANGUAGE changes the headers of pg_controldata 
> >> (but not the actual output, LC_ALL does that). Thanks for the 
> >> nudge, I'll get to rewriting some code.
> 
> > pg_upgrade does this in controldata.c for this exact reason:
> 
> > /*
> >  * Because we test the pg_resetxlog output strings, it has to be in
> >  * English.
> >  */
> > if (getenv("LANG"))
> > lang = pg_strdup(ctx, getenv("LANG"));
> > #ifndef WIN32
> > putenv(pg_strdup(ctx, "LANG=C"));
> > #else
> > SetEnvironmentVariableA("LANG", "C");
> > #endif
> 
> You do realize that's far from bulletproof?  To be sure that that does
> anything, you'd need to set (or unset) LC_ALL and LC_MESSAGES as well.
> And I thought Windows spelled it LANGUAGE not LANG ...

I have implemented your suggestion of setting LANG, LANGUAGE, and
LC_MESSAGES based on code in pg_regress.c;  that is the only place I see
where we force English, and it certainly has received more testing.

Should I set LC_ALL too?  The other variables mentioned in pg_regress.c?
Is this something I should patch to 9.0.X?

Patch attached.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
Index: contrib/pg_upgrade/controldata.c
===
RCS file: /cvsroot/pgsql/contrib/pg_upgrade/controldata.c,v
retrieving revision 1.9
diff -c -c -r1.9 controldata.c
*** contrib/pg_upgrade/controldata.c	6 Jul 2010 19:18:55 -	1.9
--- contrib/pg_upgrade/controldata.c	1 Sep 2010 20:30:45 -
***
*** 11,16 
--- 11,17 
  
  #include 
  
+ static void putenv2(migratorContext *ctx, const char *var, const char *val);
  
  /*
   * get_control_data()
***
*** 52,69 
  	bool		got_date_is_int = false;
  	bool		got_float8_pass_by_value = false;
  	char	   *lang = NULL;
  
  	/*
! 	 * Because we test the pg_resetxlog output strings, it has to be in
! 	 * English.
  	 */
  	if (getenv("LANG"))
  		lang = pg_strdup(ctx, getenv("LANG"));
  #ifndef WIN32
! 	putenv(pg_strdup(ctx, "LANG=C"));
  #else
! 	SetEnvironmentVariableA("LANG", "C");
  #endif
  	snprintf(cmd, sizeof(cmd), SYSTEMQUOTE "\"%s/%s \"%s\"" SYSTEMQUOTE,
  			 cluster->bindir,
  			 live_check ? "pg_controldata\"" : "pg_resetxlog\" -n",
--- 53,82 
  	bool		got_date_is_int = false;
  	bool		got_float8_pass_by_value = false;
  	char	   *lang = NULL;
+ 	char	   *language = NULL;
+ 	char	   *lc_messages = NULL;
  
  	/*
! 	 * Because we test the pg_resetxlog output as strings, it has to be in
! 	 * English.  Copied from pg_regress.c.
  	 */
  	if (getenv("LANG"))
  		lang = pg_strdup(ctx, getenv("LANG"));
+ 	if (getenv("LANGUAGE"))
+ 		language = pg_strdup(ctx, getenv("LANGUAGE"));
+ 	if (getenv("LC_MESSAGES"))
+ 		lc_messages = pg_strdup(ctx, getenv("LC_MESSAGES"));
+ 
+ 	putenv2(ctx, "LANG",
  #ifndef WIN32
! 			NULL);
  #else
! 			/* On Windows the default locale cannot be English, so force it */
! 			"en");
  #endif
+ 	putenv2(ctx, "LANGUAGE", NULL);
+ 	putenv2(ctx, "LC_MESSAGES", "C");
+ 
  	snprintf(cmd, sizeof(cmd), SYSTEMQUOTE "\"%s/%s \"%s\"" SYSTEMQUOTE,
  			 cluster->bindir,
  			 live_check ? "pg_controldata\"" : "pg_resetxlog\" -n",
***
*** 334,361 
  	if (output)
  		pclose(output);
  
! 	/* restore LANG */
! 	if (lang)
! 	{
! #ifndef WIN32
! 		char	   *envstr = (char *) pg_malloc(ctx, strlen(lang) + 6);
! 
! 		sprintf(envstr, "LANG=%s", lang);
! 		putenv(envstr);
! #else
! 		SetEnvironmentVariableA("LANG", lang);
! #endif
! 		pg_free(lang);
! 	}
! 	else
! 	{
! #ifndef WIN32
! 		unsetenv("LANG");
! #else
! 		SetEnvironmentVariableA("LANG", "");
! #endif
! 	}
! 
  	/* verify that we got all the mandatory pg_control data */
  	if (!got_xid || !got_oid ||
  		(!live_check && !got_log_id) ||
--- 347,360 
  	if (output)
  		pclose(output);
  
! 	/* restore environment variable settings */
! 	putenv2(ctx, "LANG", lang);
! 	putenv2(ctx, "LANGUAGE", language);
! 	putenv2(ctx, "LC_MESSAGES", lc_messages);
! 	pg_free(lang);
! 	pg_free(language);
! 	pg_free(lc_messages);
!  
  	/* verify that we got all the mandatory pg_control data */
  	if (!got_xid || !got_oid ||
  		(!live_check && !got_log_id) ||
***
*** 492,494 
--- 491,524 
  		pg_log(ctx, PG_FATAL, "Unable to rename %s to %s.\n", old_path, new_path);
  	check_ok(ctx);
  }
+ 
+ 
+ /*
+  *	This is like putenv(), but takes two arguments
+  *	It also does unsetenv() if val is NULL.
+  */
+ static void
+ putenv2(migratorContext *ctx, const char *var, const char *val)
+ {
+ 	if (val)
+ 	{
+ #ifndef WIN32
+ 		char	   *envstr = (char *) pg_malloc(ctx, strlen(var) +
+ strlen(val) + 1);
+ 
+ 		sprintf(envstr, "%s=%s", var, val);
+ 		putenv(envstr);
+ 		pg_free(envstr);
+ #else
+ 		Se

Re: [HACKERS] compiling with RELCACHE_FORCE_RELEASE doesn't pass regression

2010-09-01 Thread Tom Lane

Jeff Davis  writes:
> Compiling with RELCACHE_FORCE_RELEASE doesn't pass "make check" on my
> machine.

What happens exactly?

> Is it supposed to pass?

It shouldn't crash, but I'm not certain whether you'd see any
visible diffs in the tests.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] array_agg() NULL Handling

2010-09-01 Thread Pavel Stehule

2010/9/1 Tom Lane :
> Pavel Stehule  writes:
>> 2010/9/1 Tom Lane :
>>> Well, you can build your own version of array_agg with the same
>>> implementation, except you mark the transition function as strict ...
>
>> I am checking this now, and it is not possible - it needs a some
>> initial value and there isn't possible to set a "internal" value.
>
> Well, you can cheat a bit ...
>
> regression=# create or replace function array_agg_transfn_strict(internal, 
> anyelement) returns internal as 'array_agg_transfn' language internal 
> immutable;
> CREATE FUNCTION
> regression=# create aggregate array_agg_strict(anyelement) (stype = internal,
> sfunc = array_agg_transfn_strict, finalfunc = array_agg_finalfn);
> CREATE AGGREGATE
> regression=# create or replace function array_agg_transfn_strict(internal, 
> anyelement) returns internal as 'array_agg_transfn' language internal strict 
> immutable;
> CREATE FUNCTION
>

nice dark trick :) -  but it doesn't work

ERROR:  aggregate 16395 needs to have compatible input type and transition type
postgres=#

Pavel



>                        regards, tom lane
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] array_agg() NULL Handling

2010-09-01 Thread Tom Lane

Pavel Stehule  writes:
> 2010/9/1 Tom Lane :
>> Well, you can build your own version of array_agg with the same
>> implementation, except you mark the transition function as strict ...

> I am checking this now, and it is not possible - it needs a some
> initial value and there isn't possible to set a "internal" value.

Well, you can cheat a bit ...

regression=# create or replace function array_agg_transfn_strict(internal, 
anyelement) returns internal as 'array_agg_transfn' language internal immutable;
CREATE FUNCTION
regression=# create aggregate array_agg_strict(anyelement) (stype = internal,
sfunc = array_agg_transfn_strict, finalfunc = array_agg_finalfn);
CREATE AGGREGATE
regression=# create or replace function array_agg_transfn_strict(internal, 
anyelement) returns internal as 'array_agg_transfn' language internal strict 
immutable;
CREATE FUNCTION

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] compiling with RELCACHE_FORCE_RELEASE doesn't pass regression

2010-09-01 Thread Jeff Davis

Compiling with RELCACHE_FORCE_RELEASE doesn't pass "make check" on my
machine.

Is it supposed to pass?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] array_agg() NULL Handling

2010-09-01 Thread Pavel Stehule

2010/9/1 Tom Lane :
> "David E. Wheeler"  writes:
>> On Sep 1, 2010, at 11:09 AM, Pavel Stehule wrote:
>>> Then you can eliminate NULLs with simple function
>
>> Kind of defeats the purpose of the efficiency of the aggregate.
>
> Well, you can build your own version of array_agg with the same
> implementation, except you mark the transition function as strict ...
>

I am checking this now, and it is not possible - it needs a some
initial value and there isn't possible to set a "internal" value.
probably some C coding is necessary.

Regards

Pavel

>                        regards, tom lane
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] array_agg() NULL Handling

2010-09-01 Thread Tom Lane

"David E. Wheeler"  writes:
> On Sep 1, 2010, at 11:09 AM, Pavel Stehule wrote:
>> Then you can eliminate NULLs with simple function

> Kind of defeats the purpose of the efficiency of the aggregate.

Well, you can build your own version of array_agg with the same
implementation, except you mark the transition function as strict ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] array_agg() NULL Handling

2010-09-01 Thread Tom Lane

"David E. Wheeler"  writes:
> On Sep 1, 2010, at 10:30 AM, Tom Lane wrote:
>> Most aggregate functions ignore null inputs, so that rows in which
>> one or more of the expression(s) yield null are discarded.  (This
>> can be assumed to be true, unless otherwise specified, for all
>> built-in aggregates.)

> I don't think you need the parentheses, though without them, "This" might be 
> better written as "The ignoring of NULLs".

Done, without the parentheses.  I didn't add "The ignoring of NULLs",
it seemed a bit too verbose.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] array_agg() NULL Handling

2010-09-01 Thread David E. Wheeler

On Sep 1, 2010, at 11:09 AM, Pavel Stehule wrote:

> Then you can eliminate NULLs with simple function
> 
> CREATE OR REPLACE FUNCTION remove_null(anyarray)
> RETURNS anyarray AS $$
> SELECT ARRAY(SELECT x FROM unnest($1) g(x) WHERE x IS NOT NULL)
> $$ LANGUAGE sql;

Kind of defeats the purpose of the efficiency of the aggregate.

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] array_agg() NULL Handling

2010-09-01 Thread Pavel Stehule

2010/9/1 Thom Brown :
> On 1 September 2010 18:47, David Fetter  wrote:
>> On Wed, Sep 01, 2010 at 08:16:41AM -0700, David Wheeler wrote:
>>> On Sep 1, 2010, at 12:30 AM, Pavel Stehule wrote:
>>>
>>> > Docs is wrong :) I like current implementation.  You can remove a
>>> > NULLs from aggregation very simply, but different direction isn't
>>> > possible
>>>
>>> Would appreciate the recipe for removing the NULLs.
>>
>> WHERE clause :P
>
> There may be cases where that's undesirable, such as there being more
> than one aggregate in the SELECT list, or the column being grouped on
> needing to return rows regardless as to whether there's NULLs in the
> column being targeted by array_agg() or not.

Then you can eliminate NULLs with simple function

CREATE OR REPLACE FUNCTION remove_null(anyarray)
RETURNS anyarray AS $$
SELECT ARRAY(SELECT x FROM unnest($1) g(x) WHERE x IS NOT NULL)
$$ LANGUAGE sql;
> --
> Thom Brown
> Twitter: @darkixion
> IRC (freenode): dark_ixion
> Registered Linux user: #516935
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] array_agg() NULL Handling

2010-09-01 Thread David E. Wheeler

On Sep 1, 2010, at 10:30 AM, Tom Lane wrote:

> Hm, actually the whole para needs work.  It was designed at a time when
> DISTINCT automatically discarded nulls, which isn't true anymore, and
> that fact was patched-in in a very awkward way too.  Perhaps something
> like
> 
>The first form of aggregate expression invokes the aggregate
>once for each input row.
>The second form is the same as the first, since
>ALL is the default.
>The third form invokes the aggregate once for each distinct value,
>or set of values, of the expression(s) found in the input rows.
>The last form invokes the aggregate once for each input row; since no
>particular input value is specified, it is generally only useful
>for the count(*) aggregate function.
> 
>Most aggregate functions ignore null inputs, so that rows in which
>one or more of the expression(s) yield null are discarded.  (This
>can be assumed to be true, unless otherwise specified, for all
>built-in aggregates.)

I don't think you need the parentheses, though without them, "This" might be 
better written as "The ignoring of NULLs".

Just my $0.02.

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-01 Thread Simon Riggs

On Wed, 2010-09-01 at 08:33 +0300, Heikki Linnakangas wrote:
> On 01/09/10 04:02, Robert Haas wrote:
> >  See the thread on interruptible sleeps.  The problem
> > right now is that there are some polling loops that act to throttle
> > the maximum rate at which a node doing sync rep can make forward
> > progress, independent of the capabilities of the hardware.
> 
> To be precise, the polling doesn't affect the "bandwidth" the 
> replication can handle, but it introduces a delay wh

We're sending the WAL data in batches. We can't really escape from the
fact that we're effectively using group commit when we use synch rep.
That will necessarily increase delay and require more sessions to get
same throughput.

> >  Those need
> > to be replaced with a system that doesn't inject unnecessary delays
> > into the process, which is what Heikki is working on.
> 
> Right.

> Once we're done with that, all the big questions are still left. How to 
> configure it? What does synchronous replication mean, when is a 
> transaction acknowledged as committed? What to do if a standby server 
> dies and never acknowledges a commit? All these issues have been 
> discussed, but there is no consensus yet.

That sounds an awful lot like performance tuning first and the feature
additions last.

And if you're in the middle of performance tuning, surely some objective
performance tests would help us, no?

IMHO we should be concentrating on how to add the next features because
its clear to me that if you do things in the wrong order you'll be
wasting time. And we don't have much of that, ever.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] array_agg() NULL Handling

2010-09-01 Thread David E. Wheeler

On Sep 1, 2010, at 10:52 AM, Thom Brown wrote:

>>> ould appreciate the recipe for removing the NULLs.
>> 
>> WHERE clause :P
> 
> There may be cases where that's undesirable, such as there being more
> than one aggregate in the SELECT list, or the column being grouped on
> needing to return rows regardless as to whether there's NULLs in the
> column being targeted by array_agg() or not.

Exactly the issue I ran into:

SELECT name AS distribution,
   array_agg(
   CASE relstatus WHEN 'stable'
   THEN version
   ELSE NULL
   END ORDER BY version) AS stable,
   array_agg(
   CASE relstatus
   WHEN 'testing'
   THEN version
   ELSE NULL
   END ORDER BY version) AS testing
  FROM distributions
 GROUP BY name;

  distribution │  stable   │  testing   
 ──┼───┼
  pair │ {NULL,1.0.0,NULL} │ {0.0.1,NULL,1.2.0}
  pgtap│ {NULL}│ {0.0.1}
 (2 rows)

Annoying.

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] array_agg() NULL Handling

2010-09-01 Thread Thom Brown

On 1 September 2010 18:47, David Fetter  wrote:
> On Wed, Sep 01, 2010 at 08:16:41AM -0700, David Wheeler wrote:
>> On Sep 1, 2010, at 12:30 AM, Pavel Stehule wrote:
>>
>> > Docs is wrong :) I like current implementation.  You can remove a
>> > NULLs from aggregation very simply, but different direction isn't
>> > possible
>>
>> Would appreciate the recipe for removing the NULLs.
>
> WHERE clause :P

There may be cases where that's undesirable, such as there being more
than one aggregate in the SELECT list, or the column being grouped on
needing to return rows regardless as to whether there's NULLs in the
column being targeted by array_agg() or not.
-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] array_agg() NULL Handling

2010-09-01 Thread David Fetter

On Wed, Sep 01, 2010 at 08:16:41AM -0700, David Wheeler wrote:
> On Sep 1, 2010, at 12:30 AM, Pavel Stehule wrote:
> 
> > Docs is wrong :) I like current implementation.  You can remove a
> > NULLs from aggregation very simply, but different direction isn't
> > possible
> 
> Would appreciate the recipe for removing the NULLs.

WHERE clause :P

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] array_agg() NULL Handling

2010-09-01 Thread Tom Lane

"David E. Wheeler"  writes:
> On Sep 1, 2010, at 10:12 AM, Tom Lane wrote:
>> Even more to the point, should we deliberately make this vaguer so that
>> we aren't finding ourselves with obsolete text again and again?  You can
>> bet that people adding new aggregates in the future aren't going to
>> think to update this sentence, any more than happened with array_agg.

> Perhaps consult the docs for each aggregate to determine how it handles 
> NULLs.

Hm, actually the whole para needs work.  It was designed at a time when
DISTINCT automatically discarded nulls, which isn't true anymore, and
that fact was patched-in in a very awkward way too.  Perhaps something
like

The first form of aggregate expression invokes the aggregate
once for each input row.
The second form is the same as the first, since
ALL is the default.
The third form invokes the aggregate once for each distinct value,
or set of values, of the expression(s) found in the input rows.
The last form invokes the aggregate once for each input row; since no
particular input value is specified, it is generally only useful
for the count(*) aggregate function.

Most aggregate functions ignore null inputs, so that rows in which
one or more of the expression(s) yield null are discarded.  (This
can be assumed to be true, unless otherwise specified, for all
built-in aggregates.)

Then we have to make sure array_agg is properly documented, but we
don't have to insert something into the description of every single
aggregate, which is what your proposal would require.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] array_agg() NULL Handling

2010-09-01 Thread David E. Wheeler

On Sep 1, 2010, at 10:12 AM, Tom Lane wrote:

> I think when that text was written, it was meant to imply "all the
> aggregates defined in SQL92".  There seems to be a lot of confusion
> in this thread about whether "standard" means "defined by SQL spec"
> or "built-in in Postgres".  Should we try to refine the wording to
> clarify that?

Yes please.

> Even more to the point, should we deliberately make this vaguer so that
> we aren't finding ourselves with obsolete text again and again?  You can
> bet that people adding new aggregates in the future aren't going to
> think to update this sentence, any more than happened with array_agg.

Perhaps “consult the docs for each aggregate to determine how it handles NULLs.”

Best,

David
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] array_agg() NULL Handling

2010-09-01 Thread Tom Lane

"David E. Wheeler"  writes:
> *** 1543,1549 
>   The first form of aggregate expression invokes the aggregate
>   across all input rows for which the given expression(s) yield
>   non-null values.  (Actually, it is up to the aggregate function
> ! whether to ignore null values or not — but all the standard ones 
> do.)
>   The second form is the same as the first, since
>   ALL is the default.  The third form invokes the
>   aggregate for all distinct values of the expressions found
> --- 1543,1550 
>   The first form of aggregate expression invokes the aggregate
>   across all input rows for which the given expression(s) yield
>   non-null values.  (Actually, it is up to the aggregate function
> ! whether to ignore null values or not — but all the standard
> ! ones except array_agg do.)
>   The second form is the same as the first, since
>   ALL is the default.  The third form invokes the
>   aggregate for all distinct values of the expressions found

I think when that text was written, it was meant to imply "all the
aggregates defined in SQL92".  There seems to be a lot of confusion
in this thread about whether "standard" means "defined by SQL spec"
or "built-in in Postgres".  Should we try to refine the wording to
clarify that?

Even more to the point, should we deliberately make this vaguer so that
we aren't finding ourselves with obsolete text again and again?  You can
bet that people adding new aggregates in the future aren't going to
think to update this sentence, any more than happened with array_agg.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] array_agg() NULL Handling

2010-09-01 Thread David E. Wheeler

On Sep 1, 2010, at 12:30 AM, Pavel Stehule wrote:

>> So are the docs right, or is array_agg() right?
> 
> Docs is wrong :) I like current implementation. You can remove a NULLs
> from aggregation very simply, but different direction isn't possible

Patch:

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 9f91939..e301019 100644
*** a/doc/src/sgml/syntax.sgml
--- b/doc/src/sgml/syntax.sgml
*** sqrt(2)
*** 1543,1549 
  The first form of aggregate expression invokes the aggregate
  across all input rows for which the given expression(s) yield
  non-null values.  (Actually, it is up to the aggregate function
! whether to ignore null values or not — but all the standard ones 
do.)
  The second form is the same as the first, since
  ALL is the default.  The third form invokes the
  aggregate for all distinct values of the expressions found
--- 1543,1550 
  The first form of aggregate expression invokes the aggregate
  across all input rows for which the given expression(s) yield
  non-null values.  (Actually, it is up to the aggregate function
! whether to ignore null values or not — but all the standard
! ones except array_agg do.)
  The second form is the same as the first, since
  ALL is the default.  The third form invokes the
  aggregate for all distinct values of the expressions found

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-01 Thread Robert Haas

On Wed, Sep 1, 2010 at 6:23 AM, Heikki Linnakangas
 wrote:
> I'm not sure what we should aim for in the first phase. But if you want as
> little code as possible yet have something useful, I think 'replay' mode
> with no standby registration is the way to go.

IMHO, less is more.  Trying to do too much at once can cause us to
miss the release window (and can also create more bugs).  We just need
to leave the door open to adding later whatever we leave out now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] array_agg() NULL Handling

2010-09-01 Thread David E. Wheeler

On Sep 1, 2010, at 1:06 AM, Thom Brown wrote:

>> I think it might be both.  array_agg doesn't return NULL, it returns
>> an array which contains NULL.
> 
> The second I wrote that, I realised it was b*ll%$ks, as I was still in
> the process of waking up.

I know that feeling.

/me sips his coffee

Best,

David

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] permissions bug in RI checks?

2010-09-01 Thread Tom Lane

David Christensen  writes:
> In doing a schema upgrade, we noticed the following behavior, which certainly 
> seems like a bug.  Steps to reproduce:
> ...
> The bug in this case is that "b" has full permissions on all of the
> underlying tables, but runs into issues when trying to access the
> referenced tables.

Permissions checks for RI operations involve the owner of the table,
from whom you've revoked all permissions.  If the RI operations were
done as the caller, as you seem to expect, that would *not* be an
improvement; callers would have to have more privileges than one really
wants.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] permissions bug in RI checks?

2010-09-01 Thread David Christensen

Hey -hackers,

In doing a schema upgrade, we noticed the following behavior, which certainly 
seems like a bug.  Steps to reproduce:

CREATE USER a;
CREATE USER b;

CREATE TABLE t1 (id serial primary key);
CREATE TABLE t2 (id int references t1(id));

ALTER TABLE t1 OWNER TO a;
ALTER TABLE t2 OWNER TO a;

\c - a

REVOKE ALL ON t1 FROM a;
REVOKE ALL ON t2 FROM a;

GRANT ALL ON t1 TO b;
GRANT ALL ON t2 TO b;

\c - b

INSERT INTO t2 (id) VALUES (1);

ERROR:  permission denied for relation t1
CONTEXT:  SQL statement "SELECT 1 FROM ONLY "public"."t1" x WHERE "id" 
OPERATOR(pg_catalog.=) $1 FOR SHARE OF x"

The bug in this case is that "b" has full permissions on all of the underlying 
tables, but runs into issues when trying to access the referenced tables.  I 
traced this down to the RI checks, specifically the portion in ri_PlanCheck() 
where it calls SetUserIdAndSecContext() and then later runs the queries in the 
context of the owner of the relation.  Since the owner "a" lacks SELECT and 
UPDATE privileges on the table, it is not able to take the ShareLock, and spits 
out the above error.  This behavior does not occur if the object owner is a 
database superuser.  This is presumably because the superuser bypasses the 
regular ACL checks and succeeds regardless.

The behavior was originally noted in 8.1.21, but exists as well in HEAD.

No real resolution proposed, but I wanted to understand the reason behind the 
restrictions if it was intentional behavior.

Thanks,

David

Regards,

David
--
David Christensen
End Point Corporation
da...@endpoint.com





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Path question

2010-09-01 Thread PostgreSQL - Hans-Jürgen Schönig

hello tom,

yeah, we have followed quite a lot of discussion as well ...
and yes, no patches.

as far as this problem is concerned: we are working on a patch and did some 
prototyping inside the planner already (attached).
the code we have is pretty limited atm (such as checking for a sort clause 
explicitly and so on - it has no support for windowing related optimizations 
and so on so far).

the cost model is not our problem - it is a lot easier to read than the code we 
are fighting with (it is a different level of complexity). i think costs can be 
handled.

yes, this merging adds some costs for sure. we might see a hell amount of 
operators being called - but i think given a reasonable number of partitions it 
is still a lot cheaper than actually resorting ... and, it is a lot more space 
efficient.
in my practical case i cannot resort because we would simply run out of space. 
an index scan is expensive but needs no additional sort space ...
and, merge is O(n) which sort is clearly not.

advise is highly appreciated.

many thanks,

hans

push-down-sort-into-inh-2.patch
Description: Binary data

On Sep 1, 2010, at 5:00 PM, Tom Lane wrote:

> =?iso-8859-1?Q?PostgreSQL_-_Hans-J=FCrgen_Sch=F6nig?=  
> writes:
>> On Sep 1, 2010, at 4:10 PM, Tom Lane wrote:
>>> This is really premature, and anything you do along those lines now will
>>> probably never get committed.
> 
>> well, why non-overlapping? the idea is to make append smart enough to
>> take the sorted lists from below and merge them which will give sorted
>> output as well.
> 
> Well, an extra merge step is going to change the cost comparisons quite
> a bit; see Greg Starks' comments.  But in any case, my point wasn't that
> this is something we should never do; it was that it makes more sense to
> wait till something has happened with explicit partitioning.
> 
>>> The project direction is that we are going to add some explicit
>>> representation of partitioned tables.
> 
>> can you outline some ideas here and maybe point to some useful discussion 
>> here?
> 
> There's been boatloads of discussion of partitioning, and at least one
> submitted patch, over the past year or so ...
> 
>   regards, tom lane
> 

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

2010-09-01 Thread Cristian Bittel

Maybe the issue, for the momtent, could be avoided modifying the shared heap
for sessions on Windows. But I don't really have idea how much to increase
or decrease the values. Try and error? But, inside the opened Windows
sessions nothing alerts of a heap exaust so could be unpredictable how much
to change the values until the next PostgreSQL service crash...
32-bits: http://support.microsoft.com/kb/184802

There are several reports for another services with the same behavior
including exit code 128 and a workaround to increase the heap on old Windows
versions but the Exit Code 128 seems to apply to Windows 2003 Server x64
also. And seems to be improved in Windows 2008 where heap is not fixed.
https://fogbugz.bitvise.com/default.asp?WinSSHD.1.12888.2
http://support.microsoft.com/kb/824422

2010/8/31 Bruce Momjian 

> Dave Page wrote:
> > On Tue, Aug 31, 2010 at 4:35 PM, Bruce Momjian  wrote:
> > > Dave Page wrote:
> > >> On Tue, Aug 31, 2010 at 4:27 PM, Bruce Momjian 
> wrote:
> > >> > We have already found that exceeding desktop heap might cause a
> > >> > CreateProcess to return success but later fail with a return code of
> > >> > 128, which causes a server restart.
> > >>
> > >> That doesn't mean that this is desktop heap exhaustion though - just
> > >> that it can cause the same effect.
> > >
> > > Right, but it is the only possible server crash cause we have come up
> > > with so far.
> >
> > Understood - I'm just unconvinced it's the cause - aside from the
> > point I made earlier about heap exhaustion being very predictable and
> > reproducible (which this issue apparently is not), when the server is
> > run under the SCM, it creates a logon session for that service alone
> > which has it's own heap allocation which is entirely independent of
> > the allocation used by any interactive logon sessions.
> >
> > So unless there's a major isolation bug in Windows, any desktop heap
> > usage in an interactive session for one user should have zero effect
> > on a non-interactive session for another user.
>
> Well, the only description that we have ever heard that makes sense is
> some kind of heap exhaustion, perhaps triggered by a Windows bug that
> doesn't properly track heap allocations sometimes.
>
> Of course, the cause might be aliens, but we don't have any evidence of
> that either.  :-|
>
> What we do know is that CreateProcess is returning success, and the
> child is exiting with 128 no_such_child, and that logging out can
> trigger it sometimes.
>
> --
>  Bruce Momjian  http://momjian.us
>  EnterpriseDB http://enterprisedb.com
>
>  + It's impossible for everything to be true. +
>

Re: [HACKERS] array_agg() NULL Handling

2010-09-01 Thread David E. Wheeler

On Sep 1, 2010, at 12:30 AM, Pavel Stehule wrote:

> Docs is wrong :) I like current implementation. You can remove a NULLs
> from aggregation very simply, but different direction isn't possible

Would appreciate the recipe for removing the NULLs.

Best,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #5305: Postgres service stops when closing Windows session

2010-09-01 Thread Dave Page

On Wed, Sep 1, 2010 at 3:49 PM, Cristian Bittel  wrote:
> Maybe the issue, for the momtent, could be avoided modifying the shared heap
> for sessions on Windows. But I don't really have idea how much to increase
> or decrease the values. Try and error? But, inside the opened Windows
> sessions nothing alerts of a heap exaust so could be unpredictable how much
> to change the values until the next PostgreSQL service crash...
> 32-bits: http://support.microsoft.com/kb/184802
>
> There are several reports for another services with the same behavior
> including exit code 128 and a workaround to increase the heap on old Windows
> versions but the Exit Code 128 seems to apply to Windows 2003 Server x64
> also. And seems to be improved in Windows 2008 where heap is not fixed.
> https://fogbugz.bitvise.com/default.asp?WinSSHD.1.12888.2
> http://support.microsoft.com/kb/824422

Given the unpredictability, if this is connected to desktop heap I
don't think it's running out of per-session memory, so much as the
system-wide heap (which, afaict, is fixed at 48MB). That might explain
why a desktop session could affect other sessions.

Is this a terminal server, with lots of interactive users? Can you
check the heap usage using the desktop heap monitor:
http://www.microsoft.com/downloads/details.aspx?familyid=5cfc9b74-97aa-4510-b4b9-b2dc98c8ed8b&displaylang=en


-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] "serializable" in comments and names

2010-09-01 Thread Kevin Grittner

There are many comments in the code which use "serializable
transaction" to mean "transaction snapshot based transaction".  This
doesn't matter much as long as REPEATABLE READ and SERIALIZABLE
transaction isolation levels behave identically, but will be
confusing and inaccurate when there is any difference between the
two.  In a similar way, the static bool registered_serializable in
snapmgr.c will become misleading, and the IsXactIsoLevelSerializable
macro is bound to lead to confusion.
 
The patch to implement true serializable transactions using SSI
renames/rewords these things to avoid confusion.  However, there are
seven files which are changed only for this reason, and another
where there is one "real" change of two lines hidden in the midst of
dozens of lines of such wording changes.  I find it distracting to
have all this mixed in, and I fear that it will be a time-waster for
anyone reviewing the meat of the patch.  I'd like to submit a
"no-op" patch to cover these issues in advance of the CF, to get
them out of the way.  Joe suggested that I pose the issue to the
-hackers list.
 
I could knock out a couple other files from the main patch if people
considered it acceptable to enable the SHMQueueIsDetached function
now, which the patch uses in several places within asserts.  I would
remove the #ifdef NOT_USED from around the (very short) function,
and add it to the .h file.
 
The changes to the comments and local variables seem pretty safe. 
The change of IsXactIsoLevelSerializable to
IsXactIsoLevelXactSnapshotBased (or whatever name the community
prefers) could break the compile of external code; but the fix would
be pretty obvious.  It's hard to see how the change of a local
variable name would present a lot of risk.  So I see this as an
extremely low-risk no-op patch to lay the groundwork for the main
patch, so that it is easier to read.
 
Thoughts?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Path question

2010-09-01 Thread Tom Lane

=?iso-8859-1?Q?PostgreSQL_-_Hans-J=FCrgen_Sch=F6nig?=  
writes:
> On Sep 1, 2010, at 4:10 PM, Tom Lane wrote:
>> This is really premature, and anything you do along those lines now will
>> probably never get committed.

> well, why non-overlapping? the idea is to make append smart enough to
> take the sorted lists from below and merge them which will give sorted
> output as well.

Well, an extra merge step is going to change the cost comparisons quite
a bit; see Greg Starks' comments.  But in any case, my point wasn't that
this is something we should never do; it was that it makes more sense to
wait till something has happened with explicit partitioning.

>> The project direction is that we are going to add some explicit
>> representation of partitioned tables.

> can you outline some ideas here and maybe point to some useful discussion 
> here?

There's been boatloads of discussion of partitioning, and at least one
submitted patch, over the past year or so ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: register/unregister standby Re: [HACKERS] Synchronous replication

2010-09-01 Thread Heikki Linnakangas


On 30/08/10 15:14, Fujii Masao wrote:

I think that the advantage of registering standbys is that we can
specify which WAL files the master has to keep for the upcoming
standby. IMO, it's usually called together with pg_start_backup
as follows:

 SELECT register_standby('foo', pg_start_backup())

This requests the master keep to all the WAL files following the
backup starting location which pg_start_backup returns.


Hmm, that's clever. I was thinking that you'd initialize the standby 
from an existing backup, and in that context the standby would not need 
to connect to the master except via the replication connection. To take 
a base backup, you'll need not only that but also access to the 
filesystem in the master, ie. shell access.


There's been some talk of being able to stream a base backup over the 
replication connection too, which would be extremely handy. And that 
makes my point even stronger that registering a standby should be 
possible via the replication connection.


Of course, you could well expose the functionality as both a built-in 
function and a command in replication mode, so this detail isn't really 
that important right now.



--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: ECPG dynamic cursor fix for UPDATE/DELETE ... WHERE CURRENT OF :curname

2010-09-01 Thread Michael Meskes

> 1. The statement
> 
> UPDATE table SET fld1 = :input1
> WHERE CURRENT OF :curname
> RETURNING id + :input2;
> 
> is transformed into
> 
> UPDATE table SET fld1 = $1
> WHERE CURRENT OF $0
> RETURNING id + $2;
> 
> and the $0 is past $1. The current code cannot deal with such
> a messed up order, and scanning the original query twice is
> needed, once for $0 substitution, once for mapping $1, etc. to
> the other input variables.

I cannot seem to reproduce this bug. Could you please send me an example that
makes this reproducible? Yes, I know that I have to change preproc.y to allow
for variable cursor names but in my test case everything seems to work well and
$0 gets replaced by the cursor name.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
ICQ 179140304, AIM/Yahoo/Skype michaelmeskes, Jabber mes...@jabber.org
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Path question

2010-09-01 Thread PostgreSQL - Hans-Jürgen Schönig

On Sep 1, 2010, at 4:10 PM, Tom Lane wrote:

> Boszormenyi Zoltan  writes:
>> we are experimenting with modifying table partitioning
>> so the ORDER BY clause can be pushed down to
>> child nodes on the grounds that:
> 
> This is really premature, and anything you do along those lines now will
> probably never get committed.  The problem is that the transformation
> you propose is wrong unless the planner can prove that the different
> child tables contain nonoverlapping ranges of the sort key.  Now you
> might be intending to add logic to try to prove that from inspection of
> constraints, but I don't believe that reverse-engineering such knowledge
> on the fly is a sane approach: it will be hugely expensive and will add
> that cost even in many situations where the optimization fails to apply.
> 


well, why non-overlapping? the idea is to make append smart enough to take the 
sorted lists from below and merge them which will give sorted output as well.
my original idea was what you described but given Martijn van Oosterhout's 
posting we were pretty confident that we can get along without non-overlapping 
partitions.


> The project direction is that we are going to add some explicit
> representation of partitioned tables.  After that, the planner can just
> know immediately that a range-partitioned sort key is amenable to this
> treatment, and at that point it'll make sense to work on it.
> 


can you outline some ideas here and maybe point to some useful discussion here?


many thanks,

hans


--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Path question

2010-09-01 Thread Greg Stark

2010/9/1 Boszormenyi Zoltan :
> we are experimenting with modifying table partitioning
> so the ORDER BY clause can be pushed down to
> child nodes on the grounds that:
> 1. smaller datasets are faster to sort, e.g. two datasets that almost
>    spill out to disk are faster to sort in memory and later merge them
>    than the union set that spills out to disk, or two larger sets
>    that spill out to disk are faster to sort individually than the
>    union dataset (because of the longer seeks, etc)

For what it's worth this logic doesn't necessarily hold. Sorting two
data sets in memory is only faster if you only need one result set at
a time. If you know the first set contains only records which come
before the second then you can do this but if not then you'll need all
the result sets to be available at the same time which will require
spilling them to disk. If you have to do that then you might as well
do a tapesort anyways.

> 2. individual child nodes can have indexes that produce
>    the sorted output already

This is the big win. Currently Append nodes mean throwing out any
ordering from the child nodes and that's important information to be
losing.

> Currently I am abusing the AppendPath node but later I will add a new
> executor node that will merge the sorted input coming from the children.

You realize I already did this a few years ago. Search for "merge
append" on the lists. The hard part -- as usual -- ends up being
figuring out in the planner when to do this optimization and how to
avoid exponential growth in the number of plans considered and the
amount of data retained about each plan.


For what it's worth I disagree with Tom. I think this is a situation
where we need *both* types of solution. Ideally we will be able to use
a plain Append node for cases where we know the relative ordering of
the data in different partitions, but there will always be cases where
the structured partition data doesn't actually match up with the
ordering requested and we'll need to fall back to a merge-append node.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Path question

2010-09-01 Thread Tom Lane

Boszormenyi Zoltan  writes:
> we are experimenting with modifying table partitioning
> so the ORDER BY clause can be pushed down to
> child nodes on the grounds that:

This is really premature, and anything you do along those lines now will
probably never get committed.  The problem is that the transformation
you propose is wrong unless the planner can prove that the different
child tables contain nonoverlapping ranges of the sort key.  Now you
might be intending to add logic to try to prove that from inspection of
constraints, but I don't believe that reverse-engineering such knowledge
on the fly is a sane approach: it will be hugely expensive and will add
that cost even in many situations where the optimization fails to apply.

The project direction is that we are going to add some explicit
representation of partitioned tables.  After that, the planner can just
know immediately that a range-partitioned sort key is amenable to this
treatment, and at that point it'll make sense to work on it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Path question

2010-09-01 Thread Boszormenyi Zoltan

Hi,

we are experimenting with modifying table partitioning
so the ORDER BY clause can be pushed down to
child nodes on the grounds that:
1. smaller datasets are faster to sort, e.g. two datasets that almost
spill out to disk are faster to sort in memory and later merge them
than the union set that spills out to disk, or two larger sets
that spill out to disk are faster to sort individually than the
union dataset (because of the longer seeks, etc)
2. individual child nodes can have indexes that produce
the sorted output already

Currently I am abusing the AppendPath node but later I will add a new
executor node that will merge the sorted input coming from the children.

I added the pathkey to the AppendPath to reflect that it produces
sorted output and I am adding the Sort plan to the children.

My problem is:

zozo=# explain select * from t1 where d = '2008-01-01' order by d;
QUERY
PLAN
---
 Result  (cost=8.28..33.13 rows=4 width=40)
   ->  Append  (cost=8.28..33.13 rows=4 width=40)
 ->  Sort  (cost=8.28..8.28 rows=1 width=40)
   Sort Key: public.t1.d
   ->  Index Scan using t1_d_key on t1  (cost=0.00..8.27
rows=1 width=40)
 Index Cond: (d = '2008-01-01'::date)
 ->  Sort  (cost=8.28..8.28 rows=1 width=40)
   Sort Key: public.t1.d
   ->  Index Scan using t1_2008_d_key on t1_2008 t1 
(cost=0.00..8.27 rows=1 width=40)
 Index Cond: (d = '2008-01-01'::date)
 ->  Sort  (cost=8.28..8.28 rows=1 width=40)
   Sort Key: public.t1.d
   ->  Index Scan using t1_2009_d_key on t1_2009 t1 
(cost=0.00..8.27 rows=1 width=40)
 Index Cond: (d = '2008-01-01'::date)
 ->  Sort  (cost=8.28..8.28 rows=1 width=40)
   Sort Key: public.t1.d
   ->  Index Scan using t1_2010_d_key on t1_2010 t1 
(cost=0.00..8.27 rows=1 width=40)
 Index Cond: (d = '2008-01-01'::date)
(18 rows)

In one leaf, e.g.:

 ->  Sort  (cost=8.28..8.28 rows=1 width=40)
   Sort Key: public.t1.d
   ->  Index Scan using t1_2010_d_key on t1_2010 t1 
(cost=0.00..8.27 rows=1 width=40)
 Index Cond: (d = '2008-01-01'::date)

The plan is scanning the t_2010 child table, but the Sort is trying to
sort by the fully qualified parent table. I think this is a problem
but I don't know how to solve it. I have tried transforming the
parent query with

 adjust_appendrel_attrs((Node *) parse, appinfo)

where parse is

 Query  *parse = root->parse;

in set_append_rel_pathlist() and the transformed query trees are
used for the children with

make_sort_from_sortclauses(root, query->sortClause, subplan)

in create_append_plan(). adjust_appendrel_attrs() seems to be prepared
to be called with a Query * , so I don't know why the above leaf plan
doesn't show "Sort Key: public.t1_2010.d" and so on.

Can someone help me?

Best regards,
Zoltán Böszötményi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] git: uh-oh

2010-09-01 Thread Magnus Hagander

On Wed, Sep 1, 2010 at 02:33, Robert Haas  wrote:
> On Tue, Aug 31, 2010 at 5:07 PM, Magnus Hagander  wrote:
>> On Tue, Aug 31, 2010 at 19:46, Magnus Hagander  wrote:
>>> On Tue, Aug 31, 2010 at 19:44, Tom Lane  wrote:
 Magnus Hagander  writes:
> Ok. I've got a new migration runinng. Here's the revisions removed:
> RCS file: /usr/local/cvsroot/pgsql/src/backend/parser/Attic/gram.c,v
> deleting revision 2.88
> RCS file: 
> /usr/local/cvsroot/pgsql/src/interfaces/ecpg/preproc/Attic/pgc.c,v
> deleting revision 1.2
> RCS file: 
> /usr/local/cvsroot/pgsql/src/interfaces/ecpg/preproc/Attic/preproc.c,v
> deleting revision 1.6

 Hmm, it looks like you deleted the file deletion events (the versions
 cited above).  Not sure this is the right thing.  Check to see if the
 files are still there according to the converted git history ...
>>>
>>> Oh, drat. That's right. It shouldn't have been inclusive :S
>>>
>>> I'll abort the conversion and run it again :)
>>
>> Ok, I've pushed a clone of the new repository with these modifications to:
>>
>> http://git.postgresql.org/gitweb?p=git-migration-test.git;a=summary
>>
>> Haven't had the time to dig into it yet, so please go ahead anybody
>> who wants to :-)
>
> That definitely didn't fix it, although I'm not quite sure why.  Can
> you throw the modified CVS you ran this off of up somewhere I can
> rsync it?

no rsync server on that box, but I put up a tarball for you at
http://www.hagander.net/pgsql/cvsrepo.tgz


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-01 Thread Heikki Linnakangas


On 01/09/10 10:53, Fujii Masao wrote:

Before discussing about that, we should determine whether registering
standbys in master is really required. It affects configuration a lot.
Heikki thinks that it's required, but I'm still unclear about why and
how.

Why do standbys need to be registered in master? What information
should be registered?


That requirement falls out from the handling of disconnected standbys. 
If a standby is not connected, what does the master do with commits? If 
the answer is anything else than acknowledge them to the client 
immediately, as if the standby never existed, the master needs to know 
what standby servers exist. Otherwise it can't know if all the standbys 
are connected or not.



What does synchronous replication mean, when is a transaction
acknowledged as committed?


I proposed four synchronization levels:

1. async
   doesn't make transaction commit wait for replication, i.e.,
   asynchronous replication. This mode has been already supported in
   9.0.

2. recv
   makes transaction commit wait until the standby has received WAL
   records.

3. fsync
   makes transaction commit wait until the standby has received and
   flushed WAL records to disk

4. replay
   makes transaction commit wait until the standby has replayed WAL
   records after receiving and flushing them to disk

OTOH, Simon proposed the quorum commit feature. I think that both
is required for various our use cases. Thought?


I'd like to keep this as simple as possible, yet flexible so that with 
enough scripting and extensions, you can get all sorts of behavior. I 
think quorum commit falls into the "extension" category; if you're setup 
is complex enough, it's going to be impossible to represent that in our 
config files no matter what. But if you write a little proxy, you can 
implement arbitrary rules there.


I think recv/fsync/replay should be specified in the standby. It has no 
direct effect on the master, the master would just relay the setting to 
the standby when it connects, or the standby would send multiple 
XLogRecPtrs and let the master decide when the WAL is persistent enough. 
And what if you write a proxy that has some other meaning of "persistent 
enough"? Like when it has been written to the OS buffers but not yet 
fsync'd, or when it has been fsync'd to at least one standby and 
received by at least three others. recv/fsync/replay is not going to 
represent that behavior well.


"sync vs async" on the other hand should be specified in the master, 
because it has a direct impact on the behavior of commits in the master.


I propose a configuration file standbys.conf, in the master:

# STANDBY NAMESYNCHRONOUS   TIMEOUT
importantreplica  yes   100ms
tempcopy  no10s

Or perhaps this should be stored in a system catalog.


What to do if a standby server dies and never
acknowledges a commit?


The master's reaction to that situation should be configurable. So
I'd propose new configuration parameter specifying the reaction.
Valid values are:

- standalone
   When the master has waited for the ACK much longer than the timeout
   (or detected the failure of the standby), it closes the connection
   to the standby and restarts transactions.

- down
   When that situation occurs, the master shuts down immediately.
   Though this is unsafe for the system requiring high availability,
   as far as I recall, some people wanted this mode in the previous
   discussion.


Yeah, though of course you might want to set that per-standby too..


Let's step back a bit and ask what would be the simplest thing that you 
could call "synchronous replication" in good conscience, and also be 
useful at least to some people. Let's leave out the "down" mode, because 
that requires registration. We'll probably have to do registration at 
some point, but let's take as small steps as possible.


Without the "down" mode in the master, frankly I don't see the point of 
the "recv" and "fsync" levels in the standby. Either way, when the 
master acknowledges a commit to the client, you don't know if it has 
made it to the standby yet because the replication connection might be 
down for some reason.


That leaves us the 'replay' mode, which *is* useful, because it gives 
you the guarantee that when the master acknowledges a commit, it will 
appear committed in all hot standby servers that are currently 
connected. With that guarantee you can build a reliable cluster with 
something pgpool-II where all writes go to one node, and reads are 
distributed to multiple nodes.


I'm not sure what we should aim for in the first phase. But if you want 
as little code as possible yet have something useful, I think 'replay' 
mode with no standby registration is the way to go.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hacker

Re: I: [HACKERS] About "Our CLUSTER implementation is pessimal" patch

2010-09-01 Thread Itagaki Takahiro

On Tue, Aug 31, 2010 at 8:04 PM, Itagaki Takahiro
 wrote:
> * How to determine which method was used?
>  We can get some information from trace_sort logs,
>  but CLUSTER VERBOSE would be better to log
>  which CLUSTER method was used.

I wrote a patch to improve CLUSTER VERBOSE (and VACUUM FULL VERBOSE).
The patch should be applied after sorted_cluster-20100721.patch .

* clustering "schema.table" by index scan on "index"
* clustering "schema.table" by sequential scan and sort

It also adds VACUUM VERBOSE-like logs:
INFO:  "table": found 1 removable, 11 nonremovable row versions in
1655 pages
DETAIL:  1 dead row versions cannot be removed yet.
CPU 0.03s/0.06u sec elapsed 0.21 sec.

Note that the patch says nothing about reindexing. But if
required, I'm willing to add some VERBOSE messages for
indexes (ex. REINDEX VERBOSE)

-- 
Itagaki Takahiro

cluster_verbose-20100901.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] leaky views, yet again

2010-09-01 Thread KaiGai Kohei

(2010/07/21 19:35), KaiGai Kohei wrote:
> (2010/07/21 19:26), Robert Haas wrote:
>> 2010/7/21 KaiGai Kohei:
 On the other hand, if it's enough from a performance
 point of view to review and mark only a few built-in functions like
 index operators, maybe it's ok.

>>> I also think it is a worthful idea to try as a proof-of-concept.
>>
>> Yeah. So, should we mark this patch as Returned with Feedback, and
>> you can submit a proof-of-concept patch for the next CF?
>>
> Yes, it's fair enough.
> 
The attached patch is a proof-of-concept one according to the suggestion
from Heikki before.

Right now, it stands on a strict assumption that considers operators
implemented with built-in functions are safe; it does not have no
possibility to leak supplied arguments anywhere.

This patch modifies the logic that the planner decides where the
given qualifier clause should be distributed.

If the clause does not contain any "leakable" functions, nothing
were changed. In this case, the clause shall be pushed down into
inside of the join, if it depends on one-side of the join.

Elsewhere, if the clause contains any "leakable" functions, this
patch prevents to push down the clause into join loop, because
the clause need to be evaluated after the condition of join.
If it would not be prevented, the "leakable" function may expose
the contents to be invisible for users; due to the VIEWs for
row-level security purpose.

Example

postgres=# CREATE OR REPLACE FUNCTION f_policy(int)
   RETURNS bool LANGUAGE 'plpgsql'
   AS 'begin return $1 % 2 = 0; end;';
CREATE FUNCTION
postgres=# CREATE OR REPLACE FUNCTION f_malicious(text)
   RETURNS bool LANGUAGE 'plpgsql' COST 0.001
   AS 'begin raise notice ''leak: %'', $1; return true; end';
CREATE FUNCTION
postgres=# CREATE TABLE t1 (a int primary key, b text);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "t1_pkey" for 
table "t1"
CREATE TABLE
postgres=# CREATE TABLE t2 (x int primary key, y text);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "t2_pkey" for 
table "t2"
CREATE TABLE
postgres=# INSERT INTO t1 VALUES (1,'aaa'), (2,'bbb'), (3,'ccc');
INSERT 0 3
postgres=# INSERT INTO t2 VALUES (2,'xxx'), (3,'yyy'), (4,'zzz');
INSERT 0 3
postgres=# CREATE VIEW v AS SELECT * FROM t1 JOIN t2 ON f_policy(a+x);
CREATE VIEW

* SELECT * FROM v WHERE f_malicious(b);

[without this patch]
postgres=# EXPLAIN SELECT * FROM v WHERE f_malicious(b);
QUERY PLAN
--
 Nested Loop  (cost=0.00..133685.13 rows=168100 width=72)
   Join Filter: f_policy((t1.a + t2.x))
   ->  Seq Scan on t2  (cost=0.00..22.30 rows=1230 width=36)
   ->  Materialize  (cost=0.00..24.35 rows=410 width=36)
 ->  Seq Scan on t1  (cost=0.00..22.30 rows=410 width=36)
   Filter: f_malicious(b)
(6 rows)
 ==> f_malicious() may be raise a notice about invisible tuples.

[with this patch]
postgres=# EXPLAIN SELECT * FROM v WHERE f_malicious(b);
QUERY PLAN
---
 Nested Loop  (cost=0.00..400969.96 rows=168100 width=72)
   Join Filter: (f_malicious(t1.b) AND f_policy((t1.a + t2.x)))
   ->  Seq Scan on t1  (cost=0.00..22.30 rows=1230 width=36)
   ->  Materialize  (cost=0.00..28.45 rows=1230 width=36)
 ->  Seq Scan on t2  (cost=0.00..22.30 rows=1230 width=36)
(5 rows)
 ==> f_malicious() is moved to outside of the join.
 It is evaluated earlier than f_policy() in same level due to
 the function cost, but it is another matter.


* SELECT * FROM v WHERE a = 2;
[without this patch]
postgres=# EXPLAIN SELECT * FROM v WHERE a = 2;
   QUERY PLAN
-
 Nested Loop  (cost=0.00..353.44 rows=410 width=72)
   Join Filter: f_policy((t1.a + t2.x))
   ->  Index Scan using t1_pkey on t1  (cost=0.00..8.27 rows=1 width=36)
 Index Cond: (a = 2)
   ->  Seq Scan on t2  (cost=0.00..22.30 rows=1230 width=36)
(5 rows)

[with this patch]
postgres=# EXPLAIN SELECT * FROM v WHERE a = 2;
   QUERY PLAN
-
 Nested Loop  (cost=0.00..353.44 rows=410 width=72)
   Join Filter: f_policy((t1.a + t2.x))
   ->  Index Scan using t1_pkey on t1  (cost=0.00..8.27 rows=1 width=36)
 Index Cond: (a = 2)
   ->  Seq Scan on t2  (cost=0.00..22.30 rows=1230 width=36)
(5 rows)

  ==> "a = 2" is a built-in operator, so we assume it is safe.
  This clause was pushed down into the join loop, then utilized to
  index scan.


* SELECT * FROM v WHERE a::text = 'a';

[without this patch]
postgres=# EXPLAIN SELECT * FROM v WHERE a::text = '2';
   QUERY PLAN
-

Re: [HACKERS] array_agg() NULL Handling

2010-09-01 Thread Thom Brown

On 1 September 2010 07:56, Thom Brown  wrote:
> On 1 September 2010 06:45, David E. Wheeler  wrote:
>> The aggregate docs say:
>>
>>> The first form of aggregate expression invokes the aggregate across all 
>>> input rows for which the given expression(s) yield non-null values. 
>>> (Actually, it is up to the aggregate function whether to ignore null values 
>>> or not — but all the standard ones do.)
>>
>> -- 
>> http://developer.postgresql.org/pgdocs/postgres/sql-expressions.html#SYNTAX-AGGREGATES
>>
>> That, however, is not true of array_agg():
>>
>> try=# CREATE TABLE foo(id int);
>> CREATE TABLE
>> try=# INSERT INTO foo values(1), (2), (NULL), (3);
>> INSERT 0 4
>> try=# select array_agg(id) from foo;
>>  array_agg
>> ──
>>  {1,2,NULL,3}
>> (1 row)
>>
>> So are the docs right, or is array_agg() right?
>
> I think it might be both.  array_agg doesn't return NULL, it returns
> an array which contains NULL.

The second I wrote that, I realised it was b*ll%$ks, as I was still in
the process of waking up.

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-01 Thread Fujii Masao

On Wed, Sep 1, 2010 at 2:33 PM, Heikki Linnakangas
 wrote:
> Once we're done with that, all the big questions are still left.

Yeah, let's discuss about those topics :)

> How to configure it?

Before discussing about that, we should determine whether registering
standbys in master is really required. It affects configuration a lot.
Heikki thinks that it's required, but I'm still unclear about why and
how.

Why do standbys need to be registered in master? What information
should be registered?

> What does synchronous replication mean, when is a transaction
> acknowledged as committed?

I proposed four synchronization levels:

1. async
  doesn't make transaction commit wait for replication, i.e.,
  asynchronous replication. This mode has been already supported in
  9.0.

2. recv
  makes transaction commit wait until the standby has received WAL
  records.

3. fsync
  makes transaction commit wait until the standby has received and
  flushed WAL records to disk

4. replay
  makes transaction commit wait until the standby has replayed WAL
  records after receiving and flushing them to disk

OTOH, Simon proposed the quorum commit feature. I think that both
is required for various our use cases. Thought?

> What to do if a standby server dies and never
> acknowledges a commit?

The master's reaction to that situation should be configurable. So
I'd propose new configuration parameter specifying the reaction.
Valid values are:

- standalone
  When the master has waited for the ACK much longer than the timeout
  (or detected the failure of the standby), it closes the connection
  to the standby and restarts transactions.

- down
  When that situation occurs, the master shuts down immediately.
  Though this is unsafe for the system requiring high availability,
  as far as I recall, some people wanted this mode in the previous
  discussion.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] array_agg() NULL Handling

2010-09-01 Thread Pavel Stehule

2010/9/1 David E. Wheeler :
> The aggregate docs say:
>
>> The first form of aggregate expression invokes the aggregate across all 
>> input rows for which the given expression(s) yield non-null values. 
>> (Actually, it is up to the aggregate function whether to ignore null values 
>> or not — but all the standard ones do.)
>
> -- 
> http://developer.postgresql.org/pgdocs/postgres/sql-expressions.html#SYNTAX-AGGREGATES
>
> That, however, is not true of array_agg():
>
> try=# CREATE TABLE foo(id int);
> CREATE TABLE
> try=# INSERT INTO foo values(1), (2), (NULL), (3);
> INSERT 0 4
> try=# select array_agg(id) from foo;
>  array_agg
> ──
>  {1,2,NULL,3}
> (1 row)
>
> So are the docs right, or is array_agg() right?

Docs is wrong :) I like current implementation. You can remove a NULLs
from aggregation very simply, but different direction isn't possible

Regards
Pavel Stehule
>
> Best,
>
> David
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)

2010-09-01 Thread Heikki Linnakangas


On 31/08/10 15:47, Fujii Masao wrote:

On Fri, Aug 27, 2010 at 4:39 PM, Fujii Masao  wrote:

/*
  * XXX: Should we invent an API to wait for data coming from the
  * client connection too? It's not critical, but we could then
  * eliminate the timeout altogether and go to sleep for good.
  */


Yes, it would be very helpful when walsender waits for the ACK from
the standby in upcoming synchronous replication.


I propose to change WaitLatch() so that it accepts the socket
descriptor as an argument, to wait for data coming from the
client connection.


Yeah, we probably should do that now.


 WaitLatch() monitors not only the self-pipe
but also the given socket. If -1 is supplied, it checks only
the self-pipe.


The obvious next question is how to wait for multiple sockets and a 
latch at the same time? Perhaps we should have a select()-like interface 
where you can pass multiple file descriptors. Then again, looking at the 
current callers of select() in the backend, apart from postmaster they 
all wait for only one fd.



The socket should be monitored by using poll() if the platform
has it, since poll() is usually more efficient.


Yeah, I guess.


So I'd like to change Unix implementation of WaitLatch() as
follows. Thought?

---
define WaitLatch(latch, timeout) WaitLatchAndSocket(latch, -1, timeout)

void WaitLatchAndSocket(Latch *latch, int sock, long timeout);
{
 ...

 FD_SET(selfpipe_readfd,&input_mask);
 if (sock != -1)
 FD_SET(sock,&input_mask);

#ifdef HAVE_POLL
 poll(...)
#else
 select(...)
  #endif   /* HAVE_POLL */

 ...
}
---


Yep.


Windows implementation of WaitLatchAndSocket() seems not to be
so simple. We would need to wait for both the latch event and
the packet from the socket by using WaitForMultipleObjectsEx().


Well, we already use WaitForMultipleObjectsEx() to implement select() on 
Windows, so it should be straightforward to copy that. I'll look into that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] array_agg() NULL Handling

2010-09-01 Thread David E. Wheeler

On Aug 31, 2010, at 11:56 PM, Thom Brown wrote:

>>> The first form of aggregate expression invokes the aggregate across all 
>>> input rows for which the given expression(s) yield non-null values. 
>>> (Actually, it is up to the aggregate function whether to ignore null values 
>>> or not — but all the standard ones do.)
>> 
>> -- 
>> http://developer.postgresql.org/pgdocs/postgres/sql-expressions.html#SYNTAX-AGGREGATES
>> 
>> That, however, is not true of array_agg():
>> 
>> try=# CREATE TABLE foo(id int);
>> CREATE TABLE
>> try=# INSERT INTO foo values(1), (2), (NULL), (3);
>> INSERT 0 4
>> try=# select array_agg(id) from foo;
>>  array_agg
>> ──
>>  {1,2,NULL,3}
>> (1 row)
>> 
>> So are the docs right, or is array_agg() right?
> 
> I think it might be both.  array_agg doesn't return NULL, it returns
> an array which contains NULL.

No, string_agg() doesn't work this way, for example:

select string_agg(id::text, ',') from foo;
 string_agg 

 1,2,3
(1 row)

Note that it's not:

select string_agg(id::text, ',') from foo;
 string_agg 

 1,2,,3
(1 row)

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

68 matches

Mail list logo