Re: [HACKERS] Pre-alloc ListCell's optimization

2011-05-24 Thread Stephen Frost
* Stephen Frost (sfr...@snowman.net) wrote:
   Finally, sorry it's kind of a fugly patch, it's just a proof of
   concept and I'd be happy to clean it up if others feel it's worthwhile
   and a reasonable approach, but I really need to get it out there and
   take a break from it (I've been a bit obsessive-compulsive about it
   since PGCon.. :D).

Erm, sorry, just to clarify, while it's a P-O-C patch, it does compile
cleanly and passes all the regression tests, so it's something that one
can play with at least.  Not sure if it'd be worth benchmarking it until
we feel comfortable that this is a decent approach, but I wouldn't
complain if someone decided to...

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Should partial dumps include extensions?

2011-05-24 Thread Robert Haas
On Tue, May 24, 2011 at 4:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 There's a complaint here
 http://archives.postgresql.org/pgsql-general/2011-05/msg00714.php
 about the fact that 9.1 pg_dump always dumps CREATE EXTENSION commands
 for all loaded extensions.  Should we change that?  A reasonable
 compromise might be to suppress extensions in the same cases where we
 suppress procedural languages, ie if --schema or --table was used
 (see include_everything switch in pg_dump.c).

Making it work like procedural languages seems sensible to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pre-alloc ListCell's optimization

2011-05-24 Thread Alvaro Herrera
Excerpts from Stephen Frost's message of mar may 24 22:56:21 -0400 2011:

   A couple of notes regarding the patch:
 
   First, it uses ffs(), which might not be fully portable..  We could
   certainly implement the same thing in userspace and use ffs() when
   it's available.

Err, see RIGHTMOST_ONE in bitmapset.c.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tackling full page writes

2011-05-24 Thread Bruce Momjian
Robert Haas wrote:
 2. The other fairly obvious alternative is to adjust our existing WAL
 record types to be idempotent - i.e. to not rely on the existing page
 contents.  For XLOG_HEAP_INSERT, we currently store the target tid and
 the tuple contents.  I'm not sure if there's anything else, but we
 would obviously need the offset where the new tuple should be written,
 which we currently infer from reading the existing page contents.  For
 XLOG_HEAP_DELETE, we store just the TID of the target tuple; we would
 certainly need to store its offset within the block, and maybe the
 infomask.  For XLOG_HEAP_UPDATE, we'd need the old and new offsets and
 perhaps also the old and new infomasks.  Assuming that's all we need
 and I'm not missing anything (which I won't bet on), that means we'd
 be adding, say, 4 bytes per insert or delete and 8 bytes per update.
 So, if checkpoints are spread out widely enough that there will be
 more than ~2K operations per page between checkpoints, then it makes
 more sense to just do a full page write and call it good.  If not,
 this idea might have legs.

I vote for wal_level = idempotent because so few people will know what
idempotent means.  ;-)

Idempotent does seem like the most promising idea.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] The way to know whether the standby has caught up with the master

2011-05-24 Thread Fujii Masao
Hi,

For reliable high-availability, when the master crashes, the clusterware must
know whether it can promote the standby safely without any data loss,
before actually promoting it. IOW, it must know whether the standby has
already caught up with the primary. Otherwise, failover might cause data loss.
We can know that from pg_stat_replication on the master. But the problem
is that pg_stat_replication is not available since the master is not running at
that moment. So that info should be available also on the standby.

To achieve that, I'm thinking to change walsender so that, when the standby
has caught up with the master, it sends back the message indicating that to
the standby. And I'm thinking to add new function (or view like
pg_stat_replication)
available on the standby, which shows that info.

Thought?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The way to know whether the standby has caught up with the master

2011-05-24 Thread Heikki Linnakangas

On 25.05.2011 07:42, Fujii Masao wrote:

For reliable high-availability, when the master crashes, the clusterware must
know whether it can promote the standby safely without any data loss,
before actually promoting it. IOW, it must know whether the standby has
already caught up with the primary. Otherwise, failover might cause data loss.
We can know that from pg_stat_replication on the master. But the problem
is that pg_stat_replication is not available since the master is not running at
that moment. So that info should be available also on the standby.

To achieve that, I'm thinking to change walsender so that, when the standby
has caught up with the master, it sends back the message indicating that to
the standby. And I'm thinking to add new function (or view like
pg_stat_replication)
available on the standby, which shows that info.


By the time the standby has received that message, it might not be 
caught-up anymore because new WAL might've been generated in the master 
already.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The way to know whether the standby has caught up with the master

2011-05-24 Thread Fujii Masao
On Wed, May 25, 2011 at 2:16 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 25.05.2011 07:42, Fujii Masao wrote:

 For reliable high-availability, when the master crashes, the clusterware
 must
 know whether it can promote the standby safely without any data loss,
 before actually promoting it. IOW, it must know whether the standby has
 already caught up with the primary. Otherwise, failover might cause data
 loss.
 We can know that from pg_stat_replication on the master. But the problem
 is that pg_stat_replication is not available since the master is not
 running at
 that moment. So that info should be available also on the standby.

 To achieve that, I'm thinking to change walsender so that, when the
 standby
 has caught up with the master, it sends back the message indicating that
 to
 the standby. And I'm thinking to add new function (or view like
 pg_stat_replication)
 available on the standby, which shows that info.

 By the time the standby has received that message, it might not be caught-up
 anymore because new WAL might've been generated in the master already.

Right. But, thanks to sync rep, until such a new WAL has been replicated to
the standby, the commit of transaction is not visible to the client. So, even if
there are some WAL not replicated to the standby, the clusterware can promote
the standby safely without any data loss (to the client point of view), I think.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Foreign memory context read

2011-05-24 Thread Vaibhav Kaushal
Indeed I was acting weird there. I had completely forgotten about the
bool pointer. Moreover, I actually got confused about the palloc0's
return type...whether it was a datum or a pointer to datum. Looked back
at the expansion and got it clear. 

Thanks a lot Mr. Tom. 

Regards,
Vaibhav

On Mon, 2011-05-23 at 09:58 -0400, Tom Lane wrote:
 Vaibhav Kaushal vaibhavkaushal...@gmail.com writes:
  My mind started wandering after that error. Now, actually, i was trying to
  do something like this:
 
  *last_result = palloc0(sizeof(Datum));
  bool *isnnuull = true;
  *last_result = slot_getattr(slot, num_atts, *isnnuull);
 
 This seems utterly confused about data types.  The first line thinks
 that last_result is of type Datum ** (ie, pointer to pointer to Datum),
 since it's storing a pointer-to-Datum through it.  The third line
 however is treating last_result as of type Datum *, since it's storing
 a Datum (not pointer to Datum) through it.  And the second line is
 assigning true (a bool value) to a variable declared as pointer to
 bool, which you then proceed to incorrectly dereference while passing it
 as the last argument to slot_getattr.  The code will certainly crash on
 that deref, independently of the multiple other bugs here.
 
 Recommendation: gcc is your friend.  Pay attention to the warnings it
 gives you.
 
   regards, tom lane



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Proposal: Another attempt at vacuum improvements

2011-05-24 Thread Pavan Deolasee
Hi All,

Some of the ideas regarding vacuum improvements were discussed here:
http://archives.postgresql.org/pgsql-hackers/2008-05/msg00863.php
http://archives.postgresql.org/pgsql-patches/2008-06/msg00059.php

A recent thread was started by Robert Haas, but I don't know if we logically
concluded that either.
http://archives.postgresql.org/pgsql-hackers/2011-03/msg00946.php

This was once again brought up by Robert Haas in a discussion with Tom and
me during the PGCon and  we agreed there are few things we can do make
vacuum more performant. One of the things that Tom mentioned is that the
vacuum today is not aware of the fact that its a periodic operation and
there might be ways to utilize that in some way.

The biggest gripe today is that vacuum needs two heap scans and each scan
dirties the buffer. While visibility map ensures that not-all blocks are
read and written during the scan, for a very large table, even a small
percentage of blocks can be significant. Further, post-HOT, the second scan
of the heap does not really reclaim any significant space, except for dead
line pointers. So there is a good reason to avoid that. I wanted to start a
discussion just about that. I am proposing one solution below, but I am not
married to the idea.

So the idea is to separate the index vacuum (removing index pointers to dead
tuples) from the heap vacuum. When we do heap vacuum (either by HOT-pruning
or using regular vacuum), we can spool the dead line pointers somewhere. To
avoid any hot-spots during normal processing, the spooling can be done
periodically like the stats collection. One obvious choice for spooling dead
line pointers is to use a relation fork. The index vacuum will be kicked off
periodically depending on the number of spooled deal line pointers. When
that happens, the index vacuum will remove all index pointers pointing to
those dead   line pointers and forget the spooled line pointers.

The dead line pointers themselves will be removed whenever a heap page is
later vacuumed, either as part of HOT pruning or the next heap vacuum. We
would need some mechanism though to know that the index pointers to the
existing dead line pointers have been vacuumed and its safe to remove them
now. May be we can track the last operation that generated a dead line
pointer in the page using a LSN in the page header and also keep track of
the LSN of the last successful index vacuum. If the index vacuum LSN is
greater than the page header vacuum LSN, we can safely remove the existing
dead line pointers. I am deliberately not suggesting how to track the index
vacuum LSN since my last proposal to do something similar through a pg_class
column was shot down by Tom :-)

In nutshell, what I am suggesting is to do heap and index vacuuming
independently. The heap will be vacuumed either by HOT pruning or a periodic
heap vacuum and the dead line pointers will be collected. An index vacuum
will remove the index pointers to those dead line pointers. And at some
later point, the dead line pointers will be removed, either as part of
retail or complete heap vacuum. Its not clear if its useful, but a single
index vacuum can follow multiple heap vacuums or vice versa.

Another advantage of this technique would be that we can then support
start/stop heap vacuum or vacuuming a range of blocks at a time or even
vacuuming only those blocks which are already cached in the buffer cache.
Just a hand-waving at this point, but seems possible.

Suggestions/comments/criticism all welcome, but please don't shoot down the
idea on implementation details since I have really not spent time on that,
so it will be easy find holes and corner cases. That can be worked out if we
believe something like this will be useful.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] Reducing overhead of frequent table locks

2011-05-24 Thread Noah Misch
On Mon, May 23, 2011 at 09:15:27PM -0400, Robert Haas wrote:
 On Fri, May 13, 2011 at 4:16 PM, Noah Misch n...@leadboat.com wrote:
  ? ? ? ?if (level = ShareUpdateExclusiveLock)
  ? ? ? ? ? ? ? ?++strong_lock_counts[my_strong_lock_count_partition]
  ? ? ? ? ? ? ? ?sfence
  ? ? ? ? ? ? ? ?if (strong_lock_counts[my_strong_lock_count_partition] == 1)
  ? ? ? ? ? ? ? ? ? ? ? ?/* marker 1 */
  ? ? ? ? ? ? ? ? ? ? ? ?import_all_local_locks
  ? ? ? ? ? ? ? ?normal_LockAcquireEx
  ? ? ? ?else if (level = RowExclusiveLock)
  ? ? ? ? ? ? ? ?lfence
  ? ? ? ? ? ? ? ?if (strong_lock_counts[my_strong_lock_count_partition] == 0)
  ? ? ? ? ? ? ? ? ? ? ? ?/* marker 2 */
  ? ? ? ? ? ? ? ? ? ? ? ?local_only
  ? ? ? ? ? ? ? ? ? ? ? ?/* marker 3 */
  ? ? ? ? ? ? ? ?else
  ? ? ? ? ? ? ? ? ? ? ? ?normal_LockAcquireEx
  ? ? ? ?else
  ? ? ? ? ? ? ? ?normal_LockAcquireEx
 
  At marker 1, we need to block until no code is running between markers two 
  and
  three. ?You could do that with a per-backend lock (LW_SHARED by the strong
  locker, LW_EXCLUSIVE by the backend). ?That would probably still be a win 
  over
  the current situation, but it would be nice to have something even cheaper.
 
 Barring some brilliant idea, or anyway for a first cut, it seems to me
 that we can adjust the above pseudocode by assuming the use of a
 LWLock.  In addition, two other adjustments: first, the first line
 should test level  ShareUpdateExclusiveLock, rather than =, per
 previous discussion.  Second, import_all_local locks needn't really
 move everything; just those locks with a matching locktag.  Thus:
 
 !if (level  ShareUpdateExclusiveLock)
 !++strong_lock_counts[my_strong_lock_count_partition]
 !sfence
 !for each backend
 !take per-backend lwlock for target backend
 !transfer fast-path entries with matching locktag
 !release per-backend lwlock for target backend
 !normal_LockAcquireEx
 !else if (level = RowExclusiveLock)
 !lfence
 !if (strong_lock_counts[my_strong_lock_count_partition] == 0)
 !take per-backend lwlock for own backend
 !fast-path lock acquisition
 !release per-backend lwlock for own backend
 !else
 !normal_LockAcquireEx
 !else
 !normal_LockAcquireEx

This drops the part about only transferring fast-path entries once when a
strong_lock_counts cell transitions from zero to one.  Granted, that itself
requires some yet-undiscussed locking.  For that matter, we can't have
multiple strong lockers completing transfers on the same cell in parallel.
Perhaps add a FastPathTransferLock, or an array of per-cell locks, that each
strong locker holds for that entire if body and while decrementing the
strong_lock_counts cell at lock release.

As far as the level of detail of this pseudocode goes, there's no need to hold
the per-backend LWLock while transferring the fast-path entries.  You just
need to hold it sometime between bumping strong_lock_counts and transferring
the backend's locks.  This ensures that, for example, the backend is not
sleeping in the middle of a fast-path lock acquisition for the whole duration
of this code.

 Now, a small fly in the ointment is that we haven't got, with
 PostgreSQL, a portable library of memory primitives.  So there isn't
 an obvious way of doing that sfence/lfence business.

I was thinking that, if the final implementation could benefit from memory
barrier interfaces, we should create those interfaces now.  Start with only a
platform-independent dummy implementation that runs a lock/unlock cycle on a
spinlock residing in backend-local memory.  I'm 75% sure that would be
sufficient on all architectures for which we support spinlocks.  It may turn
out that we can't benefit from such interfaces at this time ...

 Now, it seems to
 me that in the strong lock case, the sfence isn't really needed
 anyway, because we're about to start acquiring and releasing an lwlock
 for every backend, and that had better act as a full memory barrier
 anyhow, or we're doomed.  The weak lock case is more interesting,
 because we need the fence before we've taken any LWLock.

Agreed.

 But perhaps it'd be sufficient to just acquire the per-backend lwlock
 before checking strong_lock_counts[].  If, as we hope, we get back a
 zero, then we do the fast-path lock acquisition, release the lwlock,
 and away we go.  If we get back any other value, then we've wasted an
 lwlock acquisition cycle.  Or actually maybe not: it seems to me that
 in that case we'd better transfer all of our fast-path entries into
 the main hash table before trying to acquire any lock the slow way, at
 least if we don't want the deadlock detector to have to know about the
 fast-path.  So then we get this:
 
 !if (level  

Re: [HACKERS] SSI predicate locking on heap -- tuple or row?

2011-05-24 Thread Kevin Grittner
Kevin Grittner  wrote:
 Dan Ports  wrote:
 
 Does that make sense to you?
 
 Makes sense to me. Like the proof I offered, you have shown that
 there is no cycle which can develop with the locks copied which
 isn't there anyway if we don't copy the locks.
 
I woke up with the nagging thought that while the above is completely
accurate, it deserves some slight elaboration. These proofs show that
there is no legitimate cycle which could cause an anomaly which the
move from row-based to tuple-based logic will miss.  They don't prove
that the change will generate all the same serialization failures;
and in fact, some false positives are eliminated by the change. 
That's a good thing.  In addition to the benefits mentioned in prior
posts, there will be a reduction in the rate of rollbacks (in
particular corner cases) from what people see in beta1 without a loss
of correctness.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Operator families vs. casts

2011-05-24 Thread Noah Misch
PostgreSQL 9.1 will implement ALTER TABLE ALTER TYPE operations that use a
binary coercion cast without rewriting the table or unrelated indexes.  It
will always rewrite any indexes and recheck any foreign key constraints that
depend on a changing column.  This is unnecessary for 100% of core binary
coercion casts.  In my original design[1], I planned to detect this by
comparing the operator families of the old and would-be-new indexes.  (This
still yields some unnecessary rewrites; oid_ops and int4_ops are actually
compatible, for example.)  When I implemented[2] it, I found that the
contracts[3] for operator families are not strong enough to prove that the
existing indexes and constraints remain valid.  Specifically, I wished assume
val0 = val1 iff val0::a = val1::b for any val0, val1, a, b such that we
resolve both equality operators in the same operator family.  The operator
family contracts say nothing about consistency with casts.  Is there a
credible use case for violating that assumption?  If not, I'd like to document
it as a requirement for operator family implementors.

The above covers B-tree and hash operator families.  GIN and GiST have no
operator family contracts.  Here was the comment in my first patch intended to
sweep that under the table:

!  * We do not document a contract for GIN or GiST operator families.  Only the
!  * GIN operator family array_ops has more than one constituent operator 
class,
!  * and only typmod-only changes to arrays can avoid a rewrite.  Preserving a 
GIN
!  * index across such a change is safe.  We therefore support GiST and GIN here
!  * using the same rules as for B-tree and hash indexes, but that is mostly
!  * academic.  Any forthcoming contract for GiST or GIN operator families 
should,
!  * all other things being equal, bolster the validity of this assumption.
!  *
!  * Exclusion constraints raise the question: can we trust that the operator 
has
!  * the same semantics with the new type?  The operator will fall in the 
index's
!  * operator family.  For B-tree or hash, the operator will be = or ,
!  * yielding an affirmative answer from contractual requirements.  For GiST and
!  * GIN, we assume that a similar requirement would fall out of any contract 
for
!  * their operator families, should one arise.  We therefore support exclusion
!  * constraints without any special treatment, but this is again mostly 
academic.

Any thoughts on what to do here?  We could just add basic operator family
contracts requiring what we need.  Perhaps, instead, the ALTER TABLE code
should require an operator family match for B-tree and hash but an operator
class match for other access methods.

For now, I plan to always rewrite indexes on expressions or having predicates.
With effort, we could detect compatible changes there, too.

I also had a more mundane design question in the second paragraph of [2].  It
can probably wait for the review of the next version of the patch.  However,
given that it affects a large percentage of the patch, I'd appreciate any
early feedback on it.

Thanks,
nm

[1] 
http://archives.postgresql.org/message-id/20101229125625.ga27...@tornado.gateway.2wire.net
[2] 
http://archives.postgresql.org/message-id/20110113230124.ga18...@tornado.gateway.2wire.net
[3] http://www.postgresql.org/docs/9.0/interactive/xindex.html#XINDEX-OPFAMILY

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sepgsql: fix relkind handling on foreign tables

2011-05-24 Thread Kohei KaiGai
2011/5/23 Robert Haas robertmh...@gmail.com:
 On Sun, May 22, 2011 at 5:52 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 The attached patch fixes up case handling in foreign tables.

 Now it didn't assign security label on foreign table on its creation
 time, and didn't check access rights on the dml hook.
 This patch fixes these problems; It allows foreign tables default
 labeling and access checks as db_table object class.

 A foreign table is really more like a view, or a function call.  Are
 you sure you want to handle it like a table?

It might be a tentative solution, so I'll want to cancel this patch.

Its nature is indeed more similar to function call rather than tables,
but not a function itself. So, it might be a better idea to define its
own object class such as db_foreign_table instead of existing
object classes.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sepgsql: fix relkind handling on foreign tables

2011-05-24 Thread Robert Haas
On Tue, May 24, 2011 at 6:57 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 2011/5/23 Robert Haas robertmh...@gmail.com:
 On Sun, May 22, 2011 at 5:52 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 The attached patch fixes up case handling in foreign tables.

 Now it didn't assign security label on foreign table on its creation
 time, and didn't check access rights on the dml hook.
 This patch fixes these problems; It allows foreign tables default
 labeling and access checks as db_table object class.

 A foreign table is really more like a view, or a function call.  Are
 you sure you want to handle it like a table?

 It might be a tentative solution, so I'll want to cancel this patch.

 Its nature is indeed more similar to function call rather than tables,
 but not a function itself. So, it might be a better idea to define its
 own object class such as db_foreign_table instead of existing
 object classes.

Perhaps.  Or else use db_view.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Small patch for GiST: move childoffnum to child

2011-05-24 Thread Alexander Korotkov
During preparing patch of my GSoC project I found reasonable to
move childoffnum (GISTInsertStack structure) from parent to child. This
means that now child have childoffnum of parent's link to child. It allows
to maintain entire parts of tree in that GISTInsertStack structures. Also it
simplifies existing code a bit.
Heikki advice me that since this change simplifies existing code it can be
considered as a separate patch.

--
With best regards,
Alexander Korotkov.


gist_childoffnum.path
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.1 support for hashing arrays

2011-05-24 Thread Bruce Momjian
Robert Haas wrote:
 On Sun, May 22, 2011 at 11:49 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  I believe, however, that applying this will invalidate the contents of
  any hash indexes on array types that anyone has built using 9.1beta1.
  Do we need to do something about that?
 
  Like bumping catversion?
 
 Sure.  Although note that the system catalogs are not actually
 changing, which goes to someone else's recent point about catversion
 getting bumped for things other than changes in the things for which
 the cat in catversion is an abbreviation.
 
  I would probably complain about that, except you already did it post-beta1:
  http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=9bb6d9795253bb521f81c626fea49a704a369ca9
 
 Unfortunately, I was unable to make that omelet without breaking some eggs.  
 :-(
 
  Possibly Bruce will feel like adding a check to pg_upgrade for the case.
  I wouldn't bother myself though. ?It seems quite unlikely that anyone's
  depending on the feature yet.
 
 I'll leave that to you, Bruce, and whoever else wants to weigh in to
 hammer that one out.

Oh, you are worried someone might have stored hash indexes with the old
catalog format?  Seems like something we might mention in the next beta
release announcement, but nothing more.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #6034: pg_upgrade fails when it should not.

2011-05-24 Thread Bruce Momjian
Robert Haas wrote:
 On Mon, May 23, 2011 at 8:26 AM, Bruce Momjian br...@momjian.us wrote:
  Sorry, I was unclear. ?The question is whether the case of _name_ of the
  locale is significant, meaning can you have two locale names that differ
  only by case and behave differently?
 
 That would seem surprising to me, but I really have no idea.
 
 There's the other direction, too: two locales that vary by something
 more than case, but still have identical behavior.  Maybe we just
 decide not to worry about that, but then why worry about this?

Well, if we remove the check then people could easily get broken
upgrades by upgrading to a server with a different locale.  A Google
search seems to indicate the locale names are case-sensitive so I am
thinking the problem is that the user didn't have exact locales, and
needs that to use pg_upgrade.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reducing overhead of frequent table locks

2011-05-24 Thread Robert Haas
On Tue, May 24, 2011 at 5:07 AM, Noah Misch n...@leadboat.com wrote:
 This drops the part about only transferring fast-path entries once when a
 strong_lock_counts cell transitions from zero to one.

Right: that's because I don't think that's what we want to do.  I
don't think we want to transfer all per-backend locks to the shared
hash table as soon as anyone attempts to acquire a strong lock;
instead, I think we want to transfer only those fast-path locks which
have the same locktag as the strong lock someone is attempting to
acquire.  If we do that, then it doesn't matter whether the
strong_lock_counts[] cell is transitioning from 0 to 1 or from 6 to 7:
we still have to check for strong locks with that particular locktag.

 Granted, that itself
 requires some yet-undiscussed locking.  For that matter, we can't have
 multiple strong lockers completing transfers on the same cell in parallel.
 Perhaps add a FastPathTransferLock, or an array of per-cell locks, that each
 strong locker holds for that entire if body and while decrementing the
 strong_lock_counts cell at lock release.

I was imagining that the per-backend LWLock would protect the list of
fast-path locks.  So to transfer locks, you would acquire the
per-backend LWLock for the backend which has the lock, and then the
lock manager partition LWLock, and then perform the transfer.

 As far as the level of detail of this pseudocode goes, there's no need to hold
 the per-backend LWLock while transferring the fast-path entries.  You just
 need to hold it sometime between bumping strong_lock_counts and transferring
 the backend's locks.  This ensures that, for example, the backend is not
 sleeping in the middle of a fast-path lock acquisition for the whole duration
 of this code.

See above; I'm lost.

 Now, a small fly in the ointment is that we haven't got, with
 PostgreSQL, a portable library of memory primitives.  So there isn't
 an obvious way of doing that sfence/lfence business.

 I was thinking that, if the final implementation could benefit from memory
 barrier interfaces, we should create those interfaces now.  Start with only a
 platform-independent dummy implementation that runs a lock/unlock cycle on a
 spinlock residing in backend-local memory.  I'm 75% sure that would be
 sufficient on all architectures for which we support spinlocks.  It may turn
 out that we can't benefit from such interfaces at this time ...

OK.

 Now, it seems to
 me that in the strong lock case, the sfence isn't really needed
 anyway, because we're about to start acquiring and releasing an lwlock
 for every backend, and that had better act as a full memory barrier
 anyhow, or we're doomed.  The weak lock case is more interesting,
 because we need the fence before we've taken any LWLock.

 Agreed.

 But perhaps it'd be sufficient to just acquire the per-backend lwlock
 before checking strong_lock_counts[].  If, as we hope, we get back a
 zero, then we do the fast-path lock acquisition, release the lwlock,
 and away we go.  If we get back any other value, then we've wasted an
 lwlock acquisition cycle.  Or actually maybe not: it seems to me that
 in that case we'd better transfer all of our fast-path entries into
 the main hash table before trying to acquire any lock the slow way, at
 least if we don't want the deadlock detector to have to know about the
 fast-path.  So then we get this:

 !        if (level  ShareUpdateExclusiveLock)
 !                ++strong_lock_counts[my_strong_lock_count_partition]
 !                for each backend
 !                        take per-backend lwlock for target backend
 !                        transfer fastpath entries with matching locktag
 !                        release per-backend lwlock for target backend
 !        else if (level = RowExclusiveLock)
 !                take per-backend lwlock for own backend
 !                if (strong_lock_counts[my_strong_lock_count_partition] == 0)
 !                        fast-path lock acquisition
 !                        done = true
 !                else
 !                        transfer all fastpath entries
 !                release per-backend lwlock for own backend
 !        if (!done)
 !                normal_LockAcquireEx

 Could you elaborate on the last part (the need for else transfer all fastpath
 entries) and, specifically, how it aids deadlock avoidance?  I didn't think
 this change would have any impact on deadlocks, because all relevant locks
 will be in the global lock table before any call to normal_LockAcquireEx.

Oh, hmm, maybe you're right.  I was concerned about the possibility
that of a backend which already holds locks going to sleep on a lock
wait, and maybe running the deadlock detector, and failing to notice a
deadlock.  But I guess that can't happen: if any of the locks it holds
are relevant to the deadlock detector, the backend attempting to
acquire those locks will transfer them before attempting to acquire
the lock itself, so it should be OK.

 To 

Re: [HACKERS] Reducing overhead of frequent table locks

2011-05-24 Thread Noah Misch
On Tue, May 24, 2011 at 08:53:11AM -0400, Robert Haas wrote:
 On Tue, May 24, 2011 at 5:07 AM, Noah Misch n...@leadboat.com wrote:
  This drops the part about only transferring fast-path entries once when a
  strong_lock_counts cell transitions from zero to one.
 
 Right: that's because I don't think that's what we want to do.  I
 don't think we want to transfer all per-backend locks to the shared
 hash table as soon as anyone attempts to acquire a strong lock;
 instead, I think we want to transfer only those fast-path locks which
 have the same locktag as the strong lock someone is attempting to
 acquire.  If we do that, then it doesn't matter whether the
 strong_lock_counts[] cell is transitioning from 0 to 1 or from 6 to 7:
 we still have to check for strong locks with that particular locktag.

Oh, I see.  I was envisioning that you'd transfer all locks associated with
the strong_lock_counts cell; that is, all the locks that would now go directly
to the global lock table when requested going forward.  Transferring only
exact matches seems fine too, and then I agree with your other conclusions.

  Granted, that itself
  requires some yet-undiscussed locking. ?For that matter, we can't have
  multiple strong lockers completing transfers on the same cell in parallel.
  Perhaps add a FastPathTransferLock, or an array of per-cell locks, that each
  strong locker holds for that entire if body and while decrementing the
  strong_lock_counts cell at lock release.
 
 I was imagining that the per-backend LWLock would protect the list of
 fast-path locks.  So to transfer locks, you would acquire the
 per-backend LWLock for the backend which has the lock, and then the
 lock manager partition LWLock, and then perform the transfer.

I see later in your description that the transferer will delete each fast-path
lock after transferring it.  Given that, this does sound adequate.

  As far as the level of detail of this pseudocode goes, there's no need to 
  hold
  the per-backend LWLock while transferring the fast-path entries. ?You just
  need to hold it sometime between bumping strong_lock_counts and transferring
  the backend's locks. ?This ensures that, for example, the backend is not
  sleeping in the middle of a fast-path lock acquisition for the whole 
  duration
  of this code.
 
 See above; I'm lost.

It wasn't a particularly useful point.

  To validate the locking at this level of detail, I think we need to sketch 
  the
  unlock protocol, too. ?On each strong lock release, we'll decrement the
  strong_lock_counts cell. ?No particular interlock with fast-path lockers
  should be needed; a stray AccessShareLock needlessly making it into the 
  global
  lock table is no problem. ?As mentioned above, we _will_ need an interlock
  with lock transfer operations. ?How will transferred fast-path locks get
  removed from the global lock table? ?Presumably, the original fast-path 
  locker
  should do so at transaction end; anything else would contort the life cycle.
  Then add a way for the backend to know which locks had been transferred as
  well as an interlock against concurrent transfer operations. ?Maybe that's
  all.
 
 I'm thinking that the backend can note, in its local-lock table,
 whether it originally acquired a lock via the fast-path or not.  Any
 lock not originally acquired via the fast-path will be released just
 as now.  For any lock that WAS originally acquired via the fast-path,
 we'll take our own per-backend lwlock, which protects the fast-path
 queue, and scan the fast-path queue for a matching entry.  If none is
 found, then we know the lock was transferred, so release the
 per-backend lwlock and do it the regular way (take lock manager
 partition lock, etc.).

Sounds good.

  To put it another way: the current system is fair; the chance of hitting 
  lock
  exhaustion is independent of lock level. ?The new system would be unfair; 
  lock
  exhaustion is much more likely to appear for a  ShareUpdateExclusiveLock
  acquisition, through no fault of that transaction. ?I agree this isn't 
  ideal,
  but it doesn't look to me like an unacceptable weakness. ?Making lock slots
  first-come, first-served is inherently unfair; we're not at all set up to
  justly arbitrate between mutually-hostile lockers competing for slots. ?The
  overall situation will get better, not worse, for the admin who wishes to
  defend against hostile unprivileged users attempting a lock table DOS.
 
 Well, it's certainly true that the proposed system is far less likely
 to bomb out trying to acquire an AccessShareLock than what we have
 today, since in the common case the AccessShareLock doesn't use up any
 shared resources.  And that should make a lot of people happy.  But as
 to the bad scenario, one needn't presume that the lockers are hostile
 - it may just be that the system is running on the edge of a full lock
 table.  In the worst case, someone wanting a strong lock on a table
 may end up transferring a hundred or 

Re: [HACKERS] Operator families vs. casts

2011-05-24 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 PostgreSQL 9.1 will implement ALTER TABLE ALTER TYPE operations that use a
 binary coercion cast without rewriting the table or unrelated indexes.  It
 will always rewrite any indexes and recheck any foreign key constraints that
 depend on a changing column.  This is unnecessary for 100% of core binary
 coercion casts.  In my original design[1], I planned to detect this by
 comparing the operator families of the old and would-be-new indexes.  (This
 still yields some unnecessary rewrites; oid_ops and int4_ops are actually
 compatible, for example.)

No, they aren't: signed and unsigned comparisons do not yield the same
sort order.  I think that example may destroy the rest of your argument.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Operator families vs. casts

2011-05-24 Thread Noah Misch
On Tue, May 24, 2011 at 10:10:34AM -0400, Tom Lane wrote:
 Noah Misch n...@leadboat.com writes:
  PostgreSQL 9.1 will implement ALTER TABLE ALTER TYPE operations that use a
  binary coercion cast without rewriting the table or unrelated indexes.  It
  will always rewrite any indexes and recheck any foreign key constraints that
  depend on a changing column.  This is unnecessary for 100% of core binary
  coercion casts.  In my original design[1], I planned to detect this by
  comparing the operator families of the old and would-be-new indexes.  (This
  still yields some unnecessary rewrites; oid_ops and int4_ops are actually
  compatible, for example.)
 
 No, they aren't: signed and unsigned comparisons do not yield the same
 sort order.

True; scratch the parenthetical comment.

 I think that example may destroy the rest of your argument.

Not that I'm aware of.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reducing overhead of frequent table locks

2011-05-24 Thread Robert Haas
On Tue, May 24, 2011 at 10:03 AM, Noah Misch n...@leadboat.com wrote:
 Let's see if I understand the risk better now: the new system will handle lock
 load better, but when it does hit a limit, understanding why that happened
 will be more difficult.  Good point.  No silver-bullet ideas come to mind for
 avoiding that.

The only idea I can think of is to try to impose some bounds.  For
example, suppose we track the total number of locks that the system
can handle in the shared hash table.  We try to maintain the system in
a state where the number of locks that actually exist is less than
that number, even though some of them may be stored elsewhere.  You
can imagine a system where backends grab a global mutex to reserve a
certain number of slots, and store that in shared memory together with
their fast-path list, but another backend which is desperate for space
can go through and trim back reservations to actual usage.

 Will the pg_locks view scan fast-path lock tables?  If not, we probably need
 another view that does.  We can then encourage administrators to monitor for
 fast-path lock counts that get high relative to shared memory capacity.

I think pg_locks should probably scan the fast-path tables.

Another random idea for optimization: we could have a lock-free array
with one entry per backend, indicating whether any fast-path locks are
present.  Before acquiring its first fast-path lock, a backend writes
a 1 into that array and inserts a store fence.  After releasing its
last fast-path lock, it performs a store fence and writes a 0 into the
array.  Anyone who needs to grovel through all the per-backend
fast-path arrays for whatever reason can perform a load fence and then
scan the array.  If I understand how this stuff works (and it's very
possible that I don't), when the scanning backend sees a 0, it can be
assured that the target backend has no fast-path locks and therefore
doesn't need to acquire and release that LWLock or scan that fast-path
array for entries.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding an example for replication configuration to pg_hba.conf

2011-05-24 Thread Bruce Momjian
Magnus Hagander wrote:
 On Thu, May 19, 2011 at 11:09, Dave Page dp...@pgadmin.org wrote:
  On Thu, May 19, 2011 at 2:44 PM, Selena Deckelmann sel...@chesnok.com 
  wrote:
  On Wed, May 18, 2011 at 8:20 PM, Alvaro Herrera
  alvhe...@commandprompt.com wrote:
  Excerpts from Greg Smith's message of mi? may 18 23:07:13 -0400 2011:
  Two things that could be changed from this example to make it more 
  useful:
 
  -The default database is based on your user name, which is postgres in
  most packaged builds but not if you compile your own. ?I don't know
  whether it's practical to consider substituting that into this file, or
  if it's just enough to mention that as an additional doc comment.
 
  You mean the default username, not the default database, but yeah; so do
  we need a @default_username@ token to be replaced by initdb with
  whatever it has as effective_user? ?(In this case the patch is no longer
  2 lines, but still should be trivial enough).
 
  That would be nice. So, we just add that token to initdb? Seems simple.
 
  I added some explanation of the all vs replication bit in the header 
  comments.
 
  Revision attached.
 
  Looks good to me.
 
  As I mentioned offlist, I'd like it in teal please.
 
 Applied with some further minor bikeshedding (remove trailing spaces,
 rewrap so columns aren't wider than 80 chars, etc)

Let me just point out that people who have already run initdb during
beta will not see this in their pg_hba.conf, nor in their
share/pg_hba.conf.sample, even after they have upgraded to a later beta,
unless they run initdb.  However, we have bumped the catalog version for
something else so they should then get this change.

My point is if we change configuration files and then don't bump the
catalog version, the share/*.sample files get out of sync with the files
in /data, which can be kind of confusing.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] moving toast table to its own tablespace

2011-05-24 Thread Bruce Momjian
Robert Haas wrote:
 On Thu, May 19, 2011 at 3:17 PM, Alvaro Herrera alvhe...@alvh.no-ip.org 
 wrote:
  Is there a reason we don't allow moving the toast table to a separate
  tablespace, other than unimplemented feature? ?If not, I propose such a
  syntax as
 
  ALTER TABLE foo SET TOAST TABLESPACE bar;
 
 Off the top of my head, I don't see any reason not to allow that.

Added to TODO:

Allow toast tables to be moved to a different tablespace

* http://archives.postgresql.org/pgsql-hackers/2011-05/msg00980.php 

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding an example for replication configuration to pg_hba.conf

2011-05-24 Thread Magnus Hagander
On Tue, May 24, 2011 at 10:53, Bruce Momjian br...@momjian.us wrote:
 Magnus Hagander wrote:
 On Thu, May 19, 2011 at 11:09, Dave Page dp...@pgadmin.org wrote:
  On Thu, May 19, 2011 at 2:44 PM, Selena Deckelmann sel...@chesnok.com 
  wrote:
  On Wed, May 18, 2011 at 8:20 PM, Alvaro Herrera
  alvhe...@commandprompt.com wrote:
  Excerpts from Greg Smith's message of mi? may 18 23:07:13 -0400 2011:
  Two things that could be changed from this example to make it more 
  useful:
 
  -The default database is based on your user name, which is postgres in
  most packaged builds but not if you compile your own. ?I don't know
  whether it's practical to consider substituting that into this file, or
  if it's just enough to mention that as an additional doc comment.
 
  You mean the default username, not the default database, but yeah; so do
  we need a @default_username@ token to be replaced by initdb with
  whatever it has as effective_user? ?(In this case the patch is no longer
  2 lines, but still should be trivial enough).
 
  That would be nice. So, we just add that token to initdb? Seems simple.
 
  I added some explanation of the all vs replication bit in the header 
  comments.
 
  Revision attached.
 
  Looks good to me.
 
  As I mentioned offlist, I'd like it in teal please.

 Applied with some further minor bikeshedding (remove trailing spaces,
 rewrap so columns aren't wider than 80 chars, etc)

 Let me just point out that people who have already run initdb during
 beta will not see this in their pg_hba.conf, nor in their
 share/pg_hba.conf.sample, even after they have upgraded to a later beta,
 unless they run initdb.  However, we have bumped the catalog version for
 something else so they should then get this change.

Why would they not see it in their share/pg_hba.conf.sample?

It will not affect the existing one in $PGDATA, but why wouldn't the
installed .sample change?

 My point is if we change configuration files and then don't bump the
 catalog version, the share/*.sample files get out of sync with the files
 in /data, which can be kind of confusing.

They would - but what you are saying above is that they would not get
out of sync, because the share/*.sample also don't update. Just a
mistake in what you said above, or am I missing something?


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pull up aggregate subquery

2011-05-24 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, May 23, 2011 at 4:02 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Yeah.  For simple scan/join queries it seems likely that we only care
 about parameterizing indexscans, since the main opportunity for a win is
 to not scan all of a large table.  Restricting things that way would
 help reduce the number of extra Paths to carry around.  But I'm not sure
 whether the same argument can be made for arbitrary subqueries.

 I must be misunderstanding you, because index scans are the thing we
 already *do* parameterize; and what else would make any sense?

The point I was trying to make is that the ultimate reason for having a
parameterized portion-of-a-plan will be that there's a parameterized
indexscan somewhere down at the bottom.  I had originally imagined that
we might parameterize any old scan; for example consider replacing

Nestloop
  Join Filter: a.x = b.y
  - Seqscan on a
  - Seqscan on b

with

Nestloop
  - Seqscan on a
  - Seqscan on b
   Filter: b.y = a.x

Although this isn't nearly as useful as if we could be using an index on
b.y, there would still be some marginal gain to be had, because we'd be
able to reject rows of b without first passing them up to the join node.
But I'm afraid that going all-out like that would slow the planner down
far too much (too many Paths to consider) to be justified by a marginal
runtime gain.

So the idea I have at the moment is that we'll still only parameterize
indexscans, but then allow those to be joined to unrelated relations
while still remaining parameterized.  That should reduce the number
of parameterized Paths hanging around, because only joinclauses that
match indexes will give rise to such Paths.

But I think this is all fairly unrelated to the case that Hitoshi is on
about.  As you said earlier, it seems like we'd have to derive both
parameterized and unparameterized plans for the subquery, which seems
mighty expensive.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2 schedule

2011-05-24 Thread David Fetter
On Mon, May 23, 2011 at 10:44:20PM -0400, Greg Smith wrote:
 At the developer meeting last week:
 http://wiki.postgresql.org/wiki/PgCon_2011_Developer_Meeting there
 was an initial schedule for 9.2 hammered out and dutifully
 transcribed at
 http://wiki.postgresql.org/wiki/PostgreSQL_9.2_Development_Plan ,
 and the one part I wasn't sure I had written down correctly I see
 Robert already fixed.
 
 There was a suggestion to add some publicity around the schedule for
 this release.

Already started. :)

http://www.postgresql.org/community/weeklynews/pwn20110522

 There's useful PR value to making it more obvious to
 people that the main development plan is regular and time-based,
 even if the release date itself isn't fixed.  The right time to make
 an initial announcement like that is soon, particularly if a goal
 here is to get more submitted into the first 9.2 CF coming in only a
 few weeks.  Anyone have changes to suggest before this starts
 working its way toward an announcement?

I thought we'd agreed on the timing for the first CF, and that I was
to announce it in the PostgreSQL Weekly News, so I did just that.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reducing overhead of frequent table locks

2011-05-24 Thread Noah Misch
On Tue, May 24, 2011 at 10:35:23AM -0400, Robert Haas wrote:
 On Tue, May 24, 2011 at 10:03 AM, Noah Misch n...@leadboat.com wrote:
  Let's see if I understand the risk better now: the new system will handle 
  lock
  load better, but when it does hit a limit, understanding why that happened
  will be more difficult. ?Good point. ?No silver-bullet ideas come to mind 
  for
  avoiding that.
 
 The only idea I can think of is to try to impose some bounds.  For
 example, suppose we track the total number of locks that the system
 can handle in the shared hash table.  We try to maintain the system in
 a state where the number of locks that actually exist is less than
 that number, even though some of them may be stored elsewhere.  You
 can imagine a system where backends grab a global mutex to reserve a
 certain number of slots, and store that in shared memory together with
 their fast-path list, but another backend which is desperate for space
 can go through and trim back reservations to actual usage.

Forcing artificial resource exhaustion is a high price to pay.  I suppose it's
quite like disabling Linux memory overcommit, and some folks would like it.

 Another random idea for optimization: we could have a lock-free array
 with one entry per backend, indicating whether any fast-path locks are
 present.  Before acquiring its first fast-path lock, a backend writes
 a 1 into that array and inserts a store fence.  After releasing its
 last fast-path lock, it performs a store fence and writes a 0 into the
 array.  Anyone who needs to grovel through all the per-backend
 fast-path arrays for whatever reason can perform a load fence and then
 scan the array.  If I understand how this stuff works (and it's very
 possible that I don't), when the scanning backend sees a 0, it can be
 assured that the target backend has no fast-path locks and therefore
 doesn't need to acquire and release that LWLock or scan that fast-path
 array for entries.

I'm probably just missing something, but can't that conclusion become obsolete
arbitrarily quickly?  What if the scanning backend sees a 0, and the subject
backend is currently sleeping just before it would have bumped that value?  We
need to take the LWLock is there's any chance that the subject backend has not
yet seen the scanning backend's strong_lock_counts[] update.

nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pull up aggregate subquery

2011-05-24 Thread Robert Haas
On Tue, May 24, 2011 at 11:11 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I must be misunderstanding you, because index scans are the thing we
 already *do* parameterize; and what else would make any sense?

 The point I was trying to make is that the ultimate reason for having a
 parameterized portion-of-a-plan will be that there's a parameterized
 indexscan somewhere down at the bottom.  I had originally imagined that
 we might parameterize any old scan; for example consider replacing

        Nestloop
          Join Filter: a.x = b.y
          - Seqscan on a
          - Seqscan on b

 with

        Nestloop
          - Seqscan on a
          - Seqscan on b
               Filter: b.y = a.x

Oh, I see.  I have a general gripe with nested loop plans: we already
consider too many of them.  IIRC, when I last fooled around with this,
the number of nested loop paths that we generate far exceeded the
number of merge or hash join paths, and most of those paths suck and
are a complete waste of time.  It strikes me that we ought to be
trying to find ways to get rid of some of the paths we're already
considering, rather than adding any more.  In this particular case, if
the second plan is actually faster, it probably won't be by much; we
could think about trying to make some kind of ex-post-facto
transformation instead of throwing everything into the path machinery.

 Although this isn't nearly as useful as if we could be using an index on
 b.y, there would still be some marginal gain to be had, because we'd be
 able to reject rows of b without first passing them up to the join node.
 But I'm afraid that going all-out like that would slow the planner down
 far too much (too many Paths to consider) to be justified by a marginal
 runtime gain.

Agreed.

 So the idea I have at the moment is that we'll still only parameterize
 indexscans, but then allow those to be joined to unrelated relations
 while still remaining parameterized.  That should reduce the number
 of parameterized Paths hanging around, because only joinclauses that
 match indexes will give rise to such Paths.

That seems fine, yeah.  If anything, we might want to limit it even
more, but certainly that's a good place to start, and see how it
shakes out.

 But I think this is all fairly unrelated to the case that Hitoshi is on
 about.  As you said earlier, it seems like we'd have to derive both
 parameterized and unparameterized plans for the subquery, which seems
 mighty expensive.

That was my first thought, too, but then I wondered if I was getting
cheap.  Most of the time, the subquery will be something simple, and
replanning it twice won't really matter much.  If it happens to be
something complicated, then it will take longer, but on the other hand
that's exactly the sort of byzantine query where you probably want the
planner to pull out all the stops.  Aggregates tend to feel slow
almost invariably, because the amount of data being processed under
the hood is much larger than what actually gets emitted, so I think we
should at least consider the possibility that users really won't care
about a bit of extra work.  The case I'm concerned about is where you
have several levels of nested aggregates, and the effect starts to
multiply.  But even if that turns out to be a problem, we could handle
it by limiting consideration of the alternate path to the top 1 or 2
levels and handle the rest as we do now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cannot build docs of 9.1 on Windows

2011-05-24 Thread Andrew Dunstan



On 05/19/2011 06:29 PM, MauMau wrote:

From: Andrew Dunstan and...@dunslane.net

On Thu, May 19, 2011 10:32 am, Robert Haas wrote:

2011/5/16 MauMau maumau...@gmail.com:

Can't open perl script make-errcodes-table.pl: No such file or
directory


I think this is the root of the problem.  We have no script called
make-errcodes-table.pl.  Can you try changing it to
generate-errcodes-table.pl and see if that works?




Building docs under Windows in the buildfarm is on my TODO list. We
already support it (as of a few weeks ago) for non-Windows build 
systems.


That will help us make sure we don't have this kind of drift.


Thank you. I could remove the error Can't open perl script 
make-errcodes-table.pl: N... by changing make-errcodes-table.pl 
to generate-errcodes-table.pl, but all other results seems to be 
same as before.


Andrew, could you announce the commit when you have successfully built 
docs on Windows? Can I know that fact by watching pgsql-hackers and 
pgsql-docs? I'll git-fetch the patch.





builddoc.bat failed on my system and reading it made my head hurt. So I 
did what I've done with other bat files and rewrote it in Perl. The 
result is attached. It works for me, and should be a dropin replacement. 
Just put it in the src/tools/msvc directory and run perl builddoc.pl. 
Please test it and if it works for you we'll use it and make 
builddoc.bat a thin wrapper like build.bat and vcregress.bat.


cheers

andrew


builddoc.pl
Description: Perl program

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reducing overhead of frequent table locks

2011-05-24 Thread Robert Haas
On Tue, May 24, 2011 at 11:38 AM, Noah Misch n...@leadboat.com wrote:
 Another random idea for optimization: we could have a lock-free array
 with one entry per backend, indicating whether any fast-path locks are
 present.  Before acquiring its first fast-path lock, a backend writes
 a 1 into that array and inserts a store fence.  After releasing its
 last fast-path lock, it performs a store fence and writes a 0 into the
 array.  Anyone who needs to grovel through all the per-backend
 fast-path arrays for whatever reason can perform a load fence and then
 scan the array.  If I understand how this stuff works (and it's very
 possible that I don't), when the scanning backend sees a 0, it can be
 assured that the target backend has no fast-path locks and therefore
 doesn't need to acquire and release that LWLock or scan that fast-path
 array for entries.

 I'm probably just missing something, but can't that conclusion become obsolete
 arbitrarily quickly?  What if the scanning backend sees a 0, and the subject
 backend is currently sleeping just before it would have bumped that value?  We
 need to take the LWLock is there's any chance that the subject backend has not
 yet seen the scanning backend's strong_lock_counts[] update.

Can't we bump strong_lock_counts[] *first*, make sure that change is
globally visible, and only then start scanning the array?

Once we've bumped strong_lock_counts[] and made sure everyone can see
that change, it's still possible for backends to take a fast-path lock
in some *other* fast-path partition, but nobody should be able to add
any more fast-path locks in the partition we care about after that
point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2 schedule

2011-05-24 Thread Robert Haas
On Tue, May 24, 2011 at 11:33 AM, David Fetter da...@fetter.org wrote:
 On Mon, May 23, 2011 at 10:44:20PM -0400, Greg Smith wrote:
 At the developer meeting last week:
 http://wiki.postgresql.org/wiki/PgCon_2011_Developer_Meeting there
 was an initial schedule for 9.2 hammered out and dutifully
 transcribed at
 http://wiki.postgresql.org/wiki/PostgreSQL_9.2_Development_Plan ,
 and the one part I wasn't sure I had written down correctly I see
 Robert already fixed.

 There was a suggestion to add some publicity around the schedule for
 this release.

 Already started. :)

 http://www.postgresql.org/community/weeklynews/pwn20110522

 There's useful PR value to making it more obvious to
 people that the main development plan is regular and time-based,
 even if the release date itself isn't fixed.  The right time to make
 an initial announcement like that is soon, particularly if a goal
 here is to get more submitted into the first 9.2 CF coming in only a
 few weeks.  Anyone have changes to suggest before this starts
 working its way toward an announcement?

 I thought we'd agreed on the timing for the first CF, and that I was
 to announce it in the PostgreSQL Weekly News, so I did just that.

We talked about doing a separate -announce post just for this item,
and there seemed to be some support for that.  I'm OK with either way,
though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Domains versus polymorphic functions, redux

2011-05-24 Thread Tom Lane
In http://archives.postgresql.org/pgsql-bugs/2011-05/msg00171.php
Regina Obe complains that this fails in 9.1, though it worked before:

regression=# CREATE DOMAIN topoelementarray AS integer[]; 
CREATE DOMAIN
regression=# SELECT array_upper(ARRAY[[1,2], [3,4]]::topoelementarray, 1);
ERROR:  function array_upper(topoelementarray, integer) does not exist

This is a consequence of the changes I made to fix bug #5717,
particularly the issues around ANYARRAY matching discussed here:
http://archives.postgresql.org/pgsql-hackers/2010-10/msg01545.php

Regina is the second or third beta tester to complain about domains over
arrays no longer matching ANYARRAY, so I think we'd better do something
about it.  I haven't tried to code anything up yet, but the ideas I'm
considering trying to implement go like this:

1. If a domain type is passed to an ANYARRAY argument, automatically
downcast it to its base type (which of course had better then be an
array).  This would include inserting an implicit cast into the
expression tree, so that if the function uses get_fn_expr_argtype or
similar, it would see the base type.  Also, if the function returns
ANYARRAY, its result is considered to be of the base type not the
domain.

2. If a domain type is passed to an ANYELEMENT argument, automatically
downcast it to its base type if there is any ANYARRAY argument, or if
the function result type is ANYARRAY, or if any other ANYELEMENT
argument is not of the same domain type.  The first two cases are
necessary since we don't have arrays of domains: the match is guaranteed
to fail if we don't do this, since there can be no matching array type
for the domain.  The third case is meant to handle cases like
function(domain-over-int, 42) where the function has two ANYELEMENT
arguments: we now fail, but reducing the domain to int would allow
success.

An alternative rule we could use in place of #2 is just smash domains
to base types always, when they're matched to ANYELEMENT.  That would
be simpler and more in keeping with #1, but it might change the behavior
in cases where the historical behavior is reasonable (unlike the cases
discussed in my message referenced above...)  I find this simpler rule
tempting from an implementor's standpoint, but am unsure if there'll be
complaints.

Comments, better ideas?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2 schedule

2011-05-24 Thread David Fetter
On Tue, May 24, 2011 at 11:54:19AM -0400, Robert Haas wrote:
 On Tue, May 24, 2011 at 11:33 AM, David Fetter da...@fetter.org wrote:
  On Mon, May 23, 2011 at 10:44:20PM -0400, Greg Smith wrote:
  At the developer meeting last week:
  http://wiki.postgresql.org/wiki/PgCon_2011_Developer_Meeting there
  was an initial schedule for 9.2 hammered out and dutifully
  transcribed at
  http://wiki.postgresql.org/wiki/PostgreSQL_9.2_Development_Plan ,
  and the one part I wasn't sure I had written down correctly I see
  Robert already fixed.
 
  There was a suggestion to add some publicity around the schedule for
  this release.
 
  Already started. :)
 
  http://www.postgresql.org/community/weeklynews/pwn20110522
 
  There's useful PR value to making it more obvious to
  people that the main development plan is regular and time-based,
  even if the release date itself isn't fixed.  The right time to make
  an initial announcement like that is soon, particularly if a goal
  here is to get more submitted into the first 9.2 CF coming in only a
  few weeks.  Anyone have changes to suggest before this starts
  working its way toward an announcement?
 
  I thought we'd agreed on the timing for the first CF, and that I was
  to announce it in the PostgreSQL Weekly News, so I did just that.
 
 We talked about doing a separate -announce post just for this item,
 and there seemed to be some support for that.  I'm OK with either way,
 though.

For what it's worth, I think there should also be a separate -announce
(and -general, and -hackers) post for the item.  This is about getting
the message out early and broadly so people have the best chance of
getting it in time to act on it.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pull up aggregate subquery

2011-05-24 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, May 24, 2011 at 11:11 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 The point I was trying to make is that the ultimate reason for having a
 parameterized portion-of-a-plan will be that there's a parameterized
 indexscan somewhere down at the bottom.

 Oh, I see.  I have a general gripe with nested loop plans: we already
 consider too many of them.  IIRC, when I last fooled around with this,
 the number of nested loop paths that we generate far exceeded the
 number of merge or hash join paths, and most of those paths suck and
 are a complete waste of time.

Hm, really?  My experience is that it's the mergejoin paths that breed
like rabbits, because there are so many potential sort orders.

 But I think this is all fairly unrelated to the case that Hitoshi is on
 about.  As you said earlier, it seems like we'd have to derive both
 parameterized and unparameterized plans for the subquery, which seems
 mighty expensive.

 That was my first thought, too, but then I wondered if I was getting
 cheap.

Yeah, it's certainly possible that we're worrying too much.  Usually
I only get concerned about added planner logic if it will impact the
planning time for simple queries.  Simple tends to be in the eye of
the beholder, but something with a complicated aggregate subquery is
probably not simple by anyone's definition.

In this case the sticky point is that there could be multiple possible
sets of clauses available to be pushed down, depending on what you
assume is the outer relation for the eventual upper-level nestloop.
So worst case, you could have not just one parameterized plan to
generate in addition to the regular kind, but 2^N of them ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reducing overhead of frequent table locks

2011-05-24 Thread Noah Misch
On Tue, May 24, 2011 at 11:52:54AM -0400, Robert Haas wrote:
 On Tue, May 24, 2011 at 11:38 AM, Noah Misch n...@leadboat.com wrote:
  Another random idea for optimization: we could have a lock-free array
  with one entry per backend, indicating whether any fast-path locks are
  present. ?Before acquiring its first fast-path lock, a backend writes
  a 1 into that array and inserts a store fence. ?After releasing its
  last fast-path lock, it performs a store fence and writes a 0 into the
  array. ?Anyone who needs to grovel through all the per-backend
  fast-path arrays for whatever reason can perform a load fence and then
  scan the array. ?If I understand how this stuff works (and it's very
  possible that I don't), when the scanning backend sees a 0, it can be
  assured that the target backend has no fast-path locks and therefore
  doesn't need to acquire and release that LWLock or scan that fast-path
  array for entries.
 
  I'm probably just missing something, but can't that conclusion become 
  obsolete
  arbitrarily quickly? ?What if the scanning backend sees a 0, and the subject
  backend is currently sleeping just before it would have bumped that value? 
  ?We
  need to take the LWLock is there's any chance that the subject backend has 
  not
  yet seen the scanning backend's strong_lock_counts[] update.
 
 Can't we bump strong_lock_counts[] *first*, make sure that change is
 globally visible, and only then start scanning the array?
 
 Once we've bumped strong_lock_counts[] and made sure everyone can see
 that change, it's still possible for backends to take a fast-path lock
 in some *other* fast-path partition, but nobody should be able to add
 any more fast-path locks in the partition we care about after that
 point.

There's a potentially-unbounded delay between when the subject backend reads
strong_lock_counts[] and when it sets its fast-path-used flag.  (I didn't mean
not yet seen in the sense that some memory load would not show the latest
value.  I just meant that the subject backend may still be taking relevant
actions based on its previous load of the value.)  We could have the subject
set its fast-path-used flag before even checking strong_lock_counts[], then
clear the flag when strong_lock_counts[] dissuaded it from proceeding.  Maybe
that's what you had in mind?

That being said, it's a slight extra cost for all fast-path lockers to benefit
the strong lockers, so I'm not prepared to guess whether it will pay off.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Domains versus polymorphic functions, redux

2011-05-24 Thread Merlin Moncure
On Tue, May 24, 2011 at 11:12 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 1. If a domain type is passed to an ANYARRAY argument, automatically
 downcast it to its base type (which of course had better then be an
 array).  This would include inserting an implicit cast into the
 expression tree, so that if the function uses get_fn_expr_argtype or
 similar, it would see the base type.  Also, if the function returns
 ANYARRAY, its result is considered to be of the base type not the
 domain.

Does that mean that plpgsql %type variable declarations will see the
base type (and miss any constraint checks?).  I think it's fine either
way, but that's worth noting.

 An alternative rule we could use in place of #2 is just smash domains
 to base types always, when they're matched to ANYELEMENT.  That would
 be simpler and more in keeping with #1, but it might change the behavior
 in cases where the historical behavior is reasonable (unlike the cases
 discussed in my message referenced above...)  I find this simpler rule
 tempting from an implementor's standpoint, but am unsure if there'll be
 complaints.

#2a seems cleaner to me (superficially).  Got an example of a behavior
you think is changed?  In particular, is there a way the new function
would fail where it used to not fail?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Domains versus polymorphic functions, redux

2011-05-24 Thread David E. Wheeler
On May 24, 2011, at 9:12 AM, Tom Lane wrote:

 An alternative rule we could use in place of #2 is just smash domains
 to base types always, when they're matched to ANYELEMENT.  That would
 be simpler and more in keeping with #1, but it might change the behavior
 in cases where the historical behavior is reasonable (unlike the cases
 discussed in my message referenced above...)  I find this simpler rule
 tempting from an implementor's standpoint, but am unsure if there'll be
 complaints.

I'm not sure where the historical behavior manifests, but this certainly seems 
like it might be the most consistent implementation, as well. Which option is 
least likely to violate the principal of surprise?

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2 schedule

2011-05-24 Thread Greg Smith

David Fetter wrote:

I thought we'd agreed on the timing for the first CF, and that I was
to announce it in the PostgreSQL Weekly News, so I did just that.
  


Yes, and excellent.  The other ideas were:

-Publish information about the full schedule to some of the more popular 
mailing lists


-Link to this page more obviously from postgresql.org (fixed redirect 
URL is probably the right approach) to bless it, and potentially 
improve its search rank too.


The specific new problem being highlighted to work on here is that the 
schedule and development process is actually quite good as open-source 
projects go, but that fact isn't visible at all unless you're already on 
the inside of the project.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Cascade replication (WIP)

2011-05-24 Thread Fujii Masao
Hi,

I'd like to propose cascade replication feature (i.e., allow the
standby to accept
replication connection from another standby) for 9.2. This feature is useful to
reduce the overhead of the master since by using that we can decrease the
number of standbys directly connecting to the master.

I attached the WIP patch, which changes walsender so that it starts replication
even during recovery. Then, the walsender attempts to send all WAL that's
already been fsync'd to the standby's disk (i.e., send WAL up to the bigger
location between the receive location and the replay one). When the standby is
promoted, all walsenders in that standby end because they cannot continue
replication any more in that case because of the timeline mismatch.

The standby must not accept replication connection from that standby itself.
Otherwise, since any new WAL data would not appear in that standby,
replication cannot advance any more. As a safeguard against this, I introduced
new ID to identify each instance. The walsender sends that ID as the fourth
field of the reply of IDENTIFY_SYSTEM, and then walreceiver checks whether
the IDs are the same between two servers. If they are the same, which means
that the standby is just connecting to that standby itself, so walreceiver
emits ERROR.

One remaining problem which I'll have to tackle is that: Even while walreceiver
is not in progress (i.e., the startup process is retrieving WAL file from the
archive), the cascading walsender should continuously send new WAL data.
This means that the walsender should send the WAL file restored from the
archive. The problem is that the name of such a restored WAL file is always
RECOVERYXLOG. For now, walsender cannot handle the WAL file with such
a name.

To address the above problem, I'm thinking to make the startup process restore
the WAL file with its real name instead of RECOVERYXLOG. Then, like in the
master, the walsender can read and send the restored WAL file. The required
WAL file can be recycled before being sent. So we might need to enable
wal_keep_segments setting even in the standby.

Comments? Objections?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
*** a/doc/src/sgml/protocol.sgml
--- b/doc/src/sgml/protocol.sgml
***
*** 1357,1362  The commands accepted in walsender mode are:
--- 1357,1374 
/listitem
/varlistentry
  
+   varlistentry
+   term
+identificationkey
+   /term
+   listitem
+   para
+Identification key. Also useful to check that the standby is
+not connecting to that standby itself.
+   /para
+   /listitem
+   /varlistentry
+ 
/variablelist
   /para
  /listitem
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 9551,9556  GetXLogReplayRecPtr(void)
--- 9551,9572 
  }
  
  /*
+  * Get current standby flush position, ie, the last WAL position
+  * known to be fsync'd to disk in standby.
+  */
+ XLogRecPtr
+ GetStandbyFlushRecPtr(void)
+ {
+ 	XLogRecPtr	recvptr;
+ 	XLogRecPtr	redoptr;
+ 
+ 	recvptr = GetWalRcvWriteRecPtr(NULL);
+ 	redoptr = GetXLogReplayRecPtr();
+ 
+ 	return XLByteLT(recvptr, redoptr) ? redoptr : recvptr;
+ }
+ 
+ /*
   * Report the last WAL replay location (same format as pg_start_backup etc)
   *
   * This is useful for determining how much of WAL is visible to read-only
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
***
*** 351,357  static void processCancelRequest(Port *port, void *pkt);
  static int	initMasks(fd_set *rmask);
  static void report_fork_failure_to_client(Port *port, int errnum);
  static CAC_state canAcceptConnections(void);
- static long PostmasterRandom(void);
  static void RandomSalt(char *md5Salt);
  static void signal_child(pid_t pid, int signal);
  static bool SignalSomeChildren(int signal, int targets);
--- 351,356 
***
*** 2410,2415  reaper(SIGNAL_ARGS)
--- 2409,2423 
  			pmState = PM_RUN;
  
  			/*
+ 			 * Kill the cascading walsender to urge the cascaded standby to
+ 			 * reread the timeline history file, adjust its timeline and
+ 			 * establish replication connection again. This is required
+ 			 * because the timeline of cascading standby is not consistent
+ 			 * with that of cascaded one just after failover.
+ 			 */
+ 			SignalSomeChildren(SIGUSR2, BACKEND_TYPE_WALSND);
+ 
+ 			/*
  			 * Crank up the background writer, if we didn't do that already
  			 * when we entered consistent recovery state.  It doesn't matter
  			 * if this fails, we'll just try again later.
***
*** 4369,4375  RandomSalt(char *md5Salt)
  /*
   * PostmasterRandom
   */
! static long
  PostmasterRandom(void)
  {
  	/*
--- 4377,4383 
  /*
   * PostmasterRandom
   */
! long
  PostmasterRandom(void)
  {
  	/*
*** a/src/backend/replication/basebackup.c
--- 

Re: [HACKERS] Domains versus polymorphic functions, redux

2011-05-24 Thread Noah Misch
On Tue, May 24, 2011 at 12:12:55PM -0400, Tom Lane wrote:
 In http://archives.postgresql.org/pgsql-bugs/2011-05/msg00171.php
 Regina Obe complains that this fails in 9.1, though it worked before:
 
 regression=# CREATE DOMAIN topoelementarray AS integer[]; 
 CREATE DOMAIN
 regression=# SELECT array_upper(ARRAY[[1,2], [3,4]]::topoelementarray, 1);
 ERROR:  function array_upper(topoelementarray, integer) does not exist
 
 This is a consequence of the changes I made to fix bug #5717,
 particularly the issues around ANYARRAY matching discussed here:
 http://archives.postgresql.org/pgsql-hackers/2010-10/msg01545.php
 
 Regina is the second or third beta tester to complain about domains over
 arrays no longer matching ANYARRAY, so I think we'd better do something
 about it.  I haven't tried to code anything up yet, but the ideas I'm
 considering trying to implement go like this:
 
 1. If a domain type is passed to an ANYARRAY argument, automatically
 downcast it to its base type (which of course had better then be an
 array).  This would include inserting an implicit cast into the
 expression tree, so that if the function uses get_fn_expr_argtype or
 similar, it would see the base type.  Also, if the function returns
 ANYARRAY, its result is considered to be of the base type not the
 domain.

We discussed this a few weeks ago:
http://archives.postgresql.org/message-id/20110511093217.gb26...@tornado.gateway.2wire.net

What's to recommend #1 over what I proposed then?  Seems like a discard of
functionality for little benefit.

 2. If a domain type is passed to an ANYELEMENT argument, automatically
 downcast it to its base type if there is any ANYARRAY argument, or if
 the function result type is ANYARRAY, or if any other ANYELEMENT
 argument is not of the same domain type.  The first two cases are
 necessary since we don't have arrays of domains: the match is guaranteed
 to fail if we don't do this, since there can be no matching array type
 for the domain.  The third case is meant to handle cases like
 function(domain-over-int, 42) where the function has two ANYELEMENT
 arguments: we now fail, but reducing the domain to int would allow
 success.

This seems generally consistent with other function-resolution rules around
domains.  On the other hand, existing users have supposedly coped by adding an
explicit cast to one or the other argument to get the behavior they want.  New
applications will quietly get the cast, as it were, on the domain argument(s).
I hesitate to say this is so clearly right as to warrant that change.  Even if
it is right, though, this smells like 9.2 material.

nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Domains versus polymorphic functions, redux

2011-05-24 Thread Tom Lane
David E. Wheeler da...@kineticode.com writes:
 On May 24, 2011, at 9:12 AM, Tom Lane wrote:
 An alternative rule we could use in place of #2 is just smash domains
 to base types always, when they're matched to ANYELEMENT.  That would
 be simpler and more in keeping with #1, but it might change the behavior
 in cases where the historical behavior is reasonable (unlike the cases
 discussed in my message referenced above...)  I find this simpler rule
 tempting from an implementor's standpoint, but am unsure if there'll be
 complaints.

 I'm not sure where the historical behavior manifests, but this
 certainly seems like it might be the most consistent implementation,
 as well. Which option is least likely to violate the principal of
 surprise?

Well, the basic issue here is what happens when a function like

create function noop(anyelement) returns anyelement ...

is applied to a domain argument.  Currently, the result is thought to be
of the domain type, whereas if we smash to base unconditionally, the
result will be thought to be of the domain's base type.  You can make an
argument for either behavior, but I think the argument for the current
behavior hinges on the assumption that such a function isn't doing
anything to the argument value, only passing it through as-is.

I should probably also point out the previous discussion of this area
from a couple weeks ago, notably here:
http://archives.postgresql.org/pgsql-hackers/2011-05/msg00640.php
The example I gave there seems relevant:

create function negate(anyelement) returns anyelement as
$$ select - $1 $$ language sql;

create domain pos as int check (value  0);

select negate(42::pos);

This example function isn't quite silly --- it will work on any datatype
having a unary '-' operator, and you could imagine someone wanting to do
something roughly like this in more realistic cases.  But if you want to
assume that the function returns pos when handed pos, you'd better be
prepared to insert a CastToDomain node to recheck the domain constraint.
Right now the SQL-function code doesn't support such cases:

regression=# select negate(42::pos);
ERROR:  return type mismatch in function declared to return pos
DETAIL:  Actual return type is integer.
CONTEXT:  SQL function negate during inlining

If we smashed to base type then this issue would go away.

On the other hand it feels like we'd be taking yet another step away
from allowing domains to be usefully used in function declarations.
I can't put my finger on any concrete consequence of that sort, since
what we're talking about here is ANYELEMENT/ANYARRAY functions not
functions declared to take domains --- but it sure seems like this
would put domains even further away from the status of first-class
citizenship in the type system.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Domains versus polymorphic functions, redux

2011-05-24 Thread Tom Lane
Merlin Moncure mmonc...@gmail.com writes:
 On Tue, May 24, 2011 at 11:12 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 1. If a domain type is passed to an ANYARRAY argument, automatically
 downcast it to its base type (which of course had better then be an
 array).

 Does that mean that plpgsql %type variable declarations will see the
 base type (and miss any constraint checks?).

No, this has nothing to do with %type.  What's at stake is matching to
functions/operators that are declared to take ANYARRAY.

 #2a seems cleaner to me (superficially).  Got an example of a behavior
 you think is changed?

See my response to David Wheeler.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] errno not set in case of libm functions (HPUX)

2011-05-24 Thread Ibrar Ahmed
I have found a problem which is specifically related to  HP-UX compiler.
All 'libm' functions on HP-UX Integrity server do not set errno by
default. For 'errno' setting we should compile the code using +Olibmerrno
option. So we should add this option in /src/makefiles/Makefile.hpux.
Otherwise we cannot expect this code to work properly


[float.c]
Datum
dacos(PG_FUNCTION_ARGS)
{
...
errno = 0;
result = acos(arg1);
if (errno != 0)
ereport(ERROR,

(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
 errmsg(input is out of range)));

...
}

Because acos function will not set the errono in case of invalid input, so
check will not trigger the error message. I have attached a patch to add
this option in HPUX makefile.

BTW I have found same kind of discussion without any conclusion here

http://archives.postgresql.org/pgsql-hackers/2011-05/msg00046.php

-- 
   Ibrar Ahmed
diff --git a/src/makefiles/Makefile.hpux b/src/makefiles/Makefile.hpux
index 1917d61..f2a8f19 100644
--- a/src/makefiles/Makefile.hpux
+++ b/src/makefiles/Makefile.hpux
@@ -43,6 +43,12 @@ else
CFLAGS_SL = +Z
 endif
 
+
+# HP-UX libm functions on 'Integrity server' do not set errno by default,
+# for errno setting, compile with the +Olibmerrno option.
+
+CFLAGS := +Olibmerrno $(CFLAGS)
+
 # Rule for building a shared library from a single .o file
 %$(DLSUFFIX): %.o
 ifeq ($(GCC), yes)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Domains versus polymorphic functions, redux

2011-05-24 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 On Tue, May 24, 2011 at 12:12:55PM -0400, Tom Lane wrote:
 This is a consequence of the changes I made to fix bug #5717,
 particularly the issues around ANYARRAY matching discussed here:
 http://archives.postgresql.org/pgsql-hackers/2010-10/msg01545.php

 We discussed this a few weeks ago:
 http://archives.postgresql.org/message-id/20110511093217.gb26...@tornado.gateway.2wire.net

 What's to recommend #1 over what I proposed then?  Seems like a discard of
 functionality for little benefit.

I am unwilling to commit to making #2 work, especially not under time
constraints; and you apparently aren't either, since you haven't
produced the patch you alluded to at the end of that thread.  Even if
you had, though, I'd have no confidence that all holes of the sort had
been closed.  What you're proposing is to ratchet up the implementation
requirements for every PL and and every C function declared to accept
polymorphic types, and there are a lot of members of both classes that
we don't control.

 I hesitate to say this is so clearly right as to warrant that change.  Even if
 it is right, though, this smells like 9.2 material.

Well, I'd been hoping to leave it for later too, but it seems like we
have to do something about the ANYARRAY case for 9.1.  Making ANYARRAY's
response to domains significantly inconsistent with ANYELEMENT's
response doesn't seem like a good plan.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Another attempt at vacuum improvements

2011-05-24 Thread Robert Haas
So, first of all, thanks for putting some effort and thought into
this.  Despite the large number of improvements in this area in 8.3
and 8.4, this is still a pain point, and it would be really nice to
find a way to make some further improvements.

On Tue, May 24, 2011 at 2:58 AM, Pavan Deolasee
pavan.deola...@gmail.com wrote:
 So the idea is to separate the index vacuum (removing index pointers to dead
 tuples) from the heap vacuum. When we do heap vacuum (either by HOT-pruning
 or using regular vacuum), we can spool the dead line pointers somewhere. To
 avoid any hot-spots during normal processing, the spooling can be done
 periodically like the stats collection.

What happens if the system crashes after a line pointer becomes dead
but before the record of its death is safely on disk?  The fact that a
previous index vacuum has committed is only sufficient justification
for reclaiming the dead line pointers if you're positive that the
index vacuum killed the index pointers for *every* dead line pointer.
I'm not sure we want to go there; any operation that wants to make a
line pointer dead will need to be XLOG'd.  Instead, I think we should
stick with your original idea and just try to avoid the second heap
pass.

So to do that, as you say, we can have every operation that creates a
dead line pointer note the LSN of the operation in the page.  But
instead of allocating permanent space in the page header, which would
both reduce (admittedly only by 8 bytes) the amount of space available
for tuples, and more significantly have the effect of breaking on-disk
compatibility, I'm wondering if we could get by with making space for
that extra LSN only when it's actually present. In other words, when
it's present, we set a bit PD_HAS_DEAD_LINE_PTR_LSN or somesuch,
increment pd_upper, and use the extra space to store the LSN.  There
is an alignment problem to worry about there but that shouldn't be a
huge issue.

When we vacuum, we remember the LSN before we start.  When we finish,
if we scanned the indexes and everything completed without error, then
we bump the heap's notion (wherever we store it) of the last
successful index vacuum.  When we vacuum or do HOT cleanup on a page,
if the page has a most-recent-dead-line pointer LSN and it precedes
the start-of-last-successful-index-vacuum LSN, then we mark all the
LP_DEAD tuples as LP_UNUSED and throw away the
most-recent-dead-line-pointer LSN.

One downside of this approach is that, if we do something like this,
it'll become slightly more complicated to figure out where the item
pointer array ends.  Another issue is that we might find ourselves
wanting to extend the item pointer array to add a new item, and unable
to do so easily because this most-recent-dead-line-pointer LSN is in
the way.  If the LSN stored in the page precedes the
start-of-last-successful-index-vacuum LSN, and if, further, we can get
a buffer cleanup lock on the page, then we can do a HOT cleanup and
life is good.  Otherwise, we can either (1) just forget about the
most-recent-dead-line-pointer LSN - not ideal but not catastrophic
either - or (2) if the start-of-last-successful-vacuum-LSN is old
enough, we could overwrite an LP_DEAD line pointer in place.

Another issue is that this causes problems for temporary and unlogged
tables, because no WAL records are generated and, therefore, the LSN
does not advance.  This is also a problem for GIST indexes; Heikki
fixed temporary GIST indexes by generating fake LSNs off of a
backend-local counter.  Unlogged GIST indexes are currently not
supported.  I think what we need to do is create an API to which you
can pass a relation and get an LSN.  If it's a permanent relation, you
get a regular LSN.  If it's a temporary relation, you get a fake LSN
based on a backend-local counter.  If it's an unlogged relation, you
get a fake LSN based on a shared-memory counter that is reset on
restart.  If we can encapsulate that properly, it should provide both
what we need to make this idea work and allow a somewhat graceful fix
for GIST-vs-unlogged problem.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Error compiling sepgsql in PG9.1

2011-05-24 Thread Kohei Kaigai
The attached patch enables to abort configure script when we run it with 
'--with-selinux'
option, but libselinux is older than minimum requirement to SE-PostgreSQL.

As the documentation said, it needs libselinux-2.0.93 at least, because this or 
later
version support selabel_lookup(3) for database object classes; used to initial 
labeling.

The current configure script checks existence of libselinux, but no version 
checks.
(getpeercon_raw(3) has been a supported API for a long term.)
The selinux_sepgsql_context_path(3) is a good watermark of libselinux-2.0.93 
instead.

Thanks,
--
NEC Europe Ltd, SAP Global Competence Center
KaiGai Kohei kohei.kai...@emea.nec.com


 -Original Message-
 From: Devrim GÜNDÜZ [mailto:dev...@gunduz.org]
 Sent: 21. Mai 2011 07:46
 To: Kohei Kaigai
 Cc: Emanuel Calvo; postgresql Forums; KaiGai Kohei
 Subject: Re: [GENERAL] Error compiling sepgsql in PG9.1
 
 On Sat, 2011-05-21 at 02:50 +0100, Kohei Kaigai wrote:
  As documentation said, it needs libselinux 2.0.93 or higher.
  This version supports selabel_lookup(3) for database object classes.
 
 AFAICS, we are not checking it during configure. It might be worth to add 
 libselinux version check
 in the configure phase.
 --
 Devrim GÜNDÜZ
 Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com 
 PostgreSQL
 Danışmanı/Consultant, Red Hat Certified Engineer
 Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr 
 http://www.gunduz.org  Twitter:
 http://twitter.com/devrimgunduz


sepgsql-fix-config-version.patch
Description: sepgsql-fix-config-version.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pull up aggregate subquery

2011-05-24 Thread Hitoshi Harada
2011/5/25 Tom Lane t...@sss.pgh.pa.us:
 Robert Haas robertmh...@gmail.com writes:
 That was my first thought, too, but then I wondered if I was getting
 cheap.

 Yeah, it's certainly possible that we're worrying too much.  Usually
 I only get concerned about added planner logic if it will impact the
 planning time for simple queries.  Simple tends to be in the eye of
 the beholder, but something with a complicated aggregate subquery is
 probably not simple by anyone's definition.

 In this case the sticky point is that there could be multiple possible
 sets of clauses available to be pushed down, depending on what you
 assume is the outer relation for the eventual upper-level nestloop.
 So worst case, you could have not just one parameterized plan to
 generate in addition to the regular kind, but 2^N of them ...

My intention is that if join qual matches subqury Agg's grouping keys
then the Var can be pushed down, so I'm not worried about the
exponential possibilities of paths growth.

And I found the right place to hack, where set_subquery_pathlist()
pushes down some baseristrictinfo. We don't have Var in the
RestrictInfo now, but I guess we can put them in it somehow before
reaching there.

Even if I can do it, the effective case is only outer is only one
tuple case. As I noted earlier this optimization will complete by
executor's cooperation, which is something like
gather-param-values-as-array before starting Agg scan. So I'm still
thinking which of pulling up and parameterized scan is better.

Regards,



-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2 schedule

2011-05-24 Thread Josh Berkus
Robert,

 -Publish information about the full schedule to some of the more popular
 mailing lists

I think that posting to pgsql-announce and PostgreSQL.org news, and this
list would be sufficient.  I'm happy to take care of that.

 -Link to this page more obviously from postgresql.org (fixed redirect
 URL is probably the right approach) to bless it, and potentially
 improve its search rank too.

I would suggest instead adding a new page to postgresql.org/developer
which lists the development schedule, rather than linking to that wiki
page.  Maybe on this page?

http://www.postgresql.org/developer/roadmap

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reducing overhead of frequent table locks

2011-05-24 Thread Robert Haas
On Tue, May 24, 2011 at 12:34 PM, Noah Misch n...@leadboat.com wrote:
 There's a potentially-unbounded delay between when the subject backend reads
 strong_lock_counts[] and when it sets its fast-path-used flag.  (I didn't mean
 not yet seen in the sense that some memory load would not show the latest
 value.  I just meant that the subject backend may still be taking relevant
 actions based on its previous load of the value.)  We could have the subject
 set its fast-path-used flag before even checking strong_lock_counts[], then
 clear the flag when strong_lock_counts[] dissuaded it from proceeding.  Maybe
 that's what you had in mind?

I'd like to say yes, but actually, no, I just failed to notice the
race condition.  It's definitely less appealing if we have to do it
that way.

Another idea would be to only clear the fast-path-used flags lazily.
If backend A inspects the fast-path queue for backend B and finds it
completely empty, it clears the flag; otherwise it just stays set
indefinitely.

 That being said, it's a slight extra cost for all fast-path lockers to benefit
 the strong lockers, so I'm not prepared to guess whether it will pay off.

Yeah.  Basically this entire idea is about trying to make life easier
for weak lockers at the expense of making it more difficult for strong
lockers.  I think that's a good trade-off in general, but we might
need to wait until we have an actual implementation to judge whether
we've turned the dial too far.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2 schedule

2011-05-24 Thread Robert Haas
On Tue, May 24, 2011 at 1:35 PM, Josh Berkus j...@agliodbs.com wrote:
 Robert,

Actually, you're responding to Greg, not me.

But +1 for your suggestions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] errno not set in case of libm functions (HPUX)

2011-05-24 Thread Tom Lane
Ibrar Ahmed ibrar.ah...@gmail.com writes:
 I have found a problem which is specifically related to  HP-UX compiler.
 All 'libm' functions on HP-UX Integrity server do not set errno by
 default. For 'errno' setting we should compile the code using +Olibmerrno
 option. So we should add this option in /src/makefiles/Makefile.hpux.

This patch will break things on my admittedly rather ancient HPUX box:

$ cc +Olibmerrno
cc: warning 450: Unrecognized option +Olibmerrno.

As submitted, it would also break gcc-based builds, though that at least
wouldn't be hard to fix.

If you want to submit a configure patch to test whether the switch is
appropriate, we could consider it.

BTW, is it really true that HP decided they could make the compiler's
default behavior violate the C standard so flagrantly?  I could believe
offering a switch that you had to specify to save a few cycles at the
cost of nonstandard behavior; but if your report is actually correct,
their engineering standards have gone way downhill since I worked there.
I wonder whether you are inserting some other nonstandard switch that
turns on this effect.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2 schedule

2011-05-24 Thread Josh Berkus

 Actually, you're responding to Greg, not me.

Sorry.

 But +1 for your suggestions.

Any objections before I post something?  Greg?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pull up aggregate subquery

2011-05-24 Thread Robert Haas
On Tue, May 24, 2011 at 12:34 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Oh, I see.  I have a general gripe with nested loop plans: we already
 consider too many of them.  IIRC, when I last fooled around with this,
 the number of nested loop paths that we generate far exceeded the
 number of merge or hash join paths, and most of those paths suck and
 are a complete waste of time.

 Hm, really?  My experience is that it's the mergejoin paths that breed
 like rabbits, because there are so many potential sort orders.

*scratches head*

Well, I'm pretty sure that's how it looked when I was testing it.  I
wonder how this could be different for the two of us.  Or maybe one of
us is confused.  Admittedly, I haven't looked at it in a while.

 But I think this is all fairly unrelated to the case that Hitoshi is on
 about.  As you said earlier, it seems like we'd have to derive both
 parameterized and unparameterized plans for the subquery, which seems
 mighty expensive.

 That was my first thought, too, but then I wondered if I was getting
 cheap.

 Yeah, it's certainly possible that we're worrying too much.  Usually
 I only get concerned about added planner logic if it will impact the
 planning time for simple queries.  Simple tends to be in the eye of
 the beholder, but something with a complicated aggregate subquery is
 probably not simple by anyone's definition.

 In this case the sticky point is that there could be multiple possible
 sets of clauses available to be pushed down, depending on what you
 assume is the outer relation for the eventual upper-level nestloop.
 So worst case, you could have not just one parameterized plan to
 generate in addition to the regular kind, but 2^N of them ...

Hmm.  Well, 2^N is more than 2.  But I bet most of them are boring.
Judging by his followup email, Hitoshi Harada seems to think we can
just look at the case where we can parameterize on all of the grouping
columns.  The only other case that seems like it might be interesting
is parameterizing on any single one of the grouping columns.  I can't
get excited about pushing down arbitrary subsets.  Of course, even
O(N) in the number of grouping columns might be too much, but then we
could fall back to just all or nothing.  I think the all case by
itself would probably extract 90%+ of the benefit, especially since
all will often mean the only one there is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] errno not set in case of libm functions (HPUX)

2011-05-24 Thread Andrew Dunstan



On 05/24/2011 01:44 PM, Tom Lane wrote:

Ibrar Ahmedibrar.ah...@gmail.com  writes:

I have found a problem which is specifically related to  HP-UX compiler.
All 'libm' functions on HP-UX Integrity server do not set errno by
default. For 'errno' setting we should compile the code using +Olibmerrno
option. So we should add this option in /src/makefiles/Makefile.hpux.

This patch will break things on my admittedly rather ancient HPUX box:

$ cc +Olibmerrno
cc: warning 450: Unrecognized option +Olibmerrno.

As submitted, it would also break gcc-based builds, though that at least
wouldn't be hard to fix.

If you want to submit a configure patch to test whether the switch is
appropriate, we could consider it.

BTW, is it really true that HP decided they could make the compiler's
default behavior violate the C standard so flagrantly?  I could believe
offering a switch that you had to specify to save a few cycles at the
cost of nonstandard behavior; but if your report is actually correct,
their engineering standards have gone way downhill since I worked there.
I wonder whether you are inserting some other nonstandard switch that
turns on this effect.




I have been whining for years about the lack of HP-UX support (both for 
gcc and their compiler) on the buildfarm. I really really wish HP would 
come to the party and supply some equipment and software. Failing that, 
some spare cycles being made available on a machine by someone else who 
runs it would be good.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] errno not set in case of libm functions (HPUX)

2011-05-24 Thread Heikki Linnakangas

On 24.05.2011 20:44, Tom Lane wrote:

BTW, is it really true that HP decided they could make the compiler's
default behavior violate the C standard so flagrantly?  I could believe
offering a switch that you had to specify to save a few cycles at the
cost of nonstandard behavior; but if your report is actually correct,
their engineering standards have gone way downhill since I worked there.
I wonder whether you are inserting some other nonstandard switch that
turns on this effect.


This (http://docs.hp.com/en/B3901-90015/ch02s07.html) says:


+O[no]libmerrno

Description:

This option enables[disables] support for errno in libm functions. The default 
is +Onolibmerrno.

In C++ C-mode, the default is +Olibmerrno with -Aa option.


So the default is indeed non-standard. But I wonder if we should use -Aa 
instead? The documentation I found for -Aa 
(http://docs.hp.com/en/B3901-90017/ch02s22.html) says:



-Aa

The -Aa option instructs the compiler to use Koenig lookup and strict ANSI for 
scope rules. This option is equivalent to specifying -Wc,-koenig_lookup,on and 
-Wc,-ansi_for_scope,on.

The default is off. Refer to -Ae option for C++ C-mode description. The 
standard features enabled by -Aa are incompatible with earlier C and C++ 
features.


That sounds like what we want. Apparently that description is not 
complete, and -Aa changes some other behavior to ANSI C compatible as 
well, like +Olibmerrno. There's also -AC99, which specifies compiling in 
C99-mode - I wonder if that sets +Olibmerrno too.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Domains versus polymorphic functions, redux

2011-05-24 Thread Noah Misch
On Tue, May 24, 2011 at 01:28:38PM -0400, Tom Lane wrote:
 Noah Misch n...@leadboat.com writes:
  On Tue, May 24, 2011 at 12:12:55PM -0400, Tom Lane wrote:
  This is a consequence of the changes I made to fix bug #5717,
  particularly the issues around ANYARRAY matching discussed here:
  http://archives.postgresql.org/pgsql-hackers/2010-10/msg01545.php
 
  We discussed this a few weeks ago:
  http://archives.postgresql.org/message-id/20110511093217.gb26...@tornado.gateway.2wire.net
 
  What's to recommend #1 over what I proposed then?  Seems like a discard of
  functionality for little benefit.
 
 I am unwilling to commit to making #2 work, especially not under time
 constraints; and you apparently aren't either, since you haven't
 produced the patch you alluded to at the end of that thread.

I took your lack of any response as non-acceptance of the plan I outlined.
Alas, the wrong conclusion.  I'll send a patch this week.

 Even if
 you had, though, I'd have no confidence that all holes of the sort had
 been closed.  What you're proposing is to ratchet up the implementation
 requirements for every PL and and every C function declared to accept
 polymorphic types, and there are a lot of members of both classes that
 we don't control.

True.  I will not give you that confidence.  Those omissions would have to
remain bugs to be fixed as they're found.

nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] errno not set in case of libm functions (HPUX)

2011-05-24 Thread Heikki Linnakangas

On 24.05.2011 20:56, Andrew Dunstan wrote:

I have been whining for years about the lack of HP-UX support (both for
gcc and their compiler) on the buildfarm. I really really wish HP would
come to the party and supply some equipment and software. Failing that,
some spare cycles being made available on a machine by someone else who
runs it would be good.


I'm trying to arrange access to a HP-UX box within EnterpriseDB. No luck 
this far. Hopefully I'll get a buildfarm animal up in the next week or 
so, but don't hold your breath...


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Alignment padding bytes in arrays vs the planner

2011-05-24 Thread Robert Haas
On Mon, May 23, 2011 at 1:12 AM, Noah Misch n...@leadboat.com wrote:
 On Tue, Apr 26, 2011 at 11:51:35PM -0400, Noah Misch wrote:
 On Tue, Apr 26, 2011 at 07:23:12PM -0400, Tom Lane wrote:
 [input functions aren't the only problematic source of uninitialized datum 
 bytes]

  We've run into other manifestations of this issue before.  Awhile ago
  I made a push to ensure that datatype input functions didn't leave any
  ill-defined padding bytes in their results, as a result of similar
  misbehavior for simple constants.  But this example shows that we'd
  really have to enforce the rule of no ill-defined bytes for just about
  every user-callable function's results, which is a pretty ugly prospect.

 FWIW, when I was running the test suite under valgrind, these were the 
 functions
 that left uninitialized bytes in datums: array_recv, array_set, 
 array_set_slice,
 array_map, construct_md_array, path_recv.  If the test suite covers this 
 well,
 we're not far off.  (Actually, I only had the check in PageAddItem ... 
 probably
 needed to be in one or two other places to catch as much as possible.)

 Adding a memory definedness check to printtup() turned up one more culprit:
 tsquery_and.

*squints*

OK, I can't see what's broken.  Help?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Alignment padding bytes in arrays vs the planner

2011-05-24 Thread Noah Misch
On Tue, May 24, 2011 at 02:05:33PM -0400, Robert Haas wrote:
 On Mon, May 23, 2011 at 1:12 AM, Noah Misch n...@leadboat.com wrote:
  On Tue, Apr 26, 2011 at 11:51:35PM -0400, Noah Misch wrote:
  On Tue, Apr 26, 2011 at 07:23:12PM -0400, Tom Lane wrote:
  [input functions aren't the only problematic source of uninitialized datum 
  bytes]
 
   We've run into other manifestations of this issue before. ?Awhile ago
   I made a push to ensure that datatype input functions didn't leave any
   ill-defined padding bytes in their results, as a result of similar
   misbehavior for simple constants. ?But this example shows that we'd
   really have to enforce the rule of no ill-defined bytes for just about
   every user-callable function's results, which is a pretty ugly prospect.
 
  FWIW, when I was running the test suite under valgrind, these were the 
  functions
  that left uninitialized bytes in datums: array_recv, array_set, 
  array_set_slice,
  array_map, construct_md_array, path_recv. ?If the test suite covers this 
  well,
  we're not far off. ?(Actually, I only had the check in PageAddItem ... 
  probably
  needed to be in one or two other places to catch as much as possible.)
 
  Adding a memory definedness check to printtup() turned up one more culprit:
  tsquery_and.
 
 *squints*
 
 OK, I can't see what's broken.  Help?

QTN2QT() allocates memory for a TSQuery using palloc().  TSQuery contains an
array of QueryItem, which contains three bytes of padding between its first and
second members.  Those bytes don't get initialized, so we have unpredictable
content in the resulting datum.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Error compiling sepgsql in PG9.1

2011-05-24 Thread Robert Haas
2011/5/24 Kohei Kaigai kohei.kai...@emea.nec.com:
 The attached patch enables to abort configure script when we run it with 
 '--with-selinux'
 option, but libselinux is older than minimum requirement to SE-PostgreSQL.

 As the documentation said, it needs libselinux-2.0.93 at least, because this 
 or later
 version support selabel_lookup(3) for database object classes; used to 
 initial labeling.

 The current configure script checks existence of libselinux, but no version 
 checks.
 (getpeercon_raw(3) has been a supported API for a long term.)
 The selinux_sepgsql_context_path(3) is a good watermark of libselinux-2.0.93 
 instead.

Looks to me like you need to adjust the wording of the error message.

Maybe libselinux version 2.0.93 or newer is required, or something like that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] errno not set in case of libm functions (HPUX)

2011-05-24 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 So the default is indeed non-standard. But I wonder if we should use -Aa 
 instead?

Probably not; at least on older HPUX versions, -Aa turns off access to
assorted stuff that we do want, eg long long.  man cc on my box
saith

 -Amode Specify the compilation standard to be used by the
compiler.  mode can be one of the following letters:

   c(Default) Compile in a mode compatible with
HP-UX releases prior to 7.0.  (See The C
Programming Language, First Edition by
Kernighan and Ritchie).  This option also
defines the symbol _HPUX_SOURCE and allows the
user to access macros and typedefs provided by
the HPUX Operating System. The default
compilation mode may change in future releases.

   aCompile under ANSI mode (ANSI programming
language C standard ISO 9899:1990).  When
compiling under ANSI mode, the header files
would define only those names (macros and
typedefs) specified by the Standard. To access
macros and typedefs that are not defined by the
ANSI Standard but are provided by the HPUX
Operating System, define the symbol
_HPUX_SOURCE; or use the extension option
described below.

   eExtended ANSI mode.  Same as -Aa -D_HPUX_SOURCE
+e.  This would define the names (macros and
typedefs) provided by the HPUX Operating System
and, in addition, allow the following
extensions: $ characters in identifier names,
sized enums, sized bit-fields, and 64-bit
integral type long long.  Additional extensions
may be added to this option in the future.

The +e option is elsewhere stated to mean

  +eEnables HP value-added features while compiling
in ANSI C mode, -Aa.  This option is ignored
with -Ac because these features are already
provided.  Features enabled:

 o  Long pointers
 o  Integral type specifiers can appear in
enum declarations.
 o  The $ character can appear in
identifier names.
 o  Missing parameters on intrinsic calls

which isn't 100% consistent with what it says under -Ae, so maybe some
additional experimentation is called for.  But anyway, autoconf appears
to think that -Ae is preferable to the combination -Aa -D_HPUX_SOURCE
(that choice is coming from autoconf not our own code); so I'm not
optimistic that we can get more-standard behavior by overriding that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Alignment padding bytes in arrays vs the planner

2011-05-24 Thread Robert Haas
On Tue, May 24, 2011 at 2:11 PM, Noah Misch n...@leadboat.com wrote:
 OK, I can't see what's broken.  Help?

 QTN2QT() allocates memory for a TSQuery using palloc().  TSQuery contains an
 array of QueryItem, which contains three bytes of padding between its first 
 and
 second members.  Those bytes don't get initialized, so we have unpredictable
 content in the resulting datum.

OK, so I guess this needs to be applied and back-patched to 8.3, then.
 8.2 doesn't have this code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Domains versus polymorphic functions, redux

2011-05-24 Thread David E. Wheeler
On May 24, 2011, at 10:11 AM, Tom Lane wrote:

 regression=# select negate(42::pos);
 ERROR:  return type mismatch in function declared to return pos
 DETAIL:  Actual return type is integer.
 CONTEXT:  SQL function negate during inlining
 
 If we smashed to base type then this issue would go away.

+1

 On the other hand it feels like we'd be taking yet another step away
 from allowing domains to be usefully used in function declarations.
 I can't put my finger on any concrete consequence of that sort, since
 what we're talking about here is ANYELEMENT/ANYARRAY functions not
 functions declared to take domains --- but it sure seems like this
 would put domains even further away from the status of first-class
 citizenship in the type system.

I agree. It sure seems to me like DOMAINs should act exactly like any other 
type. I know that has improved over time, and superficially at least, the above 
will make it seem like more like than it does with the error. But maybe it's 
time to re-think how domains are implemented? (Not for 9.1, mind.) I mean, why 
*don't* they act like first class types?

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Alignment padding bytes in arrays vs the planner

2011-05-24 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, May 24, 2011 at 2:11 PM, Noah Misch n...@leadboat.com wrote:
 QTN2QT() allocates memory for a TSQuery using palloc().  TSQuery contains an
 array of QueryItem, which contains three bytes of padding between its first 
 and
 second members.  Those bytes don't get initialized, so we have unpredictable
 content in the resulting datum.

 OK, so I guess this needs to be applied and back-patched to 8.3, then.

Yeah.  I'm in process of doing that, actually.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Alignment padding bytes in arrays vs the planner

2011-05-24 Thread Robert Haas
On Tue, May 24, 2011 at 2:18 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, May 24, 2011 at 2:11 PM, Noah Misch n...@leadboat.com wrote:
 QTN2QT() allocates memory for a TSQuery using palloc().  TSQuery contains an
 array of QueryItem, which contains three bytes of padding between its first 
 and
 second members.  Those bytes don't get initialized, so we have unpredictable
 content in the resulting datum.

 OK, so I guess this needs to be applied and back-patched to 8.3, then.

 Yeah.  I'm in process of doing that, actually.

Excellent.  Are you going to look at MauMau's patch for bug #6011 also?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Domains versus polymorphic functions, redux

2011-05-24 Thread Tom Lane
David E. Wheeler da...@kineticode.com writes:
 On May 24, 2011, at 10:11 AM, Tom Lane wrote:
 On the other hand it feels like we'd be taking yet another step away
 from allowing domains to be usefully used in function declarations.

 I agree. It sure seems to me like DOMAINs should act exactly like any
 other type. I know that has improved over time, and superficially at
 least, the above will make it seem like more like than it does with
 the error. But maybe it's time to re-think how domains are
 implemented? (Not for 9.1, mind.) I mean, why *don't* they act like
 first class types?

Well, if they actually were first-class types, they probably wouldn't
be born with an implicit cast to some other type to handle 99% of
operations on them ;-).  I think the hard part here is having that cake
and eating it too, ie, supporting domain-specific functions without
breaking the implicit use of the base type's functions.

I guess that the question that's immediately at hand is sort of a
variant of that, because using a polymorphic function declared to take
ANYARRAY on a domain-over-array really is using a portion of the base
type's functionality.  What we've learned from bug #5717 and the
subsequent issues is that using that base functionality without
immediately abandoning the notion that the domain has some life of its
own (ie, immediately casting to the base type) is harder than it looks.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Domains versus polymorphic functions, redux

2011-05-24 Thread David E. Wheeler
On May 24, 2011, at 11:30 AM, Tom Lane wrote:

 Well, if they actually were first-class types, they probably wouldn't
 be born with an implicit cast to some other type to handle 99% of
 operations on them ;-).  I think the hard part here is having that cake
 and eating it too, ie, supporting domain-specific functions without
 breaking the implicit use of the base type's functions.

Yeah.

 I guess that the question that's immediately at hand is sort of a
 variant of that, because using a polymorphic function declared to take
 ANYARRAY on a domain-over-array really is using a portion of the base
 type's functionality.  What we've learned from bug #5717 and the
 subsequent issues is that using that base functionality without
 immediately abandoning the notion that the domain has some life of its
 own (ie, immediately casting to the base type) is harder than it looks.

Well, in the ANYELEMENT context (or ANYARRAY), what could be lost by 
abandoning the notion that the domain has some life of its own?

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Alignment padding bytes in arrays vs the planner

2011-05-24 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 Adding a memory definedness check to printtup() turned up one more culprit:
 tsquery_and.

Patch applied, thanks.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Alignment padding bytes in arrays vs the planner

2011-05-24 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Excellent.  Are you going to look at MauMau's patch for bug #6011 also?

No.  I don't do Windows, so I can't test it.

(On general principles, I don't think that hacking write_eventlog the
way he did is appropriate; such a function should write the log, not
editorialize.  But that's up to whoever does commit it.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] inconvenient compression options in pg_basebackup

2011-05-24 Thread Peter Eisentraut
On sön, 2011-05-22 at 16:43 -0400, Magnus Hagander wrote:
 On Fri, May 20, 2011 at 17:45, Peter Eisentraut pete...@gmx.net wrote:
  On fre, 2011-05-20 at 14:19 -0400, Magnus Hagander wrote:
   I suggest we add an argument-less option -z that means compress,
  and
   then -Z can be relegated to choosing the compression level.
 
  We can't just use -Z without a parameter for that?
 
  You can't portably have a command-line option with an optional argument.
 
 Ugh.
 
 In that case, I'm fine with your suggestion.

Quick patch for verification.  I chose the naming -z/--gzip to mirror
GNU tar.
diff --git i/doc/src/sgml/ref/pg_basebackup.sgml w/doc/src/sgml/ref/pg_basebackup.sgml
index 8a7b833..ce7eb52 100644
--- i/doc/src/sgml/ref/pg_basebackup.sgml
+++ w/doc/src/sgml/ref/pg_basebackup.sgml
@@ -169,8 +169,8 @@ PostgreSQL documentation
  /varlistentry
 
  varlistentry
-  termoption-Z replaceable class=parameterlevel/replaceable/option/term
-  termoption--compress=replaceable class=parameterlevel/replaceable/option/term
+  termoption-z/option/term
+  termoption--gzip/option/term
   listitem
para
 Enables gzip compression of tar file output. Compression is only
@@ -179,6 +179,18 @@ PostgreSQL documentation
/para
   /listitem
  /varlistentry
+
+ varlistentry
+  termoption-Z replaceable class=parameterlevel/replaceable/option/term
+  termoption--compress-level=replaceable class=parameterlevel/replaceable/option/term
+  listitem
+   para
+Sets the compression level when gzip compression is enabled.
+The default is the default compression level of the zlib
+library.
+   /para
+  /listitem
+ /varlistentry
 /variablelist
/para
para
@@ -394,11 +406,11 @@ PostgreSQL documentation
   /para
 
   para
-   To create a backup of the local server with one maximum compressed
+   To create a backup of the local server with one compressed
tar file for each tablespace, and store it in the directory
filenamebackup/filename, showing a progress report while running:
 screen
-prompt$/prompt userinputpg_basebackup -D backup -Ft -Z9 -P/userinput
+prompt$/prompt userinputpg_basebackup -D backup -Ft -z -P/userinput
 /screen
   /para
 
diff --git i/src/bin/pg_basebackup/pg_basebackup.c w/src/bin/pg_basebackup/pg_basebackup.c
index 1f31fe0..7c2cb57 100644
--- i/src/bin/pg_basebackup/pg_basebackup.c
+++ w/src/bin/pg_basebackup/pg_basebackup.c
@@ -32,7 +32,10 @@ char		format = 'p';		/* p(lain)/t(ar) */
 char	   *label = pg_basebackup base backup;
 bool		showprogress = false;
 int			verbose = 0;
-int			compresslevel = 0;
+bool		gzip = false;
+#ifdef HAVE_LIBZ
+int			compresslevel = Z_DEFAULT_COMPRESSION;
+#endif
 bool		includewal = false;
 bool		fastcheckpoint = false;
 char	   *dbhost = NULL;
@@ -126,7 +129,8 @@ usage(void)
 	printf(_(  -D, --pgdata=DIRECTORY   receive base backup into directory\n));
 	printf(_(  -F, --format=p|t output format (plain, tar)\n));
 	printf(_(  -x, --xlog   include required WAL files in backup\n));
-	printf(_(  -Z, --compress=0-9   compress tar output\n));
+	printf(_(  -z, --gzip   compress tar output with gzip\n));
+	printf(_(  -Z, --compress-level=0-9 compression level\n));
 	printf(_(\nGeneral options:\n));
 	printf(_(  -c, --checkpoint=fast|spread\n
 			set fast or spread checkpointing\n));
@@ -265,7 +269,7 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
 		else
 		{
 #ifdef HAVE_LIBZ
-			if (compresslevel  0)
+			if (gzip)
 			{
 snprintf(fn, sizeof(fn), %s/base.tar.gz, basedir);
 ztarfile = gzopen(fn, wb);
@@ -289,7 +293,7 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
 		 * Specific tablespace
 		 */
 #ifdef HAVE_LIBZ
-		if (compresslevel  0)
+		if (gzip)
 		{
 			snprintf(fn, sizeof(fn), %s/%s.tar.gz, basedir, PQgetvalue(res, rownum, 0));
 			ztarfile = gzopen(fn, wb);
@@ -309,7 +313,7 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
 	}
 
 #ifdef HAVE_LIBZ
-	if (compresslevel  0)
+	if (gzip)
 	{
 		if (!ztarfile)
 		{
@@ -919,7 +923,8 @@ main(int argc, char **argv)
 		{format, required_argument, NULL, 'F'},
 		{checkpoint, required_argument, NULL, 'c'},
 		{xlog, no_argument, NULL, 'x'},
-		{compress, required_argument, NULL, 'Z'},
+		{gzip, no_argument, NULL, 'z'},
+		{compress-level, required_argument, NULL, 'Z'},
 		{label, required_argument, NULL, 'l'},
 		{host, required_argument, NULL, 'h'},
 		{port, required_argument, NULL, 'p'},
@@ -952,7 +957,7 @@ main(int argc, char **argv)
 		}
 	}
 
-	while ((c = getopt_long(argc, argv, D:F:l:Z:c:h:p:U:xwWvP,
+	while ((c = getopt_long(argc, argv, D:F:l:c:h:p:U:xwWvPzZ:,
 			long_options, option_index)) != -1)
 	{
 		switch (c)
@@ -978,6 +983,9 @@ main(int argc, char **argv)
 			case 'l':
 label = xstrdup(optarg);
 break;
+			case 'z':
+gzip = true;
+break;
 			case 'Z':
 compresslevel = 

Re: [HACKERS] Adding an example for replication configuration to pg_hba.conf

2011-05-24 Thread Bruce Momjian
Magnus Hagander wrote:
   As I mentioned offlist, I'd like it in teal please.
 
  Applied with some further minor bikeshedding (remove trailing spaces,
  rewrap so columns aren't wider than 80 chars, etc)
 
  Let me just point out that people who have already run initdb during
  beta will not see this in their pg_hba.conf, nor in their
  share/pg_hba.conf.sample, even after they have upgraded to a later beta,

Oops, yes, I was wrong here.  Sorry.

  unless they run initdb. ?However, we have bumped the catalog version for
  something else so they should then get this change.
 
 Why would they not see it in their share/pg_hba.conf.sample?
 
 It will not affect the existing one in $PGDATA, but why wouldn't the
 installed .sample change?

Yes, the problem is the sample will change, but the $PGDATA will not, so
anyone doing a diff of the two files to see the localized changes will
see the changes that came in as part of that commit.

  My point is if we change configuration files and then don't bump the
  catalog version, the share/*.sample files get out of sync with the files
  in /data, which can be kind of confusing.
 
 They would - but what you are saying above is that they would not get
 out of sync, because the share/*.sample also don't update. Just a
 mistake in what you said above, or am I missing something?

Yes, my mistake.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Domains versus polymorphic functions, redux

2011-05-24 Thread Tom Lane
David E. Wheeler da...@kineticode.com writes:
 On May 24, 2011, at 11:30 AM, Tom Lane wrote:
 I guess that the question that's immediately at hand is sort of a
 variant of that, because using a polymorphic function declared to take
 ANYARRAY on a domain-over-array really is using a portion of the base
 type's functionality.  What we've learned from bug #5717 and the
 subsequent issues is that using that base functionality without
 immediately abandoning the notion that the domain has some life of its
 own (ie, immediately casting to the base type) is harder than it looks.

 Well, in the ANYELEMENT context (or ANYARRAY), what could be lost by 
 abandoning the notion that the domain has some life of its own?

I'm starting to think that maybe we should separate the two cases after
all.  If we force a downcast for ANYARRAY matching, we will fix the loss
of functionality induced by the bug #5717 patch, and it doesn't seem
like anyone has a serious objection to that.  What to do for ANYELEMENT
seems to be a bit more controversial, and at least some of the proposals
aren't reasonable to do in 9.1 at this stage.  Maybe we should just
leave ANYELEMENT as-is for the moment, and reconsider that issue later?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding an example for replication configuration to pg_hba.conf

2011-05-24 Thread Robert Haas
On Tue, May 24, 2011 at 2:48 PM, Bruce Momjian br...@momjian.us wrote:
 Yes, the problem is the sample will change, but the $PGDATA will not, so
 anyone doing a diff of the two files to see the localized changes will
 see the changes that came in as part of that commit.

I don't think that's a serious problem.  I wouldn't want to make a
change like that in a released version, but doing it during beta seems
OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding an example for replication configuration to pg_hba.conf

2011-05-24 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, May 24, 2011 at 2:48 PM, Bruce Momjian br...@momjian.us wrote:
 Yes, the problem is the sample will change, but the $PGDATA will not, so
 anyone doing a diff of the two files to see the localized changes will
 see the changes that came in as part of that commit.

 I don't think that's a serious problem.  I wouldn't want to make a
 change like that in a released version, but doing it during beta seems
 OK.

Given that we've already forced initdb for beta2, it seems like a
complete non-issue right now, anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] inconvenient compression options in pg_basebackup

2011-05-24 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 Quick patch for verification.  I chose the naming -z/--gzip to mirror
 GNU tar.

I would argue that -Z ought to turn on gzip without my having to write
-z as well (at least when the argument is greater than zero; possibly
-Z0 should be allowed as meaning no compression).

Other than that (and the ensuing docs and help changes), looks fine.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Error compiling sepgsql in PG9.1

2011-05-24 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 2011/5/24 Kohei Kaigai kohei.kai...@emea.nec.com:
 The attached patch enables to abort configure script when we run it with 
 '--with-selinux'
 option, but libselinux is older than minimum requirement to SE-PostgreSQL.

 Looks to me like you need to adjust the wording of the error message.
 Maybe libselinux version 2.0.93 or newer is required, or something like 
 that.

Yeah.  Applied with that change.

BTW, it's not helpful to include the diff of the generated configure
script in such patches.  The committer will run autoconf for himself,
and from a readability standpoint the generated file is quite useless.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #6034: pg_upgrade fails when it should not.

2011-05-24 Thread Bruce Momjian
Robert Haas wrote:
 On Mon, May 23, 2011 at 2:57 PM, Bruce Momjian br...@momjian.us wrote:
  Robert Haas wrote:
  On Mon, May 23, 2011 at 8:26 AM, Bruce Momjian br...@momjian.us wrote:
   Sorry, I was unclear. ?The question is whether the case of _name_ of the
   locale is significant, meaning can you have two locale names that differ
   only by case and behave differently?
 
  That would seem surprising to me, but I really have no idea.
 
  There's the other direction, too: two locales that vary by something
  more than case, but still have identical behavior. ?Maybe we just
  decide not to worry about that, but then why worry about this?
 
  Well, if we remove the check then people could easily get broken
  upgrades by upgrading to a server with a different locale. ?A Google
  search seems to indicate the locale names are case-sensitive so I am
  thinking the problem is that the user didn't have exact locales, and
  needs that to use pg_upgrade.
 
 I think you misread what I wrote, or I misexplained it, but never
 mind.  Matching locale names case-insensitively sounds reasonable to
 me, unless someone has reason to believe it will blow up.

OK, that's what I needed to hear.  I have applied the attached patch,
but only to 9.1 because  of the risk of breakage. (This was only the
first bug report of this, and we aren't 100% certain about the case
issue.)

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c
new file mode 100644
index 2117b7f..60c1fbb
*** a/contrib/pg_upgrade/check.c
--- b/contrib/pg_upgrade/check.c
*** static void
*** 333,345 
  check_locale_and_encoding(ControlData *oldctrl,
  		  ControlData *newctrl)
  {
! 	if (strcmp(oldctrl-lc_collate, newctrl-lc_collate) != 0)
  		pg_log(PG_FATAL,
  			   old and new cluster lc_collate values do not match\n);
! 	if (strcmp(oldctrl-lc_ctype, newctrl-lc_ctype) != 0)
  		pg_log(PG_FATAL,
  			   old and new cluster lc_ctype values do not match\n);
! 	if (strcmp(oldctrl-encoding, newctrl-encoding) != 0)
  		pg_log(PG_FATAL,
  			   old and new cluster encoding values do not match\n);
  }
--- 333,346 
  check_locale_and_encoding(ControlData *oldctrl,
  		  ControlData *newctrl)
  {
! 	/* These are often defined with inconsistent case, so use pg_strcasecmp(). */
! 	if (pg_strcasecmp(oldctrl-lc_collate, newctrl-lc_collate) != 0)
  		pg_log(PG_FATAL,
  			   old and new cluster lc_collate values do not match\n);
! 	if (pg_strcasecmp(oldctrl-lc_ctype, newctrl-lc_ctype) != 0)
  		pg_log(PG_FATAL,
  			   old and new cluster lc_ctype values do not match\n);
! 	if (pg_strcasecmp(oldctrl-encoding, newctrl-encoding) != 0)
  		pg_log(PG_FATAL,
  			   old and new cluster encoding values do not match\n);
  }

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding an example for replication configuration to pg_hba.conf

2011-05-24 Thread Bruce Momjian
Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  On Tue, May 24, 2011 at 2:48 PM, Bruce Momjian br...@momjian.us wrote:
  Yes, the problem is the sample will change, but the $PGDATA will not, so
  anyone doing a diff of the two files to see the localized changes will
  see the changes that came in as part of that commit.
 
  I don't think that's a serious problem.  I wouldn't want to make a
  change like that in a released version, but doing it during beta seems
  OK.
 
 Given that we've already forced initdb for beta2, it seems like a
 complete non-issue right now, anyway.

Yes, agreed.  I was just pointing it out because people often don't
realize the effect this has.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] New/Revised TODO? Gathering actual read performance data for use by planner

2011-05-24 Thread Michael Nolan
In the TODO list is this item:

*Modify the planner to better estimate caching effects
*
Tom mentioned this in his presentation at PGCON, and I also chatted with Tom
about it briefly afterwards.

Based on last year's discussion of this TODO item, it seems thoughts have
been focused on estimating how much data is
being satisfied from PG's shared buffers.  However, I think that's only part
of the problem.

Specifically, read performance is going to be affected by:

1.  Reads fulfilled from shared buffers.
2.  Reads fulfilled from system cache.
3.  Reads fulfilled from disk controller cache.
4.  Reads from physical media.

#4 is further complicated by the type of physical media for that specific
block.  For example, reads that can
be fulfilled from a SSD are going to be much faster than ones that access
hard drives (or even slower types of media.)

System load is going to impact all of these as well.

Therefore, I suggest that an alternative to the above TODO may be to gather
performance data without knowing
(or more importantly without needing to know) which of the above sources
fulfilled the read.

This data would probably need to be kept separately for each table or index,
as some tables or indexes
may be mostly or fully in cache or on faster physical media than others,
although in the absence of other
data about a specific table or index, data about other relations in the same
tablespace might be of some use.

Tom mentioned that the cost of doing multiple system time-of-day calls for
each block read might be
prohibitive, it may also be that the data may also be too coarse on some
systems to be truly useful
(eg, the epoch time in seconds.)

If this data were available, that could mean that successive plans for the
same query could have
significantly different plans (and thus actual performance), based on what
has happened recently,
so these statistics would have to be relatively short term and updated
frequently, but without becoming
computational bottlenecks.

The problem is one I'm interested in working on.
--
Mike Nolan


[HACKERS] tackling full page writes

2011-05-24 Thread Robert Haas
While eating good Indian food and talking about aviation accidents on
the last night of PGCon, Greg Stark, Heikki Linnakangas, and I found
some time to brainstorm about possible ways to reduce the impact of
full_page_writes.  I'm not sure that these ideas are much good, but
for the sake of posterity:

1. Heikki suggested that instead of doing full page writes, we might
try to write only the parts of the page that have changed.  For
example, if we had 16 bits to play with in the page header (which we
don't), then we could imagine the page as being broken up into 16
512-byte chunks, one per bit.  Each time we update the page, we write
whatever subset of the 512-byte chunks we're actually modifying,
except for any that have been written since the last checkpoint.  In
more detail, when writing a WAL record, if a checkpoint has intervened
since the page LSN, then we first clear all 16 bits, reset the bits
for the chunks we're modifying, and XLOG those chunks.  If no
checkpoint has intervened, then we set the bits for any chunks that we
are modifying and for which the corresponding bits aren't yet set; and
XLOG the corresponding chunks.  As I think about it a bit more, we'd
need to XLOG not only the parts of the page we actually modifying, but
any that the WAL record would need to be correct on replay.

(It was further suggested that, in our grand tradition of bad naming,
we could name this feature partial full page writes and enable it
either with a setting of full_page_writes=partial, or better yet, add
a new GUC partial_full_page_writes.  The beauty of the latter is that
it's completely ambiguous what happens when full_page_writes=off and
partial_full_page_writes=on.  Actually, we could invert the sense and
call it disable_partial_full_page_writes instead, which would probably
remove all hope of understanding.  This all seemed completely
hilarious when we were talking about it, and we weren't even drunk.)

2. The other fairly obvious alternative is to adjust our existing WAL
record types to be idempotent - i.e. to not rely on the existing page
contents.  For XLOG_HEAP_INSERT, we currently store the target tid and
the tuple contents.  I'm not sure if there's anything else, but we
would obviously need the offset where the new tuple should be written,
which we currently infer from reading the existing page contents.  For
XLOG_HEAP_DELETE, we store just the TID of the target tuple; we would
certainly need to store its offset within the block, and maybe the
infomask.  For XLOG_HEAP_UPDATE, we'd need the old and new offsets and
perhaps also the old and new infomasks.  Assuming that's all we need
and I'm not missing anything (which I won't bet on), that means we'd
be adding, say, 4 bytes per insert or delete and 8 bytes per update.
So, if checkpoints are spread out widely enough that there will be
more than ~2K operations per page between checkpoints, then it makes
more sense to just do a full page write and call it good.  If not,
this idea might have legs.

3. Going a bit further, Greg proposed the idea of ripping out our
current WAL infrastructure altogether and instead just having one WAL
record that says these byte ranges on this page changed to have these
new contents.  That's elegantly simple, but I'm afraid it would bloat
the records quite a bit.  For example, as Heikki pointed out,
HEAP_XLOG_DELETE relies on the XID in the record header to figure out
what to write, and all the heap-modification operations implicitly
specify the visibility map change when they specify the heap change.
We currently have a flag to indicate whether the visibility map
actually requires an update, but it's just one bit.  However, one
possible application of this concept is that we could add something
like this in along with our existing WAL record types.  It might be
useful, for example, for third-party index AMs, which are currently
pretty much out of luck.

That's about as far as we got.  Though I haven't convinced anyone else
yet, I still think there's some merit to the idea of just writing the
portion of the page that precedes pd_upper.  WAL records would have to
assume that the tuple data might be clobbered, but they could rely on
the early portion of the page to be correct.  AFAICT, that would be OK
for all of the existing WAL records except for XLOG_HEAP2_CLEAN (i.e.
vacuum), with the exception that - prior to the minimum recovery point
- they'd need to apply their changes unconditionally rather than
considering the page LSN.  Tom has argued that won't work, but I'm not
sure he's convinced anyone else yet...

Anyone else have good ideas?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] about EDITOR_LINENUMBER_SWITCH

2011-05-24 Thread Peter Eisentraut
On lör, 2011-05-21 at 20:39 -0400, Robert Haas wrote:
 On Sat, May 21, 2011 at 5:47 PM, Peter Eisentraut pete...@gmx.net wrote:
  I noticed the 9.1 release notes claim that the new
  EDITOR_LINENUMBER_SWITCH thing is an environment variable, whereas it is
  actually a psql variable.

 It's probably the result of drift between the original patch and what
 was eventually committed.  IIRC, Pavel had it as an environment
 variable originally, but Tom and I didn't feel the feature was
 important enough to merit that treatment.

I think it's not really a matter of importance, it's a matter of
making things work correctly.  I have a shell configuration that sets
different environment variables, including editor, depending on what
directory I'm in.  Now I think that all the editors in question use the
+ syntax, but anyone else with something like that slightly out of the
ordinary would be stuck.  The other problem is if I change the editor
here, I have to change this other piece there.  Note that you cannot
even specify the editor itself in psqlrc.

  Another thought is that this whole thing could be done away with if we
  just allowed people to pass through arbitrary options to the editor,
  like
 
  \edit file.sql +50 -a -b -c
 
  For powerusers, this could have interesting possibilities.
 
 That's an intriguing possibility.  But part of the point of the
 original feature was to be able to say:
 
 \ef somefunc 10
 
 ...and end up on line 10 of somefunc, perhaps in response to an error
 message complaining about that line.  I don't think your proposal
 would address that.

Well, you'd write

\ef somefunc +10

instead.  Or something else, depending on the editor, but then you'd
know what to write, since under the current theory you'd have to have
configured it previously.  Using the +10 syntax also looks a bit
clearer, in my mind.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] about EDITOR_LINENUMBER_SWITCH

2011-05-24 Thread Peter Eisentraut
On sön, 2011-05-22 at 06:30 +0200, Pavel Stehule wrote:
 A idea with other options are
 interesting. More usable can be store these option inside psql
 variable (be consistent with current state). Maybe in
 EDITOR_OPTIONS ? 

There isn't really a need for that, since if you want to pass options to
your editor, you can stick them in the EDITOR variable.  The idea would
be more to pass options per occasion.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] about EDITOR_LINENUMBER_SWITCH

2011-05-24 Thread Robert Haas
On Tue, May 24, 2011 at 4:36 PM, Peter Eisentraut pete...@gmx.net wrote:
 That's an intriguing possibility.  But part of the point of the
 original feature was to be able to say:

 \ef somefunc 10

 ...and end up on line 10 of somefunc, perhaps in response to an error
 message complaining about that line.  I don't think your proposal
 would address that.

 Well, you'd write

 \ef somefunc +10

 instead.

But that would not put you on line 10 of the function.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] minor patch submission: CREATE CAST ... AS EXPLICIT

2011-05-24 Thread Peter Eisentraut
On lör, 2011-05-21 at 15:46 +0200, Fabien COELHO wrote:
 Hello,
 
 Please find attached a minor stylish patch. It compiles and the update 
 test cases work for me.
 
 Description:
 
 Add AS EXPLICIT to CREATE CAST
 
 This gives a name to the default case of CREATE CAST, which creates a 
 cast which must be explicitely invoked.
 
 From a language definition perspective, it is helpful to have a name for 
 every case instead of an implicit fallback, without any word to describe 
 it. See for instance CREATE USER CREATEDB/NOCREATEDB or CREATE RULE ... 
 DO ALSO/INSTEAD for similar occurences of naming default cases.

Oddly enough, we did add the DO ALSO syntax much later, and no one
complained about that, as far as I recall.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Latch implementation that wakes on postmaster death on both win32 and Unix

2011-05-24 Thread Peter Geoghegan
Attached is the latest revision of the latch implementation that
monitors postmaster death, plus the archiver client that now relies on
that new functionality and thereby works well without a tight
PostmasterIsAlive() polling loop.

On second thought, it is reasonable for the patch to be evaluated with
the archiver changes. Any problems that we'll have with latch changes
are likely problems that all WL_POSTMASTER_DEATH latch clients will
have, so we might as well include the simplest such client initially.
Once I have buy-in on the latch changes, the archiver work becomes
uncontroversial, I think.

The lifesign terminology has been dropped. We now close() the file
descriptor that represents ownership - the write end of our
anonymous pipe - in each child backend directly in the forking
machinery (the thin fork() wrapper for the non-EXEC_BACKEND case),
through a call to ReleasePostmasterDeathWatchHandle(). We don't have
to do that on Windows, and we don't.

I've handled the non-win32 EXEC_BACKEND case, which I understand just
exists for testing purposes. I've done the usual BackendParameters
stuff.

A ReleasePostmasterDeathWatchHandle() call is unnecessary on win32
(the function doesn't exist there - the need to call it on Unix is a
result of its implementation). I'd like to avoid having calls to it in
each auxiliary process. It should be called in a single sweet spot
that doesn't put any burden on child process authors to remember to
call it themselves.

Disappointingly, and despite a big effort, there doesn't seem to be a
way to have the win32 WaitForMultipleObjects() call wake on postmaster
death in addition to everything else in the same way that select()
does, so there are now two blocking calls, each in a thread of its own
(when the latch code is interested in postmaster death - otherwise,
it's single threaded as before).

The threading stuff (in particular, the fact that we used a named pipe
in a thread where the name of the pipe comes from the process PID) is
inspired by win32 signal emulation, src/backend/port/win32/signal.c .

You can easily observe that it works as advertised on Windows by
starting Postgres with archiving, using task manager to monitor
processes, and doing the following to the postmaster (assuming it has
a PID of 1234). This is the Windows equivalent of kill -9 :

C:\Users\Petertaskkill /pid 1234 /F

You'll see that it takes about a second for the archiver to exit. All
processes exit.

Thoughts?

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e71090f..b1d38f5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10150,7 +10150,7 @@ retry:
 	/*
 	 * Wait for more WAL to arrive, or timeout to be reached
 	 */
-	WaitLatch(XLogCtl-recoveryWakeupLatch, 500L);
+	WaitLatch(XLogCtl-recoveryWakeupLatch, WL_LATCH_SET | WL_TIMEOUT, 500L);
 	ResetLatch(XLogCtl-recoveryWakeupLatch);
 }
 else
diff --git a/src/backend/port/unix_latch.c b/src/backend/port/unix_latch.c
index 6dae7c9..c60986c 100644
--- a/src/backend/port/unix_latch.c
+++ b/src/backend/port/unix_latch.c
@@ -94,6 +94,7 @@
 
 #include miscadmin.h
 #include storage/latch.h
+#include storage/pmsignal.h
 #include storage/shmem.h
 
 /* Are we currently in WaitLatch? The signal handler would like to know. */
@@ -108,6 +109,15 @@ static void initSelfPipe(void);
 static void drainSelfPipe(void);
 static void sendSelfPipeByte(void);
 
+/* 
+ * Constants that represent which of a pair of fds given
+ * to pipe() is watched and owned in the context of 
+ * dealing with postmaster death
+ */
+#define POSTMASTER_FD_WATCH 0
+#define POSTMASTER_FD_OWN 1
+
+extern int postmaster_alive_fds[2];
 
 /*
  * Initialize a backend-local latch.
@@ -188,22 +198,22 @@ DisownLatch(volatile Latch *latch)
  * backend-local latch initialized with InitLatch, or a shared latch
  * associated with the current process by calling OwnLatch.
  *
- * Returns 'true' if the latch was set, or 'false' if timeout was reached.
+ * Returns bit field indicating which condition(s) caused the wake-up.
  */
-bool
-WaitLatch(volatile Latch *latch, long timeout)
+int
+WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
 {
-	return WaitLatchOrSocket(latch, PGINVALID_SOCKET, false, false, timeout)  0;
+	return WaitLatchOrSocket(latch, wakeEvents, PGINVALID_SOCKET, timeout);
 }
 
 /*
  * Like WaitLatch, but will also return when there's data available in
- * 'sock' for reading or writing. Returns 0 if timeout was reached,
- * 1 if the latch was set, 2 if the socket became readable or writable.
+ * 'sock' for reading or writing.
+ *
+ * Returns bit field indicating which condition(s) caused the wake-up.
  */
 int
-WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
-  bool forWrite, long timeout)
+WaitLatchOrSocket(volatile 

[HACKERS] Should partial dumps include extensions?

2011-05-24 Thread Tom Lane
There's a complaint here
http://archives.postgresql.org/pgsql-general/2011-05/msg00714.php
about the fact that 9.1 pg_dump always dumps CREATE EXTENSION commands
for all loaded extensions.  Should we change that?  A reasonable
compromise might be to suppress extensions in the same cases where we
suppress procedural languages, ie if --schema or --table was used
(see include_everything switch in pg_dump.c).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] eviscerating the parser

2011-05-24 Thread Bruce Momjian
Robert Haas wrote:
 On Sun, May 22, 2011 at 1:38 PM, Joshua Berkus j...@agliodbs.com wrote:
  Another point is that parsing overhead is quite obviously not the
  reason for the massive performance gap between one core running simple
  selects on PostgreSQL and one core running simple selects on MySQL.
  Even if I had (further) eviscerated the parser to cover only the
  syntax those queries actually use, it wasn't going to buy more than a
  couple points.
 
  I don't know if you say Jignesh's presentation, but there seems to be a lot 
  of reason to believe that we are lock-bound on large numbers of concurrent 
  read-only queries.
 
 I didn't see Jignesh's presentation, but I'd come to the same
 conclusion (with some help from Jeff Janes and others):
 
 http://archives.postgresql.org/pgsql-hackers/2010-11/msg01643.php
 http://archives.postgresql.org/pgsql-hackers/2010-11/msg01665.php
 
 We did also recently discuss how we might improve the behavior in this case:
 
 http://archives.postgresql.org/pgsql-hackers/2011-05/msg00787.php
 
 ...and ensuing discussion.
 
 However, in this case, there was only one client, so that's not the
 problem.  I don't really see how to get a big win here.  If we want to
 be 4x faster, we'd need to cut time per query by 75%.  That might
 require 75 different optimizations averaging 1% a piece, most likely
 none of them trivial.  I do confess I'm a bit confused as to why
 prepared statements help so much.  That is increasing the throughput
 by 80%, which is equivalent to decreasing time per query by 45%.  That
 is a surprisingly big number, and I'd like to better understand where
 all that time is going.

Prepared statements are pre-parsed/rewritten/planned, but I can't see
how decreasing the parser size would affect those other stages, and
certainly not 45%.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2 schedule

2011-05-24 Thread Peter Eisentraut
On mån, 2011-05-23 at 22:44 -0400, Greg Smith wrote:
 -Given that work in August is particularly difficult to line up with 
 common summer schedules around the world, having the other 1 month
 gap in the schedule go there makes sense. 

You might want to add a comment on the schedule page about the
June/July/August timing, because it looks like a typo, and the meeting
minutes are also inconsistent how they talk about June and July.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] about EDITOR_LINENUMBER_SWITCH

2011-05-24 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, May 24, 2011 at 4:36 PM, Peter Eisentraut pete...@gmx.net wrote:
 That's an intriguing possibility.  But part of the point of the
 original feature was to be able to say:
 
 \ef somefunc 10
 
 ...and end up on line 10 of somefunc, perhaps in response to an error
 message complaining about that line.  I don't think your proposal
 would address that.

 Well, you'd write
 
 \ef somefunc +10
 
 instead.

 But that would not put you on line 10 of the function.

Right.  It would also increase the cognitive load on the user to have
to remember the command-line go-to-line-number switch for his editor.
So I don't particularly want to redesign this feature.  However, I can
see the possible value of letting EDITOR_LINENUMBER_SWITCH be set from
the same place that you set EDITOR, which would suggest that we allow
the value to come from an environment variable.  I'm not sure whether
there is merit in allowing both that source and ~/.psqlrc, though
possibly for Windows users it might be easier if ~/.psqlrc worked.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] minor patch submission: CREATE CAST ... AS EXPLICIT

2011-05-24 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On lör, 2011-05-21 at 15:46 +0200, Fabien COELHO wrote:
 From a language definition perspective, it is helpful to have a name for 
 every case instead of an implicit fallback, without any word to describe 
 it. See for instance CREATE USER CREATEDB/NOCREATEDB or CREATE RULE ... 
 DO ALSO/INSTEAD for similar occurences of naming default cases.

 Oddly enough, we did add the DO ALSO syntax much later, and no one
 complained about that, as far as I recall.

Sure, but CREATE RULE is entirely locally-grown syntax, so there is no
argument from standards compliance to consider there.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] about EDITOR_LINENUMBER_SWITCH

2011-05-24 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sat, May 21, 2011 at 5:47 PM, Peter Eisentraut pete...@gmx.net wrote:
 I noticed the 9.1 release notes claim that the new
 EDITOR_LINENUMBER_SWITCH thing is an environment variable, whereas it is
 actually a psql variable.
 This is perhaps sort of a Freudian slip.

 It's probably the result of drift between the original patch and what
 was eventually committed.  IIRC, Pavel had it as an environment
 variable originally, but Tom and I didn't feel the feature was
 important enough to merit that treatment.

BTW, the above is merest historical revisionism: there was never a
version of the patch that did it that way.  AFAICS the idea started
here:
http://archives.postgresql.org/pgsql-hackers/2010-08/msg00089.php
to which you immediately asked whether it should be an environmental
variable, and I said no on what might be considered thin grounds:
http://archives.postgresql.org/pgsql-hackers/2010-08/msg00182.php

I can't see any real objection other than complexity to having it look
for a psql variable and then an environment variable.  Or we could drop
the psql variable part of that, if it seems too complicated.

Also, while we're on the subject, I'm not real sure why we don't allow
the code to provide a default value when EDITOR has a well-known value
like vi or emacs.  As long as there is a way to override that,
where's the harm in a default?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] about EDITOR_LINENUMBER_SWITCH

2011-05-24 Thread Robert Haas
On Tue, May 24, 2011 at 5:35 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sat, May 21, 2011 at 5:47 PM, Peter Eisentraut pete...@gmx.net wrote:
 I noticed the 9.1 release notes claim that the new
 EDITOR_LINENUMBER_SWITCH thing is an environment variable, whereas it is
 actually a psql variable.
 This is perhaps sort of a Freudian slip.

 It's probably the result of drift between the original patch and what
 was eventually committed.  IIRC, Pavel had it as an environment
 variable originally, but Tom and I didn't feel the feature was
 important enough to merit that treatment.

 BTW, the above is merest historical revisionism: there was never a
 version of the patch that did it that way.

Even if you were correct, that's a snarky way to put it, and the point
is trivial anyway.  But I don't think I'm imagining the getenv() call
in this version of the patch:

http://archives.postgresql.org/pgsql-hackers/2010-07/msg01253.php

 Also, while we're on the subject, I'm not real sure why we don't allow
 the code to provide a default value when EDITOR has a well-known value
 like vi or emacs.  As long as there is a way to override that,
 where's the harm in a default?

Well, the question is how many people it'll help.  Some people might
have a full pathname, others might called it vim...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cannot build docs of 9.1 on Windows

2011-05-24 Thread MauMau

Andrew,

From: Andrew Dunstan and...@dunslane.net

builddoc.bat failed on my system and reading it made my head hurt. So I
did what I've done with other bat files and rewrote it in Perl. The
result is attached. It works for me, and should be a dropin replacement.
Just put it in the src/tools/msvc directory and run perl builddoc.pl.
Please test it and if it works for you we'll use it and make
builddoc.bat a thin wrapper like build.bat and vcregress.bat.


It worked successfully! doc\src\sgml\html directory and its contents was 
created, and the HTML contents appear to be correct. Thank you very much. 
The output of perl builddoc.pl was as follows:



--
perl mk_feature_tables.pl YES 
../../../src/backend/catalog/sql_feature_packages.txt 
../../../src/backend/catalog/sql_features.txt  features-supported.sgmlperl 
mk_feature_tables.pl NO 
../../../src/backend/catalog/sql_feature_packages.txt 
../../../src/backend/catalog/sql_features.txt  features-unsupported.sgml
perl generate-errcodes-table.pl ../../../src/backend/utils/errcodes.txt  
errcodes-table.sgml

Running first build...
D:\pgdev\doctool/openjade-1.3.1/bin/openjade -V 
html-index -wall -wno-unused-param -wno-empty -D . -c 
D:\pgdev\doctool/docbook-dsssl-1.79/catalog -d stylesheet.dsl -i 
output-html -t sgml postgres.sgml 21 | findstr /V DTDDECL catalog entries 
are not supported

Running collateindex...
perl D:\pgdev\doctool/docbook-dsssl-1.79/bin/collateindex.pl -f -g -i 
bookindex -o bookindex.sgml HTML.index

Processing HTML.index...
2158 entries loaded...
0 entries ignored...
Done.
Running second build...
D:\pgdev\doctool/openjade-1.3.1/bin/openjade -wall -wno-unused-param -wno-empty 
-D . -c D:\pgdev\doctool/docbook-dsssl-1.79/catalog -d stylesheet.dsl -t 
sgml -i output-html -i include-index postgres.sgml 21 | findstr /V 
DTDDECL catalog entries are not supported

Docs build complete.
--


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] adding a new column in IDENTIFY_SYSTEM

2011-05-24 Thread Jaime Casanova
On Fri, May 20, 2011 at 12:50 PM, Magnus Hagander mag...@hagander.net wrote:

 Yes. It might be useful to note it, and then ust make an override
 flag. My pointm, though, was that doing it for walreceiver is more
 important and a more logical first step.


ok, patch attached.

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 6be5a14..2235c7f 100644
*** a/doc/src/sgml/protocol.sgml
--- b/doc/src/sgml/protocol.sgml
*** The commands accepted in walsender mode
*** 1315,1321 
  listitem
   para
Requests the server to identify itself. Server replies with a result
!   set of a single row, containing three fields:
   /para
  
   para
--- 1315,1321 
  listitem
   para
Requests the server to identify itself. Server replies with a result
!   set of a single row, containing four fields:
   /para
  
   para
*** The commands accepted in walsender mode
*** 1356,1361 
--- 1356,1372 
/para
/listitem
/varlistentry
+ 
+   varlistentry
+   term
+xlogversion
+   /term
+   listitem
+   para
+Current version of xlog page format.
+   /para
+   /listitem
+   /varlistentry
  
/variablelist
   /para
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 0831b1b..ca39654 100644
*** a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
--- b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
***
*** 21,26 
--- 21,27 
  
  #include libpq-fe.h
  #include access/xlog.h
+ #include access/xlog_internal.h
  #include miscadmin.h
  #include replication/walreceiver.h
  #include utils/builtins.h
*** libpqrcv_connect(char *conninfo, XLogRec
*** 83,88 
--- 84,90 
  	char		standby_sysid[32];
  	TimeLineID	primary_tli;
  	TimeLineID	standby_tli;
+  	uint16  primary_xlp_magic;
  	PGresult   *res;
  	char		cmd[64];
  
*** libpqrcv_connect(char *conninfo, XLogRec
*** 114,120 
  		the primary server: %s,
  		PQerrorMessage(streamConn;
  	}
! 	if (PQnfields(res) != 3 || PQntuples(res) != 1)
  	{
  		int			ntuples = PQntuples(res);
  		int			nfields = PQnfields(res);
--- 116,122 
  		the primary server: %s,
  		PQerrorMessage(streamConn;
  	}
! 	if (PQnfields(res) != 4 || PQntuples(res) != 1)
  	{
  		int			ntuples = PQntuples(res);
  		int			nfields = PQnfields(res);
*** libpqrcv_connect(char *conninfo, XLogRec
*** 127,133 
--- 129,137 
  	}
  	primary_sysid = PQgetvalue(res, 0, 0);
  	primary_tli = pg_atoi(PQgetvalue(res, 0, 1), 4, 0);
+  	primary_xlp_magic = atoi(PQgetvalue(res, 0, 2));
  
+ 	PQclear(res);
  	/*
  	 * Confirm that the system identifier of the primary is the same as ours.
  	 */
*** libpqrcv_connect(char *conninfo, XLogRec
*** 135,141 
  			 GetSystemIdentifier());
  	if (strcmp(primary_sysid, standby_sysid) != 0)
  	{
- 		PQclear(res);
  		ereport(ERROR,
  (errmsg(database system identifier differs between the primary and standby),
   errdetail(The primary's identifier is %s, the standby's identifier is %s.,
--- 139,144 
*** libpqrcv_connect(char *conninfo, XLogRec
*** 147,159 
  	 * recovery target timeline.
  	 */
  	standby_tli = GetRecoveryTargetTLI();
- 	PQclear(res);
  	if (primary_tli != standby_tli)
  		ereport(ERROR,
  (errmsg(timeline %u of the primary does not match recovery target timeline %u,
  		primary_tli, standby_tli)));
  	ThisTimeLineID = primary_tli;
  
  	/* Start streaming from the point requested by startup process */
  	snprintf(cmd, sizeof(cmd), START_REPLICATION %X/%X,
  			 startpoint.xlogid, startpoint.xrecoff);
--- 150,171 
  	 * recovery target timeline.
  	 */
  	standby_tli = GetRecoveryTargetTLI();
  	if (primary_tli != standby_tli)
  		ereport(ERROR,
  (errmsg(timeline %u of the primary does not match recovery target timeline %u,
  		primary_tli, standby_tli)));
  	ThisTimeLineID = primary_tli;
  
+ 	/*
+ 	 * Check that the primary has a compatible XLOG_PAGE_MAGIC
+ 	 */
+  	if (primary_xlp_magic != XLOG_PAGE_MAGIC)
+  	{
+  		ereport(ERROR, 
+ (errmsg(XLOG pages are not compatible between primary and standby),
+   errhint(Verify PostgreSQL versions on both, primary and standby.)));
+  	}
+  
  	/* Start streaming from the point requested by startup process */
  	snprintf(cmd, sizeof(cmd), START_REPLICATION %X/%X,
  			 startpoint.xlogid, startpoint.xrecoff);
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 470e6d1..392cf94 100644
*** a/src/backend/replication/walsender.c
--- b/src/backend/replication/walsender.c
*** IdentifySystem(void)
*** 279,289 
  	

Re: [HACKERS] SSI predicate locking on heap -- tuple or row?

2011-05-24 Thread Dan Ports
On Tue, May 24, 2011 at 04:18:37AM -0500, Kevin Grittner wrote:
 These proofs show that
 there is no legitimate cycle which could cause an anomaly which the
 move from row-based to tuple-based logic will miss.  They don't prove
 that the change will generate all the same serialization failures;
 and in fact, some false positives are eliminated by the change. 

Yes, that's correct. That's related to the part in the proof where I
claimed T3 couldn't have a conflict out *to some transaction T0 that
precedes T1*.

I originally tried to show that T3 couldn't have any conflicts out that
T2 didn't have, which would mean we got the same set of serialization
failures, but that's not true. In fact, it's not too hard to come up
with an example where there would be a serialization failure with the
row version links, but not without. However, because the rw-conflict
can't be pointing to a transaction that precedes T1 in the serial
order, it won't create a cycle. In other words, there are serialization
failures that won't happen anymore, but they were false positives.

Dan

-- 
Dan R. K. Ports  MIT CSAILhttp://drkp.net/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2 schedule

2011-05-24 Thread Greg Smith

On 05/24/2011 05:03 PM, Peter Eisentraut wrote:

On mån, 2011-05-23 at 22:44 -0400, Greg Smith wrote:
   

-Given that work in August is particularly difficult to line up with
common summer schedules around the world, having the other1 month
gap in the schedule go there makes sense.
 

You might want to add a comment on the schedule page about the
June/July/August timing, because it looks like a typo, and the meeting
minutes are also inconsistent how they talk about June and July.
   


Yes, I was planning to (and just did) circle back to the minutes to make 
everything match up.  It's now self-consistent, same dates as the 
schedule, and explains the rationale better.


I'm not sure how to address the feeling of typo you have on the schedule 
page beyond that.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] adding a new column in IDENTIFY_SYSTEM

2011-05-24 Thread Fujii Masao
On Wed, May 25, 2011 at 8:26 AM, Jaime Casanova ja...@2ndquadrant.com wrote:
 On Fri, May 20, 2011 at 12:50 PM, Magnus Hagander mag...@hagander.net wrote:

 Yes. It might be useful to note it, and then ust make an override
 flag. My pointm, though, was that doing it for walreceiver is more
 important and a more logical first step.


 ok, patch attached.

Why is the check of WAL version required for streaming replication?
As Tom said, if the version is different between two servers, the
check of system identifier fails first. No?

+   primary_xlp_magic = atoi(PQgetvalue(res, 0, 2));

You wrongly get the third field (i.e., current xlog location) as the
WAL version.
You should call PQgetvalue(res, 0, 3), instead.

 errdetail(Expected 1 tuple with 3 fields, got %d tuples with %d fields.,

You need to change the above message.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2 schedule

2011-05-24 Thread Greg Smith

On 05/24/2011 01:35 PM, Josh Berkus wrote:

I would suggest instead adding a new page to postgresql.org/developer
which lists the development schedule, rather than linking to that wiki
page.  Maybe on this page?

http://www.postgresql.org/developer/roadmap
   


Now that I look at the roadmap page again, I think all that would really 
be needed here is to tweak its wording a bit.  If the description on 
there of the link to the wiki looked like this:


General development information
  A wiki page about various aspects of the PostgreSQL development 
process, including detailed schedules and submission guidelines


I think that's enough info to keep there.  Putting more information back 
onto the main site when it can live happily on the wiki seems 
counterproductive to me; if there's concerns about things like 
vandalism, we can always lock the page.  I could understand the argument 
that it looks more professional to have it on the main site, but 
perception over function only goes so far for me.


The idea of adding a link back to the wiki from the 
https://commitfest.postgresql.org/ page would complete being able to 
navigate among the three major sites here, no matter which people 
started at.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] adding a new column in IDENTIFY_SYSTEM

2011-05-24 Thread Jaime Casanova
On Tue, May 24, 2011 at 8:52 PM, Fujii Masao masao.fu...@gmail.com wrote:

 +       primary_xlp_magic = atoi(PQgetvalue(res, 0, 2));

 You wrongly get the third field (i.e., current xlog location) as the
 WAL version.
 You should call PQgetvalue(res, 0, 3), instead.

 errdetail(Expected 1 tuple with 3 fields, got %d tuples with %d fields.,

 You need to change the above message.


Fixed.

About you comments on the check... if you read the thread, you will
find that the whole reason for the field is future improvement, but
everyone wanted some use of the field now... so i made a patch to use
it in pg_basebackup before the transfer starts and avoid time and
bandwith waste but Magnus prefer this in walreceiver...

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 6be5a14..2235c7f 100644
*** a/doc/src/sgml/protocol.sgml
--- b/doc/src/sgml/protocol.sgml
*** The commands accepted in walsender mode
*** 1315,1321 
  listitem
   para
Requests the server to identify itself. Server replies with a result
!   set of a single row, containing three fields:
   /para
  
   para
--- 1315,1321 
  listitem
   para
Requests the server to identify itself. Server replies with a result
!   set of a single row, containing four fields:
   /para
  
   para
*** The commands accepted in walsender mode
*** 1356,1361 
--- 1356,1372 
/para
/listitem
/varlistentry
+ 
+   varlistentry
+   term
+xlogversion
+   /term
+   listitem
+   para
+Current version of xlog page format.
+   /para
+   /listitem
+   /varlistentry
  
/variablelist
   /para
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 0831b1b..c3f3571 100644
*** a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
--- b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
***
*** 21,26 
--- 21,27 
  
  #include libpq-fe.h
  #include access/xlog.h
+ #include access/xlog_internal.h
  #include miscadmin.h
  #include replication/walreceiver.h
  #include utils/builtins.h
*** libpqrcv_connect(char *conninfo, XLogRec
*** 83,88 
--- 84,90 
  	char		standby_sysid[32];
  	TimeLineID	primary_tli;
  	TimeLineID	standby_tli;
+  	uint16  primary_xlp_magic;
  	PGresult   *res;
  	char		cmd[64];
  
*** libpqrcv_connect(char *conninfo, XLogRec
*** 114,120 
  		the primary server: %s,
  		PQerrorMessage(streamConn;
  	}
! 	if (PQnfields(res) != 3 || PQntuples(res) != 1)
  	{
  		int			ntuples = PQntuples(res);
  		int			nfields = PQnfields(res);
--- 116,122 
  		the primary server: %s,
  		PQerrorMessage(streamConn;
  	}
! 	if (PQnfields(res) != 4 || PQntuples(res) != 1)
  	{
  		int			ntuples = PQntuples(res);
  		int			nfields = PQnfields(res);
*** libpqrcv_connect(char *conninfo, XLogRec
*** 122,133 
  		PQclear(res);
  		ereport(ERROR,
  (errmsg(invalid response from primary server),
!  errdetail(Expected 1 tuple with 3 fields, got %d tuples with %d fields.,
  		   ntuples, nfields)));
  	}
  	primary_sysid = PQgetvalue(res, 0, 0);
  	primary_tli = pg_atoi(PQgetvalue(res, 0, 1), 4, 0);
  
  	/*
  	 * Confirm that the system identifier of the primary is the same as ours.
  	 */
--- 124,137 
  		PQclear(res);
  		ereport(ERROR,
  (errmsg(invalid response from primary server),
!  errdetail(Expected 1 tuple with 4 fields, got %d tuples with %d fields.,
  		   ntuples, nfields)));
  	}
  	primary_sysid = PQgetvalue(res, 0, 0);
  	primary_tli = pg_atoi(PQgetvalue(res, 0, 1), 4, 0);
+  	primary_xlp_magic = atoi(PQgetvalue(res, 0, 3));
  
+ 	PQclear(res);
  	/*
  	 * Confirm that the system identifier of the primary is the same as ours.
  	 */
*** libpqrcv_connect(char *conninfo, XLogRec
*** 135,141 
  			 GetSystemIdentifier());
  	if (strcmp(primary_sysid, standby_sysid) != 0)
  	{
- 		PQclear(res);
  		ereport(ERROR,
  (errmsg(database system identifier differs between the primary and standby),
   errdetail(The primary's identifier is %s, the standby's identifier is %s.,
--- 139,144 
*** libpqrcv_connect(char *conninfo, XLogRec
*** 147,159 
  	 * recovery target timeline.
  	 */
  	standby_tli = GetRecoveryTargetTLI();
- 	PQclear(res);
  	if (primary_tli != standby_tli)
  		ereport(ERROR,
  (errmsg(timeline %u of the primary does not match recovery target timeline %u,
  		primary_tli, standby_tli)));
  	ThisTimeLineID = primary_tli;
  
  	/* Start streaming from the point requested by startup process */
  	snprintf(cmd, sizeof(cmd), START_REPLICATION %X/%X,
  			 startpoint.xlogid, startpoint.xrecoff);
--- 

  1   2   >