date:20110629

Re: [HACKERS] time-delayed standbys

2011-06-29 Thread Simon Riggs

On Wed, Jun 15, 2011 at 6:58 AM, Fujii Masao masao.fu...@gmail.com wrote:

 Or, we should
 implement new promote mode which finishes a recovery as soon as
 promote is requested (i.e., not replay all the available WAL records)?

That's not a new feature. We had it in 8.4, but it was removed.

Originally, we supported fast failover via trigger file.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] time-delayed standbys

2011-06-29 Thread Simon Riggs

On Thu, Jun 16, 2011 at 7:29 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jun 15, 2011 at 1:58 AM, Fujii Masao masao.fu...@gmail.com wrote:
 When the replication connection is terminated, the standby tries to read
 WAL files from the archive. In this case, there is no walreceiver process,
 so how does the standby calculate the clock difference?

 Good question.  Also, just because we have streaming replication
 available doesn't mean that we should force people to use it.  It's
 still perfectly legit to set up a standby that only use
 archive_command and restore_command, and it would be nice if this
 feature could still work in such an environment.  I anticipate that
 most people want to use streaming replication, but a time-delayed
 standby is a good example of a case where you might decide you don't
 need it.  It could be useful to have all the WAL present (but not yet
 applied) if you're thinking you might want to promote that standby -
 but my guess is that in many cases, the time-delayed standby will be
 *in addition* to one or more regular standbys that would be the
 primary promotion candidates.  So I can see someone deciding that
 they'd rather not have the load of another walsender on the master,
 and just let the time-delayed standby read from the archive.

 Even if that were not an issue, I'm still more or less of the opinion
 that trying to solve the time synchronization problem is a rathole
 anyway.  To really solve this problem well, you're going to need the
 standby to send a message containing a timestamp, get a reply back
 from the master that contains that timestamp and a master timestamp,
 and then compute based on those two timestamps plus the reply
 timestamp the maximum and minimum possible lag between the two
 machines.  Then you're going to need to guess, based on several cycles
 of this activity, what the actual lag is, and adjust it over time (but
 not too quckly, unless of course a large manual step has occurred) as
 the clocks potentially drift apart from each other.  This is basically
 what ntpd does, except that it can be virtually guaranteed that our
 implementation will suck by comparison.  Time synchronization is
 neither easy nor our core competency, and I think trying to include it
 in this feature is going to result in a net loss of reliability.


This begs the question of why we need this feature at all, in the way proposed.

Streaming replication is designed for immediate transfer of WAL. File
based is more about storing them for some later use.

It seems strange to pollute the *immediate* transfer route with a
delay, when that is easily possible with a small patch to pg_standby
that can wait until the filetime delay is  X before returning.

The main practical problem with this is that most people's WAL
partitions aren't big enough to store the delayed WAL files, which is
why we provide the file archiving route anyway. So in practical terms
this will be unusable, or at least dangerous to use.

+1 for the feature concept, but -1 for adding this to streaming replication.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

2011-06-29 Thread Florian Pflug

On Jun29, 2011, at 05:02 , Jeff Davis wrote:
 On Tue, 2011-06-28 at 22:20 +0200, Florian Pflug wrote:
 I believe if we go that route we should make RANGEINPUT a full-blown
 type, having pair of bound semantics. Adding a lobotomized version
 just for the sake of range input feels a bit like a kludge to me.
 
 It's not a pair, because it can be made up of 0, 1, or 2 scalar values
 (unless you count infinity as one of those values, in which case 0 or
 2). And without ordering, it's not clear that those values are really
 bounds.

Hm, yeah, the lack of an ordering operator is trouble. There also seem
to be more problems with that idea, see below for that. So scratch
the idea of turning RANGEINPUT into a full-blown type...

 I don't think that having an extra type around is so bad. It solves a
 lot of problems, and doesn't seem like it would get in the way. And it's
 only for the construction of ranges out of scalars, which seems like the
 most natural place where a cast might be required (similar to casting an
 unknown literal, which is fairly common).

What I'm concerned about is how elegantly we'd be able to tie up all
the loose ends. What'd be the result of
  select range(1,2)
for example? Or
  create table (r rangeinput)
for that matter.

I think we'd want to forbid both of these, and more or less every other
use except
  range(1,2)::some range type
but that seems to require special-casing RANGEINPUT in a lot of places.

If we don't restrict RANGEINPUT that way, I think we ought to provide
at least a basic set of operators and functions for it - e.g.
input, output, lower(), upper(), ...

*Pondering this*

But we can't do that easily, since RANGEINPUT would actually be a kind of
VARIANT type (i.e. can hold values of arbitrary types). That's something
that our type system doesn't really support. We do have RECORD, which is
similar in a way, but its implementation is about as intrusive as it
gets...

 Alternatively, we could replace RANGEINPUT simply with ANYARRAY,
 and add functions ANYRANGE-ANYRANGE which allow specifying the
 bound operator (, = respectively ,=) after construction.
 
 So you'd write (using the functions-as-fields syntax I believe
 we support)
  (ARRAY[1,2]::int8range).left_open.right_closed for '(1,2]'
 and
  ARRAY[NULL,2]::int8range for '[-inf,2]'
 
 I think we can rule this one out:
 * The functions-as-fields syntax is all but deprecated (or should be)

Is it? That's actually too bad, since I kinda like it. But anyway,
if that's a concern it could also be
  range_bounds(ARRAY[1,2]::int8range, '(]')

 * That's hardly a readability improvement

Granted, it won't win any beauty contest, but

 * It still suffers similar problems as casting back and forth to text:
 ANYARRAY is too general, doesn't really take advantage of the type
 system, and not a great fit anyway.

I believe it alleviates the gravest problems of casting back and forth
to text. It doesn't have quoting issues and it doesn't potentially lose
information.

In any case, I wouldn't expect this to *stay* the only way to construct
a range forever. But I does have it's virtues for a first incarnation of
range type, I believe, mostly because it's completely unintrusive and
won't cause any backwards-compatbility headaches in the future

 All assuming that modifying the type system to support polymorphic
 type resolution based on the return type is out of the question... ;-)
 
 It's still not out of the question, but I thought that the intermediate
 type would be a less-intrusive alternative (and Robert seemed concerned
 about how intrusive it was).

I fear that the intermediate type will turn out to be quite intrusive,
at least if we try to handle all the corner cases and loose ends. And if
we don't, I'm concerned that we're painting ourselves into a corner here...

 There also might be a little more effort educating users if we selected
 the function based on the return type, because they might think that
 casting the inputs explicitly would be enough to get it to pick the
 right function. If it were a new syntax like RANGE[]::int8range, then I
 think it would be easier to understand.

There's certainly a risk of confusion here, simply because the relationship
between ANYRANGE and ANYLEMENT will be quite different than that of
ANYARRAY and ANYLEMENT. All we can do is state this very clearly in the
docs I think, and explain that it must be that way to support multiple
range types over the same base type.

best regards,
Florian Pflug






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Bug in SQL/MED?

2011-06-29 Thread Albe Laurenz

If you invoke any of the SQL/MED CREATE or ALTER commands,
the validator function is only called if an option list was given.

That means that you cannot enforce required options at object creation
time, because the validator function is not always called.
I consider that unexpected an undesirable behaviour.

Example:
CREATE EXTENSION file_fdw;
CREATE FOREIGN DATA WRAPPER file HANDLER file_fdw_handler VALIDATOR
file_fdw_validator;
CREATE SERVER file FOREIGN DATA WRAPPER file;
CREATE FOREIGN TABLE flat (id integer) SERVER file OPTIONS (format
'csv');
SELECT * FROM flat;
ERROR:  filename is required for file_fdw foreign tables

Now file_fdw does not try to enforce the filename option, but it
would be nice to be able to do so.

The attached patch would change the behaviour so that the validator
function
is always called.

If that change meets with approval, should file_fdw be changed so that
it
requires filename at table creation time?

Yours,
Laurenz Albe


fdw.patch
Description: fdw.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] time-delayed standbys

2011-06-29 Thread Robert Haas

On Wed, Jun 29, 2011 at 4:00 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Thu, Jun 16, 2011 at 7:29 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jun 15, 2011 at 1:58 AM, Fujii Masao masao.fu...@gmail.com wrote:
 When the replication connection is terminated, the standby tries to read
 WAL files from the archive. In this case, there is no walreceiver process,
 so how does the standby calculate the clock difference?

 Good question.  Also, just because we have streaming replication
 available doesn't mean that we should force people to use it.  It's
 still perfectly legit to set up a standby that only use
 archive_command and restore_command, and it would be nice if this
 feature could still work in such an environment.  I anticipate that
 most people want to use streaming replication, but a time-delayed
 standby is a good example of a case where you might decide you don't
 need it.  It could be useful to have all the WAL present (but not yet
 applied) if you're thinking you might want to promote that standby -
 but my guess is that in many cases, the time-delayed standby will be
 *in addition* to one or more regular standbys that would be the
 primary promotion candidates.  So I can see someone deciding that
 they'd rather not have the load of another walsender on the master,
 and just let the time-delayed standby read from the archive.

 Even if that were not an issue, I'm still more or less of the opinion
 that trying to solve the time synchronization problem is a rathole
 anyway.  To really solve this problem well, you're going to need the
 standby to send a message containing a timestamp, get a reply back
 from the master that contains that timestamp and a master timestamp,
 and then compute based on those two timestamps plus the reply
 timestamp the maximum and minimum possible lag between the two
 machines.  Then you're going to need to guess, based on several cycles
 of this activity, what the actual lag is, and adjust it over time (but
 not too quckly, unless of course a large manual step has occurred) as
 the clocks potentially drift apart from each other.  This is basically
 what ntpd does, except that it can be virtually guaranteed that our
 implementation will suck by comparison.  Time synchronization is
 neither easy nor our core competency, and I think trying to include it
 in this feature is going to result in a net loss of reliability.


 This begs the question of why we need this feature at all, in the way 
 proposed.

 Streaming replication is designed for immediate transfer of WAL. File
 based is more about storing them for some later use.

 It seems strange to pollute the *immediate* transfer route with a
 delay, when that is easily possible with a small patch to pg_standby
 that can wait until the filetime delay is  X before returning.

 The main practical problem with this is that most people's WAL
 partitions aren't big enough to store the delayed WAL files, which is
 why we provide the file archiving route anyway. So in practical terms
 this will be unusable, or at least dangerous to use.

 +1 for the feature concept, but -1 for adding this to streaming replication.

As implemented, the feature will work with either streaming
replication or with file-based replication.  I don't see any value in
restricting to work ONLY with file-based replication.

Also, if we were to do it by making pg_standby wait, then the whole
thing would be much less accurate, and the delay would become much
harder to predict, because you'd be operating on the level of entire
WAL segments, rather than individual commit records.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inconsistency between postgresql.conf and docs

2011-06-29 Thread Robert Haas

On Tue, Jun 28, 2011 at 11:52 PM, Josh Berkus j...@agliodbs.com wrote:
 But in the sample file, the synchronous_standby_names parameter is the
 first parameter under the heading - Streaming Replication - Server
 Settings while in the documentation, that parameter has its own
 subsection 18.5.5 after the streaming replication section 18.5.4.
 Since the rest of section 18.5.4 was more than a screenful in my
 browser, at first glance i didn't spot 18.5.5 and was confused.

 He is correct.  So, my question is, should the docs change, or should
 postgresql.conf.sample change?

Another thing that's a bit strange there is that most of the
section-header comments in postgresql.conf say:

# - Section Name -

i.e. they begin and end with a dash.  Whereas that one for some reason says:

# - Streaming Replication - Server Settings

And probably should just say:

# - Streaming Replication -

I don't have a strong feeling on whether or not we should put that
setting in its own section.  Right now, we only have one setting for
synchronous replication, so I guess maybe it depends on if we think
there will be more in the future.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

2011-06-29 Thread Robert Haas

On Tue, Jun 28, 2011 at 11:02 PM, Jeff Davis pg...@j-davis.com wrote:
 It's still not out of the question, but I thought that the intermediate
 type would be a less-intrusive alternative (and Robert seemed concerned
 about how intrusive it was).

I'm no great fan of our existing type system, and I'm not opposed to
trying to improve it.  However, I'm a bit wary of the theory that we
can just tweak X, Y, or Z and then everything will go more smoothly
for range types.  I fear that there will be knock-on consequences that
we'll spend a lot of time either (a) arguing about or (b) fixing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Process local hint bit cache

2011-06-29 Thread Merlin Moncure

On Tue, Jun 28, 2011 at 10:03 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Apr 7, 2011 at 2:49 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Thu, Apr 7, 2011 at 1:28 PM, Merlin Moncure mmonc...@gmail.com wrote:
                        int ByteOffset = xid / BITS_PER_BYTE;

 whoops, I just notice this was wrong -- the byte offset needs to be
 taking bucket into account.  I need to clean this up some more
 obviously, but the issues at play remain the same

 So, where is the latest version of this patch?

https://commitfest.postgresql.org/action/patch_view?id=553

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Avoid index rebuilds for no-rewrite ALTER TABLE ALTER TYPE

2011-06-29 Thread Robert Haas

On Tue, Jun 28, 2011 at 3:40 PM, Noah Misch n...@leadboat.com wrote:
 On Tue, Jun 28, 2011 at 02:11:11PM -0400, Robert Haas wrote:
 On Mon, Jun 27, 2011 at 10:43 PM, Noah Misch n...@leadboat.com wrote:
  On Mon, Jun 27, 2011 at 03:45:43PM -0400, Robert Haas wrote:
  On Wed, Jun 15, 2011 at 1:03 AM, Noah Misch n...@leadboat.com wrote:
   [patch to avoid index rebuilds]
 
  With respect to the documentation hunks, it seems to me that the first
  hunk might be made clearer by leaving the paragraph of which it is a
  part as-is, and adding another paragraph afterwards beginning with the
  words In addition.
 
  The added restriction elaborates on the transitivity requirement, so I 
  wanted to
  keep the new language adjacent to that.

 That makes sense, but it reads a bit awkwardly to me.  Maybe it's just
 that the sentence itself is so complicated that I have difficulty
 understanding it.  I guess it's the same problem as with the text you
 added about hash indexes: without thinking about it, it's hard to
 understand what territory is covered by the new sentence that is not
 covered by what's already there.  In the case of the hash indexes, I
 think I have it figured out: we already know that we must get
 compatible hash values out of logically equal values of different
 datatypes.  But it's possible that the inter-type cast changes the
 value in some arbitrary way and then compensates for it by defining
 the hash function in such a way as to compensate.  Similarly, for
 btrees, we need the relative ordering of A and B to remain the same
 after casting within the opfamily, not to be rearranged somehow.
 Maybe the text you've got is OK to explain this, but I wonder if
 there's a way to do it more simply.

 That's basically right, I think.  Presently, there is no connection between
 casts and operator family notions of equality.  For example, a cast can change
 the hash value.  In general, that's not wrong.  However, I wish to forbid it
 when some hash operator family covers both the source and destination types of
 the cast.  Note that, I don't especially care whether the stored bits changed 
 or
 not.  I just want casts to preserve equality when an operator family defines
 equality across the types involved in the cast.  The specific way of
 articulating that is probably vulnerable to improvement.

  It would be valuable to avoid introducing a second chunk of code that knows
  everything about the catalog entries behind an index. ?That's what led me 
  to the
  put forward the most recent version as best. ?What do you find vile about 
  that
  approach? ?I wasn't comfortable with it at first, because I suspected the 
  checks
  in RelationPreserveStorage() might be important for correctness. ?Having 
  studied
  it some more, though, I think they just reflect the narrower needs of its
  current sole user.

 Maybe vile is a strong word, but it seems like a modularity violation:
 we're basically letting DefineIndex() do some stuff we don't really
 want done, and then backing it out parts of it that we don't really
 want done after all.  It seems like it would be better to provide
 DefineIndex with enough information not to do the wrong thing in the
 first place.  Could we maybe pass stmt-oldNode to DefineIndex(), and
 let it work out the details?

 True.  I initially shied away from that, because we assume somewhat deep into
 the stack that the new relation will have pg_class.oid = pg_class.relfilenode.
 Here's the call stack in question:

        RelationBuildLocalRelation
        heap_create
        index_create
        DefineIndex
        ATExecAddIndex

 Looking at it again, it wouldn't bring the end of the world to add a 
 relfilenode
 argument to each. None of those have more than four callers.

Yeah.  Those functions take an awful lot of arguments, which suggests
that some refactoring might be in order, but I still think it's
cleaner to add another argument than to change the state around
after-the-fact.

 ATExecAddIndex()
 would then call RelationPreserveStorage() before calling DefineIndex(), which
 would in turn put things in a correct state from the start.  Does that seem
 appropriate?  Offhand, I do like it better than what I had.

I wish we could avoid the whole death-and-resurrection thing
altogether, but off-hand I'm not seeing a real clean way to do that.
At the very least we had better comment it to death.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Process local hint bit cache

2011-06-29 Thread Robert Haas

On Wed, Jun 29, 2011 at 9:39 AM, Merlin Moncure mmonc...@gmail.com wrote:
 On Tue, Jun 28, 2011 at 10:03 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Apr 7, 2011 at 2:49 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Thu, Apr 7, 2011 at 1:28 PM, Merlin Moncure mmonc...@gmail.com wrote:
                        int ByteOffset = xid / BITS_PER_BYTE;

 whoops, I just notice this was wrong -- the byte offset needs to be
 taking bucket into account.  I need to clean this up some more
 obviously, but the issues at play remain the same

 So, where is the latest version of this patch?

 https://commitfest.postgresql.org/action/patch_view?id=553

Oh, shoot.  When I looked at this last night, I thought that link went
somewhere else.  Woops.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parameterized aggregate subquery (was: Pull up aggregate subquery)

2011-06-29 Thread Yeb Havinga



On 2011-06-17 09:54, Hitoshi Harada wrote:

While reviewing the gist/box patch, I found some planner APIs that can
replace parts in my patch. Also, comments in includes wasn't updated
appropriately. Revised patch attached.


Hello Hitoshi-san,

I read your latest patch implementing parameterizing subquery scans.

1)
In the email from june 9 with the patch You wrote: While IndexScan
is simple since its information like costs are well known by the base
relation, SubqueryScan should re-plan its Query to gain that, which is
expensive.

Initial concerns I had were caused by misinterpreting 'replanning' as: 
for each outer tuple, replan the subquery (it sounds a bit like 
'ReScan'). Though the general comments in the patch are helpful, I think 
it would be an improvement to describe why subqueries are planned more 
than once, i.e. something like

best_inner_subqueryscan
   Try to find a better subqueryscan path and its associated plan for 
each join clause that can be pushed down, in addition to the path that 
is already calculated by set_subquery_pathlist.


The consequences of having multiple subquery paths and plans for each 
new subquery path makes the bulk of the remainder of the patch.


2)
Since 'subquery_is_pushdown_safe' is invariant under join clause push 
down, it might be possible to have it called only once in 
set_subquery_pathlist, i.e. by only attaching rel-preprocessed_subquery 
if the subquery_is_pushdown_safe.


3)
/*
 * set_subquery_pathlist
 *Build the (single) access path for a subquery RTE
 */
This unchanged comment is still correct, but 'the (single) access path' 
might have become a bit misleading now there are multiple paths 
possible, though not by set_subquery_pathlist.



4) somewhere in the patch
s/regsitered/registered/

5) Regression tests are missing; I think at this point they'd aid in 
speeding up development/test cycles.


6) Before patching Postgres, I could execute the test with the size_l 
and size_m tables, after patching against current git HEAD (patch 
without errors), I get the following error when running the example query:

ERROR:  plan should not reference subplan's variable

I get the same error with the version from june 9 with current git HEAD.

7) Since both set_subquery_pathlist and best_inner_subqueryscan push 
down clauses, as well as add a path and subplan, could a generalized 
function be made to support both, to reduce duplicate code?


regards,
Yeb Havinga

--
Yeb Havinga
http://www.mgrid.net/
Mastering Medical Data


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Process local hint bit cache

2011-06-29 Thread Simon Riggs

On Sat, Apr 2, 2011 at 8:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Merlin Moncure mmonc...@gmail.com writes:

 aside:
 Moving TransactionIdInProgress below TransactionIdDidCommit can help
 in once sense: TransactionIdDidCommit grabs the XidStatus but discards
 the knowledge if the transaction is known aborted.

 Doesn't the single-entry cache help with that?

Yes, it does.

Merlin,

I'm a little worried by some of the statements around this patch. It
would be good to get a single clear analysis, then build the patch
from that.

* TransactionIdDidCommit grabs XidStatus and then keeps it - if the
xact is known aborted or known committed.
* In the top of the patch you mention that It's tempting to try and
expand the simple last transaction status in
transam.c but this check happens too late in the critical visibility
routines to provide much benefit.  For the fastest possible cache
performance with the least amount of impact on scan heavy cases, you have
to maintain the cache here if you want to avoid dirtying the page
based on interaction with the cache.

We call TransactionIdIsKnownCompleted() before we touch shared memory
in TransactionIdIsInProgress(), which seems early enough for me.
Maybe I've misunderstood your comments.

Also, the reason we set the hint bits is (in my understanding) so that
other users will avoid having to do clog lookups. If we only cache xid
status locally, we would actually generate more clog lookups, not
less. I realise that reduces I/O, so it seems to be a straight
tradeoff between I/O and clog access.

I think we need to be clear what the goals of this are - different
people may have different tuning goals - perhaps we need a parameter
for that because in different situations either one might make sense.

For me, it makes sense for us to set hint bits on already-dirty pages
when they are written out by the bgwriter. At the moment we do very
little to amortise the I/O that must already happen.

You could easily get frustrated here and lose interest. I hope not.
This is interesting stuff and you are well qualified to find a
solution. I just want to be sure we define exactly what the problem
is, and publish all of the analysis first. Writing the final patch is
probably the easiest part of that task.

Anyway, I don't think this patch is ready for commit this time around,
but good hunting. There is some treasure to be found here, I'm
certain.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Branch refs/heads/REL9_1_STABLE was removed

2011-06-29 Thread Magnus Hagander

On Tue, Jun 28, 2011 at 19:45, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Tom Lane's message of mar jun 28 10:39:22 -0400 2011:

 I think it would be sensible to block branch removal, as there's
 basically never a scenario where we'd do that during current usage.
 I'm not excited about blocking branch addition, although I worry
 sooner or later somebody will accidentally push a private development
 branch :-(.  Is it possible to block only removal and not addition?

 If we can tweak the thing, how about we only allow creating branches
 that match a certain pattern, say ^REL_\d+_\d+_STABLE$?

I've put this in place - except I used ^REL\d+... and not what you
suggested, since that's how we name our branches :P

I'm going to push an actual valid branch, let's say 9.7, and then
remove it again, just to make sure things worked (with it installed I
cannot push an invalid branch, so i can't test the branch removal
block). So don't panic if you see that one :)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Branch refs/heads/REL9_1_STABLE was removed

2011-06-29 Thread Magnus Hagander

On Wed, Jun 29, 2011 at 16:30, Magnus Hagander mag...@hagander.net wrote:
 On Tue, Jun 28, 2011 at 19:45, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
 Excerpts from Tom Lane's message of mar jun 28 10:39:22 -0400 2011:

 I think it would be sensible to block branch removal, as there's
 basically never a scenario where we'd do that during current usage.
 I'm not excited about blocking branch addition, although I worry
 sooner or later somebody will accidentally push a private development
 branch :-(.  Is it possible to block only removal and not addition?

 If we can tweak the thing, how about we only allow creating branches
 that match a certain pattern, say ^REL_\d+_\d+_STABLE$?

 I've put this in place - except I used ^REL\d+... and not what you
 suggested, since that's how we name our branches :P

 I'm going to push an actual valid branch, let's say 9.7, and then
 remove it again, just to make sure things worked (with it installed I
 cannot push an invalid branch, so i can't test the branch removal
 block). So don't panic if you see that one :)

Ok, I'm done experimenting and this is now in production.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Branch refs/heads/REL9_1_STABLE was removed

2011-06-29 Thread Alvaro Herrera

Excerpts from Magnus Hagander's message of mié jun 29 10:30:51 -0400 2011:
 On Tue, Jun 28, 2011 at 19:45, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  Excerpts from Tom Lane's message of mar jun 28 10:39:22 -0400 2011:
 
  I think it would be sensible to block branch removal, as there's
  basically never a scenario where we'd do that during current usage.
  I'm not excited about blocking branch addition, although I worry
  sooner or later somebody will accidentally push a private development
  branch :-(.  Is it possible to block only removal and not addition?
 
  If we can tweak the thing, how about we only allow creating branches
  that match a certain pattern, say ^REL_\d+_\d+_STABLE$?
 
 I've put this in place - except I used ^REL\d+... and not what you
 suggested, since that's how we name our branches :P

Hah!  It took me a while to notice the difference :-D

Thanks!  I know I feel safer with this in place :-P

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Process local hint bit cache

2011-06-29 Thread Merlin Moncure

On Wed, Jun 29, 2011 at 9:09 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Sat, Apr 2, 2011 at 8:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Merlin Moncure mmonc...@gmail.com writes:

 aside:
 Moving TransactionIdInProgress below TransactionIdDidCommit can help
 in once sense: TransactionIdDidCommit grabs the XidStatus but discards
 the knowledge if the transaction is known aborted.

 Doesn't the single-entry cache help with that?

 Yes, it does.

correct.  that line of analysis was refuted both in terms of benefit
and on structural grounds by Tom.

 I'm a little worried by some of the statements around this patch. It
 would be good to get a single clear analysis, then build the patch
 from that.

 * TransactionIdDidCommit grabs XidStatus and then keeps it - if the
 xact is known aborted or known committed.
 * In the top of the patch you mention that It's tempting to try and
 expand the simple last transaction status in
    transam.c but this check happens too late in the critical visibility
    routines to provide much benefit.  For the fastest possible cache
    performance with the least amount of impact on scan heavy cases, you have
    to maintain the cache here if you want to avoid dirtying the page
    based on interaction with the cache.

 We call TransactionIdIsKnownCompleted() before we touch shared memory
 in TransactionIdIsInProgress(), which seems early enough for me.
 Maybe I've misunderstood your comments.

This is correct.  If you are scanning tuples all with the same
transaction, the transam.c cache will prevent some of the worst
problems.  However, going through all the mechanics in the transam.c
covered case is still fairly expensive.  You have to call several
non-inlined functions, including XLogNeedsFlush(), over and over.
This is more expensive than it looks and a transam.c cache
implementation is hamstrung out of the gate if you are trying avoid
i/o...it isn't really all that much cheaper than jumping out to the
slru and will penalize repetitive scans (I can prove this with
benchmarks).  Bear in mind I started out with the intent of working in
transam.c, and dropped it when it turned out not to work.

 Also, the reason we set the hint bits is (in my understanding) so that
 other users will avoid having to do clog lookups. If we only cache xid
 status locally, we would actually generate more clog lookups, not
 less. I realise that reduces I/O, so it seems to be a straight
 tradeoff between I/O and clog access.

That is not correct.   Any cache 'miss' on a page continues to fall
through to SetHintBitsCache() which dirties the page as it always has.
 This is a key innovation the patch IMNSHO.  Your worst case is the
old behavior -- the maximum number of times the fault to clog can
happen for a page is once.  It's impossible to even get one more clog
access than you did with stock postgres.

 For me, it makes sense for us to set hint bits on already-dirty pages
 when they are written out by the bgwriter. At the moment we do very
 little to amortise the I/O that must already happen.

I could agree with this.  However if the bgwriter is forcing out pages
before the hint bits can be set you have a problem.  Also adding more
work to the bgwriter may cause other secondary performance issues.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Process local hint bit cache

2011-06-29 Thread Merlin Moncure

On Wed, Jun 29, 2011 at 10:09 AM, Merlin Moncure mmonc...@gmail.com wrote:
 That is not correct.   Any cache 'miss' on a page continues to fall
 through to SetHintBitsCache() which dirties the page as it always has.

er, SetHintBits(), not SetHintBitsCache()

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] default privileges wording

2011-06-29 Thread Andrew Dunstan



I was just reading the docs on default privileges, and they say this:

   Depending on the type of object, the initial default privileges
   might include granting some privileges to PUBLIC. The default is no
   public access for tables, columns, schemas, and tablespaces; CONNECT
   privilege and TEMP table creation privilege for databases; EXECUTE
   privilege for functions; and USAGE privilege for languages. The
   object owner can of course revoke these privileges.


I had to read it several times before I understood it properly, so I'm 
not terribly happy with it. I'm thinking of revising it slightly like this:


   Depending on the type of object, the initial default privileges
   might include granting some privileges to PUBLIC, including CONNECT
   privilege and TEMP table creation privilege for databases, EXECUTE
   privilege for functions, and USAGE privilege for languages. For
   tables, columns, schemas and tablespaces the default is no public
   access. The object owner can of course revoke any default PUBLIC
   privileges.

That seems clearer to me, but maybe other people can make it clearer still.

Comments?

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

2011-06-29 Thread Jeff Davis

On Wed, 2011-06-29 at 13:35 +0200, Florian Pflug wrote:
 What I'm concerned about is how elegantly we'd be able to tie up all
 the loose ends. What'd be the result of
   select range(1,2)
 for example? Or
   create table (r rangeinput)
 for that matter.
 
 I think we'd want to forbid both of these, and more or less every other
 use except
   range(1,2)::some range type
 but that seems to require special-casing RANGEINPUT in a lot of places.

We could make it a pseudo-type and make the IO functions generate
exceptions. That should prevent most mistakes and effectively hide it
from the user (sure, they could probably use it somewhere if they really
want to, but I wouldn't be worried about breaking backwards
compatibility with undocumented usage like that). There are plenty of
types that are hidden from users in one way or another -- trigger, void,
internal, fdw_handler, etc., so I don't see this as special-casing at
all.

That might make it slightly harder to document, but I think it can be
done. All we have to do is document the range constructors saying you
must cast the result to a valid range type; trying to use the result of
these functions directly raises an exception. In fact, I think I'll
take back the hard to document claim from before: it will be pretty
easy to document, and if someone gets it wrong, we can throw a helpful
error and hint.

Robert didn't really seem to like the idea of throwing an error though
-- Robert, can you expand on your reasoning here?

I tend to lean toward throwing an error as well, but I don't really have
much of an opinion.

 If we don't restrict RANGEINPUT that way, I think we ought to provide
 at least a basic set of operators and functions for it - e.g.
 input, output, lower(), upper(), ...
 
 *Pondering this*
 
 But we can't do that easily, since RANGEINPUT would actually be a kind of
 VARIANT type (i.e. can hold values of arbitrary types). That's something
 that our type system doesn't really support. We do have RECORD, which is
 similar in a way, but its implementation is about as intrusive as it
 gets...

I don't want to go down the road of making this a fully supported type.
I don't see any use case for it at all, and I think it's a bad idea to
design something with no idea how people might want to use it.

 Is it? That's actually too bad, since I kinda like it. But anyway,
 if that's a concern it could also be
   range_bounds(ARRAY[1,2]::int8range, '(]')

What type would the result of that be? What value?

  * It still suffers similar problems as casting back and forth to text:
  ANYARRAY is too general, doesn't really take advantage of the type
  system, and not a great fit anyway.
 
 I believe it alleviates the gravest problems of casting back and forth
 to text. It doesn't have quoting issues and it doesn't potentially lose
 information.

I think it still circumvents the type system to a degree. We're just
putting stuff in an array with no intention of really using it that way.

 In any case, I wouldn't expect this to *stay* the only way to construct
 a range forever. But I does have it's virtues for a first incarnation of
 range type, I believe, mostly because it's completely unintrusive and
 won't cause any backwards-compatbility headaches in the future

I'm not sure that your overloading of arrays is completely immune from
backwards-compatibility problems, should we decide to change it later.

But regardless, we have quite a lot of time to make a decision before
9.2 is released; so let's do it once and do it right.

 I fear that the intermediate type will turn out to be quite intrusive,
 at least if we try to handle all the corner cases and loose ends. And if
 we don't, I'm concerned that we're painting ourselves into a corner here...

Can you expand on some of the corner-cases and loose ends you're
concerned about? Does marking it as a pseudotype and making the IO
functions throw exceptions handle them?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

2011-06-29 Thread Jeff Davis

On Wed, 2011-06-29 at 08:52 -0400, Robert Haas wrote:
 On Tue, Jun 28, 2011 at 11:02 PM, Jeff Davis pg...@j-davis.com wrote:
  It's still not out of the question, but I thought that the intermediate
  type would be a less-intrusive alternative (and Robert seemed concerned
  about how intrusive it was).
 
 I'm no great fan of our existing type system, and I'm not opposed to
 trying to improve it.  However, I'm a bit wary of the theory that we
 can just tweak X, Y, or Z and then everything will go more smoothly
 for range types.  I fear that there will be knock-on consequences that
 we'll spend a lot of time either (a) arguing about or (b) fixing.

Agreed.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] default privileges wording

2011-06-29 Thread Alvaro Herrera

Excerpts from Andrew Dunstan's message of mié jun 29 11:21:12 -0400 2011:
 
 I was just reading the docs on default privileges, and they say this:
 
 Depending on the type of object, the initial default privileges
 might include granting some privileges to PUBLIC. The default is no
 public access for tables, columns, schemas, and tablespaces; CONNECT
 privilege and TEMP table creation privilege for databases; EXECUTE
 privilege for functions; and USAGE privilege for languages. The
 object owner can of course revoke these privileges.
 
 
 I had to read it several times before I understood it properly, so I'm 
 not terribly happy with it. I'm thinking of revising it slightly like this:
 
 Depending on the type of object, the initial default privileges
 might include granting some privileges to PUBLIC, including CONNECT
 privilege and TEMP table creation privilege for databases, EXECUTE
 privilege for functions, and USAGE privilege for languages. For
 tables, columns, schemas and tablespaces the default is no public
 access. The object owner can of course revoke any default PUBLIC
 privileges.

Some types of objects [have/include/grant] no privileges to PUBLIC by
default.  These are tables, columns, schemas and tablespaces.  For other
types, the default privileges granted to PUBLIC are as follows: CONNECT
privilege and TEMP table creation privilege for databases; EXECUTE
privilege for functions; and USAGE privilege for languages.  The object
owner can, of course, revoke [these/any default] privileges.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.2] Fix leaky-view problem, part 1

2011-06-29 Thread Kohei KaiGai

2011/6/28 Noah Misch n...@leadboat.com:
 On Tue, Jun 28, 2011 at 10:11:59PM +0200, Kohei KaiGai wrote:
 2011/6/28 Noah Misch n...@leadboat.com:
  Suppose your query references two views owned by different roles. ?The 
  quals
  of those views will have the same depth. ?Is there a way for information to
  leak from one view owner to another due to that?
 
 Even if multiple subqueries were pulled-up and qualifiers got merged, user 
 given
 qualifiers shall have smaller depth value, so it will be always
 launched after the
 qualifiers originally used in the subqueries.

 Of course, it is another topic in the case when the view is originally
 defined with
 leaky functions.

 Right.  I was thinking of a pair of quals, one in each of two view 
 definitions.
 The views are mutually-untrusting.  Something like this:

 CREATE VIEW a AS SELECT * FROM ta WHERE ac = 5;
 ALTER VIEW a OWNER TO alice;
 CREATE VIEW b AS SELECT * FROM tb WHERE bc = 6;
 ALTER VIEW b OWNER TO bob;
 SELECT * FROM a, b;

 Both the ac=5 and the bc=6 quals do run at the same depth despite enforcing
 security for different principals.  I can't think of a way that one view owner
 could use this situation to subvert the security of the other view owner, but 
 I
 wanted to throw it out.

Even if view owner set a trap in his view, we have no way to reference variables
come from outside of the view. In above example, even if I added f_leak() into
the definition of VIEW A, we cannot give argument to reference VIEW B.

 I was referring to this paragraph:

  On the technical side, I am pretty doubtful that the approach of adding a
  nestlevel to FuncExpr and RelOptInfo is the right way to go.  I believe we
  have existing code (to handle left joins) that prevents quals from being
  pushed down too far by fudging the set of relations that are supposedly 
 needed
  to evaluate the qual.  I suspect a similar approach would work here.

It seems to me the later half of this paragraph is talking about the problem of
unexpected qualifier pushing-down over the security barrier; I'm trying to solve
the problem with the part.2 patch.
The scenario the part.1 patch tries to solve is order to launch qualifiers, not
unexpected pushing-down.

As long as we focus on the ordering problem, it is reasonable approach to
track original nestlevel to enforce to launch qualifier come from deeper level
earlier. Because it makes performance damages, if we prevent pull-up
subqueries come from security view.

For example, if we cannot pull-up this subquery, we will need to scan the
relation twice.
  SELECT * FROM (SELECT * FROM tbl WHERE f_policy(x)) WHERE f_leak(y);

Normally, it should be pulled-up into the following form:
  SELECT * FROM tbl WHERE f_policy(x) AND f_leak(y);

 In addition, implementation will become complex, if both of qualifiers 
 pulled-up
 from security barrier view and qualifiers pulled-up from regular views are 
 mixed
 within a single qualifier list.

 I only scanned the part 2 patch, but isn't the bookkeeping already happening 
 for
 its purposes?  How much more complexity would we get to apply the same 
 strategy
 to the behavior of this patch?

If conditional, what criteria we should have to reorder the quelifier?
The current patch checks the depth at first, then it checks cost if same deptn.
It is quite simple rule. I have no idea of the criteria to order the
mixed qualifier
come from security-barrier views and regular views.

  On Mon, Jun 06, 2011 at 01:37:11PM +0100, Kohei Kaigai wrote:
  --- a/src/backend/optimizer/plan/planner.c
  +++ b/src/backend/optimizer/plan/planner.c
 
  + ? ? else if (IsA(node, Query))
  + ? ? {
  + ? ? ? ? ? ? depth += 2;
 
  Why two?
 
 This patch is a groundwork for the upcoming row-level security feature.
 In the case when the backend appends security policy functions, it should
 be launched prior to user given qualifiers. So, I intends to give odd depth
 numbers for these functions to have higher priorities to other one.
 Of course, 1 might work well right now.

 I'd say it should either be 1 until such time as that's needed, or it needs a
 comment noting why it's 2.

OK, I'll add comment to introduce why the depth is incremented by 2.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hint bit cache v6

2011-06-29 Thread Robert Haas

On Fri, May 13, 2011 at 3:48 PM, Merlin Moncure mmonc...@gmail.com wrote:
 what's changed:
 *) as advertised, i'm no longer bothering to cache invalid bits.  hint
 bit i/o via rollbacked transactions is not a big deal IMNSHO, so they
 will work as they have always done.
 *) all the tuple visibility routines are now participating in the cache
 *) xmax commit bits are now being cached, not just xmin.  this
 prevents hint bit i/o following deletes.
 *) juggled the cache interaction a bit so the changes to the
 visibility routines could be a lot more surgical. specifically, I
 reverted SetHintBits() to return void and put the cache adjustment
 inside 'SetHintBitsCache()' which sets the bits, dirties, and adjusts
 the cache.  SetHintBitsNonDirty() only sets the bits without dirtying
 the page, and is called when you get a cache hit.
 *) various minor cleanups, decided to keep pageindex as type
 TransactionId since that's what clog does.
 *) made a proper init function, put cache data into
 CacheMemoryContext, and moved the cache data pointer references into
 the page structure

 performance testing done:
 *) select only pgbench and standard i/o pgbech tests don't show
 performance difference within statistical noise.

 The test I need to do, a cache and clog thrashing test, I haven't
 found a clean way to put together yet.  I'm really skeptical that any
 problems will show up there. That said, I am satisfied the patch does
 what it is supposed to do -- eliminates hint bit i/o without for cases
 where it is a big headache without penalizing other cases.  Anyone
 that would like to comment on the technique or other points of the
 patch please do so (otherwise it's time for the 'fest).

OK, so I read through this patch.  I agree with Simon's comment on the
other thread that it's probably not ready for commit right at the
moment, but also want to offer a few other thoughts.

The patch fails to conform to our coding style in a variety of ways,
but I don't want to get bogged down in that at this point.  The first
question we need to answer here is: What is this patch doing?  And do
we want that?

With regards to the first question, it seems to me that the patch is
doing two separate but related things.

1. It allocates four 8kB pages in backend-local memory, each of which
can store one bit for each XID in a 64K range.  The bit is set if the
XID is known to be committed.  If we find this bit set, then we
needn't consult CLOG.  This reduces CLOG traffic, at the cost of
potentially doing more work in the case where the cache misses.  Every
so often, we reconsider which XID ranges should be represented in our
cache and consider replacing an existing page in the cache with some
other one.

2. When we probe the cache and hit, we set the hint bit on the tuple
*without dirtying the page*.  Thus, the hint bit will only get written
back to disk if the page is dirtied for some other reason.  This will
frequently reduce the amount of write I/O generated by large table
scans, at the cost of possibly generating additional work for other
backends who try to read this data later.

Now, the interesting thing about the patch is that the downsides of #2
are considerably mitigated by #1.  If we assume for the sake of
argument that the backend-local cache is really, really fast, and that
the cache replacement policy is sound, then one is just as good as the
other, and the only time we need the hint bits is when the cache
overflows, which just so happens to be exactly the patch forces them
to be written out to disk.  That's pretty clever.  The exact way this
is implemented is a little bit grotty, but I feel like it can work.
So I'm inclined to think that, at 10,000 feet, if we can convince
ourselves that the basic idea of sticking a cache in here is OK, then
the additional refinement of declining to dirty the page when the
cache, rather than CLOG, tell us that the XID is committed is probably
OK too.

With respect to the cache itself, I think the part that concerns me
most is the cache replacement algorithm.  It's obvious that straight
LRU can't work here, since that could easily result in horrible
thrashing behavior.  But it's not clear to me that the actual
algorithm, which considers evicting a page after every 100 misses, is
all that great either.  Each page stores 64K transaction IDs, and
after being in cache for a while, you might have 25% or 50% of those
bits set, which is quite an asset.  Now you hit a run of 100 tuples
with some XID not represented in that cache and you chuck that page
out the window and replace it with a page that has just that one bit
set.  Now you hit another run of 100 tuples with XIDs that would have
hit the old page and flip it back; but now you've lost all those
valuable bits you had set.  I'm not sure how likely that is to happen
in real life, but it certainly seems like it could be a bit painful if
it did.  The cache replacement algorithm is not cheap.

Rather than just fiddling with the thresholds, it

Re: [HACKERS] patch: Allow \dd to show constraint comments

2011-06-29 Thread Merlin Moncure

On Thu, May 19, 2011 at 9:36 PM, Josh Kupershmidt schmi...@gmail.com wrote:
 On Thu, May 19, 2011 at 10:26 AM, Robert Haas robertmh...@gmail.com wrote:
 At the risk of opening a can of worms, if we're going to fix \dd,
 shouldn't we fix it completely, and include comments on ALL the object
 types that can have them?  IIRC it's missing a bunch, not just
 constraints.

 You opened this can up, so I hope you like worms ;-) Let's break down
 the objects that the COMMENT docs say one can comment on. [Disclaimer:
 It's likely I've made some mistakes in this list, data was gleaned
 mostly from perusing describe.c]

 1.) Comments displayed by \dd:

  TABLE
  AGGREGATE
  OPERATOR
  RULE
  FUNCTION
  INDEX
  SEQUENCE
  TRIGGER
  TYPE
  VIEW

 2.) Comments displayed in the backslash commands for the object:

  AGGREGATE                     \da
  COLUMN                        \d+ tablename
  COLLATION                     \dO
  DATABASE                      \l+
  EXTENSION                     \dx
  FUNCTION                      \df+
  FOREIGN TABLE                 \d+ tablename (I think?)
  LARGE OBJECT                  \dl
  ROLE                          \dg+
  SCHEMA                        \dn+
  TABLESPACE                    \db+
  TYPE                          \dT
  TEXT SEARCH CONFIGURATION     \dF
  TEXT SEARCH DICTIONARY        \dFd
  TEXT SEARCH PARSER            \dFp
  TEXT SEARCH TEMPLATE          \dFt

 3.) Objects for which we don't display the comments anywhere:
  CAST
  CONSTRAINT (addressed by this patch)
  CONVERSION
  DOMAIN
  PROCEDURAL LANGUAGE

 4.) My eyes are starting to glaze over, and I'm too lazy to figure out
 how or if comments for these objects are handled:
  FOREIGN DATA WRAPPER
  OPERATOR CLASS
  OPERATOR FAMILY
  SERVER

 Another thought is that I wonder if it's really useful to have a
 backslash commands that dumps out comments on many different object
 types.

 ISTM that \dd is best suited as a command to show the comments for
 objects for which we don't have an individual backslash command, or
 objects for which it's not practical to show the comment in the
 backslash command.

 In some cases, e.g. \db+, we include the description for the
 object in the output of the backslash command that lists objects just
 of that type, which seems like a better design.

 I agree that's the way to go for objects where such display is
 feasible (the backslash command produces columnar output, and we can
 just stick in a comment column).

 I wouldn't want to try to pollute, say, \d+ with comments about
 tables, rules, triggers, etc. But for the few objects displayed by
 both \dd and the object's respective backslash command (aggregates,
 types, and functions), I would be in favor of ripping out the \dd
 display.

 Of course we have no
 backslash command for constraints anyway

 Precisely, and I think there's a solid argument for putting
 constraints into bucket 1 above, as this patch does, since there's no
 good room to display constraint comments inside \d+, and there's no
 backslash command specifically for constraints.

 I was kind of hoping to avoid dealing with this can of worms with this
 simple patch, which by itself seems uncontroversial. If there's
 consensus that \dd and the other backslash commands need further
 reworking, I can probably devote a little more time to this. But let's
 not have the perfect be the enemy of the good.

Patch applies clean, does what it is supposed to do, and matches other
conventions in describe.c  Passing to committer.   pg_comments may be
a better way to go, but that is a problem for another day...

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

2011-06-29 Thread Robert Haas

On Wed, Jun 29, 2011 at 11:41 AM, Jeff Davis pg...@j-davis.com wrote:
 Robert didn't really seem to like the idea of throwing an error though
 -- Robert, can you expand on your reasoning here?

I guess I don't have any terribly well-thought out reasoning - maybe
it's fine.  It just seems strange to have a type that you can't
display.

But now that I'm thinking about this a little more, I'm worried about this case:

CREATE TABLE foo AS RANGE('something'::funkytype, 'somethingelse'::funktype);
DROP TYPE funkytype;

It seems to me that the first statement had better fail, or else the
second one is going to create a hopeless mess (imagine that a new type
comes along and gets the OID of funkytype).

It also seems a bit strange to me that we're contemplating a system
where users are always going to have to cast the return type.
Generally, casts are annoying and we want to minimize the need for
them.  I'm not sure what the alternative is, though, unless we create
separate constructor functions for each type: int8range_cc(1, 2).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inconsistency between postgresql.conf and docs

2011-06-29 Thread Josh Berkus


 I don't have a strong feeling on whether or not we should put that
 setting in its own section.  Right now, we only have one setting for
 synchronous replication, so I guess maybe it depends on if we think
 there will be more in the future.

I believe there will be more in the future.   However, given that the
replication section isn't exactly overpopulated, I think we could
consolidate.

My preference would be to have:

# REPLICATION

# - Master Settings -
# these settings affect the master role in replication
# they will be ignored on the standby

... settings ...

# - Standby Settings -
# these settings affect the standby role in replication
# they will be ignored on the master

... settings ...


That's how I've been setting up the file for my customers; it's fairly
clear and understandable.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: starting to review the Extend NOT NULL representation to pg_constraint patch

2011-06-29 Thread Alvaro Herrera

Excerpts from Robert Haas's message of lun jun 27 10:35:59 -0400 2011:
 On Mon, Jun 27, 2011 at 3:08 AM, Dean Rasheed dean.a.rash...@gmail.com 
 wrote:

  I would summarise the consistency requirements as:
 
  1). ADD CONSTRAINT should leave both parent and child tables in the
  same state as they would have been if the constraint had been defined
  at table creation time.
 
  2). DROP CONSTRAINT should leave both parent and child tables in the
  same state as if the constraint had never existed (completely
  reversing the effects of ADD CONSTRAINT).
 
  I don't have a strong opinion as to whether or not the NOT NULL part
  of a PK should be inherited, provided that it is consistent with the
  above.
 
  I guess that if I were forced to choose, I would say that the NOT NULL
  part of a PK should not be inherited, since I do think of it as part
  of the PK, and PKs are not inherited.
 
 OK, I see your point, and I agree with you.

Interesting.  This whole thing requires quite a bit of rejiggering in
the initial transformation phase, I think, but yeah, I see the points
here and I will see to them.  Does this mean that NOT NULL PRIMARY KEY
now behaves differently?  I think it does , because if you drop the PK
then the field needs to continue being not null.

And here I was thinking that this was just a quick job to enable NOT
VALID constraints ...

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

2011-06-29 Thread David E. Wheeler

On Jun 28, 2011, at 8:02 PM, Jeff Davis wrote:

 I think David Wheeler was trying to make a similar point, but I'm still
 not convinced.
 
 It's not a pair, because it can be made up of 0, 1, or 2 scalar values
 (unless you count infinity as one of those values, in which case 0 or
 2). And without ordering, it's not clear that those values are really
 bounds.
 
 The type needs to:
 * represent two values, either of which might be a special infinite
 value
 * represent the value empty
 * represent inclusivity/exclusivity of both values
 
 and those things seem fairly specific to ranges, so I don't really see
 what other use we'd have for such a type. But I'm open to suggestion.
 
 I don't think that having an extra type around is so bad. It solves a
 lot of problems, and doesn't seem like it would get in the way. And it's
 only for the construction of ranges out of scalars, which seems like the
 most natural place where a cast might be required (similar to casting an
 unknown literal, which is fairly common).

I'm fine with that, but my point is that if it's going to be exposed to users 
somehow, it needs to be useful on its own, without casting. Because some wit 
will make a column of this type. If it's not somehow useful on its own, then it 
should be an implementation detail or internal that I never see in SQL. IMHO.

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

2011-06-29 Thread David E. Wheeler

On Jun 29, 2011, at 8:41 AM, Jeff Davis wrote:

 We could make it a pseudo-type and make the IO functions generate
 exceptions. That should prevent most mistakes and effectively hide it
 from the user (sure, they could probably use it somewhere if they really
 want to, but I wouldn't be worried about breaking backwards
 compatibility with undocumented usage like that). There are plenty of
 types that are hidden from users in one way or another -- trigger, void,
 internal, fdw_handler, etc., so I don't see this as special-casing at
 all.

That could work.

 I don't want to go down the road of making this a fully supported type.
 I don't see any use case for it at all, and I think it's a bad idea to
 design something with no idea how people might want to use it.

+1

I'm still not clear, though, on why the return type of range() should not be 
related to the types of its arguments. So

range(1, 5)

Should return intrange, and

range(1::int8, 5::int8)

Should return int8range, and

range('foo', 'f')

Should return textrange.

Best,

David
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: starting to review the Extend NOT NULL representation to pg_constraint patch

2011-06-29 Thread Robert Haas

On Wed, Jun 29, 2011 at 12:51 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Robert Haas's message of lun jun 27 10:35:59 -0400 2011:
 On Mon, Jun 27, 2011 at 3:08 AM, Dean Rasheed dean.a.rash...@gmail.com 
 wrote:

  I would summarise the consistency requirements as:
 
  1). ADD CONSTRAINT should leave both parent and child tables in the
  same state as they would have been if the constraint had been defined
  at table creation time.
 
  2). DROP CONSTRAINT should leave both parent and child tables in the
  same state as if the constraint had never existed (completely
  reversing the effects of ADD CONSTRAINT).
 
  I don't have a strong opinion as to whether or not the NOT NULL part
  of a PK should be inherited, provided that it is consistent with the
  above.
 
  I guess that if I were forced to choose, I would say that the NOT NULL
  part of a PK should not be inherited, since I do think of it as part
  of the PK, and PKs are not inherited.

 OK, I see your point, and I agree with you.

 Interesting.  This whole thing requires quite a bit of rejiggering in
 the initial transformation phase, I think, but yeah, I see the points
 here and I will see to them.  Does this mean that NOT NULL PRIMARY KEY
 now behaves differently?  I think it does , because if you drop the PK
 then the field needs to continue being not null.

Yeah, I think an implicit not-null because you made it a primary key
is now different from one that you write out.

 And here I was thinking that this was just a quick job to enable NOT
 VALID constraints ...

Bwahaha.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

2011-06-29 Thread Florian Pflug

On Jun29, 2011, at 18:34 , Robert Haas wrote:
 It also seems a bit strange to me that we're contemplating a system
 where users are always going to have to cast the return type.
 Generally, casts are annoying and we want to minimize the need for
 them.  I'm not sure what the alternative is, though, unless we create
 separate constructor functions for each type: int8range_cc(1, 2).

Well, if we want multiple range types per base type (which we do), then
the user needs to specify which one to use somehow. A cast seems the most
natural way to do that to me - after all, casting is *the* way to coerce
value to a certain type.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

2011-06-29 Thread Florian Pflug

On Jun29, 2011, at 19:05 , David E. Wheeler wrote:
 I'm still not clear, though, on why the return type of range()
 should not be related to the types of its arguments. So
 
range(1, 5)
 
 Should return intrange, and
 
range(1::int8, 5::int8)
 
 Should return int8range, and
 
range('foo', 'f')
 
 Should return textrange.

Because there might be more than one range type for a
base type. Say there are two range types over text, one
with collation 'de_DE' and one with collation 'en_US'.
What would the type of
  range('foo', 'f')
be?

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

2011-06-29 Thread David E. Wheeler

On Jun 29, 2011, at 10:13 AM, Florian Pflug wrote:

 Because there might be more than one range type for a
 base type. Say there are two range types over text, one
 with collation 'de_DE' and one with collation 'en_US'.
 What would the type of
  range('foo', 'f')
 be?

The one that corresponds to the current LC_COLLATE setting.

Best,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] default privileges wording

2011-06-29 Thread David Fetter

On Wed, Jun 29, 2011 at 11:50:38AM -0400, Alvaro Herrera wrote:
 Excerpts from Andrew Dunstan's message of mié jun 29 11:21:12 -0400 2011:
  
  I was just reading the docs on default privileges, and they say this:
  
  Depending on the type of object, the initial default privileges
  might include granting some privileges to PUBLIC. The default is no
  public access for tables, columns, schemas, and tablespaces; CONNECT
  privilege and TEMP table creation privilege for databases; EXECUTE
  privilege for functions; and USAGE privilege for languages. The
  object owner can of course revoke these privileges.
  
  
  I had to read it several times before I understood it properly, so I'm 
  not terribly happy with it. I'm thinking of revising it slightly like this:
  
  Depending on the type of object, the initial default privileges
  might include granting some privileges to PUBLIC, including CONNECT
  privilege and TEMP table creation privilege for databases, EXECUTE
  privilege for functions, and USAGE privilege for languages. For
  tables, columns, schemas and tablespaces the default is no public
  access. The object owner can of course revoke any default PUBLIC
  privileges.
 
 Some types of objects [have/include/grant] no privileges to PUBLIC by
 default.  These are tables, columns, schemas and tablespaces.  For other
 types, the default privileges granted to PUBLIC are as follows: CONNECT
 privilege and TEMP table creation privilege for databases; EXECUTE
 privilege for functions; and USAGE privilege for languages.  The object
 owner can, of course, revoke [these/any default] privileges.

How about this?

Some types of objects deny all privileges to PUBLIC by default.  These
are tables, columns, schemas and tablespaces.  For other types, the
default privileges granted to PUBLIC are as follows: CONNECT privilege
and TEMP table creation privilege for databases; EXECUTE privilege for
functions; and USAGE privilege for languages.  The object owner can,
of course, revoke both default and expressly granted privileges.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parameterized aggregate subquery (was: Pull up aggregate subquery)

2011-06-29 Thread Hitoshi Harada

2011/6/29 Yeb Havinga yebhavi...@gmail.com:

 On 2011-06-17 09:54, Hitoshi Harada wrote:

 While reviewing the gist/box patch, I found some planner APIs that can
 replace parts in my patch. Also, comments in includes wasn't updated
 appropriately. Revised patch attached.

 Hello Hitoshi-san,

Hi Yeb,

 I read your latest patch implementing parameterizing subquery scans.

 6) Before patching Postgres, I could execute the test with the size_l and
 size_m tables, after patching against current git HEAD (patch without
 errors), I get the following error when running the example query:
 ERROR:  plan should not reference subplan's variable

 I get the same error with the version from june 9 with current git HEAD.

It seems that something has changed since I developed the patch at
first. The last one and the one before are not so different with each
other, especially in terms of runtime but might not be tested enough.
I tried time-slip of the local git branch to around june 10, but the
same error occurs. The error message itself is not in parts changed
recently, so I guess some surrounding change would affect it.

 7) Since both set_subquery_pathlist and best_inner_subqueryscan push down
 clauses, as well as add a path and subplan, could a generalized function be
 made to support both, to reduce duplicate code?

I tried it but decided as it is, for the future extensibility. The
subquery parameterizing will (can) be extended more complicatedly. I'm
not sure if we'd better gathering some small parts into one by
throwing future capability.

Other things are all good points. Thanks for elaborate review!
More than anything, I'm going to fix the 6) issue, at least to find the cause.

Regards,
-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Patch Review: Bugfix for XPATH() if text or attribute nodes are selected

2011-06-29 Thread Radosław Smogura

Review of patch 
https://commitfest.postgresql.org/action/patch_view?id=580


=== Patch description ===
SUMMARY: When text() based XPATH expression is invoked output is not XML 
escaped


DESCRIPTION: Submitter invokes following statement:
SELECT (XPATH('/*/text()', 'rootlt;/root'))[1].
He expect (escaped) result lt;, but gets 

AFFECTS: Possible this may affects situations when user wants to insert output 
from above expression to XML column.


PATCH CONTENT:
A. 1. Patch fixes above problem (I don't treat this like problem, but like 
enhancement).

A. 2. In addition patch contains test cases for above.
A. 3. Patch changes behaviour of method xml_xmlnodetoxmltype invoked by 
xpath_internal, by adding escape_xml() call.

A. 4. I don't see any stability issues with this.
A. 5. Performance may be reduced and memory consumption may 
increase due to internals of method escape_xml


=== Review ===
B. 1. Patch applies cleanly.

B. 2. Source compiles, and patch works as Submitter wants.

B. 3. Personally I missed some notes in documentation that such 
expression will be escaped (those should be clearly exposed, as the live 
functionality is changed).


B. 4. Submitter, possible, revealed some missed, minor functionality of 
PostgreSQL XML. As he expects XML escaped output.


B. 5. Currently XPATH produces escaped output for Element nodes, and 
not escaped output for all other types of nodes (including text, 
comment, etc.)


B. 6. Current behaviour _is intended_ (there is if  to check node type) 
and _natural_. In this particular case user ask for text content of some 
node, and this content is actually .


B. 7. Similar behaviour may be observer e. g. in SQL Server(tm)
SELECT x.value('(//text())[1]', 'varchar(256)') FROM #xml_t;
Produced 

B. 8. Even if current behaviour may be treated as wrong it was exposed 
and other may depends on not escaped content.


B. 9. I think, correct approach should go to create new function 
(basing on one existing) that will be able to escape above. In this 
situation

call should look like (for example only!):
SELECT xml_escape((XPATH('/*/text()', 'rootlt;/root')))[1]
or
SELECT xml_escape((XPATH('/*/text()', 'rootlt;/root'))[1])
One method to operate on array one to operate on single XML datum.
Or to think about something like xmltext(). (Compare current status of xml.c)

B. 10. If such function (B.9.) is needed and if it will be included is out of 
scope of this review.


Basing mainly on A.1, B.6., B.8., bearing in mind B.10., in my opinion this is 
subject to reject as need more work, or as invalid.


The detailed explanation why such behaviour should not be implemented I will 
send in review of https://commitfest.postgresql.org/action/patch_view?id=565.


Regards,

Radosław Smogura

P. S.
I would like to say sorry, for such late answaer, but I sent this from other 
mail address, which was not attached to mailing list. Blame KDE KMail not me 
:)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] default privileges wording

2011-06-29 Thread Robert Haas

On Wed, Jun 29, 2011 at 1:20 PM, David Fetter da...@fetter.org wrote:
 On Wed, Jun 29, 2011 at 11:50:38AM -0400, Alvaro Herrera wrote:
 Excerpts from Andrew Dunstan's message of mié jun 29 11:21:12 -0400 2011:
 
  I was just reading the docs on default privileges, and they say this:
 
      Depending on the type of object, the initial default privileges
      might include granting some privileges to PUBLIC. The default is no
      public access for tables, columns, schemas, and tablespaces; CONNECT
      privilege and TEMP table creation privilege for databases; EXECUTE
      privilege for functions; and USAGE privilege for languages. The
      object owner can of course revoke these privileges.
 
 
  I had to read it several times before I understood it properly, so I'm
  not terribly happy with it. I'm thinking of revising it slightly like this:
 
      Depending on the type of object, the initial default privileges
      might include granting some privileges to PUBLIC, including CONNECT
      privilege and TEMP table creation privilege for databases, EXECUTE
      privilege for functions, and USAGE privilege for languages. For
      tables, columns, schemas and tablespaces the default is no public
      access. The object owner can of course revoke any default PUBLIC
      privileges.

 Some types of objects [have/include/grant] no privileges to PUBLIC by
 default.  These are tables, columns, schemas and tablespaces.  For other
 types, the default privileges granted to PUBLIC are as follows: CONNECT
 privilege and TEMP table creation privilege for databases; EXECUTE
 privilege for functions; and USAGE privilege for languages.  The object
 owner can, of course, revoke [these/any default] privileges.

 How about this?

 Some types of objects deny all privileges to PUBLIC by default.  These
 are tables, columns, schemas and tablespaces.  For other types, the
 default privileges granted to PUBLIC are as follows: CONNECT privilege
 and TEMP table creation privilege for databases; EXECUTE privilege for
 functions; and USAGE privilege for languages.  The object owner can,
 of course, revoke both default and expressly granted privileges.

Or, since I find the use of the word deny a bit unclear:

When a table, column, schema, or tablespace is created, no privileges
are granted to PUBLIC.  But for other objects, some privileges will be
granted to PUBLIC automatically at the time the object is created:
CONNECT privilege and TEMP table creation privilege for database, ...
etc., the rest as you have it

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] time-delayed standbys

2011-06-29 Thread Simon Riggs

On Wed, Jun 29, 2011 at 1:24 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jun 29, 2011 at 4:00 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Thu, Jun 16, 2011 at 7:29 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jun 15, 2011 at 1:58 AM, Fujii Masao masao.fu...@gmail.com wrote:
 When the replication connection is terminated, the standby tries to read
 WAL files from the archive. In this case, there is no walreceiver process,
 so how does the standby calculate the clock difference?

 Good question.  Also, just because we have streaming replication
 available doesn't mean that we should force people to use it.  It's
 still perfectly legit to set up a standby that only use
 archive_command and restore_command, and it would be nice if this
 feature could still work in such an environment.  I anticipate that
 most people want to use streaming replication, but a time-delayed
 standby is a good example of a case where you might decide you don't
 need it.  It could be useful to have all the WAL present (but not yet
 applied) if you're thinking you might want to promote that standby -
 but my guess is that in many cases, the time-delayed standby will be
 *in addition* to one or more regular standbys that would be the
 primary promotion candidates.  So I can see someone deciding that
 they'd rather not have the load of another walsender on the master,
 and just let the time-delayed standby read from the archive.

 Even if that were not an issue, I'm still more or less of the opinion
 that trying to solve the time synchronization problem is a rathole
 anyway.  To really solve this problem well, you're going to need the
 standby to send a message containing a timestamp, get a reply back
 from the master that contains that timestamp and a master timestamp,
 and then compute based on those two timestamps plus the reply
 timestamp the maximum and minimum possible lag between the two
 machines.  Then you're going to need to guess, based on several cycles
 of this activity, what the actual lag is, and adjust it over time (but
 not too quckly, unless of course a large manual step has occurred) as
 the clocks potentially drift apart from each other.  This is basically
 what ntpd does, except that it can be virtually guaranteed that our
 implementation will suck by comparison.  Time synchronization is
 neither easy nor our core competency, and I think trying to include it
 in this feature is going to result in a net loss of reliability.


 This begs the question of why we need this feature at all, in the way 
 proposed.

 Streaming replication is designed for immediate transfer of WAL. File
 based is more about storing them for some later use.

 It seems strange to pollute the *immediate* transfer route with a
 delay, when that is easily possible with a small patch to pg_standby
 that can wait until the filetime delay is  X before returning.

 The main practical problem with this is that most people's WAL
 partitions aren't big enough to store the delayed WAL files, which is
 why we provide the file archiving route anyway. So in practical terms
 this will be unusable, or at least dangerous to use.

 +1 for the feature concept, but -1 for adding this to streaming replication.

 As implemented, the feature will work with either streaming
 replication or with file-based replication.

That sounds like the exact opposite of yours and Fujii's comments
above. Please explain.

 I don't see any value in
 restricting to work ONLY with file-based replication.

As explained above, it won't work in practice because of the amount of
file space required.

Or, an alternative question: what will you do when it waits so long
that the standby runs out of disk space?

If you hard-enforce the time delay specified then you just make
replication fail under during heavy loads.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Review of patch Bugfix for XPATH() if expression returns a scalar value

2011-06-29 Thread Radosław Smogura

This is review of patch
https://commitfest.postgresql.org/action/patch_view?id=565
Bugfix for XPATH() if expression returns a scalar value

Patch applies cleanly, and compiles cleanly too, I didn't checked tests.

Form discussion about patch, and referenced thread in this patch 
http://archives.postgresql.org/pgsql-general/2010-07/msg00355.php, if I 
understand good such functionality is desired.

This patch, I think, gives good approach for dealing with scalar 
values, 
and, as well converting value to it's string representation is good too 
(according to current support for xml), with one exception detailed below.

In this patch submitter, similarly to 
https://commitfest.postgresql.org/action/patch_view?id=580, added 
functionality for XML-escaping of some kind of values (I think only string 
scalars are escaped), which is not-natural and may lead to double escaping in 
some situation, example query may be:

SELECT XMLELEMENT(name root, XMLATTRIBUTES(foo.namespace AS sth)) FROM (SELECT 
(XPATH('namespace-uri(/*)', x))[1] AS namespace FROM (VALUES (XMLELEMENT(name 
root, XMLATTRIBUTES('n' AS xmlns, 'v' AS value),'t'))) v(x)) as foo;

   xmlelement
-
 root sth=amp;lt;n/
 
It's clearly visible that value from attribute is lt;n, not . 
Every 
parser will read this as lt;n which is not-natural and will require form 
consumer/developer to de-escape this on his side - roughly speaking this will 
be reported as serious bug.
I didn't found good reasons why XML-escaping should be included, 
submitter 
wrote about inserting this to xml column and possibility of data damage, but 
didn't give an example. Such example is desired.

Conclusion
I vote +1 for this patch if escaping will be removed.

Regards,
Radoslaw Smogura

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] time-delayed standbys

2011-06-29 Thread Robert Haas

On Wed, Jun 29, 2011 at 1:50 PM, Simon Riggs si...@2ndquadrant.com wrote:
 As implemented, the feature will work with either streaming
 replication or with file-based replication.

 That sounds like the exact opposite of yours and Fujii's comments
 above. Please explain.

I think our comments above were addressing the issue of whether it's
feasible to correct for time skew between the master and the slave.
Tom was arguing that we should try, but I was arguing that any system
we put together is likely to be pretty unreliable (since good time
synchronization algorithms are quite complex, and to my knowledge no
one here is an expert on implementing them, nor do I think we want
that much complexity in the backend) and Fujii was pointing out that
it won't work at all if the WAL files are going through the archive
rather than through streaming replication, which (if I understand you
correctly) will be a more common case than I had assumed.

 I don't see any value in
 restricting to work ONLY with file-based replication.

 As explained above, it won't work in practice because of the amount of
 file space required.

I guess it depends on how busy your system is and how much disk space
you have.  If using streaming replication causes pg_xlog to fill up on
your standby, then you can either (1) put pg_xlog on a larger file
system or (2) configure only restore_command and not primary_conninfo,
so that only the archive is used.

 Or, an alternative question: what will you do when it waits so long
 that the standby runs out of disk space?

I don't really see how that's any different from what happens now.  If
(for whatever reason) the master is generating WAL faster than a
streaming standby can replay it, then the excess WAL is going to pile
up someplace, and you might run out of disk space.   Time-delaying the
standby creates an additional way for that to happen, but I don't
think it's an entirely new problem.

I am not sure exactly how walreceiver handles it if the disk is full.
I assume it craps out and eventually retries, so probably what will
happen is that, after the standby's pg_xlog directory fills up,
walreceiver will sit there and error out until replay advances enough
to remove a WAL file and thus permit some more data to be streamed.
If the standby gets far enough behind the master that the required
files are no longer there, then it will switch to the archive, if
available.  It might be nice to have a mode that only allows streaming
replication when the amount of disk space on the standby is greater
than or equal to some threshold, but that seems like a topic for
another patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Patch file questions?

2011-06-29 Thread Casey Havenor

I'm obviously new.   But making great progress in PostgreSQL with my new
application...

Setup:
 
I'm running on MAC. 
Postgre 9.0.4 
Virtual Machine with application dev in Linux. 

Problem:
 
I like many other have come across the inherit issues. 

I found the thread here about such issue... 
http://postgresql.1045698.n5.nabble.com/FK-s-to-refer-to-rows-in-inheritance-child-td3287684.html
 

I grabbed the fk_inheritance.v1.patch file and have been trying to install
it for the last two hours. 

Where am I suppose to put this file? 
When it asks me for the file to patch what should I input? 
What else am I missing? 

Tore through Google looking for some example and everything I found or tried
didn't work? 

Warmest regards, 

Casey Havenor


--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Patch-file-questions-tp4536025p4536025.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch file questions?

2011-06-29 Thread Alvaro Herrera

Excerpts from Casey Havenor's message of mié jun 29 14:24:11 -0400 2011:

 Problem:
  
 I like many other have come across the inherit issues. 
 
 I found the thread here about such issue... 
 http://postgresql.1045698.n5.nabble.com/FK-s-to-refer-to-rows-in-inheritance-child-td3287684.html
  
 
 I grabbed the fk_inheritance.v1.patch file and have been trying to install
 it for the last two hours. 

I suggest you do not run a patched version of Postgres.  The reason this
patch hasn't been included is that it doesn't completely solve the
problem, it might cause others because it hasn't been fully tested, and
to make matters worse, you are on your own if you run a system with it
because we cannot support you if you have weird problems.

Particularly so if you're new to Postgres.  Doubly so if you're new to
patching source code.

My advice would be to use some other design that does not involve FKs on
inheritance hierarchies.  (Yes, partitioning becomes a hard problem).

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hint bit cache v6

2011-06-29 Thread Merlin Moncure

On Wed, Jun 29, 2011 at 11:11 AM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, May 13, 2011 at 3:48 PM, Merlin Moncure mmonc...@gmail.com wrote:
 what's changed:
 *) as advertised, i'm no longer bothering to cache invalid bits.  hint
 bit i/o via rollbacked transactions is not a big deal IMNSHO, so they
 will work as they have always done.
 *) all the tuple visibility routines are now participating in the cache
 *) xmax commit bits are now being cached, not just xmin.  this
 prevents hint bit i/o following deletes.
 *) juggled the cache interaction a bit so the changes to the
 visibility routines could be a lot more surgical. specifically, I
 reverted SetHintBits() to return void and put the cache adjustment
 inside 'SetHintBitsCache()' which sets the bits, dirties, and adjusts
 the cache.  SetHintBitsNonDirty() only sets the bits without dirtying
 the page, and is called when you get a cache hit.
 *) various minor cleanups, decided to keep pageindex as type
 TransactionId since that's what clog does.
 *) made a proper init function, put cache data into
 CacheMemoryContext, and moved the cache data pointer references into
 the page structure

 performance testing done:
 *) select only pgbench and standard i/o pgbech tests don't show
 performance difference within statistical noise.

 The test I need to do, a cache and clog thrashing test, I haven't
 found a clean way to put together yet.  I'm really skeptical that any
 problems will show up there. That said, I am satisfied the patch does
 what it is supposed to do -- eliminates hint bit i/o without for cases
 where it is a big headache without penalizing other cases.  Anyone
 that would like to comment on the technique or other points of the
 patch please do so (otherwise it's time for the 'fest).

 OK, so I read through this patch.  I agree with Simon's comment on the
 other thread that it's probably not ready for commit right at the
 moment, but also want to offer a few other thoughts.

 The patch fails to conform to our coding style in a variety of ways,
 but I don't want to get bogged down in that at this point.  The first
 question we need to answer here is: What is this patch doing?  And do
 we want that?

Well, thanks for the review!  Considering feedback I'll pull it from
the current 'fest and see if it can be brought up to muster.

 With regards to the first question, it seems to me that the patch is
 doing two separate but related things.

 1. It allocates four 8kB pages in backend-local memory, each of which
 can store one bit for each XID in a 64K range.  The bit is set if the
 XID is known to be committed.  If we find this bit set, then we
 needn't consult CLOG.  This reduces CLOG traffic, at the cost of
 potentially doing more work in the case where the cache misses.  Every
 so often, we reconsider which XID ranges should be represented in our
 cache and consider replacing an existing page in the cache with some
 other one.

 2. When we probe the cache and hit, we set the hint bit on the tuple
 *without dirtying the page*.  Thus, the hint bit will only get written
 back to disk if the page is dirtied for some other reason.  This will
 frequently reduce the amount of write I/O generated by large table
 scans, at the cost of possibly generating additional work for other
 backends who try to read this data later.

Right -- exactly.  If you work through all the cases the main price to
pay is the overhead of the cache itself.

 Now, the interesting thing about the patch is that the downsides of #2
 are considerably mitigated by #1.  If we assume for the sake of
 argument that the backend-local cache is really, really fast, and that
 the cache replacement policy is sound, then one is just as good as the
 other, and the only time we need the hint bits is when the cache
 overflows, which just so happens to be exactly the patch forces them
 to be written out to disk.  That's pretty clever.  The exact way this
 is implemented is a little bit grotty, but I feel like it can work.
 So I'm inclined to think that, at 10,000 feet, if we can convince
 ourselves that the basic idea of sticking a cache in here is OK, then
 the additional refinement of declining to dirty the page when the
 cache, rather than CLOG, tell us that the XID is committed is probably
 OK too.

 With respect to the cache itself, I think the part that concerns me
 most is the cache replacement algorithm.  It's obvious that straight
 LRU can't work here, since that could easily result in horrible
 thrashing behavior.  But it's not clear to me that the actual
 algorithm, which considers evicting a page after every 100 misses, is
 all that great either.  Each page stores 64K transaction IDs, and
 after being in cache for a while, you might have 25% or 50% of those
 bits set, which is quite an asset.  Now you hit a run of 100 tuples
 with some XID not represented in that cache and you chuck that page
 out the window and replace it with a page that has just that one bit
 set.  Now you

Re: [HACKERS] SSI modularity questions

2011-06-29 Thread Heikki Linnakangas


On 29.06.2011 00:33, Kevin Grittner wrote:

Heikki Linnakangas  wrote:

On 28.06.2011 20:47, Kevin Grittner wrote:



Hmm, the calls in question are the ones in heapgettup() and
heapgettup_pagemode(), which are subroutines of heap_getnext().
heap_getnext() is only used in sequential scans, so it seems safe
to remove those calls.


I haven't found anything to the contrary, if I understand correctly,
Dan found the same, and all the tests pass without them.  Here's a
patch to remove them.  This makes the recently-added
rs_relpredicatelocked boolean field unnecessary, so that's removed in
this patch, too.


Thanks, committed. I also moved the PredicateLockRelation() call to 
heap_beginscan(), per earlier discussion.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Repeated PredicateLockRelation calls during seqscan

2011-06-29 Thread Heikki Linnakangas


On 26.06.2011 23:49, Kevin Grittner wrote:

Kevin Grittner  wrote:

Kevin Grittner wrote:

Heikki Linnakangas wrote:



BTW, isn't bitgetpage() in nodeBitmapHeapscan.c missing
PredicateLockTuple() and CheckForSerializableConflictOut() calls
in the codepath for a lossy bitmap? In the non-lossy case,
heap_hot_search_buffer() takes care of it, but not in the lossy
case.


I think the attached addresses that.


Don't commit that patch, it's not holding up in testing here.

I'll look at it some more.


Version 2 is attached.


Thanks, applied this too.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI modularity questions

2011-06-29 Thread Kevin Grittner

Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote:
 On 29.06.2011 00:33, Kevin Grittner wrote:
 Heikki Linnakangas  wrote:
 On 28.06.2011 20:47, Kevin Grittner wrote:

 Hmm, the calls in question are the ones in heapgettup() and
 heapgettup_pagemode(), which are subroutines of heap_getnext().
 heap_getnext() is only used in sequential scans, so it seems
 safe to remove those calls.

 I haven't found anything to the contrary, if I understand
 correctly, Dan found the same, and all the tests pass without
 them.  Here's a patch to remove them.  This makes the
 recently-added rs_relpredicatelocked boolean field unnecessary,
 so that's removed in this patch, too.
 
 Thanks, committed. I also moved the PredicateLockRelation() call
 to heap_beginscan(), per earlier discussion.
 
Thanks!
 
Before we leave the subject of modularity, do you think the entire
else clause dealing with the lossy bitmaps should be a heapam.c
function called from nodeBitmapHeapscan.c?  With the move of the
PredicateLockRelation() call you mention above, that leaves this as
the only place in the executor which references SSI, and it also is
the only place in the executor to call PageGetMaxOffsetNumber() and
OffsetNumberNext(), which seem like AM things.  The logic seems
somewhat similar to heap_hot_search_buffer() and such a function
would take roughly the same parameters.
 
On the other hand, it's obviously not a bug, so maybe that's
something to put on a list to look at later.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Feature request: new column is_paused in pg_stat_replication

2011-06-29 Thread Emanuel Calvo

Hi hackers,

I don't know if it's possible or not, but could be interesting to add
a column in pg_stat_replication to now if the standby is paused. It
could be useful if
somebody has several standby servers and use this function for some tasks.

I know that there is an indirect way to calculate this in the primary,
just comparing
the other locations with the replay one.

Any toughs?


-- 
--
              Emanuel Calvo
              Helpame.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Patch file questions?

2011-06-29 Thread Casey Havenor

I'm obviously new.   But making great progress in PostgreSQL with my new
application...

Setup:
 
I'm running on MAC. 
Postgre 9.0.4 
Virtual Machine with application dev in Linux. 

Problem:
 
I like many other have come across the inherit issues. 

I found the thread here about such issue... 
http://postgresql.1045698.n5.nabble.com/FK-s-to-refer-to-rows-in-inheritance-child-td3287684.html
 

I grabbed the fk_inheritance.v1.patch file and have been trying to install
it for the last two hours. 

Where am I suppose to put this file? 
When it asks me for the file to patch what should I input? 
What else am I missing? 

Tore through Google looking for some example and everything I found or tried
didn't work? 

Warmest regards, 

Casey Havenor


--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Patch-file-questions-tp4536003p4536003.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch file questions?

2011-06-29 Thread Casey Havenor

Partitioning becomes impossible as I'd have to hunt down every single row
from every table within the hierarchy when needed.   I've got an object
driven system with permissions for users so I'll easily have thousands of
rows to manage across 100's of tables. 

For inheritance I'm using it for the following.  ONLY on/with UNIQUE
CONSTRAINTS and FOREIGN KEYS with OIDS enabled - which from my understanding
that shouldn't be an issues as there shouldn't any duplicate entries that
cause a deadlock?   -- So I would think this patch would be ok?  What am i
missing here? 

Is there another way that won't be such a headache - cost tons of man hours
- and still be efficient? 

Either way I'd still like to know how to apply a patch to PostgreSQL just to
satisfy my curiosity? :) 

Thanks for the fast response. 

-
Warmest regards, 

Casey Havenor
--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Patch-file-questions-tp4536031p4536413.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Feature request: new column is_paused in pg_stat_replication

2011-06-29 Thread Simon Riggs

On Wed, Jun 29, 2011 at 8:49 PM, Emanuel Calvo postgres@gmail.com wrote:

 I don't know if it's possible or not, but could be interesting to add
 a column in pg_stat_replication to now if the standby is paused. It
 could be useful if
 somebody has several standby servers and use this function for some tasks.

 I know that there is an indirect way to calculate this in the primary,
 just comparing
 the other locations with the replay one.

Interesting thought, I'll look into it.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch file questions?

2011-06-29 Thread Alvaro Herrera

Excerpts from Casey Havenor's message of mié jun 29 15:53:31 -0400 2011:
 Partitioning becomes impossible as I'd have to hunt down every single row
 from every table within the hierarchy when needed.   I've got an object
 driven system with permissions for users so I'll easily have thousands of
 rows to manage across 100's of tables. 

I suggest you raise this issue on pgsql-sql, -general or maybe
-performance, for design advice.

 Either way I'd still like to know how to apply a patch to PostgreSQL just to
 satisfy my curiosity? :) 

You need the patch utility, and either the -p1 or -p0 switch depending
on how the file was built.  When done, recompile.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hint bit cache v6

2011-06-29 Thread Robert Haas

On Wed, Jun 29, 2011 at 3:16 PM, Merlin Moncure mmonc...@gmail.com wrote:
 I think it's a fair point to ask how often thrashing cases will truly
 come up where you don't have some other significant cost like i/o.
 Even when you do thrash, you are just falling back on stock postgres
 behaviors minus the maintenance costs.

Agree.

 With the algorithm as coded, to fully flush the cache you'd have to
 find a series of *unhinted* tuples distributed across minimum of four
 64k wide page ranges, with the number of tuples being over the 5%
 noise floor.  Furthermore, to eject a cache page you have to have more
 hits than a page already in the cache got -- hits are tracked on the
 existing pages and the candidate pages are effectively with the
 existing pages based on hits with the best four picked.  Also, is it
 reasonable to expect the cache to help in these types of cases
 anyways?

*scratches head*

Well, surely you must need to reset the counters for the pages you've
chosen to include in the cache at some point.  If you don't, then
there's a strong likelihood that after some period of time the pages
you have in there will become so firmly nailed into the cache that
they can never be evicted.

To be clear, I don't really think it matters how sensitive the cache
is to a *complete* flush.  The question I want to ask is: how much
does it take to knock ONE page out of cache?  And what are the chances
of that happening too frequently?  It seems to me that if a run of 100
tuples with the same previously-unseen XID is enough to knock over the
applecart, then that's not a real high bar - you could easily hit that
limit on a single page.  And if that isn't enough, then I don't
understand the algorithm.

 If your concern is the cost of replacement, then you can
 micro-optimize (a hand rolled qsort alone should squeeze out quite a
 bit of fat) or experiment with a new algorithm as you suggested.
 However i should note that at least for pgbench fsync=off oltp tests
 (the only way I could think of to exercise cache replacement), it
 (replacement costs) did not show up on the radar.

It might be useful, just for debugging purposes, to add an elog(LOG)
in there that triggers whenever a cache flush occurs.  And then play
with different workloads and try to make it happen as often as you
can.  One idea I had was to hack the XID generator so that it only
returns every (say) 200th XID, instead of consecutive ones.  That
might make it easier to identify the sorts of workloads that are prone
to causing problems, and then we could further analyze those workloads
and see whether they are a problem in real life, and/or what can be
done to minimize the impact.

 OTOH, If your main concern is about losing data out of the cache via
 page swap, increasing the number of cache pages (or use smaller ones)
 might work but this incrementally makes the testing the cache more
 expensive.

Yeah, I am inclined to think that going very far in that direction is
going to be a big pile of fail.  TBH, I'm somewhat surprised that you
managed to get what you already have to work without a performance
regression.  That code path is extremely hot, and injecting more
complexity seems like it ought to be a loser.  For purposes of this
review, I'm taking it as given that you've tested this carefully, but
I'll admit to lingering skepticism.

 I did briefly experiment with trying to bootstrap incoming cache pages
 with data gathered out of the clog -- but it didn't help much and was
 a lot more trouble than worth.

I believe it.

 One possible enhancement would be to try and bias pages with more data
 in them so that it's more difficult to get them out -- or even
 maintain separate pools with ejection priority.  I'm not in a hurry
 and suggestions are welcome.

I think the basic problem is that the cache pages are so large.  It's
hard to make them smaller because that increases the cost of accessing
the cache, as you already noted, but at the same time, making an
eviction decision on a cache that holds 64K entries based on 100 data
points seems like waving around a pretty large-caliber weapon in a
fairly uncontrolled fashion.  Maybe it works OK 90% of the time, but
it's hard to avoid the niggling fear that someone might accidentally
get shot.

Just for the sake of argument, let's suppose we made an array with 64K
elements, each one representing one 64K chunk of the XID space.  Each
element is a 4-byte unsigned integer, so the array takes up 256kB of
memory... probably not good, but I'm just thinking out loud here.
Every time we're asked about an XID, we bump the corresponding
counter.  Periodically (say, every 64K probes) we zip through all the
counters and consider performing a cache replacement.  We look at the
not-already-cached page that has the highest counter value and compare
it to the counter value for the cached pages.  If it is higher and the
difference exceeds some safety margin (to protect against hysteresis),
then we do the replacement.  I think

[HACKERS] serializable installcheck

2011-06-29 Thread Kevin Grittner

I've discovered (the hard way) that if you run `make installcheck`
with default_transaction_isolation = 'serializable' and you have a
serializable read write transaction sitting idle (or prepared) that
the transactions test will block indefinitely waiting for a safe
time to run, because of the test of the READ ONLY DEFERRABLE code. 
I think that's OK, but figured I should make people aware of the
fact and see if everyone agrees.
 
I will admit that it can be confusing until you find the idle or
prepared transaction.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review of patch Bugfix for XPATH() if expression returns a scalar value

2011-06-29 Thread Florian Pflug

On Jun29, 2011, at 19:57 , Radosław Smogura wrote:
 This is review of patch
 https://commitfest.postgresql.org/action/patch_view?id=565
 Bugfix for XPATH() if expression returns a scalar value
 
 SELECT XMLELEMENT(name root, XMLATTRIBUTES(foo.namespace AS sth)) FROM 
 (SELECT 
 (XPATH('namespace-uri(/*)', x))[1] AS namespace FROM (VALUES (XMLELEMENT(name 
 root, XMLATTRIBUTES('n' AS xmlns, 'v' AS value),'t'))) v(x)) as foo;
 
   xmlelement
 -
 root sth=amp;lt;n/
 
   It's clearly visible that value from attribute is lt;n, not . 
 Every 
 parser will read this as lt;n which is not-natural and will require form 
 consumer/developer to de-escape this on his side - roughly speaking this will 
 be reported as serious bug.

There's a problem there, no doubt, but your proposed solution just moves the
problem around.

Here's your query, reformatted to be more legible 

SELECT
  XMLELEMENT(name root,
 XMLATTRIBUTES(foo.namespace AS sth)
  )
FROM (
  SELECT 
(XPATH('namespace-uri(/*)', x))[1] AS namespace
  FROM (VALUES (XMLELEMENT(name root,
XMLATTRIBUTES('n' AS xmlns,
  'v' AS value),'t')
  )
) v(x)) as foo;

What happens is that the XPATH expression selects the xmlns
attribute with the value 'n'. To be well-formed xml, that value
must be escaped, so what is actually returned by the XPATH
call is 'lt;n'. The XMLATTRIBUTES() function, however, doesn't
distinguish between input of type XML and input of type TEXT,
so it figures it has to represent the *textual* value 'lt;n'
in xml, and produces 'amp;lt;n'.

Not escaping textual values returned by XPATH isn't a solution,
though. For example, assume someone inserts the value returned
by the XPATH() call in your query into a column of type XML.
If XPATH() returned 'n' literally instead of escaping it,
the column would contain non-well-formed XML in a column of type
XML. Not pretty, and pretty fatal during your next dump/reload
cycle.

Or, if you want another example, simply remove the XMLATTRIBUTES
call and use the value returned by XPATH as a child node directly.

SELECT
  XMLELEMENT(name root,
 foo.namespace
  )
FROM (
  SELECT 
(XPATH('namespace-uri(/*)', x))[1] AS namespace
  FROM (VALUES (XMLELEMENT(name root,
XMLATTRIBUTES('n' AS xmlns,
  'v' AS value),'t')
  )
) v(x)) as foo;

 xmlelement 

 rootlt;n/root

Note that this correct! The root node contains a text node
representing the textual value 'n'. If XPATH() hadn't return
the value escaped, the query above would have produces
  
  rootn/root

which is obviously wrong. It works in this case because XMLELEMENT
is smart enough to distinguish between child nodes gives as TEXT
values (those are escaped) and child nodes provided as XML values
(those are inserted unchanged).

XMLATTRIBUTES, however, doesn't make that distinction, and simply
escaped all attributes values independent of their type. Now, the
reason it does that is probably that *not* all XML values are
well-formed attribute values without additional escaping. For example,
you cannot have literal '' and '' in attribute values. So if
XMLATTRIBUTES, like XMLELEMENT never escaped values which are
already of type XML, the following piece of code

  XMLELEMENT(name a, XMLATTRIBUTES('node/'::xml as v))

would produce

  a v=node//

which, alas, is not well-formed :-(

The correct solution, I believe, is to teach XMLATTRIBUTES to
only escape those characters which are invalid in attribute
values but valid in XML (Probably only '' and '' but I'll
check). If there are no objects, I'll prepare a separate patch
for that. I don't want to include it in this patch because it's
really a distinct issue (even modifies a different function).

best regards,
Florian Pflug





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch Review: Bugfix for XPATH() if text or attribute nodes are selected

2011-06-29 Thread Florian Pflug

On Jun29, 2011, at 19:34 , Radosław Smogura wrote:
 B. 6. Current behaviour _is intended_ (there is if  to check node type) and 
 _natural_. In this particular case user ask for text content of some node, 
 and this content is actually .

I don't buy that. The check for the node type is there because
two different libxml functions are used to convert nodes to
strings. The if has absolutely *zero* to do with escaping, expect
for that missing escape_xml() call in the else case.

Secondly, there is little point in having an type XML if we
don't actually ensure that values of that type can only contain
well-formed XML.

 B. 7. Similar behaviour may be observer e. g. in SQL Server(tm)
 SELECT x.value('(//text())[1]', 'varchar(256)') FROM #xml_t;
 Produced 

Whats the *type* of that value? I'm not familiar with the XPATH
support in SQL Server, but since you pass 'varchar(256)' as
the second parameter, I assume that this is how you tell it
the return type you want. Since you chose a textual type, not
quoting the value is perfectly fine. I suggest that you try
this again, but this time evaluate the XPATH expression so
that you get a value of type XML (Assuming such a type exists
in SQL Server)

 B. 8. Even if current behaviour may be treated as wrong it was exposed and 
 other may depends on not escaped content.

Granted, there's the possibility of breaking existing applications
here. But the same argument could have been applied when we made
check against invalid characters (e.g. invalid UTF-8 byte sequences)
tighter for textual types.

When it comes to data integrity (and non-well-formed values in
XML columns are a data integrity issue), I believe that the advantages
of tighter checks out-weight potential compatibility problems.

 B. 9. I think, correct approach should go to create new function (basing on 
 one existing) that will be able to escape above. In this situation
 call should look like (for example only!):
 SELECT xml_escape((XPATH('/*/text()', 'rootlt;/root')))[1]
 or
 SELECT xml_escape((XPATH('/*/text()', 'rootlt;/root'))[1])
 One method to operate on array one to operate on single XML datum.
 Or to think about something like xmltext(). (Compare current status of xml.c)

-1. Again, value of type XML should always be well-formed XML.
Requiring every future user of posgres XML support to remember
to surround XPATH() calls with XML_ESCAPE() will undoubtedly lead
to bugs.

I'm all for having escaping and un-escaping functions though,
but these *need* to have the signatures

  XML_ESCAPE(text) RETURNS xml
  XML_UNESCAPE(XML) RETURNS text

best regards,
Florian Pflug

PS: Next time, please post your Review as a follow-up of the mail
that contains the patch. That makes sure that people who already
weighted in on the issue don't overlook your review. Or, the
patch author, for that matter - I nearly missed your Review between
the larger number of mail in my postgres folder.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] default privileges wording

2011-06-29 Thread Alvaro Herrera

Excerpts from Robert Haas's message of mié jun 29 13:42:34 -0400 2011:

  How about this?
 
  Some types of objects deny all privileges to PUBLIC by default.  These
  are tables, columns, schemas and tablespaces.  For other types, the
  default privileges granted to PUBLIC are as follows: CONNECT privilege
  and TEMP table creation privilege for databases; EXECUTE privilege for
  functions; and USAGE privilege for languages.  The object owner can,
  of course, revoke both default and expressly granted privileges.
 
 Or, since I find the use of the word deny a bit unclear:
 
 When a table, column, schema, or tablespace is created, no privileges
 are granted to PUBLIC.  But for other objects, some privileges will be
 granted to PUBLIC automatically at the time the object is created:
 CONNECT privilege and TEMP table creation privilege for database, ...
 etc., the rest as you have it

Hmm, I like David's suggestion better, but I agree with you that deny
isn't the right verb there.  I have no better suggestions at moment
though.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: starting to review the Extend NOT NULL representation to pg_constraint patch

2011-06-29 Thread Alvaro Herrera

Excerpts from Robert Haas's message of mié jun 29 13:07:25 -0400 2011:
 On Wed, Jun 29, 2011 at 12:51 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  Excerpts from Robert Haas's message of lun jun 27 10:35:59 -0400 2011:

  Interesting.  This whole thing requires quite a bit of rejiggering in
  the initial transformation phase, I think, but yeah, I see the points
  here and I will see to them.  Does this mean that NOT NULL PRIMARY KEY
  now behaves differently?  I think it does , because if you drop the PK
  then the field needs to continue being not null.
 
 Yeah, I think an implicit not-null because you made it a primary key
 is now different from one that you write out.

Actually, it wasn't that hard, but I'm not really sure I like the
resulting code:

/*
 * We want to inherit NOT NULL constraints, but not primary 
keys.
 * Since attnotnull flags in pg_attribute stores both, we want 
to keep only
 * the attnotnull flag from those columns that have it from NOT 
NULL
 * constraints.  To do this, we create a copy of the table's 
descriptor
 * and scribble on it by resetting all the attnotnull bits to 
false, and
 * the setting them true for columns that appear in a NOT NULL 
constraint.
 *
 * Note: we cannot use CreateTupleDescCopy here, because it'd 
lose
 * the atthasdef bits, as well as constraints.
 */
tupleDesc = 
CreateTupleDescCopyConstr(RelationGetDescr(relation));
constr = tupleDesc-constr;
parent_nns = GetRelationNotNullConstraints(relation);

for (parent_attno = 1; parent_attno = tupleDesc-natts;
 parent_attno++)
tupleDesc-attrs[parent_attno - 1]-attnotnull = false;

foreach (cell, parent_nns)
{
NotNullConstraint *constr = lfirst(cell);

tupleDesc-attrs[constr-attnum - 1]-attnotnull = true;
}


Here's the simple example (sorry for the spanish):

alvherre=# create table foo (a int primary key, b int not null);
NOTICE:  CREATE TABLE / PRIMARY KEY creará el índice implícito «foo_pkey» para 
la tabla «foo»
CREATE TABLE
alvherre=# create table bar () inherits (foo);
CREATE TABLE

alvherre=# \d foo
Tabla «public.foo»
 Columna |  Tipo   | Modificadores 
-+-+---
 a   | integer | not null
 b   | integer | not null
Índices:
foo_pkey PRIMARY KEY, btree (a)
Número de tablas hijas: 1 (Use \d+ para listarlas.)

alvherre=# \d bar
Tabla «public.bar»
 Columna |  Tipo   | Modificadores 
-+-+---
 a   | integer | 
 b   | integer | not null
Hereda: foo


alvherre=# create table baz (a int not null primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY creará el índice implícito «baz_pkey» para 
la tabla «baz»
CREATE TABLE
alvherre=# create table qux () inherits (baz);
CREATE TABLE
alvherre=# \d baz
Tabla «public.baz»
 Columna |  Tipo   | Modificadores 
-+-+---
 a   | integer | not null
Índices:
baz_pkey PRIMARY KEY, btree (a)
Número de tablas hijas: 1 (Use \d+ para listarlas.)

alvherre=# \d qux
Tabla «public.qux»
 Columna |  Tipo   | Modificadores 
-+-+---
 a   | integer | not null
Hereda: baz

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] default privileges wording

2011-06-29 Thread David Fetter

On Wed, Jun 29, 2011 at 04:49:15PM -0400, Alvaro Herrera wrote:
 Excerpts from Robert Haas's message of mié jun 29 13:42:34 -0400 2011:
 
   How about this?
  
   Some types of objects deny all privileges to PUBLIC by default.
    These are tables, columns, schemas and tablespaces.  For other
   types, the default privileges granted to PUBLIC are as follows:
   CONNECT privilege and TEMP table creation privilege for
   databases; EXECUTE privilege for functions; and USAGE privilege
   for languages.  The object owner can, of course, revoke both
   default and expressly granted privileges.
  
  Or, since I find the use of the word deny a bit unclear:
  
  When a table, column, schema, or tablespace is created, no
  privileges are granted to PUBLIC.  But for other objects, some
  privileges will be granted to PUBLIC automatically at the time the
  object is created: CONNECT privilege and TEMP table creation
  privilege for database, ...  etc., the rest as you have it
 
 Hmm, I like David's suggestion better, but I agree with you that
 deny isn't the right verb there.  I have no better suggestions at
 moment though.

I chose deny in the sense of default deny, which is a term of art
in security engineering referring to an access control policy.

http://en.wikipedia.org/wiki/Security_engineering#Security_stance

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch file questions?

2011-06-29 Thread Casey Havenor

Googled patch utility - not sure of which one to get as none I found use the
.patch extensions? 

So is this the workflow...
- Get patch
- Get patch utility - Not sure which one - Any recommendations for Win7,
Mac, Linux? 
- Apply patch to installer for PostgreSQL or after PostgreSQL is installed? 
- Recompile?
- Reinstall? 
- Re-import existing database structure?  --- If I apply this patch - I've
already got a schema built for my database does that mean I'll lose it or
have to import it in?

Never done this have lots of questions... 

Thanks again. 

-
Warmest regards, 

Casey Havenor
--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Patch-file-questions-tp4536031p4536722.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch file questions?

2011-06-29 Thread Andrew Dunstan




On 06/29/2011 05:18 PM, Casey Havenor wrote:

Googled patch utility - not sure of which one to get as none I found use the
.patch extensions?

So is this the workflow...
- Get patch
- Get patch utility - Not sure which one -


On Linux run man patch. If it's not installed, then install your 
distro's patch package. For example, on Fedora you'd run something like 
yum install patch.


If you've never done any of this stuff you probably have quite a steep 
learning curve ahead of you.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] default privileges wording

2011-06-29 Thread Andrew Dunstan




On 06/29/2011 05:16 PM, David Fetter wrote:


Hmm, I like David's suggestion better, but I agree with you that
deny isn't the right verb there.  I have no better suggestions at
moment though.

I chose deny in the sense of default deny, which is a term of art
in security engineering referring to an access control policy.

http://en.wikipedia.org/wiki/Security_engineering#Security_stance




If two of our own most deeply invested hackers find it unclear, many 
other will too, term of art or not.


cheers

andrew






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types, constructors, and the type system

2011-06-29 Thread Peter Eisentraut

On ons, 2011-06-29 at 10:15 -0700, David E. Wheeler wrote:
 On Jun 29, 2011, at 10:13 AM, Florian Pflug wrote:
 
  Because there might be more than one range type for a
  base type. Say there are two range types over text, one
  with collation 'de_DE' and one with collation 'en_US'.
  What would the type of
   range('foo', 'f')
  be?
 
 The one that corresponds to the current LC_COLLATE setting.

Yes, or more generally, we have logic that determines, for example, what
collation to use for

'foo'  'f'

The same logic can be used to determine what collation to use for

range('foo', 'f')

(In fact, if you implement range() as a user-space function, that will
happen automatically.)



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: starting to review the Extend NOT NULL representation to pg_constraint patch

2011-06-29 Thread Robert Haas

On Wed, Jun 29, 2011 at 4:59 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Robert Haas's message of mié jun 29 13:07:25 -0400 2011:
 On Wed, Jun 29, 2011 at 12:51 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  Excerpts from Robert Haas's message of lun jun 27 10:35:59 -0400 2011:

  Interesting.  This whole thing requires quite a bit of rejiggering in
  the initial transformation phase, I think, but yeah, I see the points
  here and I will see to them.  Does this mean that NOT NULL PRIMARY KEY
  now behaves differently?  I think it does , because if you drop the PK
  then the field needs to continue being not null.

 Yeah, I think an implicit not-null because you made it a primary key
 is now different from one that you write out.

 Actually, it wasn't that hard, but I'm not really sure I like the
 resulting code:

What don't you like about it?

My concern is that I'm not sure it's correct...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: starting to review the Extend NOT NULL representation to pg_constraint patch

2011-06-29 Thread Alvaro Herrera

Excerpts from Robert Haas's message of mié jun 29 18:16:20 -0400 2011:
 On Wed, Jun 29, 2011 at 4:59 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  Excerpts from Robert Haas's message of mié jun 29 13:07:25 -0400 2011:
  On Wed, Jun 29, 2011 at 12:51 PM, Alvaro Herrera
  alvhe...@commandprompt.com wrote:
   Excerpts from Robert Haas's message of lun jun 27 10:35:59 -0400 2011:
 
   Interesting.  This whole thing requires quite a bit of rejiggering in
   the initial transformation phase, I think, but yeah, I see the points
   here and I will see to them.  Does this mean that NOT NULL PRIMARY KEY
   now behaves differently?  I think it does , because if you drop the PK
   then the field needs to continue being not null.
 
  Yeah, I think an implicit not-null because you made it a primary key
  is now different from one that you write out.
 
  Actually, it wasn't that hard, but I'm not really sure I like the
  resulting code:
 
 What don't you like about it?

Scribbling on attnotnull like that seems ... kludgy (we have to walk the
attr list three times: first to copy, second to reset all the attnotnull
flags, third to set those of interest).  The fact that we need a copy to
scribble on, seems wrong as well (we weren't creating a copy before).
The existing mechanisms to copy tupledescs aren't very flexible, but
improving that seems overengineering to me.

 My concern is that I'm not sure it's correct...

Ah, well, I don't see any reason not to trust it currently.  I am afraid
it could easily break in the future though.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Adding Japanese README

2011-06-29 Thread Tatsuo Ishii

I would like to propose to add Japanese translated version of README
in the source tree(for example against backend/storage/buffer/README).
Benefits of our community include:

- Increase possibility to hire more Japanese speaking developers by
  helping them to understand source code using those README written in
  their local language

- Make Japanese speaking developers life easier. Most of them can read
  English README of course. However, reading Japanese one will
  increase their productivity thus more contribution to PostgreSQL

- Including Japanese README in the source tree rather than placing in
  somewhere else helps syncing English README with Japanese one.

Please note that I am not against adding README translated in other
language. I just don't understand other than Japanese.

Comments?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Branch refs/heads/REL9_1_STABLE was removed

2011-06-29 Thread Michael Paquier

On Thu, Jun 30, 2011 at 12:03 AM, Alvaro Herrera alvhe...@commandprompt.com
 wrote:

 Excerpts from Magnus Hagander's message of mié jun 29 10:30:51 -0400 2011:
  On Tue, Jun 28, 2011 at 19:45, Alvaro Herrera
  alvhe...@commandprompt.com wrote:
   Excerpts from Tom Lane's message of mar jun 28 10:39:22 -0400 2011:
  
   I think it would be sensible to block branch removal, as there's
   basically never a scenario where we'd do that during current usage.
   I'm not excited about blocking branch addition, although I worry
   sooner or later somebody will accidentally push a private development
   branch :-(.  Is it possible to block only removal and not addition?
  
   If we can tweak the thing, how about we only allow creating branches
   that match a certain pattern, say ^REL_\d+_\d+_STABLE$?
 
  I've put this in place - except I used ^REL\d+... and not what you
  suggested, since that's how we name our branches :P

Thanks, +10.
This is cool and will avoid for sure future problems.

-- 
Michael Paquier
http://michael.otacoo.com

[HACKERS] SECURITY LABEL on shared database object

2011-06-29 Thread Joe Conway

I signed up to do a review on $subject patch for the commitfest. In
order to do that, I want to get SELinux and contrib/sepgsql properly set
up so that I can test. I ran into a problem when trying to do:

cd contrib/sepgsql
make install   (succeeds)
make installcheck  (fails)

I get this:

== creating database contrib_regression ==
ERROR:  could not determine which collation to use for string
comparison
HINT:  Use the COLLATE clause to set the collation explicitly.
command failed: /usr/local/pgsql-head/bin/psql -X -c CREATE
DATABASE \contrib_regression\ TEMPLATE=template0 postgres
make: *** [installcheck] Error 2

So I installed sepgsql into the postgres database anyway and do this:

postgres=# SELECT sepgsql_restorecon(NULL);
ERROR:  could not determine which collation to use for string
comparison
HINT:  Use the COLLATE clause to set the collation explicitly.

Ok, so now I go look at the docs to figure out what exactly a COLLATE
clause is. Only searching the online docs brings up no hits on the
keyword COLLATE. Google brings me to TODO wiki page:

http://wiki.postgresql.org/wiki/Todo:Collate

But that isn't much help either. Grepping the source gets hits in 9.1
and master. So I guess:

1) COLLATE clause is a new feature in 9.1?
2) The doc search feature on postgresql.org does not search the 9.1
   documentation?

I looked in the 9.1 docs in SQL Commands-SELECT and could find no
reference to COLLATE. Can anyone point me to some documentation that
would explain what that error message means and how to resolve it?

Thanks,

Joe

-- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting,  24x7 Support



signature.asc
Description: OpenPGP digital signature

[HACKERS] avoid including rel.h in execnodes.h

2011-06-29 Thread Alvaro Herrera

This simple patch moves two struct declarations (Trigger and
TriggerDesc) from rel.h into a new file, reltrigger.h.  The benefit is
that execnodes.h only needs to include the latter.  Since execnodes.h is
very widely included, this change means there are less files that
indirectly include rel.h now, which is a good thing because rel.h
includes a ton of other files.  (Of course, rel.h itself needs to
include the new header).

I also included rel.h in spi.h, because it was previously indirectly
included via execnodes.h and with this patch it would no longer be,
which is a problem because it'd cause external code to fail to compile.

-- 
Álvaro Herrera alvhe...@alvh.no-ip.org


0001-Move-Trigger-and-TriggerDesc-structs-out-of-rel.h-in.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding Japanese README

2011-06-29 Thread Kevin Grittner

Tatsuo Ishii is...@postgresql.org wrote:
 
 I would like to propose to add Japanese translated version of
 README in the source tree
 
 Comments?
 
The obvious concern would be drift.  As README files are patched,
someone would need to stay on top of the translation process.  Any
ideas on how that could be reasonably managed?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding Japanese README

2011-06-29 Thread Tom Lane

Tatsuo Ishii is...@postgresql.org writes:
 I would like to propose to add Japanese translated version of README
 in the source tree(for example against backend/storage/buffer/README).

I think this is basically a bad idea, because 90% of the developers will
be unable to update such a file when changing the source code, or even
verify whether it's an accurate translation.  Thus it will inevitably
become out-of-date, and that seems worse than useless.  I'm fairly
dubious that someone who doesn't read English will be able to make much
sense of the source code anyway.

 Please note that I am not against adding README translated in other
 language. I just don't understand other than Japanese.

That's another problem: if we add Japanese, why not six or seven other
languages?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding Japanese README

2011-06-29 Thread Tatsuo Ishii

 I would like to propose to add Japanese translated version of
 README in the source tree
  
 Comments?
  
 The obvious concern would be drift.  As README files are patched,
 someone would need to stay on top of the translation process.  Any
 ideas on how that could be reasonably managed?

My idea is defining maintainer for each README. Of course I am ready
for Japanese one.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding Japanese README

2011-06-29 Thread Alvaro Herrera

Excerpts from Tom Lane's message of mié jun 29 19:24:07 -0400 2011:
 Tatsuo Ishii is...@postgresql.org writes:
  I would like to propose to add Japanese translated version of README
  in the source tree(for example against backend/storage/buffer/README).
 
 I think this is basically a bad idea, because 90% of the developers will
 be unable to update such a file when changing the source code, or even
 verify whether it's an accurate translation.  Thus it will inevitably
 become out-of-date, and that seems worse than useless.  I'm fairly
 dubious that someone who doesn't read English will be able to make much
 sense of the source code anyway.

What we could do instead is add a link to a translation in the wiki,
with a note that the wiki page may not be up to date.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding Japanese README

2011-06-29 Thread Kevin Grittner

Tatsuo Ishii is...@postgresql.org wrote:
 
 The obvious concern would be drift.  As README files are
 patched, someone would need to stay on top of the translation
 process. Any ideas on how that could be reasonably managed?
 
 My idea is defining maintainer for each README. Of course I am
 ready for Japanese one.
 
That would only cover part of the problem, and not on a permanent
basis.
 
How do you notice that a README has changed?
 
How does the community know when the changes have been completely
incorporated into the translation?
 
What do we do if we have README translations which haven't been
updated when it's time to tag a release?  Delete them?  Leave old,
misleading content?
 
What if you move on to something else and are no longer active in
the project?
 
I think this needs a well-defined and sustainable *process*, not
just a set of volunteers.  I'm skeptical that a workable process can
be devised, but I'm willing to be proven wrong.
 
Now, if someone wanted to set up a web site or Wiki page with
translations, that would be up to them.  I doubt anyone would object
to a link from the Developer FAQ to such a page.  It seems to me
that this would provide most of the same benefit, without much in
the way of a down side.  It might be wise to go so far as to put a
last modified date in each README (which would be copied without
change into each translation which brought things up to date), so
that it would be easy for someone looking at translation to
determine whether it was current.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding Japanese README

2011-06-29 Thread Tatsuo Ishii

 I think this is basically a bad idea, because 90% of the developers will
 be unable to update such a file when changing the source code, or even
 verify whether it's an accurate translation.  Thus it will inevitably
 become out-of-date, and that seems worse than useless.  I'm fairly
 dubious that someone who doesn't read English will be able to make much
 sense of the source code anyway.

This is the same argument as Kevin raised. My idea is defining
maintainer for each non English README who is responsible for
updating in timely manner. If the mainter is too lazy, we would be
able to remove the file before releasing new version. In another word,
the released version of PostgreSQL will always have accurate README,
which itself valuable for those who are in charge of maintaining older
versions.

 Please note that I am not against adding README translated in other
 language. I just don't understand other than Japanese.
 
 That's another problem: if we add Japanese, why not six or seven other
 languages?

That's basically the same problem as message translation. If we have
to wait until six or seven langugae speakers appear before putting
Japanese README written in particular language, why didn't do that way
for message translation? It seems your argument is not fair here.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SECURITY LABEL on shared database object

2011-06-29 Thread Joe Conway

On 06/29/2011 04:18 PM, Joe Conway wrote:
 1) COLLATE clause is a new feature in 9.1?
 2) The doc search feature on postgresql.org does not search the 9.1
documentation?
 
 I looked in the 9.1 docs in SQL Commands-SELECT and could find no
 reference to COLLATE. Can anyone point me to some documentation that
 would explain what that error message means and how to resolve it?

A little more information. It seems that sepgsql_restorecon calls
GetSecurityLabel in the backend. Here's the backtrace:

8-
#0  errfinish (dummy=0) at elog.c:374
#1  0x007b7c70 in varstr_cmp (arg1=0x7faeb724efa9 selinux,
len1=7, arg2=0x16a1d8c selinux~, len2=7, collid=0) at varlena.c:1312
#2  0x007b7f01 in text_cmp (arg1=0x7faeb724efa8, arg2=0x16a1d88,
collid=0) at varlena.c:1468
#3  0x007b8424 in bttextcmp (fcinfo=0x708e2b30) at
varlena.c:1610
#4  0x00819c93 in FunctionCall2Coll (flinfo=0x708e30a0,
collation=0, arg1=140388373688232, arg2=23731592) at fmgr.c:1319
#5  0x0049a905 in _bt_compare (rel=0x7faeb6f71540, keysz=3,
scankey=0x708e3090, page=0x7faeb724cfc0 , offnum=1) at nbtsearch.c:406
#6  0x0049a2c6 in _bt_binsrch (rel=0x7faeb6f71540, buf=74,
keysz=3, scankey=0x708e3000, nextkey=0 '\000') at nbtsearch.c:285
#7  0x0049b17b in _bt_first (scan=0x169ce28,
dir=ForwardScanDirection) at nbtsearch.c:877
#8  0x00498971 in btgettuple (fcinfo=0x708e3af0) at nbtree.c:315
#9  0x00819c93 in FunctionCall2Coll (flinfo=0x168e0d8,
collation=0, arg1=23711272, arg2=1) at fmgr.c:1319
#10 0x0048eeed in index_getnext (scan=0x169ce28,
direction=ForwardScanDirection) at indexam.c:487
#11 0x0048da81 in systable_getnext (sysscan=0x16a1020) at
genam.c:315
#12 0x007fa322 in SearchCatCache (cache=0x1613700, v1=1,
v2=1262, v3=23731592, v4=0) at catcache.c:1201
#13 0x008091c0 in SearchSysCache (cacheId=44, key1=1, key2=1262,
key3=23731592, key4=0) at syscache.c:859
#14 0x005a7016 in GetSecurityLabel (object=0x708e43d0,
provider=0x7faeb9713830 selinux) at seclabel.c:157
8-

The third key passed to SearchSysCache is CStringGetTextDatum(provider).
Ultimately FunctionCall2Coll gets called with collation == 0 and
varstr_cmp fails due to the ambiguity.

Is there something new that should be used in place of
CStringGetTextDatum that would convey a collation here?

Joe

-- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting,  24x7 Support



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] Adding Japanese README

2011-06-29 Thread Tatsuo Ishii

 My idea is defining maintainer for each README. Of course I am
 ready for Japanese one.
  
 That would only cover part of the problem, and not on a permanent
 basis.
  
 How do you notice that a README has changed?

Commit messages, obviously.

 How does the community know when the changes have been completely
 incorporated into the translation?

Why don't you worry about message translations then? Translation is a
human process and there's no way to guaranteer the translation is
perfect.

 What do we do if we have README translations which haven't been
 updated when it's time to tag a release?  Delete them?  Leave old,
 misleading content?

I would say delete them. The reson for this is:

 If the mainter is too lazy, we would be
 able to remove the file before releasing new version. In another word,
 the released version of PostgreSQL will always have accurate README,
 which itself valuable for those who are in charge of maintaining older
 versions.

 What if you move on to something else and are no longer active in
 the project?

 I think this needs a well-defined and sustainable *process*, not
 just a set of volunteers.  I'm skeptical that a workable process can
 be devised, but I'm willing to be proven wrong.

Kevin, this is an open source project. Nobody can guarantee that
he/she can continute to contribute forever. If we are so worry about
this, we will never be able to accept any contribution at all.

BTW I will talk to some Japanese speaking developers about my idea if
community agree to add Japanese README to the source tree so that I am
not the only one who are contributing this project(I can commit that
my coworkers will help the project. However I prefer to gather
contributors outside my company in addition to inside my company since
this will bring more sustainable activity).

 Now, if someone wanted to set up a web site or Wiki page with
 translations, that would be up to them.  I doubt anyone would object
 to a link from the Developer FAQ to such a page.  It seems to me
 that this would provide most of the same benefit, without much in
 the way of a down side.  It might be wise to go so far as to put a
 last modified date in each README (which would be copied without
 change into each translation which brought things up to date), so
 that it would be easy for someone looking at translation to
 determine whether it was current.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] default privileges wording

2011-06-29 Thread Robert Haas

On Wed, Jun 29, 2011 at 4:49 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Robert Haas's message of mié jun 29 13:42:34 -0400 2011:

  How about this?
 
  Some types of objects deny all privileges to PUBLIC by default.  These
  are tables, columns, schemas and tablespaces.  For other types, the
  default privileges granted to PUBLIC are as follows: CONNECT privilege
  and TEMP table creation privilege for databases; EXECUTE privilege for
  functions; and USAGE privilege for languages.  The object owner can,
  of course, revoke both default and expressly granted privileges.

 Or, since I find the use of the word deny a bit unclear:

 When a table, column, schema, or tablespace is created, no privileges
 are granted to PUBLIC.  But for other objects, some privileges will be
 granted to PUBLIC automatically at the time the object is created:
 CONNECT privilege and TEMP table creation privilege for database, ...
 etc., the rest as you have it

 Hmm, I like David's suggestion better, but I agree with you that deny
 isn't the right verb there.  I have no better suggestions at moment
 though.

Well, I think the only relevant verb is grant, so that's why I was
trying to phrase it in terms of the negative of that - i.e. explain
that, in this case, we don't grant anything.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SECURITY LABEL on shared database object

2011-06-29 Thread Joe Conway

On 06/29/2011 05:34 PM, Joe Conway wrote:
 The third key passed to SearchSysCache is CStringGetTextDatum(provider).
 Ultimately FunctionCall2Coll gets called with collation == 0 and
 varstr_cmp fails due to the ambiguity.
 
 Is there something new that should be used in place of
 CStringGetTextDatum that would convey a collation here?

(Sorry about replying to myself again...)

In fmgr.h we have:

8-
/* These macros allow the collation argument to be omitted (with a
default of
 * InvalidOid, ie, no collation).  They exist mostly for backwards
 * compatibility of source code.
 */
#define DirectFunctionCall1(func, arg1) \
DirectFunctionCall1Coll(func, InvalidOid, arg1)
#define DirectFunctionCall2(func, arg1, arg2) \
DirectFunctionCall2Coll(func, InvalidOid, arg1, arg2)
#define DirectFunctionCall3(func, arg1, arg2, arg3) \
DirectFunctionCall3Coll(func, InvalidOid, arg1, arg2, arg3)
[...]
8--

Perhaps instead of InvalidOid here we should be using DEFAULT_COLLATION_OID?

Joe

-- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting,  24x7 Support



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] default privileges wording

2011-06-29 Thread David Fetter

On Wed, Jun 29, 2011 at 08:42:58PM -0400, Robert Haas wrote:
 On Wed, Jun 29, 2011 at 4:49 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  Excerpts from Robert Haas's message of mié jun 29 13:42:34 -0400 2011:
 
   How about this?
  
   Some types of objects deny all privileges to PUBLIC by default.  These
   are tables, columns, schemas and tablespaces.  For other types, the
   default privileges granted to PUBLIC are as follows: CONNECT privilege
   and TEMP table creation privilege for databases; EXECUTE privilege for
   functions; and USAGE privilege for languages.  The object owner can,
   of course, revoke both default and expressly granted privileges.
 
  Or, since I find the use of the word deny a bit unclear:
 
  When a table, column, schema, or tablespace is created, no privileges
  are granted to PUBLIC.  But for other objects, some privileges will be
  granted to PUBLIC automatically at the time the object is created:
  CONNECT privilege and TEMP table creation privilege for database, ...
  etc., the rest as you have it
 
  Hmm, I like David's suggestion better, but I agree with you that deny
  isn't the right verb there.  I have no better suggestions at moment
  though.
 
 Well, I think the only relevant verb is grant, so that's why I was
 trying to phrase it in terms of the negative of that - i.e. explain
 that, in this case, we don't grant anything.

How about this?

PostgreSQL grants some types of objects some default privileges to
PUBLIC.  Tables, columns, schemas and tablespaces grant no privileges
to PUBLIC by default.  For other types, the default privileges granted
to PUBLIC are as follows: CONNECT and CREATE TEMP TABLE for databases;
EXECUTE privilege for functions; and USAGE privilege for languages.
The object owner can, of course, REVOKE both default and expressly
granted privileges.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] default privileges wording

2011-06-29 Thread Robert Haas

On Wed, Jun 29, 2011 at 8:53 PM, David Fetter da...@fetter.org wrote:
 How about this?

 PostgreSQL grants some types of objects some default privileges to
 PUBLIC.  Tables, columns, schemas and tablespaces grant no privileges
 to PUBLIC by default.  For other types, the default privileges granted
 to PUBLIC are as follows: CONNECT and CREATE TEMP TABLE for databases;
 EXECUTE privilege for functions; and USAGE privilege for languages.
 The object owner can, of course, REVOKE both default and expressly
 granted privileges.

That looks pretty good to me.  I'd probably say grants default
privileges on some types of objects rather than grants some types of
objects default privileges, but YMMV.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding Japanese README

2011-06-29 Thread Itagaki Takahiro

On Thu, Jun 30, 2011 at 09:42, Tatsuo Ishii is...@postgresql.org wrote:
 BTW I will talk to some Japanese speaking developers about my idea if
 community agree to add Japanese README to the source tree so that I am
 not the only one who are contributing this project

 Now, if someone wanted to set up a web site or Wiki page with
 translations, that would be up to them.

IMHO, the Wiki approach seems to be reasonable than a README file.
It will be suitable for adding non-Japanese translations and
non-core developer can join to translate or fix the docs.

If we will promote those README files as documentation, we could
move them into internals section in the SGML doc tree. If so,
the translated README will be a part of the doc translation project.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding Japanese README

2011-06-29 Thread Tatsuo Ishii

 IMHO, the Wiki approach seems to be reasonable than a README file.
 It will be suitable for adding non-Japanese translations and
 non-core developer can join to translate or fix the docs.

I doubt other than developers can translate those README files since
the words used in the files chosen to be understandable for those who
are familiar with PostgreSQL internals.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding Japanese README

2011-06-29 Thread Michael Paquier

On Thu, Jun 30, 2011 at 10:43 AM, Tatsuo Ishii is...@postgresql.org wrote:

  IMHO, the Wiki approach seems to be reasonable than a README file.
  It will be suitable for adding non-Japanese translations and
  non-core developer can join to translate or fix the docs.

 I doubt other than developers can translate those README files since
 the words used in the files chosen to be understandable for those who
 are familiar with PostgreSQL internals.

You may be surprised by the quality of translations from non-developers
findable in a lot of Open source softwares.

-- 
Michael Paquier
http://michael.otacoo.com

Re: [HACKERS] time-delayed standbys

2011-06-29 Thread Josh Berkus

Robert,

 I don't really see how that's any different from what happens now.  If
 (for whatever reason) the master is generating WAL faster than a
 streaming standby can replay it, then the excess WAL is going to pile
 up someplace, and you might run out of disk space.   Time-delaying the
 standby creates an additional way for that to happen, but I don't
 think it's an entirely new problem.

Not remotely new.  xlog partition full is currently 75% of the emergency
support calls PGX gets from clients on 9.0 (if only they'd pay attention
to their nagios alerts!)

 I am not sure exactly how walreceiver handles it if the disk is full.
 I assume it craps out and eventually retries, so probably what will
 happen is that, after the standby's pg_xlog directory fills up,
 walreceiver will sit there and error out until replay advances enough
 to remove a WAL file and thus permit some more data to be streamed.

Nope, it gets stuck and stops there.  Replay doesn't advance unless you
can somehow clear out some space manually; if the disk is full, the disk
is full, and PostgreSQL doesn't remove WAL files without being able to
write files first.

Manual (or scripted) intervention is always necessary if you reach disk
100% full.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] default privileges wording

2011-06-29 Thread Andrew Dunstan




On 06/29/2011 09:20 PM, Robert Haas wrote:

On Wed, Jun 29, 2011 at 8:53 PM, David Fetterda...@fetter.org  wrote:

How about this?

PostgreSQL grants some types of objects some default privileges to
PUBLIC.  Tables, columns, schemas and tablespaces grant no privileges
to PUBLIC by default.  For other types, the default privileges granted
to PUBLIC are as follows: CONNECT and CREATE TEMP TABLE for databases;
EXECUTE privilege for functions; and USAGE privilege for languages.
The object owner can, of course, REVOKE both default and expressly
granted privileges.

That looks pretty good to me.  I'd probably say grants default
privileges on some types of objects rather than grants some types of
objects default privileges, but YMMV.


Yeah, that sounds good. The second sentence reads oddly to me - it's not 
the objects that are doing (or not doing) the granting; rather they are 
the subjects of the (lack of) granted privileges. Maybe we should say:


No privileges are granted to PUBLIC by default on tables, columns, 
schemas or tablespaces.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] time-delayed standbys

2011-06-29 Thread Josh Berkus

On 6/29/11 11:11 AM, Robert Haas wrote:
 If the standby gets far enough behind the master that the required
 files are no longer there, then it will switch to the archive, if
 available.

One more thing:

As I understand it (and my testing shows this), the standby *prefers*
the archive logs, and won't switch to streaming until it reaches the end
of the archive logs.  This is desirable behavior, as it minimizes the
load on the master.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] time-delayed standbys

2011-06-29 Thread Robert Haas

On Wed, Jun 29, 2011 at 9:54 PM, Josh Berkus j...@agliodbs.com wrote:
 I am not sure exactly how walreceiver handles it if the disk is full.
 I assume it craps out and eventually retries, so probably what will
 happen is that, after the standby's pg_xlog directory fills up,
 walreceiver will sit there and error out until replay advances enough
 to remove a WAL file and thus permit some more data to be streamed.

 Nope, it gets stuck and stops there.  Replay doesn't advance unless you
 can somehow clear out some space manually; if the disk is full, the disk
 is full, and PostgreSQL doesn't remove WAL files without being able to
 write files first.

 Manual (or scripted) intervention is always necessary if you reach disk
 100% full.

Wow, that's a pretty crappy failure mode... but I don't think we need
to fix it just on account of this patch.  It would be nice to fix, of
course.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dry Run mode for pg_archivecleanup

2011-06-29 Thread Fujii Masao

On Tue, Jun 28, 2011 at 6:33 AM, Gabriele Bartolini
gabriele.bartol...@2ndquadrant.it wrote:
   I have added the '-n' option to pg_archivecleanup which performs a dry-run
 and outputs the names of the files to be removed to stdout (making possible
 to pass the list via pipe to another process). Please find attached the
 small patch.

Please add the patch into the next CommitFest.
https://commitfest.postgresql.org/action/commitfest_view?id=11

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] time-delayed standbys

2011-06-29 Thread Fujii Masao

On Wed, Jun 29, 2011 at 11:14 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jun 15, 2011 at 1:58 AM, Fujii Masao masao.fu...@gmail.com wrote:
 After we run pg_ctl promote, time-delayed replication should be disabled?
 Otherwise, failover might take very long time when we set recovery_time_delay
 to high value.

 PFA a patch that I believe will disable recovery_time_delay after
 promotion.  The only change from the previous version is:

 diff --git a/src/backend/access/transam/xlog.c 
 b/src/backend/access/transam/xlog
 index 1dbf792..41b3ae9 100644
 --- a/src/backend/access/transam/xlog.c
 +++ b/src/backend/access/transam/xlog.c
 @@ -5869,7 +5869,7 @@ pg_is_xlog_replay_paused(PG_FUNCTION_ARGS)
  static void
  recoveryDelay(void)
  {
 -       while (1)
 +       while (!CheckForStandbyTrigger())
        {
                long    secs;
                int             microsecs;

Thanks for updating patch! I have a few comments;

ISTM recoveryDelayUntilTime needs to be calculated also when replaying
the commit
*compact* WAL record (i.e., record_info == XLOG_XACT_COMMIT_COMPACT).

When the user uses only two-phase commit on the master, ISTM he or she cannot
use this feature. Because recoveryDelayUntilTime is never set in that
case. Is this
intentional?

We should disable this feature also after recovery reaches the stop
point (specified
in recovery_target_xxx)?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] time-delayed standbys

2011-06-29 Thread Fujii Masao

On Thu, Jun 30, 2011 at 10:56 AM, Robert Haas robertmh...@gmail.com wrote:
 Nope, it gets stuck and stops there.  Replay doesn't advance unless you
 can somehow clear out some space manually; if the disk is full, the disk
 is full, and PostgreSQL doesn't remove WAL files without being able to
 write files first.

 Manual (or scripted) intervention is always necessary if you reach disk
 100% full.

 Wow, that's a pretty crappy failure mode... but I don't think we need
 to fix it just on account of this patch.  It would be nice to fix, of
 course.

Yeah, we need to fix that as a separate patch. The difficult point is that
we cannot delete WAL files until we replay the checkpoint record and
restartpoint occurs. But, if the disk is full, there would be no space to
receive the checkpoint record, so we cannot WAL files infinitely.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] time-delayed standbys

2011-06-29 Thread Fujii Masao

On Thu, Jun 30, 2011 at 12:14 PM, Fujii Masao masao.fu...@gmail.com wrote:
 We should disable this feature also after recovery reaches the stop
 point (specified in recovery_target_xxx)?

Another comment; it's very helpful to document the behavior of delayed standby
when promoting or after reaching the stop point.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Online base backup from the hot-standby

2011-06-29 Thread Fujii Masao

2011/6/28 Jun Ishiduka ishizuka@po.ntts.co.jp:

 Considering everything that has been discussed on this thread so far.

 Do you still think your patch is the best way to accomplish base backups
 from standby servers?
 If not what changes do you think should be made?

 I reconsider the way to not use pg_stop_backup().

 Process of online base backup on standby server:
  1. pg_start_backup('x');
  2. copy the data directory
  3. copy *pg_control*

Who deletes the backup_label file created by pg_start_backup()?
Isn't pg_stop_backup() required to do that?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Online base backup from the hot-standby

2011-06-29 Thread Magnus Hagander

On Jun 30, 2011 5:59 AM, Fujii Masao masao.fu...@gmail.com wrote:

 2011/6/28 Jun Ishiduka ishizuka@po.ntts.co.jp:
 
  Considering everything that has been discussed on this thread so far.
 
  Do you still think your patch is the best way to accomplish base
backups
  from standby servers?
  If not what changes do you think should be made?
 
  I reconsider the way to not use pg_stop_backup().
 
  Process of online base backup on standby server:
   1. pg_start_backup('x');
   2. copy the data directory
   3. copy *pg_control*

 Who deletes the backup_label file created by pg_start_backup()?
 Isn't pg_stop_backup() required to do that?

You need it to take the system out of backup mode as well, don't you?

/Magnus

Re: [HACKERS] pgbench--new transaction type

2011-06-29 Thread Jeff Janes

On Sun, Jun 19, 2011 at 3:30 PM, Greg Smith g...@2ndquadrant.com wrote:
...

 Things to fix in the patch before it would be a commit candidate:

 -Adjust the loop size/name, per above
 -Reformat some of the longer lines to try and respect the implied right
 margin in the code formatting
 -Don't include the plgsql function created. line unless in debugging mode.
 -Add the docs.  Focus on how this measures how fast the database can execute
 SELECT statements using server-side code.  An explanation that the
 transaction block size is 512 is important to share.  It also needs a
 warning that time based runs (-T) may have to wait for a block to finish
 and go beyond its normally expected end time.
 -The word via in the transaction type output description is probably not
 the best choice.  Changing to SELECT only using PL/pgSQL would translate
 better, and follow the standard case use for the name of that language.

Hi Greg,

Thanks for the review.  I've realized that I will not get a chance to
incorporate these changes during this commitfest due to work and
travel schedules, so I have changed it to Returned with Feedback and
will update it for the next commitfest.

One more thought I had, would it make sense to change this from the
creation of a PL/pgSQL permanent function to instead use the recently
added DO anonymous block syntax?  I think that would be somewhat
cleaner about leaving cruft behind in the database.  But it would
increase the overhead of each outer execution, and would also mean
that it would not be backwards compatible to run against servers
before 9.0

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hint bit cache v6

2011-06-29 Thread Jim Nasby

On Jun 29, 2011, at 3:18 PM, Robert Haas wrote:
 To be clear, I don't really think it matters how sensitive the cache
 is to a *complete* flush.  The question I want to ask is: how much
 does it take to knock ONE page out of cache?  And what are the chances
 of that happening too frequently?  It seems to me that if a run of 100
 tuples with the same previously-unseen XID is enough to knock over the
 applecart, then that's not a real high bar - you could easily hit that
 limit on a single page.  And if that isn't enough, then I don't
 understand the algorithm.

Would it be reasonable to keep a second level cache that store individual XIDs 
instead of blocks? That would provide protection for XIDs that are extremely 
common but don't have a good fit with the pattern of XID ranges that we're 
caching. I would expect this to happen if you had a transaction that touched a 
bunch of data (ie: bulk load or update) some time ago (so the other XIDs around 
it are less likely to be interesting) but not old enough to have been frozen 
yet. Obviously you couldn't keep too many XIDs in this secondary cache, but if 
you're just trying to prevent certain pathological cases then hopefully you 
wouldn't need to keep that many.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench--new transaction type

2011-06-29 Thread Greg Smith


On 06/30/2011 12:13 AM, Jeff Janes wrote:

One more thought I had, would it make sense to change this from the
creation of a PL/pgSQL permanent function to instead use the recently
added DO anonymous block syntax?  I think that would be somewhat
cleaner about leaving cruft behind in the database.  But it would
increase the overhead of each outer execution, and would also mean
that it would not be backwards compatible to run against servers
before 9.0
   


I think some measurement of the overhead difference would be needed to 
know for sure about the first part.  I suspect that given the block size 
of 512 now being targeted, that would end up not mattering very much.


pgbench's job is to generate a whole database full of cruft, so I can't 
say I'd find an argument from either side of that to be very 
compelling.  I'm not real busy anymore testing performance of PostgreSQL 
instances from before 9.0 anymore either, so whether this mode was 
compatible with them or not isn't very compelling either.  Just a mixed 
bag all around in those areas.


I would say take a look at what the performance change looks like, and 
see if it turns out to make the patch to pgbench less obtrusive.  The 
main objection against committing this code I can see is that it will 
further complicate pgbench for a purpose not many people care about.  So 
if the DO version ends up with a smaller diff and less impact on the 
codebase, that would likely be a more substantial tie-breaker in its 
favor than any of these other arguments.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small patch for GiST: move childoffnum to child

2011-06-29 Thread Jeff Janes

On Tue, Jun 28, 2011 at 10:21 AM, Alexander Korotkov
aekorot...@gmail.com wrote:
 Actually, there is no more direct need of this patch because I've rewrote
 insert function for fast build. But there are still two points for having
 this changes:
 1) As it was noted before, it simplifies code a bit.
 2) It would be better if childoffnum have the same semantics in regular
 insert and fastbuild insert.

Hi Alexander,

I've looked at your patch and have done a partial review.  It
applies cleanly and makes without warnings, and passes make check plus
some additional testing I've done (inserting lots of stuff into
regression's test_tsvector table, in parallel, with the gist index in
place) under --enable-cassert.  I repeated that test without
--enable-cassert, and saw no degradation in performance over unpatched
code.  No changes to documentation or make check code should be
needed.  The formatting looks OK.

My concern is that I am unable to prove to myself simply by reading
the code that the 24 line chunk deleted from gistFindPath (near ***
919,947 ) are no longer needed.  My familiarity with the gist code
is low enough that it is not surprising that I cannot prove this to
myself from first principles.  I have no reason to believe it is not
correct, it is just that I can't convince myself that it is correct.

So I tried provoking situations where this surrounding code section
would get executed, both patched and unpatched.  I have been unable to
do so--apparently this code is for an incredibly obscure situation
which I can't induce at will.

I would love to use this as an opportunity to study the gist code
until I can convince myself this patch is correct, but I'm afraid I
won't be able to do that promptly, or for the remainder of this commit
fest.

Since Heikki has already looked at this patch, perhaps he can provide
the assurance that I cannot, or another reviewer can.

Sorry I couldn't do a more thorough review,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch file questions?

2011-06-29 Thread Casey Havenor

What recommendations do you have for the other systems? - Win7, Win XP and
Mac?  

Thanks for the info - learning curves don't bother me - love a challenge! 

-
Warmest regards, 

Casey Havenor
--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Patch-file-questions-tp4536031p4537677.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding Japanese README

2011-06-29 Thread Hitoshi Harada

2011/6/30 Itagaki Takahiro itagaki.takah...@gmail.com:
 On Thu, Jun 30, 2011 at 09:42, Tatsuo Ishii is...@postgresql.org wrote:
 BTW I will talk to some Japanese speaking developers about my idea if
 community agree to add Japanese README to the source tree so that I am
 not the only one who are contributing this project

 Now, if someone wanted to set up a web site or Wiki page with
 translations, that would be up to them.

 IMHO, the Wiki approach seems to be reasonable than a README file.
 It will be suitable for adding non-Japanese translations and
 non-core developer can join to translate or fix the docs.

+1. If we really want to prove the demand, let's start with Wiki,
which is less invasive than README (though I doubt such pages would be
updated finally.)

Regards,

-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

99 matches

Mail list logo